Merge remote-tracking branch 'upstream/master' into nikvas0/index_mutate

This commit is contained in:
Nikita Vasilev 2019-06-24 16:44:44 +03:00
commit 7b7517ff85
1354 changed files with 142914 additions and 11917 deletions

3
.gitmodules vendored
View File

@ -82,3 +82,6 @@
[submodule "contrib/simdjson"]
path = contrib/simdjson
url = https://github.com/lemire/simdjson.git
[submodule "contrib/rapidjson"]
path = contrib/rapidjson
url = https://github.com/Tencent/rapidjson

View File

@ -1,8 +1,122 @@
## ClickHouse release 19.7.3.9, 2019-05-30
### New Features
* Allow to limit the range of a setting that can be specified by user.
These constraints can be set up in user settings profile.
[#4931](https://github.com/yandex/ClickHouse/pull/4931) ([Vitaly
Baranov](https://github.com/vitlibar))
* Add a second version of the function `groupUniqArray` with an optional
`max_size` parameter that limits the size of the resulting array. This
behavior is similar to `groupArray(max_size)(x)` function.
[#5026](https://github.com/yandex/ClickHouse/pull/5026) ([Guillaume
Tassery](https://github.com/YiuRULE))
* For TSVWithNames/CSVWithNames input file formats, column order can now be
determined from file header. This is controlled by
`input_format_with_names_use_header` parameter.
[#5081](https://github.com/yandex/ClickHouse/pull/5081)
([Alexander](https://github.com/Akazz))
### Bug Fixes
* Crash with uncompressed_cache + JOIN during merge (#5197)
[#5133](https://github.com/yandex/ClickHouse/pull/5133) ([Danila
Kutenin](https://github.com/danlark1))
* Segmentation fault on a clickhouse-client query to system tables. #5066
[#5127](https://github.com/yandex/ClickHouse/pull/5127)
([Ivan](https://github.com/abyss7))
* Data loss on heavy load via KafkaEngine (#4736)
[#5080](https://github.com/yandex/ClickHouse/pull/5080)
([Ivan](https://github.com/abyss7))
* Fixed very rare data race condition that could happen when executing a query with UNION ALL involving at least two SELECTs from system.columns, system.tables, system.parts, system.parts_tables or tables of Merge family and performing ALTER of columns of the related tables concurrently. [#5189](https://github.com/yandex/ClickHouse/pull/5189) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Performance Improvements
* Use radix sort for sorting by single numeric column in `ORDER BY` without
`LIMIT`. [#5106](https://github.com/yandex/ClickHouse/pull/5106),
[#4439](https://github.com/yandex/ClickHouse/pull/4439)
([Evgenii Pravda](https://github.com/kvinty),
[alexey-milovidov](https://github.com/alexey-milovidov))
### Documentation
* Translate documentation for some table engines to Chinese.
[#5107](https://github.com/yandex/ClickHouse/pull/5107),
[#5094](https://github.com/yandex/ClickHouse/pull/5094),
[#5087](https://github.com/yandex/ClickHouse/pull/5087)
([张风啸](https://github.com/AlexZFX)),
[#5068](https://github.com/yandex/ClickHouse/pull/5068) ([never
lee](https://github.com/neverlee))
### Build/Testing/Packaging Improvements
* Print UTF-8 characters properly in `clickhouse-test`.
[#5084](https://github.com/yandex/ClickHouse/pull/5084)
([alexey-milovidov](https://github.com/alexey-milovidov))
* Add command line parameter for clickhouse-client to always load suggestion
data. [#5102](https://github.com/yandex/ClickHouse/pull/5102)
([alexey-milovidov](https://github.com/alexey-milovidov))
* Resolve some of PVS-Studio warnings.
[#5082](https://github.com/yandex/ClickHouse/pull/5082)
([alexey-milovidov](https://github.com/alexey-milovidov))
* Update LZ4 [#5040](https://github.com/yandex/ClickHouse/pull/5040) ([Danila
Kutenin](https://github.com/danlark1))
* Add gperf to build requirements for upcoming pull request #5030.
[#5110](https://github.com/yandex/ClickHouse/pull/5110)
([proller](https://github.com/proller))
## ClickHouse release 19.6.2.11, 2019-05-13
### New Features
* TTL expressions for columns and tables. [#4212](https://github.com/yandex/ClickHouse/pull/4212) ([Anton Popov](https://github.com/CurtizJ))
* Added support for `brotli` compression for HTTP responses (Accept-Encoding: br) [#4388](https://github.com/yandex/ClickHouse/pull/4388) ([Mikhail](https://github.com/fandyushin))
* Added new function `isValidUTF8` for checking whether a set of bytes is correctly utf-8 encoded. [#4934](https://github.com/yandex/ClickHouse/pull/4934) ([Danila Kutenin](https://github.com/danlark1))
* Add new load balancing policy `first_or_random` which sends queries to the first specified host and if it's inaccessible send queries to random hosts of shard. Useful for cross-replication topology setups. [#5012](https://github.com/yandex/ClickHouse/pull/5012) ([nvartolomei](https://github.com/nvartolomei))
### Experimental Features
* Add setting `index_granularity_bytes` (adaptive index granularity) for MergeTree* tables family. [#4826](https://github.com/yandex/ClickHouse/pull/4826) ([alesapin](https://github.com/alesapin))
### Improvements
* Added support for non-constant and negative size and length arguments for function `substringUTF8`. [#4989](https://github.com/yandex/ClickHouse/pull/4989) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Disable push-down to right table in left join, left table in right join, and both tables in full join. This fixes wrong JOIN results in some cases. [#4846](https://github.com/yandex/ClickHouse/pull/4846) ([Ivan](https://github.com/abyss7))
* `clickhouse-copier`: auto upload task configuration from `--task-file` option [#4876](https://github.com/yandex/ClickHouse/pull/4876) ([proller](https://github.com/proller))
* Added typos handler for storage factory and table functions factory. [#4891](https://github.com/yandex/ClickHouse/pull/4891) ([Danila Kutenin](https://github.com/danlark1))
* Support asterisks and qualified asterisks for multiple joins without subqueries [#4898](https://github.com/yandex/ClickHouse/pull/4898) ([Artem Zuikov](https://github.com/4ertus2))
* Make missing column error message more user friendly. [#4915](https://github.com/yandex/ClickHouse/pull/4915) ([Artem Zuikov](https://github.com/4ertus2))
### Performance Improvements
* Significant speedup of ASOF JOIN [#4924](https://github.com/yandex/ClickHouse/pull/4924) ([Martijn Bakker](https://github.com/Gladdy))
### Backward Incompatible Changes
* HTTP header `Query-Id` was renamed to `X-ClickHouse-Query-Id` for consistency. [#4972](https://github.com/yandex/ClickHouse/pull/4972) ([Mikhail](https://github.com/fandyushin))
### Bug Fixes
* Fixed potential null pointer dereference in `clickhouse-copier`. [#4900](https://github.com/yandex/ClickHouse/pull/4900) ([proller](https://github.com/proller))
* Fixed error on query with JOIN + ARRAY JOIN [#4938](https://github.com/yandex/ClickHouse/pull/4938) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed hanging on start of the server when a dictionary depends on another dictionary via a database with engine=Dictionary. [#4962](https://github.com/yandex/ClickHouse/pull/4962) ([Vitaly Baranov](https://github.com/vitlibar))
* Partially fix distributed_product_mode = local. It's possible to allow columns of local tables in where/having/order by/... via table aliases. Throw exception if table does not have alias. There's not possible to access to the columns without table aliases yet. [#4986](https://github.com/yandex/ClickHouse/pull/4986) ([Artem Zuikov](https://github.com/4ertus2))
* Fix potentially wrong result for `SELECT DISTINCT` with `JOIN` [#5001](https://github.com/yandex/ClickHouse/pull/5001) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed very rare data race condition that could happen when executing a query with UNION ALL involving at least two SELECTs from system.columns, system.tables, system.parts, system.parts_tables or tables of Merge family and performing ALTER of columns of the related tables concurrently. [#5189](https://github.com/yandex/ClickHouse/pull/5189) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Build/Testing/Packaging Improvements
* Fixed test failures when running clickhouse-server on different host [#4713](https://github.com/yandex/ClickHouse/pull/4713) ([Vasily Nemkov](https://github.com/Enmk))
* clickhouse-test: Disable color control sequences in non tty environment. [#4937](https://github.com/yandex/ClickHouse/pull/4937) ([alesapin](https://github.com/alesapin))
* clickhouse-test: Allow use any test database (remove `test.` qualification where it possible) [#5008](https://github.com/yandex/ClickHouse/pull/5008) ([proller](https://github.com/proller))
* Fix ubsan errors [#5037](https://github.com/yandex/ClickHouse/pull/5037) ([Vitaly Baranov](https://github.com/vitlibar))
* Yandex LFAlloc was added to ClickHouse to allocate MarkCache and UncompressedCache data in different ways to catch segfaults more reliable [#4995](https://github.com/yandex/ClickHouse/pull/4995) ([Danila Kutenin](https://github.com/danlark1))
* Python util to help with backports and changelogs. [#4949](https://github.com/yandex/ClickHouse/pull/4949) ([Ivan](https://github.com/abyss7))
## ClickHouse release 19.5.4.22, 2019-05-13
### Bug fixes
* Fixed possible crash in bitmap* functions [#5220](https://github.com/yandex/ClickHouse/pull/5220) [#5228](https://github.com/yandex/ClickHouse/pull/5228) ([Andy Yang](https://github.com/andyyzh))
* Fixed very rare data race condition that could happen when executing a query with UNION ALL involving at least two SELECTs from system.columns, system.tables, system.parts, system.parts_tables or tables of Merge family and performing ALTER of columns of the related tables concurrently. [#5189](https://github.com/yandex/ClickHouse/pull/5189) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error `Set for IN is not created yet in case of using single LowCardinality column in the left part of IN`. This error happened if LowCardinality column was the part of primary key. #5031 [#5154](https://github.com/yandex/ClickHouse/pull/5154) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Modification of retention function: If a row satisfies both the first and NTH condition, only the first satisfied condition is added to the data state. Now all conditions that satisfy in a row of data are added to the data state. [#5119](https://github.com/yandex/ClickHouse/pull/5119) ([小路](https://github.com/nicelulu))
## ClickHouse release 19.5.3.8, 2019-04-18
### Bug fixes
* Fixed type of setting `max_partitions_per_insert_block` from boolean to UInt64. [#5028](https://github.com/yandex/ClickHouse/pull/5028) ([Mohammad Hossein Sekhavat](https://github.com/mhsekhavat))
## ClickHouse release 19.5.2.6, 2019-04-15
### New Features
@ -294,7 +408,7 @@
* Added support of `Nullable` types in `mysql` table function. [#4198](https://github.com/yandex/ClickHouse/pull/4198) ([Emmanuel Donin de Rosière](https://github.com/edonin))
* Support for arbitrary constant expressions in `LIMIT` clause. [#4246](https://github.com/yandex/ClickHouse/pull/4246) ([k3box](https://github.com/k3box))
* Added `topKWeighted` aggregate function that takes additional argument with (unsigned integer) weight. [#4245](https://github.com/yandex/ClickHouse/pull/4245) ([Andrew Golman](https://github.com/andrewgolman))
* `StorageJoin` now supports `join_overwrite` setting that allows overwriting existing values of the same key. [#3973](https://github.com/yandex/ClickHouse/pull/3973) ([Amos Bird](https://github.com/amosbird)
* `StorageJoin` now supports `join_any_take_last_row` setting that allows overwriting existing values of the same key. [#3973](https://github.com/yandex/ClickHouse/pull/3973) ([Amos Bird](https://github.com/amosbird)
* Added function `toStartOfInterval`. [#4304](https://github.com/yandex/ClickHouse/pull/4304) ([Vitaly Baranov](https://github.com/vitlibar))
* Added `RowBinaryWithNamesAndTypes` format. [#4200](https://github.com/yandex/ClickHouse/pull/4200) ([Oleg V. Kozlyuk](https://github.com/DarkWanderer))
* Added `IPv4` and `IPv6` data types. More effective implementations of `IPv*` functions. [#3669](https://github.com/yandex/ClickHouse/pull/3669) ([Vasily Nemkov](https://github.com/Enmk))
@ -1763,7 +1877,7 @@ This release contains bug fixes for the previous release 1.1.54276:
- cleanup_delay_period sets how often to start cleanup to remove outdated data.
- replicated_can_become_leader can prevent a replica from becoming the leader (and assigning merges).
* Accelerated cleanup to remove outdated data from ZooKeeper.
* Multiple improvements and fixes for clustered DDL queries. Of particular interest is the new setting distributed_ddl_task_timeout, which limits the time to wait for a response from the servers in the cluster.
* Multiple improvements and fixes for clustered DDL queries. Of particular interest is the new setting distributed_ddl_task_timeout, which limits the time to wait for a response from the servers in the cluster. If a ddl request has not been performed on all hosts, a response will contain a timeout error and a request will be executed in an async mode.
* Improved display of stack traces in the server logs.
* Added the "none" value for the compression method.
* You can use multiple dictionaries_config sections in config.xml.

View File

@ -1,3 +1,119 @@
## ClickHouse release 19.7.3.9, 2019-05-30
### Новые возможности
* Добавлена возможность ограничить значения конфигурационных параметров,
которые может задать пользователь. Эти ограничения устанавливаются в профиле
настроек пользователя. [#4931](https://github.com/yandex/ClickHouse/pull/4931)
([Vitaly Baranov](https://github.com/vitlibar))
* Добавлен вариант функции `groupUniqArray` с дополнительным параметром
`max_size`, который ограничивает размер результирующего массива, аналогично
функции `groupArray(max_size)(x)`.
[#5026](https://github.com/yandex/ClickHouse/pull/5026) ([Guillaume
Tassery](https://github.com/YiuRULE))
* Для входных файлов формата TSVWithNames и CSVWithNames появилась возможность
определить порядок колонок в файле исходя из его заголовка. Это поведение
управляется конфигурационным параметром `input_format_with_names_use_header`.
[#5081](https://github.com/yandex/ClickHouse/pull/5081)
([Alexander](https://github.com/Akazz))
### Исправления ошибок
* Падение в процессе слияния при использовании uncompressed_cache и JOIN
(#5197). [#5133](https://github.com/yandex/ClickHouse/pull/5133) ([Danila
Kutenin](https://github.com/danlark1))
* Segmentation fault на запросе к системным таблицам (#5066).
[#5127](https://github.com/yandex/ClickHouse/pull/5127)
([Ivan](https://github.com/abyss7))
* Потеря загружаемых данных при больших потоках загрузки через KafkaEngine
(#4736). [#5080](https://github.com/yandex/ClickHouse/pull/5080)
([Ivan](https://github.com/abyss7))
* Исправлен очень редкий data race condition который мог произойти при выполнении запроса с UNION ALL включающего минимум два SELECT из таблиц system.columns, system.tables, system.parts, system.parts_tables или таблиц семейства Merge и одновременно выполняющихся запросов ALTER столбцов соответствующих таблиц. [#5189](https://github.com/yandex/ClickHouse/pull/5189) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Улучшения производительности
* Используется поразрядная сортировка числовых колонок для `ORDER BY` без
`LIMIT`. [#5106](https://github.com/yandex/ClickHouse/pull/5106),
[#4439](https://github.com/yandex/ClickHouse/pull/4439) ([Evgenii
Pravda](https://github.com/kvinty),
[alexey-milovidov](https://github.com/alexey-milovidov)
### Документация
* Документация для некоторых табличных движков переведена на китайский.
[#5107](https://github.com/yandex/ClickHouse/pull/5107),
[#5094](https://github.com/yandex/ClickHouse/pull/5094),
[#5087](https://github.com/yandex/ClickHouse/pull/5087)
([张风啸](https://github.com/AlexZFX),
[#5068](https://github.com/yandex/ClickHouse/pull/5068) ([never
lee](https://github.com/neverlee))
### Улучшения сборки, тестирования и пакетирования
* Правильно отображаются символы в кодировке UTF-8 в `clickhouse-test`.
[#5084](https://github.com/yandex/ClickHouse/pull/5084)
([alexey-milovidov](https://github.com/alexey-milovidov))
* Добавлен параметр командной строки для `clickhouse-client`, позволяющий
всегда загружать данные подсказок.
[#5102](https://github.com/yandex/ClickHouse/pull/5102)
([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлены некоторые предупреждения PVS-Studio.
[#5082](https://github.com/yandex/ClickHouse/pull/5082)
([alexey-milovidov](https://github.com/alexey-milovidov))
* Обновлена библиотека LZ4.
[#5040](https://github.com/yandex/ClickHouse/pull/5040) ([Danila
Kutenin](https://github.com/danlark1))
* В зависимости сборки добавлен gperf для поддержки готовящегося PR #5030.
[#5110](https://github.com/yandex/ClickHouse/pull/5110)
([proller](https://github.com/proller))
## ClickHouse release 19.6.2.11, 2019-05-13
### Новые возможности
* TTL выражения, позволяющие настроить время жизни и автоматическую очистку данных в таблице или в отдельных её столбцах. [#4212](https://github.com/yandex/ClickHouse/pull/4212) ([Anton Popov](https://github.com/CurtizJ))
* Добавлена поддержка алгоритма сжатия `brotli` в HTTP ответах (`Accept-Encoding: br`). Для тела POST запросов, эта возможность уже существовала. [#4388](https://github.com/yandex/ClickHouse/pull/4388) ([Mikhail](https://github.com/fandyushin))
* Добавлена функция `isValidUTF8` для проверки, содержит ли строка валидные данные в кодировке UTF-8. [#4934](https://github.com/yandex/ClickHouse/pull/4934) ([Danila Kutenin](https://github.com/danlark1))
* Добавлены новое правило балансировки (`load_balancing`) `first_or_random` по которому запросы посылаются на первый заданый хост и если он недоступен - на случайные хосты шарда. Полезно для топологий с кросс-репликацией. [#5012](https://github.com/yandex/ClickHouse/pull/5012) ([nvartolomei](https://github.com/nvartolomei))
### Экспериментальные возможности
* Добавлена настройка `index_granularity_bytes` (адаптивная гранулярность индекса) для таблиц семейства MergeTree* . [#4826](https://github.com/yandex/ClickHouse/pull/4826) ([alesapin](https://github.com/alesapin))
### Улучшения
* Добавлена поддержка для не константных и отрицательных значений аргументов смещения и длины для функции `substringUTF8`. [#4989](https://github.com/yandex/ClickHouse/pull/4989) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Отключение push-down в правую таблицы в left join, левую таблицу в right join, и в обе таблицы в full join. Это исправляет неправильные JOIN результаты в некоторых случаях. [#4846](https://github.com/yandex/ClickHouse/pull/4846) ([Ivan](https://github.com/abyss7))
* `clickhouse-copier`: Автоматическая загрузка конфигурации задачи в zookeeper из `--task-file` опции [#4876](https://github.com/yandex/ClickHouse/pull/4876) ([proller](https://github.com/proller))
* Добавлены подсказки с учётом опечаток для имён движков таблиц и табличных функций. [#4891](https://github.com/yandex/ClickHouse/pull/4891) ([Danila Kutenin](https://github.com/danlark1))
* Поддержка выражений `select *` и `select tablename.*` для множественных join без подзапросов [#4898](https://github.com/yandex/ClickHouse/pull/4898) ([Artem Zuikov](https://github.com/4ertus2))
* Сообщения об ошибках об отсутствующих столбцах стали более понятными. [#4915](https://github.com/yandex/ClickHouse/pull/4915) ([Artem Zuikov](https://github.com/4ertus2))
### Улучшение производительности
* Существенное ускорение ASOF JOIN [#4924](https://github.com/yandex/ClickHouse/pull/4924) ([Martijn Bakker](https://github.com/Gladdy))
### Обратно несовместимые изменения
* HTTP заголовок `Query-Id` переименован в `X-ClickHouse-Query-Id` для соответствия. [#4972](https://github.com/yandex/ClickHouse/pull/4972) ([Mikhail](https://github.com/fandyushin))
### Исправления ошибок
* Исправлены возможные разыменования нулевого указателя в `clickhouse-copier`. [#4900](https://github.com/yandex/ClickHouse/pull/4900) ([proller](https://github.com/proller))
* Исправлены ошибки в запросах с JOIN + ARRAY JOIN [#4938](https://github.com/yandex/ClickHouse/pull/4938) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлено зависание на старте сервера если внешний словарь зависит от другого словаря через использование таблицы из БД с движком `Dictionary`. [#4962](https://github.com/yandex/ClickHouse/pull/4962) ([Vitaly Baranov](https://github.com/vitlibar))
* При использовании `distributed_product_mode = 'local'` корректно работает использование столбцов локальных таблиц в where/having/order by/... через табличные алиасы. Выкидывает исключение если таблица не имеет алиас. Доступ к столбцам без алиасов пока не возможен. [#4986](https://github.com/yandex/ClickHouse/pull/4986) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлен потенциально некорректный результат для `SELECT DISTINCT` с `JOIN` [#5001](https://github.com/yandex/ClickHouse/pull/5001) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлен очень редкий data race condition который мог произойти при выполнении запроса с UNION ALL включающего минимум два SELECT из таблиц system.columns, system.tables, system.parts, system.parts_tables или таблиц семейства Merge и одновременно выполняющихся запросов ALTER столбцов соответствующих таблиц. [#5189](https://github.com/yandex/ClickHouse/pull/5189) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Улучшения сборки/тестирования/пакетирования
* Исправлена неработоспособность тестов, если `clickhouse-server` запущен на удалённом хосте [#4713](https://github.com/yandex/ClickHouse/pull/4713) ([Vasily Nemkov](https://github.com/Enmk))
* `clickhouse-test`: Отключена раскраска результата, если команда запускается не в терминале. [#4937](https://github.com/yandex/ClickHouse/pull/4937) ([alesapin](https://github.com/alesapin))
* `clickhouse-test`: Возможность использования не только базы данных test [#5008](https://github.com/yandex/ClickHouse/pull/5008) ([proller](https://github.com/proller))
* Исправлены ошибки при запуске тестов под UBSan [#5037](https://github.com/yandex/ClickHouse/pull/5037) ([Vitaly Baranov](https://github.com/vitlibar))
* Добавлен аллокатор Yandex LFAlloc для аллоцирования MarkCache и UncompressedCache данных разными способами для более надежного отлавливания проездов по памяти [#4995](https://github.com/yandex/ClickHouse/pull/4995) ([Danila Kutenin](https://github.com/danlark1))
* Утилита для упрощения бэкпортирования изменений в старые релизы и составления changelogs. [#4949](https://github.com/yandex/ClickHouse/pull/4949) ([Ivan](https://github.com/abyss7))
## ClickHouse release 19.5.4.22, 2019-05-13
### Исправления ошибок
* Исправлены возможные падения в bitmap* функциях [#5220](https://github.com/yandex/ClickHouse/pull/5220) [#5228](https://github.com/yandex/ClickHouse/pull/5228) ([Andy Yang](https://github.com/andyyzh))
* Исправлен очень редкий data race condition который мог произойти при выполнении запроса с UNION ALL включающего минимум два SELECT из таблиц system.columns, system.tables, system.parts, system.parts_tables или таблиц семейства Merge и одновременно выполняющихся запросов ALTER столбцов соответствующих таблиц. [#5189](https://github.com/yandex/ClickHouse/pull/5189) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка `Set for IN is not created yet in case of using single LowCardinality column in the left part of IN`. Эта ошибка возникала когда LowCardinality столбец была частью primary key. #5031 [#5154](https://github.com/yandex/ClickHouse/pull/5154) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Исправление функции retention: только первое соответствующее условие добавлялось в состояние данных. Сейчас все условия которые удовлетворяют в строке данных добавляются в состояние. [#5119](https://github.com/yandex/ClickHouse/pull/5119) ([小路](https://github.com/nicelulu))
## ClickHouse release 19.5.3.8, 2019-04-18
### Исправления ошибок
@ -286,7 +402,7 @@
* Добавлена поддержка `Nullable` типов в табличной функции `mysql`. [#4198](https://github.com/yandex/ClickHouse/pull/4198) ([Emmanuel Donin de Rosière](https://github.com/edonin))
* Добавлена поддержка произвольных константных выражений в секции `LIMIT`. [#4246](https://github.com/yandex/ClickHouse/pull/4246) ([k3box](https://github.com/k3box))
* Добавлена агрегатная функция `topKWeighted` - вариант `topK`, позволяющий задавать (целый неотрицательный) вес добавляемого значения. [#4245](https://github.com/yandex/ClickHouse/pull/4245) ([Andrew Golman](https://github.com/andrewgolman))
* Движок `Join` теперь поддерживает настройку `join_overwrite`, которая позволяет перезаписывать значения для существующих ключей. [#3973](https://github.com/yandex/ClickHouse/pull/3973) ([Amos Bird](https://github.com/amosbird))
* Движок `Join` теперь поддерживает настройку `join_any_take_last_row`, которая позволяет перезаписывать значения для существующих ключей. [#3973](https://github.com/yandex/ClickHouse/pull/3973) ([Amos Bird](https://github.com/amosbird))
* Добавлена функция `toStartOfInterval`. [#4304](https://github.com/yandex/ClickHouse/pull/4304) ([Vitaly Baranov](https://github.com/vitlibar))
* Добавлена функция `toStartOfTenMinutes`. [#4298](https://github.com/yandex/ClickHouse/pull/4298) ([Vitaly Baranov](https://github.com/vitlibar))
* Добавлен формат `RowBinaryWithNamesAndTypes`. [#4200](https://github.com/yandex/ClickHouse/pull/4200) ([Oleg V. Kozlyuk](https://github.com/DarkWanderer))
@ -1710,7 +1826,7 @@
- cleanup_delay_period - периодичность запуска очистки неактуальных данных
- replicated_can_become_leader - запретить реплике становиться лидером (и назначать мержи)
* Ускорена очистка неактуальных данных из ZooKeeper
* Множественные улучшения и исправления работы кластерных DDL запросов. В частности, добавлена настройка distributed_ddl_task_timeout, ограничивающая время ожидания ответов серверов кластера.
* Множественные улучшения и исправления работы кластерных DDL запросов. В частности, добавлена настройка distributed_ddl_task_timeout, ограничивающая время ожидания ответов серверов кластера. Если запрос не успел выполниться на всех нодах в установленное время, ответ будет содержать timeout ошибку и дальнейшее выполнение этого запроса будет происходить в асинхронном режиме
* Улучшено отображение стэктрейсов в логах сервера
* Добавлен метод сжатия none
* Возможность использования нескольких секций dictionaries_config в config.xml

View File

@ -67,7 +67,11 @@ if (NOT MAKE_STATIC_LIBRARIES)
option (CLICKHOUSE_SPLIT_BINARY "Make several binaries instead one bundled (clickhouse-server, clickhouse-client, ... )" OFF)
endif ()
if (SPLIT_SHARED_LIBRARIES)
if (MAKE_STATIC_LIBRARIES AND SPLIT_SHARED_LIBRARIES)
message(FATAL_ERROR "Defining SPLIT_SHARED_LIBRARIES=1 without MAKE_STATIC_LIBRARIES=0 has no effect.")
endif()
if (NOT MAKE_STATIC_LIBRARIES AND SPLIT_SHARED_LIBRARIES)
set(BUILD_SHARED_LIBS 1 CACHE INTERNAL "")
endif ()
@ -93,6 +97,11 @@ if (COMPILER_GCC OR COMPILER_CLANG)
set (CXX_WARNING_FLAGS "${CXX_WARNING_FLAGS} -Wnon-virtual-dtor")
endif ()
if (COMPILER_GCC AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "8.3.0")
# Warnings in protobuf generating
set (CXX_WARNING_FLAGS "${CXX_WARNING_FLAGS} -Wno-array-bounds")
endif ()
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
# clang: warning: argument unused during compilation: '-stdlib=libc++'
# clang: warning: argument unused during compilation: '-specs=/usr/share/dpkg/no-pie-compile.specs' [-Wunused-command-line-argument]
@ -105,7 +114,7 @@ option (ENABLE_TESTS "Enables tests" ON)
if (CMAKE_SYSTEM_PROCESSOR MATCHES "amd64|x86_64")
option (USE_INTERNAL_MEMCPY "Use internal implementation of 'memcpy' function instead of provided by libc. Only for x86_64." ON)
if (OS_LINUX AND NOT UNBUNDLED AND MAKE_STATIC_LIBRARIES AND CMAKE_VERSION VERSION_GREATER "3.9.0")
if (OS_LINUX AND NOT UNBUNDLED AND MAKE_STATIC_LIBRARIES AND NOT SPLIT_SHARED_LIBRARIES AND CMAKE_VERSION VERSION_GREATER "3.9.0")
option (GLIBC_COMPATIBILITY "Set to TRUE to enable compatibility with older glibc libraries. Only for x86_64, Linux. Implies USE_INTERNAL_MEMCPY." ON)
endif ()
endif ()
@ -226,7 +235,7 @@ if (OS_LINUX AND NOT UNBUNDLED AND (GLIBC_COMPATIBILITY OR USE_LIBCXX))
set (CMAKE_POSTFIX_VARIABLE "CMAKE_${CMAKE_BUILD_TYPE_UC}_POSTFIX")
# FIXME: glibc-compatibility may be non-static in some builds!
set (DEFAULT_LIBS "${DEFAULT_LIBS} libs/libglibc-compatibility/libglibc-compatibility${${CMAKE_POSTFIX_VARIABLE}}.a")
set (DEFAULT_LIBS "${DEFAULT_LIBS} ${ClickHouse_BINARY_DIR}/libs/libglibc-compatibility/libglibc-compatibility${${CMAKE_POSTFIX_VARIABLE}}.a")
endif ()
# Add Libc. GLIBC is actually a collection of interdependent libraries.
@ -310,6 +319,7 @@ include (cmake/find_rt.cmake)
include (cmake/find_execinfo.cmake)
include (cmake/find_readline_edit.cmake)
include (cmake/find_re2.cmake)
include (cmake/find_libgsasl.cmake)
include (cmake/find_rdkafka.cmake)
include (cmake/find_capnp.cmake)
include (cmake/find_llvm.cmake)
@ -317,7 +327,6 @@ include (cmake/find_cpuid.cmake) # Freebsd, bundled
if (NOT USE_CPUID)
include (cmake/find_cpuinfo.cmake) # Debian
endif()
include (cmake/find_libgsasl.cmake)
include (cmake/find_libxml2.cmake)
include (cmake/find_brotli.cmake)
include (cmake/find_protobuf.cmake)
@ -328,12 +337,14 @@ include (cmake/find_base64.cmake)
include (cmake/find_hyperscan.cmake)
include (cmake/find_lfalloc.cmake)
include (cmake/find_simdjson.cmake)
include (cmake/find_rapidjson.cmake)
find_contrib_lib(cityhash)
find_contrib_lib(farmhash)
find_contrib_lib(metrohash)
find_contrib_lib(btrie)
find_contrib_lib(double-conversion)
include (cmake/find_parquet.cmake)
if (ENABLE_TESTS)
include (cmake/find_gtest.cmake)
endif ()
@ -378,5 +389,26 @@ if (GLIBC_COMPATIBILITY)
add_glibc_compat(arrow)
add_glibc_compat(protoc)
add_glibc_compat(thrift_static)
add_glibc_compat(cityhash)
add_glibc_compat(farmhash)
add_glibc_compat(murmurhash)
add_glibc_compat(metrohash)
add_glibc_compat(metrohash128)
add_glibc_compat(consistent-hashing)
add_glibc_compat(double-conversion)
add_glibc_compat(cctz)
add_glibc_compat(kj)
add_glibc_compat(simdjson)
add_glibc_compat(apple_rt)
add_glibc_compat(re2)
add_glibc_compat(re2_st)
add_glibc_compat(hs_compile_shared)
add_glibc_compat(hs_exec_shared)
add_glibc_compat(hs_shared)
add_glibc_compat(widechar_width)
add_glibc_compat(string_utils)
add_glibc_compat(consistent-hashing-sumbur)
add_glibc_compat(boost_program_options_internal)
add_glibc_compat(boost_system_internal)
add_glibc_compat(boost_regex_internal)
endif ()

View File

@ -12,7 +12,7 @@ ClickHouse is an open-source column-oriented database management system that all
* You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.
## Upcoming Events
* ClickHouse at [Percona Live 2019](https://www.percona.com/live/19/other-open-source-databases-track) in Austin on May 28-30.
* [ClickHouse Community Meetup in Beijing](https://www.huodongxing.com/event/2483759276200) on June 8.
* [ClickHouse Community Meetup in Shenzhen](https://www.huodongxing.com/event/3483759917300) on October 20.
* [ClickHouse Community Meetup in Shanghai](https://www.huodongxing.com/event/4483760336000) on October 27.
* [ClickHouse on HighLoad++ Siberia](https://www.highload.ru/siberia/2019/abstracts/5348) on June 24-25.
* [ClickHouse Meetup in Novosibirsk](https://events.yandex.ru/events/ClickHouse/26-June-2019/) on June 26.
* [ClickHouse Meetup in Shenzhen](https://www.huodongxing.com/event/3483759917300) on October 20.
* [ClickHouse Meetup in Shanghai](https://www.huodongxing.com/event/4483760336000) on October 27.

17
SECURITY.md Normal file
View File

@ -0,0 +1,17 @@
# Security Policy
## Supported Versions
The following versions of ClickHouse server are
currently being supported with security updates:
| Version | Supported |
| ------- | ------------------ |
| 1.x | :x: |
| 18.x | :x: |
| 19.x | :white_check_mark: |
## Reporting a Vulnerability
To report a potential vulnerability in ClickHouse please use the security advisory feature of GitHub:
https://github.com/yandex/ClickHouse/security/advisories

View File

@ -5,7 +5,7 @@ endmacro()
macro(add_headers_and_sources prefix common_path)
add_glob(${prefix}_headers RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} ${common_path}/*.h)
add_glob(${prefix}_sources ${common_path}/*.cpp ${common_path}/*.h)
add_glob(${prefix}_sources ${common_path}/*.cpp ${common_path}/*.c ${common_path}/*.h)
endmacro()
macro(add_headers_only prefix common_path)

View File

@ -1,9 +1,12 @@
option (USE_INTERNAL_BOOST_LIBRARY "Set to FALSE to use system boost library instead of bundled" ${NOT_UNBUNDLED})
# Test random file existing in all package variants
if (USE_INTERNAL_BOOST_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/boost/libs/system/src/error_code.cpp")
message (WARNING "submodules in contrib/boost is missing. to fix try run: \n git submodule update --init --recursive")
set (USE_INTERNAL_BOOST_LIBRARY 0)
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/boost/libs/system/src/error_code.cpp")
if(USE_INTERNAL_BOOST_LIBRARY)
message(WARNING "submodules in contrib/boost is missing. to fix try run: \n git submodule update --init --recursive")
endif()
set (USE_INTERNAL_BOOST_LIBRARY 0)
set (MISSING_INTERNAL_BOOST_LIBRARY 1)
endif ()
if (NOT USE_INTERNAL_BOOST_LIBRARY)
@ -21,10 +24,9 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY)
set (Boost_INCLUDE_DIRS "")
set (Boost_SYSTEM_LIBRARY "")
endif ()
endif ()
if (NOT Boost_SYSTEM_LIBRARY)
if (NOT Boost_SYSTEM_LIBRARY AND NOT MISSING_INTERNAL_BOOST_LIBRARY)
set (USE_INTERNAL_BOOST_LIBRARY 1)
set (Boost_SYSTEM_LIBRARY boost_system_internal)
set (Boost_PROGRAM_OPTIONS_LIBRARY boost_program_options_internal)
@ -44,7 +46,6 @@ if (NOT Boost_SYSTEM_LIBRARY)
# For packaged version:
list (APPEND Boost_INCLUDE_DIRS "${ClickHouse_SOURCE_DIR}/contrib/boost")
endif ()
message (STATUS "Using Boost: ${Boost_INCLUDE_DIRS} : ${Boost_PROGRAM_OPTIONS_LIBRARY},${Boost_SYSTEM_LIBRARY},${Boost_FILESYSTEM_LIBRARY},${Boost_REGEX_LIBRARY}")

13
cmake/find_gperf.cmake Normal file
View File

@ -0,0 +1,13 @@
# Check if gperf was installed
find_program(GPERF gperf)
if(GPERF)
option(ENABLE_GPERF "Use gperf function hash generator tool" ON)
endif()
if (ENABLE_GPERF)
if(NOT GPERF)
message(FATAL_ERROR "Could not find the program gperf")
endif()
set(USE_GPERF 1)
endif()
message(STATUS "Using gperf=${USE_GPERF}: ${GPERF}")

View File

@ -1,4 +1,5 @@
if (NOT SANITIZE AND NOT ARCH_ARM AND NOT ARCH_32 AND NOT ARCH_PPC64LE AND NOT OS_FREEBSD AND NOT APPLE)
# TODO(danlark1). Disable LFAlloc for a while to fix mmap count problem
if (NOT OS_LINUX AND NOT SANITIZE AND NOT ARCH_ARM AND NOT ARCH_32 AND NOT ARCH_PPC64LE AND NOT OS_FREEBSD AND NOT APPLE)
option (ENABLE_LFALLOC "Set to FALSE to use system libgsasl library instead of bundled" ${NOT_UNBUNDLED})
endif ()

View File

@ -22,4 +22,8 @@ elseif (NOT MISSING_INTERNAL_LIBGSASL_LIBRARY AND NOT APPLE AND NOT ARCH_32)
set (LIBGSASL_LIBRARY libgsasl)
endif ()
message (STATUS "Using libgsasl: ${LIBGSASL_INCLUDE_DIR} : ${LIBGSASL_LIBRARY}")
if(LIBGSASL_LIBRARY AND LIBGSASL_INCLUDE_DIR)
set (USE_LIBGSASL 1)
endif()
message (STATUS "Using libgsasl=${USE_LIBGSASL}: ${LIBGSASL_INCLUDE_DIR} : ${LIBGSASL_LIBRARY}")

View File

@ -0,0 +1,28 @@
option(ENABLE_RAPIDJSON "Use rapidjson" ON)
if(NOT ENABLE_RAPIDJSON)
return()
endif()
option(USE_INTERNAL_RAPIDJSON_LIBRARY "Set to FALSE to use system rapidjson library instead of bundled" ${NOT_UNBUNDLED})
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/rapidjson/include/rapidjson/rapidjson.h")
if(USE_INTERNAL_RAPIDJSON_LIBRARY)
message(WARNING "submodule contrib/rapidjson is missing. to fix try run: \n git submodule update --init --recursive")
set(USE_INTERNAL_RAPIDJSON_LIBRARY 0)
endif()
set(MISSING_INTERNAL_RAPIDJSON_LIBRARY 1)
endif()
if(NOT USE_INTERNAL_RAPIDJSON_LIBRARY)
find_path(RAPIDJSON_INCLUDE_DIR NAMES rapidjson/rapidjson.h PATHS ${RAPIDJSON_INCLUDE_PATHS})
endif()
if(RAPIDJSON_INCLUDE_DIR)
set(USE_RAPIDJSON 1)
elseif(NOT MISSING_INTERNAL_RAPIDJSON_LIBRARY)
set(RAPIDJSON_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/rapidjson/include")
set(USE_INTERNAL_RAPIDJSON_LIBRARY 1)
set(USE_RAPIDJSON 1)
endif()
message(STATUS "Using rapidjson=${USE_RAPIDJSON}: ${RAPIDJSON_INCLUDE_DIR}")

View File

@ -10,7 +10,7 @@ endif ()
if (ENABLE_RDKAFKA)
if (OS_LINUX AND NOT ARCH_ARM)
if (OS_LINUX AND NOT ARCH_ARM AND USE_LIBGSASL)
option (USE_INTERNAL_RDKAFKA_LIBRARY "Set to FALSE to use system librdkafka instead of the bundled" ${NOT_UNBUNDLED})
endif ()

View File

@ -1,5 +1,13 @@
option (USE_INTERNAL_RE2_LIBRARY "Set to FALSE to use system re2 library instead of bundled [slower]" ${NOT_UNBUNDLED})
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/re2/CMakeLists.txt")
if(USE_INTERNAL_RE2_LIBRARY)
message(WARNING "submodule contrib/re2 is missing. to fix try run: \n git submodule update --init --recursive")
endif()
set(USE_INTERNAL_RE2_LIBRARY 0)
set(MISSING_INTERNAL_RE2_LIBRARY 1)
endif()
if (NOT USE_INTERNAL_RE2_LIBRARY)
find_library (RE2_LIBRARY re2)
find_path (RE2_INCLUDE_DIR NAMES re2/re2.h PATHS ${RE2_INCLUDE_PATHS})

View File

@ -9,6 +9,6 @@ if (NOT HAVE_AVX2)
endif ()
option (USE_SIMDJSON "Use simdjson" ON)
set (SIMDJSON_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/simdjson/include")
set (SIMDJSON_LIBRARY "simdjson")
message(STATUS "Using simdjson=${USE_SIMDJSON}: ${SIMDJSON_LIBRARY}")

View File

@ -2,20 +2,28 @@ if (NOT OS_FREEBSD AND NOT ARCH_32)
option (USE_INTERNAL_ZLIB_LIBRARY "Set to FALSE to use system zlib library instead of bundled" ${NOT_UNBUNDLED})
endif ()
if (NOT MSVC)
set (INTERNAL_ZLIB_NAME "zlib-ng" CACHE INTERNAL "")
else ()
set (INTERNAL_ZLIB_NAME "zlib" CACHE INTERNAL "")
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/${INTERNAL_ZLIB_NAME}")
message (WARNING "Will use standard zlib, please clone manually:\n git clone https://github.com/madler/zlib.git ${ClickHouse_SOURCE_DIR}/contrib/${INTERNAL_ZLIB_NAME}")
endif ()
endif ()
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/${INTERNAL_ZLIB_NAME}/zlib.h")
if(USE_INTERNAL_ZLIB_LIBRARY)
message(WARNING "submodule contrib/${INTERNAL_ZLIB_NAME} is missing. to fix try run: \n git submodule update --init --recursive")
endif()
set(USE_INTERNAL_ZLIB_LIBRARY 0)
set(MISSING_INTERNAL_ZLIB_LIBRARY 1)
endif()
if (NOT USE_INTERNAL_ZLIB_LIBRARY)
find_package (ZLIB)
endif ()
if (NOT ZLIB_FOUND)
if (NOT MSVC)
set (INTERNAL_ZLIB_NAME "zlib-ng" CACHE INTERNAL "")
else ()
set (INTERNAL_ZLIB_NAME "zlib" CACHE INTERNAL "")
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/${INTERNAL_ZLIB_NAME}")
message (WARNING "Will use standard zlib, please clone manually:\n git clone https://github.com/madler/zlib.git ${ClickHouse_SOURCE_DIR}/contrib/${INTERNAL_ZLIB_NAME}")
endif ()
endif ()
if (NOT ZLIB_FOUND AND NOT MISSING_INTERNAL_ZLIB_LIBRARY)
set (USE_INTERNAL_ZLIB_LIBRARY 1)
set (ZLIB_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/${INTERNAL_ZLIB_NAME}" "${ClickHouse_BINARY_DIR}/contrib/${INTERNAL_ZLIB_NAME}" CACHE INTERNAL "") # generated zconf.h
set (ZLIB_INCLUDE_DIRS ${ZLIB_INCLUDE_DIR}) # for poco

View File

@ -1,9 +1,12 @@
option (USE_INTERNAL_ZSTD_LIBRARY "Set to FALSE to use system zstd library instead of bundled" ${NOT_UNBUNDLED})
if (USE_INTERNAL_ZSTD_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/zstd/lib/zstd.h")
message (WARNING "submodule contrib/zstd is missing. to fix try run: \n git submodule update --init --recursive")
set (USE_INTERNAL_ZSTD_LIBRARY 0)
endif ()
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/zstd/lib/zstd.h")
if(USE_INTERNAL_ZSTD_LIBRARY)
message(WARNING "submodule contrib/zstd is missing. to fix try run: \n git submodule update --init --recursive")
endif()
set(USE_INTERNAL_ZSTD_LIBRARY 0)
set(MISSING_INTERNAL_ZSTD_LIBRARY 1)
endif()
if (NOT USE_INTERNAL_ZSTD_LIBRARY)
find_library (ZSTD_LIBRARY zstd)
@ -11,7 +14,7 @@ if (NOT USE_INTERNAL_ZSTD_LIBRARY)
endif ()
if (ZSTD_LIBRARY AND ZSTD_INCLUDE_DIR)
else ()
elseif (NOT MISSING_INTERNAL_ZSTD_LIBRARY)
set (USE_INTERNAL_ZSTD_LIBRARY 1)
set (ZSTD_LIBRARY zstd)
set (ZSTD_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/zstd/lib)

View File

@ -1,6 +1,9 @@
include (CheckCXXSourceCompiles)
include (CMakePushCheckState)
set(THREADS_PREFER_PTHREAD_FLAG ON)
find_package(Threads)
cmake_push_check_state ()
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")

View File

@ -1,8 +1,8 @@
# Third-party libraries may have substandard code.
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-stringop-overflow -Wno-implicit-function-declaration -Wno-return-type -Wno-array-bounds -Wno-bool-compare -Wno-int-conversion -Wno-switch")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-sign-compare -std=c++1z")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-stringop-overflow -Wno-implicit-function-declaration -Wno-return-type -Wno-array-bounds -Wno-bool-compare -Wno-int-conversion -Wno-switch -Wno-stringop-truncation")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-sign-compare -Wno-array-bounds -Wno-missing-attributes -Wno-stringop-truncation -std=c++1z")
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-format -Wno-parentheses-equality -Wno-tautological-constant-compare -Wno-tautological-constant-out-of-range-compare -Wno-implicit-function-declaration -Wno-return-type -Wno-pointer-bool-conversion -Wno-enum-conversion -Wno-int-conversion -Wno-switch -Wno-string-plus-int")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-format -Wno-inconsistent-missing-override -std=c++1z")

2
contrib/boost vendored

@ -1 +1 @@
Subproject commit 471ea208abb92a5cba7d3a08a819bb728f27e95f
Subproject commit 830e51edb59c4f37a8638138581e1e56c29ac44f

View File

@ -1,5 +1,5 @@
set(BROTLI_SOURCE_DIR ${CMAKE_SOURCE_DIR}/contrib/brotli/c)
set(BROTLI_BINARY_DIR ${CMAKE_BINARY_DIR}/contrib/brotli/c)
set(BROTLI_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/brotli/c)
set(BROTLI_BINARY_DIR ${ClickHouse_BINARY_DIR}/contrib/brotli/c)
set(SRCS
${BROTLI_SOURCE_DIR}/dec/bit_reader.c

View File

@ -1,4 +1,4 @@
set(CPPKAFKA_DIR ${CMAKE_SOURCE_DIR}/contrib/cppkafka)
set(CPPKAFKA_DIR ${ClickHouse_SOURCE_DIR}/contrib/cppkafka)
set(SRCS
${CPPKAFKA_DIR}/src/configuration.cpp

View File

@ -1,6 +1,6 @@
add_library(roaring
roaring.c
roaring.h
roaring.hh)
roaring/roaring.h
roaring/roaring.hh)
target_include_directories (roaring PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})

View File

@ -1,5 +1,5 @@
/* auto-generated on Tue Dec 18 09:42:59 CST 2018. Do not edit! */
#include "roaring.h"
#include "roaring/roaring.h"
/* used for http://dmalloc.com/ Dmalloc - Debug Malloc Library */
#ifdef DMALLOC

2
contrib/hyperscan vendored

@ -1 +1 @@
Subproject commit 05b0f9064cca4bd55548dedb0a32ed9461146c1e
Subproject commit 01e6b83f9fbdb4020cd68a5287bf3a0471eeb272

View File

@ -1,4 +1,4 @@
set(JEMALLOC_SOURCE_DIR ${CMAKE_SOURCE_DIR}/contrib/jemalloc)
set(JEMALLOC_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/jemalloc)
set(SRCS
${JEMALLOC_SOURCE_DIR}/src/arena.c
@ -40,6 +40,10 @@ if(CMAKE_SYSTEM_NAME MATCHES "Darwin")
list(APPEND SRCS ${JEMALLOC_SOURCE_DIR}/src/zone.c)
endif()
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -w")
endif ()
add_library(jemalloc STATIC ${SRCS})
target_include_directories(jemalloc PUBLIC

View File

@ -9,7 +9,7 @@ endif()
SET(WITH_KERBEROS false)
# project and source dir
set(HDFS3_ROOT_DIR ${CMAKE_SOURCE_DIR}/contrib/libhdfs3)
set(HDFS3_ROOT_DIR ${ClickHouse_SOURCE_DIR}/contrib/libhdfs3)
set(HDFS3_SOURCE_DIR ${HDFS3_ROOT_DIR}/src)
set(HDFS3_COMMON_DIR ${HDFS3_SOURCE_DIR}/common)

View File

@ -81,7 +81,7 @@
#define PCG_128BIT_CONSTANT(high,low) \
((pcg128_t(high) << 64) + low)
#else
#include "pcg_uint128.hpp" // Y_IGNORE
#include "pcg_uint128.hpp"
namespace pcg_extras {
typedef pcg_extras::uint_x4<uint32_t,uint64_t> pcg128_t;
}

2
contrib/librdkafka vendored

@ -1 +1 @@
Subproject commit 8695b9d63ac0fe1b891b511d5b36302ffc84d4e2
Subproject commit 8681f884020e880a4c6cda3cfc672f0669e1f38e

View File

@ -1,4 +1,4 @@
set(RDKAFKA_SOURCE_DIR ${CMAKE_SOURCE_DIR}/contrib/librdkafka/src)
set(RDKAFKA_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/librdkafka/src)
set(SRCS
${RDKAFKA_SOURCE_DIR}/crc32c.c
@ -33,6 +33,7 @@ set(SRCS
${RDKAFKA_SOURCE_DIR}/rdkafka_roundrobin_assignor.c
${RDKAFKA_SOURCE_DIR}/rdkafka_sasl.c
${RDKAFKA_SOURCE_DIR}/rdkafka_sasl_plain.c
${RDKAFKA_SOURCE_DIR}/rdkafka_sasl_scram.c
${RDKAFKA_SOURCE_DIR}/rdkafka_subscription.c
${RDKAFKA_SOURCE_DIR}/rdkafka_timer.c
${RDKAFKA_SOURCE_DIR}/rdkafka_topic.c
@ -58,7 +59,7 @@ add_library(rdkafka ${SRCS})
target_include_directories(rdkafka SYSTEM PUBLIC include)
target_include_directories(rdkafka SYSTEM PUBLIC ${RDKAFKA_SOURCE_DIR}) # Because weird logic with "include_next" is used.
target_include_directories(rdkafka SYSTEM PRIVATE ${ZSTD_INCLUDE_DIR}/common) # Because wrong path to "zstd_errors.h" is used.
target_link_libraries(rdkafka PUBLIC ${ZLIB_LIBRARIES} ${ZSTD_LIBRARY} ${LZ4_LIBRARY})
target_link_libraries(rdkafka PUBLIC ${ZLIB_LIBRARIES} ${ZSTD_LIBRARY} ${LZ4_LIBRARY} ${LIBGSASL_LIBRARY})
if(OPENSSL_SSL_LIBRARY AND OPENSSL_CRYPTO_LIBRARY)
target_link_libraries(rdkafka PUBLIC ${OPENSSL_SSL_LIBRARY} ${OPENSSL_CRYPTO_LIBRARY})
endif()

View File

@ -12,7 +12,7 @@
#define ENABLE_SHAREDPTR_DEBUG 0
#define ENABLE_LZ4_EXT 1
#define ENABLE_SSL 1
//#define ENABLE_SASL 1
#define ENABLE_SASL 1
#define MKL_APP_NAME "librdkafka"
#define MKL_APP_DESC_ONELINE "The Apache Kafka C/C++ library"
// distro
@ -62,7 +62,7 @@
// libssl
#define WITH_SSL 1
// WITH_SASL_SCRAM
//#define WITH_SASL_SCRAM 1
#define WITH_SASL_SCRAM 1
// crc32chw
#if !defined(__PPC__)
#define WITH_CRC32C_HW 1

View File

@ -1,5 +1,5 @@
#if __has_include(<rdkafka.h>) // maybe bundled
# include_next <rdkafka.h> // Y_IGNORE
# include_next <rdkafka.h>
#else // system
# include_next <librdkafka/rdkafka.h>
#endif

View File

@ -1,5 +1,5 @@
set(LIBXML2_SOURCE_DIR ${CMAKE_SOURCE_DIR}/contrib/libxml2)
set(LIBXML2_BINARY_DIR ${CMAKE_BINARY_DIR}/contrib/libxml2)
set(LIBXML2_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libxml2)
set(LIBXML2_BINARY_DIR ${ClickHouse_BINARY_DIR}/contrib/libxml2)
set(SRCS
${LIBXML2_SOURCE_DIR}/SAX.c

View File

@ -1,5 +1,5 @@
set(MARIADB_CLIENT_SOURCE_DIR ${CMAKE_SOURCE_DIR}/contrib/mariadb-connector-c)
set(MARIADB_CLIENT_BINARY_DIR ${CMAKE_BINARY_DIR}/contrib/mariadb-connector-c)
set(MARIADB_CLIENT_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/mariadb-connector-c)
set(MARIADB_CLIENT_BINARY_DIR ${ClickHouse_BINARY_DIR}/contrib/mariadb-connector-c)
set(SRCS
#${MARIADB_CLIENT_SOURCE_DIR}/libmariadb/bmove_upp.c

1
contrib/rapidjson vendored Submodule

@ -0,0 +1 @@
Subproject commit 01950eb7acec78818d68b762efc869bba2420d82

2
contrib/simdjson vendored

@ -1 +1 @@
Subproject commit 681cd3369860f4eada49a387cbff93030f759c95
Subproject commit 2151ad7f34cf773a23f086e941d661f8a8873144

View File

@ -1,15 +1,7 @@
if (NOT HAVE_AVX2)
message (FATAL_ERROR "No AVX2 support")
endif ()
if(MAKE_STATIC_LIBRARIES)
set(SIMDJSON_LIB_TYPE STATIC)
MESSAGE(STATUS "Building static library ${SIMDJSON_LIBRARY}")
else()
set(SIMDJSON_LIB_TYPE SHARED)
MESSAGE(STATUS "Building dynamic library ${SIMDJSON_LIBRARY}")
endif()
set(SIMDJSON_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/simdjson/include")
set(SIMDJSON_SRC_DIR "${SIMDJSON_INCLUDE_DIR}/../src")
set(SIMDJSON_SRC
${SIMDJSON_SRC_DIR}/jsonioutil.cpp
@ -21,6 +13,6 @@ set(SIMDJSON_SRC
${SIMDJSON_SRC_DIR}/parsedjsoniterator.cpp
)
add_library(${SIMDJSON_LIBRARY} ${SIMDJSON_LIB_TYPE} ${SIMDJSON_SRC})
target_include_directories(${SIMDJSON_LIBRARY} PRIVATE "${SIMDJSON_INCLUDE_DIR}")
add_library(${SIMDJSON_LIBRARY} ${SIMDJSON_SRC})
target_include_directories(${SIMDJSON_LIBRARY} PUBLIC "${SIMDJSON_INCLUDE_DIR}")
target_compile_options(${SIMDJSON_LIBRARY} PRIVATE -mavx2 -mbmi -mbmi2 -mpclmul)

View File

@ -1,5 +1,5 @@
set(ODBC_SOURCE_DIR ${CMAKE_SOURCE_DIR}/contrib/unixodbc)
set(ODBC_BINARY_DIR ${CMAKE_BINARY_DIR}/contrib/unixodbc)
set(ODBC_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/unixodbc)
set(ODBC_BINARY_DIR ${ClickHouse_BINARY_DIR}/contrib/unixodbc)
set(SRCS

View File

@ -49,19 +49,20 @@ if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
endif ()
if (NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7)
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
if (WEVERYTHING)
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-return-std-move-in-c++11")
endif ()
endif ()
if (NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 8)
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra-semi-stmt -Wshadow-field -Wstring-plus-int -Wempty-init-stmt")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wshadow-field -Wstring-plus-int")
if(NOT APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra-semi-stmt -Wempty-init-stmt")
endif()
endif ()
if (NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 9)
if (WEVERYTHING)
if (WEVERYTHING AND NOT APPLE)
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-ctad-maybe-unsupported")
endif ()
endif ()
@ -189,8 +190,17 @@ target_link_libraries (clickhouse_common_io
${Poco_Net_LIBRARY}
${Poco_Util_LIBRARY}
${Poco_Foundation_LIBRARY}
${RE2_LIBRARY}
${RE2_ST_LIBRARY}
)
if(RE2_LIBRARY)
target_link_libraries(clickhouse_common_io PUBLIC ${RE2_LIBRARY})
endif()
if(RE2_ST_LIBRARY)
target_link_libraries(clickhouse_common_io PUBLIC ${RE2_ST_LIBRARY})
endif()
target_link_libraries(clickhouse_common_io
PUBLIC
${CITYHASH_LIBRARIES}
PRIVATE
${ZLIB_LIBRARIES}
@ -208,7 +218,9 @@ target_link_libraries (clickhouse_common_io
)
target_include_directories(clickhouse_common_io SYSTEM BEFORE PUBLIC ${RE2_INCLUDE_DIR})
if(RE2_INCLUDE_DIR)
target_include_directories(clickhouse_common_io SYSTEM BEFORE PUBLIC ${RE2_INCLUDE_DIR})
endif()
if (USE_LFALLOC)
target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${LFALLOC_INCLUDE_DIR})
@ -336,7 +348,7 @@ if (USE_JEMALLOC)
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${JEMALLOC_INCLUDE_DIR}) # used in Interpreters/AsynchronousMetrics.cpp
endif ()
target_include_directories (dbms PUBLIC ${DBMS_INCLUDE_DIR})
target_include_directories (dbms PUBLIC ${DBMS_INCLUDE_DIR} PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src/Formats/include)
target_include_directories (clickhouse_common_io PUBLIC ${DBMS_INCLUDE_DIR})
target_include_directories (clickhouse_common_io SYSTEM PUBLIC ${PCG_RANDOM_INCLUDE_DIR})
target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${DOUBLE_CONVERSION_INCLUDE_DIR})
@ -356,6 +368,6 @@ if (ENABLE_TESTS AND USE_GTEST)
# attach all dbms gtest sources
grep_gtest_sources(${ClickHouse_SOURCE_DIR}/dbms dbms_gtest_sources)
add_executable(unit_tests_dbms ${dbms_gtest_sources})
target_link_libraries(unit_tests_dbms PRIVATE ${GTEST_BOTH_LIBRARIES} dbms clickhouse_common_zookeeper)
target_link_libraries(unit_tests_dbms PRIVATE ${GTEST_BOTH_LIBRARIES} clickhouse_functions clickhouse_parsers dbms clickhouse_common_zookeeper)
add_check(unit_tests_dbms)
endif ()

View File

@ -1,11 +1,11 @@
# This strings autochanged from release_lib.sh:
set(VERSION_REVISION 54420)
set(VERSION_REVISION 54423)
set(VERSION_MAJOR 19)
set(VERSION_MINOR 8)
set(VERSION_PATCH 1)
set(VERSION_GITHASH a76e504f45ff4a74e8c492bd269f022352d5f6d9)
set(VERSION_DESCRIBE v19.8.1.1-testing)
set(VERSION_STRING 19.8.1.1)
set(VERSION_MINOR 11)
set(VERSION_PATCH 0)
set(VERSION_GITHASH badb6ab8310ed94e20f43ac9a9a227f7a2590009)
set(VERSION_DESCRIBE v19.11.0-testing)
set(VERSION_STRING 19.11.0)
# end of autochange
set(VERSION_EXTRA "" CACHE STRING "")

View File

@ -209,6 +209,9 @@ else ()
install (FILES ${CMAKE_CURRENT_BINARY_DIR}/clickhouse-obfuscator DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
list(APPEND CLICKHOUSE_BUNDLE clickhouse-obfuscator)
endif ()
if(ENABLE_CLICKHOUSE_ODBC_BRIDGE)
list(APPEND CLICKHOUSE_BUNDLE clickhouse-odbc-bridge)
endif()
# install always because depian package want this files:
add_custom_target (clickhouse-clang ALL COMMAND ${CMAKE_COMMAND} -E create_symlink clickhouse clickhouse-clang DEPENDS clickhouse)

View File

@ -54,10 +54,10 @@ public:
const String & host_, UInt16 port_, bool secure_, const String & default_database_,
const String & user_, const String & password_, const String & stage,
bool randomize_, size_t max_iterations_, double max_time_,
const String & json_path_, const ConnectionTimeouts & timeouts, const Settings & settings_)
const String & json_path_, const Settings & settings_)
:
concurrency(concurrency_), delay(delay_), queue(concurrency),
connections(concurrency, host_, port_, default_database_, user_, password_, timeouts, "benchmark", Protocol::Compression::Enable, secure_ ? Protocol::Secure::Enable : Protocol::Secure::Disable),
connections(concurrency, host_, port_, default_database_, user_, password_, "benchmark", Protocol::Compression::Enable, secure_ ? Protocol::Secure::Enable : Protocol::Secure::Disable),
randomize(randomize_), max_iterations(max_iterations_), max_time(max_time_),
json_path(json_path_), settings(settings_), global_context(Context::createGlobal()), pool(concurrency)
{
@ -240,7 +240,8 @@ private:
std::uniform_int_distribution<size_t> distribution(0, queries.size() - 1);
for (size_t i = 0; i < concurrency; ++i)
pool.schedule(std::bind(&Benchmark::thread, this, connections.get()));
pool.schedule(std::bind(&Benchmark::thread, this,
connections.get(ConnectionTimeouts::getTCPTimeoutsWithoutFailover(settings))));
InterruptListener interrupt_listener;
info_per_interval.watch.restart();
@ -310,7 +311,9 @@ private:
void execute(ConnectionPool::Entry & connection, Query & query)
{
Stopwatch watch;
RemoteBlockInputStream stream(*connection, query, {}, global_context, &settings, nullptr, Tables(), query_processing_stage);
RemoteBlockInputStream stream(
*connection,
query, {}, global_context, &settings, nullptr, Tables(), query_processing_stage);
Progress progress;
stream.setProgressCallback([&progress](const Progress & value) { progress.incrementPiecewiseAtomically(value); });
@ -325,8 +328,8 @@ private:
double seconds = watch.elapsedSeconds();
std::lock_guard lock(mutex);
info_per_interval.add(seconds, progress.rows, progress.bytes, info.rows, info.bytes);
info_total.add(seconds, progress.rows, progress.bytes, info.rows, info.bytes);
info_per_interval.add(seconds, progress.read_rows, progress.read_bytes, info.rows, info.bytes);
info_total.add(seconds, progress.read_rows, progress.read_bytes, info.rows, info.bytes);
}
@ -485,7 +488,6 @@ int mainEntryClickHouseBenchmark(int argc, char ** argv)
options["iterations"].as<size_t>(),
options["timelimit"].as<double>(),
options["json"].as<std::string>(),
ConnectionTimeouts::getTCPTimeoutsWithoutFailover(settings),
settings);
return benchmark.run();
}

View File

@ -26,8 +26,8 @@ elseif (EXISTS ${INTERNAL_COMPILER_BIN_ROOT}${INTERNAL_COMPILER_EXECUTABLE})
set (COPY_HEADERS_COMPILER "${INTERNAL_COMPILER_BIN_ROOT}${INTERNAL_COMPILER_EXECUTABLE}")
endif ()
if (COPY_HEADERS_COMPILER AND OS_LINUX)
add_custom_target (copy-headers [ -f ${TMP_HEADERS_DIR}/dbms/src/Interpreters/SpecializedAggregator.h ] || env CLANG=${COPY_HEADERS_COMPILER} BUILD_PATH=${ClickHouse_BINARY_DIR} DESTDIR=${ClickHouse_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/copy_headers.sh ${ClickHouse_SOURCE_DIR} ${TMP_HEADERS_DIR} DEPENDS ${COPY_HEADERS_DEPENDS} WORKING_DIRECTORY ${ClickHouse_SOURCE_DIR} SOURCES copy_headers.sh)
if (COPY_HEADERS_COMPILER)
add_custom_target (copy-headers [ -f ${TMP_HEADERS_DIR}/dbms/src/Interpreters/SpecializedAggregator.h ] || env CLANG=${COPY_HEADERS_COMPILER} BUILD_PATH=${ClickHouse_BINARY_DIR} DESTDIR=${ClickHouse_SOURCE_DIR} CMAKE_CXX_COMPILER_VERSION=${CMAKE_CXX_COMPILER_VERSION} ${CMAKE_CURRENT_SOURCE_DIR}/copy_headers.sh ${ClickHouse_SOURCE_DIR} ${TMP_HEADERS_DIR} DEPENDS ${COPY_HEADERS_DEPENDS} WORKING_DIRECTORY ${ClickHouse_SOURCE_DIR} SOURCES copy_headers.sh)
if (USE_INTERNAL_LLVM_LIBRARY)
set (CLANG_HEADERS_DIR "${ClickHouse_SOURCE_DIR}/contrib/llvm/clang/lib/Headers")

View File

@ -38,26 +38,28 @@ for header in $START_HEADERS; do
START_HEADERS_INCLUDE+="-include $header "
done
# Опция -mcx16 для того, чтобы выбиралось больше заголовочных файлов (с запасом).
# The latter options are the same that are added while building packages.
# TODO: Does not work on macos:
GCC_ROOT=`$CLANG -v 2>&1 | grep "Selected GCC installation"| sed -n -e 's/^.*: //p'`
for src_file in $(echo | $CLANG -M -xc++ -std=c++1z -Wall -Werror -msse4 -mcx16 -mpopcnt -O3 -g -fPIC -fstack-protector -D_FORTIFY_SOURCE=2 \
# TODO: Does not work on macos?
GCC_ROOT=${GCC_ROOT:=/usr/lib/clang/${CMAKE_CXX_COMPILER_VERSION}}
# Опция -mcx16 для того, чтобы выбиралось больше заголовочных файлов (с запасом).
# The latter options are the same that are added while building packages.
for src_file in $(echo | $CLANG -M -xc++ -std=c++1z -Wall -Werror -msse2 -msse4 -mcx16 -mpopcnt -O3 -g -fPIC -fstack-protector -D_FORTIFY_SOURCE=2 \
-I $GCC_ROOT/include \
-I $GCC_ROOT/include-fixed \
$(cat "$BUILD_PATH/include_directories.txt") \
$START_HEADERS_INCLUDE \
- |
tr -d '\\' |
sed --posix -E -e 's/^-\.o://');
sed -E -e 's/^-\.o://');
do
dst_file=$src_file;
[ -n $BUILD_PATH ] && dst_file=$(echo $dst_file | sed --posix -E -e "s!^$BUILD_PATH!!")
[ -n $DESTDIR ] && dst_file=$(echo $dst_file | sed --posix -E -e "s!^$DESTDIR!!")
dst_file=$(echo $dst_file | sed --posix -E -e 's/build\///') # for simplicity reasons, will put generated headers near the rest.
mkdir -p "$DST/$(echo $dst_file | sed --posix -E -e 's/\/[^/]*$/\//')";
[ -n $BUILD_PATH ] && dst_file=$(echo $dst_file | sed -E -e "s!^$BUILD_PATH!!")
[ -n $DESTDIR ] && dst_file=$(echo $dst_file | sed -E -e "s!^$DESTDIR!!")
dst_file=$(echo $dst_file | sed -E -e 's/build\///') # for simplicity reasons, will put generated headers near the rest.
mkdir -p "$DST/$(echo $dst_file | sed -E -e 's/\/[^/]*$/\//')";
cp "$src_file" "$DST/$dst_file";
done
@ -68,9 +70,9 @@ done
for src_file in $(ls -1 $($CLANG -v -xc++ - <<<'' 2>&1 | grep '^ /' | grep 'include' | grep -E '/lib/clang/|/include/clang/')/*.h | grep -vE 'arm|altivec|Intrin');
do
dst_file=$src_file;
[ -n $BUILD_PATH ] && dst_file=$(echo $dst_file | sed --posix -E -e "s!^$BUILD_PATH!!")
[ -n $DESTDIR ] && dst_file=$(echo $dst_file | sed --posix -E -e "s!^$DESTDIR!!")
mkdir -p "$DST/$(echo $dst_file | sed --posix -E -e 's/\/[^/]*$/\//')";
[ -n $BUILD_PATH ] && dst_file=$(echo $dst_file | sed -E -e "s!^$BUILD_PATH!!")
[ -n $DESTDIR ] && dst_file=$(echo $dst_file | sed -E -e "s!^$DESTDIR!!")
mkdir -p "$DST/$(echo $dst_file | sed -E -e 's/\/[^/]*$/\//')";
cp "$src_file" "$DST/$dst_file";
done
@ -79,9 +81,9 @@ if [ -d "$SOURCE_PATH/contrib/boost/libs/smart_ptr/include/boost/smart_ptr/detai
for src_file in $(ls -1 $SOURCE_PATH/contrib/boost/libs/smart_ptr/include/boost/smart_ptr/detail/*);
do
dst_file=$src_file;
[ -n $BUILD_PATH ] && dst_file=$(echo $dst_file | sed --posix -E -e "s!^$BUILD_PATH!!")
[ -n $DESTDIR ] && dst_file=$(echo $dst_file | sed --posix -E -e "s!^$DESTDIR!!")
mkdir -p "$DST/$(echo $dst_file | sed --posix -E -e 's/\/[^/]*$/\//')";
[ -n $BUILD_PATH ] && dst_file=$(echo $dst_file | sed -E -e "s!^$BUILD_PATH!!")
[ -n $DESTDIR ] && dst_file=$(echo $dst_file | sed -E -e "s!^$DESTDIR!!")
mkdir -p "$DST/$(echo $dst_file | sed -E -e 's/\/[^/]*$/\//')";
cp "$src_file" "$DST/$dst_file";
done
fi
@ -90,9 +92,9 @@ if [ -d "$SOURCE_PATH/contrib/boost/boost/smart_ptr/detail" ]; then
for src_file in $(ls -1 $SOURCE_PATH/contrib/boost/boost/smart_ptr/detail/*);
do
dst_file=$src_file;
[ -n $BUILD_PATH ] && dst_file=$(echo $dst_file | sed --posix -E -e "s!^$BUILD_PATH!!")
[ -n $DESTDIR ] && dst_file=$(echo $dst_file | sed --posix -E -e "s!^$DESTDIR!!")
mkdir -p "$DST/$(echo $dst_file | sed --posix -E -e 's/\/[^/]*$/\//')";
[ -n $BUILD_PATH ] && dst_file=$(echo $dst_file | sed -E -e "s!^$BUILD_PATH!!")
[ -n $DESTDIR ] && dst_file=$(echo $dst_file | sed -E -e "s!^$DESTDIR!!")
mkdir -p "$DST/$(echo $dst_file | sed -E -e 's/\/[^/]*$/\//')";
cp "$src_file" "$DST/$dst_file";
done
fi

View File

@ -1,6 +1,19 @@
set(CLICKHOUSE_CLIENT_SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/Client.cpp)
set(CLICKHOUSE_CLIENT_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/Client.cpp
${CMAKE_CURRENT_SOURCE_DIR}/ConnectionParameters.cpp
)
set(CLICKHOUSE_CLIENT_LINK PRIVATE clickhouse_common_config clickhouse_functions clickhouse_aggregate_functions clickhouse_common_io ${LINE_EDITING_LIBS} ${Boost_PROGRAM_OPTIONS_LIBRARY})
set(CLICKHOUSE_CLIENT_INCLUDE SYSTEM PRIVATE ${READLINE_INCLUDE_DIR})
set(CLICKHOUSE_CLIENT_INCLUDE SYSTEM PRIVATE ${READLINE_INCLUDE_DIR} PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/include)
include(CheckSymbolExists)
check_symbol_exists(readpassphrase readpassphrase.h HAVE_READPASSPHRASE)
configure_file(config_client.h.in ${CMAKE_CURRENT_BINARY_DIR}/include/config_client.h)
if(NOT HAVE_READPASSPHRASE)
add_subdirectory(readpassphrase)
list(APPEND CLICKHOUSE_CLIENT_LINK PRIVATE readpassphrase)
endif()
clickhouse_program_add(client)

View File

@ -65,9 +65,10 @@
#include <AggregateFunctions/registerAggregateFunctions.h>
#include <Common/Config/configReadClient.h>
#include <Storages/ColumnsDescription.h>
#include <common/argsToConfig.h>
#if USE_READLINE
#include "Suggest.h" // Y_IGNORE
#include "Suggest.h"
#endif
#ifndef __clang__
@ -203,7 +204,6 @@ private:
ConnectionParameters connection_parameters;
void initialize(Poco::Util::Application & self)
{
Poco::Util::Application::initialize(self);
@ -336,7 +336,7 @@ private:
DateLUT::instance();
if (!context.getSettingsRef().use_client_time_zone)
{
const auto & time_zone = connection->getServerTimezone();
const auto & time_zone = connection->getServerTimezone(connection_parameters.timeouts);
if (!time_zone.empty())
{
try
@ -435,7 +435,7 @@ private:
#if USE_READLINE
int res = read_history(history_file.c_str());
if (res)
throwFromErrno("Cannot read history from file " + history_file, ErrorCodes::CANNOT_READ_HISTORY);
std::cerr << "Cannot read history from file " + history_file + ": "+ errnoToString(ErrorCodes::CANNOT_READ_HISTORY);
#endif
}
else /// Create history file.
@ -520,7 +520,6 @@ private:
connection_parameters.default_database,
connection_parameters.user,
connection_parameters.password,
connection_parameters.timeouts,
"client",
connection_parameters.compression,
connection_parameters.security);
@ -536,11 +535,14 @@ private:
connection->setThrottler(throttler);
}
connection->getServerVersion(server_name, server_version_major, server_version_minor, server_version_patch, server_revision);
connection->getServerVersion(connection_parameters.timeouts,
server_name, server_version_major, server_version_minor, server_version_patch, server_revision);
server_version = toString(server_version_major) + "." + toString(server_version_minor) + "." + toString(server_version_patch);
if (server_display_name = connection->getServerDisplayName(); server_display_name.length() == 0)
if (
server_display_name = connection->getServerDisplayName(connection_parameters.timeouts);
server_display_name.length() == 0)
{
server_display_name = config().getString("host", "localhost");
}
@ -612,7 +614,7 @@ private:
#if USE_READLINE && HAVE_READLINE_HISTORY
if (!history_file.empty() && append_history(1, history_file.c_str()))
throwFromErrno("Cannot append history to file " + history_file, ErrorCodes::CANNOT_APPEND_HISTORY);
std::cerr << "Cannot append history to file " + history_file + ": " + errnoToString(ErrorCodes::CANNOT_APPEND_HISTORY);
#endif
prev_input = input;
@ -751,7 +753,7 @@ private:
}
if (!test_hint.checkActual(actual_server_error, actual_client_error, got_exception, last_exception))
connection->forceConnected();
connection->forceConnected(connection_parameters.timeouts);
if (got_exception && !ignore_error)
{
@ -827,7 +829,7 @@ private:
if (with_output && with_output->settings_ast)
apply_query_settings(*with_output->settings_ast);
connection->forceConnected();
connection->forceConnected(connection_parameters.timeouts);
/// INSERT query for which data transfer is needed (not an INSERT SELECT) is processed separately.
if (insert && !insert->select)
@ -866,7 +868,7 @@ private:
std::cout << std::endl
<< processed_rows << " rows in set. Elapsed: " << watch.elapsedSeconds() << " sec. ";
if (progress.rows >= 1000)
if (progress.read_rows >= 1000)
writeFinalProgress();
std::cout << std::endl << std::endl;
@ -898,7 +900,7 @@ private:
/// Process the query that doesn't require transferring data blocks to the server.
void processOrdinaryQuery()
{
connection->sendQuery(query, query_id, QueryProcessingStage::Complete, &context.getSettingsRef(), nullptr, true);
connection->sendQuery(connection_parameters.timeouts, query, query_id, QueryProcessingStage::Complete, &context.getSettingsRef(), nullptr, true);
sendExternalTables();
receiveResult();
}
@ -916,7 +918,7 @@ private:
if (!parsed_insert_query.data && (is_interactive || (stdin_is_not_tty && std_in.eof())))
throw Exception("No data to insert", ErrorCodes::NO_DATA_TO_INSERT);
connection->sendQuery(query_without_data, query_id, QueryProcessingStage::Complete, &context.getSettingsRef(), nullptr, true);
connection->sendQuery(connection_parameters.timeouts, query_without_data, query_id, QueryProcessingStage::Complete, &context.getSettingsRef(), nullptr, true);
sendExternalTables();
/// Receive description of table structure.
@ -1063,7 +1065,7 @@ private:
bool cancelled = false;
// TODO: get the poll_interval from commandline.
const auto receive_timeout = connection->getTimeouts().receive_timeout;
const auto receive_timeout = connection_parameters.timeouts.receive_timeout;
constexpr size_t default_poll_interval = 1000000; /// in microseconds
constexpr size_t min_poll_interval = 5000; /// in microseconds
const size_t poll_interval
@ -1420,23 +1422,23 @@ private:
<< " Progress: ";
message
<< formatReadableQuantity(progress.rows) << " rows, "
<< formatReadableSizeWithDecimalSuffix(progress.bytes);
<< formatReadableQuantity(progress.read_rows) << " rows, "
<< formatReadableSizeWithDecimalSuffix(progress.read_bytes);
size_t elapsed_ns = watch.elapsed();
if (elapsed_ns)
message << " ("
<< formatReadableQuantity(progress.rows * 1000000000.0 / elapsed_ns) << " rows/s., "
<< formatReadableSizeWithDecimalSuffix(progress.bytes * 1000000000.0 / elapsed_ns) << "/s.) ";
<< formatReadableQuantity(progress.read_rows * 1000000000.0 / elapsed_ns) << " rows/s., "
<< formatReadableSizeWithDecimalSuffix(progress.read_bytes * 1000000000.0 / elapsed_ns) << "/s.) ";
else
message << ". ";
written_progress_chars = message.count() - prefix_size - (increment % 8 == 7 ? 10 : 13); /// Don't count invisible output (escape sequences).
/// If the approximate number of rows to process is known, we can display a progress bar and percentage.
if (progress.total_rows > 0)
if (progress.total_rows_to_read > 0)
{
size_t total_rows_corrected = std::max(progress.rows, progress.total_rows);
size_t total_rows_corrected = std::max(progress.read_rows, progress.total_rows_to_read);
/// To avoid flicker, display progress bar only if .5 seconds have passed since query execution start
/// and the query is less than halfway done.
@ -1444,7 +1446,7 @@ private:
if (elapsed_ns > 500000000)
{
/// Trigger to start displaying progress bar. If query is mostly done, don't display it.
if (progress.rows * 2 < total_rows_corrected)
if (progress.read_rows * 2 < total_rows_corrected)
show_progress_bar = true;
if (show_progress_bar)
@ -1452,7 +1454,7 @@ private:
ssize_t width_of_progress_bar = static_cast<ssize_t>(terminal_size.ws_col) - written_progress_chars - strlen(" 99%");
if (width_of_progress_bar > 0)
{
std::string bar = UnicodeBar::render(UnicodeBar::getWidth(progress.rows, 0, total_rows_corrected, width_of_progress_bar));
std::string bar = UnicodeBar::render(UnicodeBar::getWidth(progress.read_rows, 0, total_rows_corrected, width_of_progress_bar));
message << "\033[0;32m" << bar << "\033[0m";
if (width_of_progress_bar > static_cast<ssize_t>(bar.size() / UNICODE_BAR_CHAR_SIZE))
message << std::string(width_of_progress_bar - bar.size() / UNICODE_BAR_CHAR_SIZE, ' ');
@ -1461,7 +1463,7 @@ private:
}
/// Underestimate percentage a bit to avoid displaying 100%.
message << ' ' << (99 * progress.rows / total_rows_corrected) << '%';
message << ' ' << (99 * progress.read_rows / total_rows_corrected) << '%';
}
message << ENABLE_LINE_WRAPPING;
@ -1474,14 +1476,14 @@ private:
void writeFinalProgress()
{
std::cout << "Processed "
<< formatReadableQuantity(progress.rows) << " rows, "
<< formatReadableSizeWithDecimalSuffix(progress.bytes);
<< formatReadableQuantity(progress.read_rows) << " rows, "
<< formatReadableSizeWithDecimalSuffix(progress.read_bytes);
size_t elapsed_ns = watch.elapsed();
if (elapsed_ns)
std::cout << " ("
<< formatReadableQuantity(progress.rows * 1000000000.0 / elapsed_ns) << " rows/s., "
<< formatReadableSizeWithDecimalSuffix(progress.bytes * 1000000000.0 / elapsed_ns) << "/s.) ";
<< formatReadableQuantity(progress.read_rows * 1000000000.0 / elapsed_ns) << " rows/s., "
<< formatReadableSizeWithDecimalSuffix(progress.read_bytes * 1000000000.0 / elapsed_ns) << "/s.) ";
else
std::cout << ". ";
}
@ -1549,7 +1551,7 @@ public:
* where possible args are file, name, format, structure, types.
* Split these groups before processing.
*/
using Arguments = std::vector<const char *>;
using Arguments = std::vector<std::string>;
Arguments common_arguments{""}; /// 0th argument is ignored.
std::vector<Arguments> external_tables_arguments;
@ -1671,8 +1673,7 @@ public:
("types", po::value<std::string>(), "types")
;
/// Parse main commandline options.
po::parsed_options parsed = po::command_line_parser(
common_arguments.size(), common_arguments.data()).options(main_description).run();
po::parsed_options parsed = po::command_line_parser(common_arguments).options(main_description).run();
po::variables_map options;
po::store(parsed, options);
po::notify(options);
@ -1705,8 +1706,7 @@ public:
for (size_t i = 0; i < external_tables_arguments.size(); ++i)
{
/// Parse commandline options related to external tables.
po::parsed_options parsed_tables = po::command_line_parser(
external_tables_arguments[i].size(), external_tables_arguments[i].data()).options(external_description).run();
po::parsed_options parsed_tables = po::command_line_parser(external_tables_arguments[i]).options(external_description).run();
po::variables_map external_options;
po::store(parsed_tables, external_options);
@ -1802,6 +1802,9 @@ public:
}
if (options.count("suggestion_limit"))
config().setInt("suggestion_limit", options["suggestion_limit"].as<int>());
argsToConfig(common_arguments, config(), 100);
}
};

View File

@ -0,0 +1,63 @@
#include "ConnectionParameters.h"
#include <fstream>
#include <iostream>
#include <Core/Defines.h>
#include <Core/Protocol.h>
#include <Core/Types.h>
#include <IO/ConnectionTimeouts.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Common/Exception.h>
#include <common/setTerminalEcho.h>
#include <ext/scope_guard.h>
#include <readpassphrase.h>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
ConnectionParameters::ConnectionParameters(const Poco::Util::AbstractConfiguration & config)
{
bool is_secure = config.getBool("secure", false);
security = is_secure ? Protocol::Secure::Enable : Protocol::Secure::Disable;
host = config.getString("host", "localhost");
port = config.getInt(
"port", config.getInt(is_secure ? "tcp_port_secure" : "tcp_port", is_secure ? DBMS_DEFAULT_SECURE_PORT : DBMS_DEFAULT_PORT));
default_database = config.getString("database", "");
/// changed the default value to "default" to fix the issue when the user in the prompt is blank
user = config.getString("user", "default");
bool password_prompt = false;
if (config.getBool("ask-password", false))
{
if (config.has("password"))
throw Exception("Specified both --password and --ask-password. Remove one of them", ErrorCodes::BAD_ARGUMENTS);
password_prompt = true;
}
else
{
password = config.getString("password", "");
/// if the value of --password is omitted, the password will be set implicitly to "\n"
if (password == "\n")
password_prompt = true;
}
if (password_prompt)
{
std::string prompt{"Password for user (" + user + "): "};
char buf[1000] = {};
if (auto result = readpassphrase(prompt.c_str(), buf, sizeof(buf), 0))
password = result;
}
compression = config.getBool("compression", true) ? Protocol::Compression::Enable : Protocol::Compression::Disable;
timeouts = ConnectionTimeouts(
Poco::Timespan(config.getInt("connect_timeout", DBMS_DEFAULT_CONNECT_TIMEOUT_SEC), 0),
Poco::Timespan(config.getInt("send_timeout", DBMS_DEFAULT_SEND_TIMEOUT_SEC), 0),
Poco::Timespan(config.getInt("receive_timeout", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC), 0),
Poco::Timespan(config.getInt("tcp_keep_alive_timeout", 0), 0));
}
}

View File

@ -1,90 +1,30 @@
#pragma once
#include <iostream>
#include <Core/Types.h>
#include <string>
#include <Core/Protocol.h>
#include <Core/Defines.h>
#include <Common/Exception.h>
#include <IO/ConnectionTimeouts.h>
#include <common/setTerminalEcho.h>
#include <ext/scope_guard.h>
#include <Poco/Util/AbstractConfiguration.h>
namespace Poco::Util
{
class AbstractConfiguration;
}
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
struct ConnectionParameters
{
String host;
std::string host;
UInt16 port{};
String default_database;
String user;
String password;
std::string default_database;
std::string user;
std::string password;
Protocol::Secure security = Protocol::Secure::Disable;
Protocol::Compression compression = Protocol::Compression::Enable;
ConnectionTimeouts timeouts;
ConnectionParameters() {}
ConnectionParameters(const Poco::Util::AbstractConfiguration & config)
{
bool is_secure = config.getBool("secure", false);
security = is_secure
? Protocol::Secure::Enable
: Protocol::Secure::Disable;
host = config.getString("host", "localhost");
port = config.getInt("port",
config.getInt(is_secure ? "tcp_port_secure" : "tcp_port",
is_secure ? DBMS_DEFAULT_SECURE_PORT : DBMS_DEFAULT_PORT));
default_database = config.getString("database", "");
/// changed the default value to "default" to fix the issue when the user in the prompt is blank
user = config.getString("user", "default");
bool password_prompt = false;
if (config.getBool("ask-password", false))
{
if (config.has("password"))
throw Exception("Specified both --password and --ask-password. Remove one of them", ErrorCodes::BAD_ARGUMENTS);
password_prompt = true;
}
else
{
password = config.getString("password", "");
/// if the value of --password is omitted, the password will be set implicitly to "\n"
if (password == "\n")
password_prompt = true;
}
if (password_prompt)
{
std::cout << "Password for user (" << user << "): ";
setTerminalEcho(false);
SCOPE_EXIT({
setTerminalEcho(true);
});
std::getline(std::cin, password);
std::cout << std::endl;
}
compression = config.getBool("compression", true)
? Protocol::Compression::Enable
: Protocol::Compression::Disable;
timeouts = ConnectionTimeouts(
Poco::Timespan(config.getInt("connect_timeout", DBMS_DEFAULT_CONNECT_TIMEOUT_SEC), 0),
Poco::Timespan(config.getInt("send_timeout", DBMS_DEFAULT_SEND_TIMEOUT_SEC), 0),
Poco::Timespan(config.getInt("receive_timeout", DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC), 0),
Poco::Timespan(config.getInt("tcp_keep_alive_timeout", 0), 0));
}
ConnectionParameters(const Poco::Util::AbstractConfiguration & config);
};
}

View File

@ -14,6 +14,7 @@
#include <Common/typeid_cast.h>
#include <Columns/ColumnString.h>
#include <Client/Connection.h>
#include <IO/ConnectionTimeouts.h>
namespace DB
@ -42,7 +43,7 @@ private:
"KILL", "QUERY", "SYNC", "ASYNC", "TEST", "BETWEEN", "TRUNCATE"
};
/// Words are fetched asynchonously.
/// Words are fetched asynchronously.
std::thread loading_thread;
std::atomic<bool> ready{false};
@ -71,7 +72,7 @@ private:
return word;
}
void loadImpl(Connection & connection, size_t suggestion_limit)
void loadImpl(Connection & connection, const ConnectionTimeouts & timeouts, size_t suggestion_limit)
{
std::stringstream query;
query << "SELECT DISTINCT arrayJoin(extractAll(name, '[\\\\w_]{2,}')) AS res FROM ("
@ -104,12 +105,12 @@ private:
query << ") WHERE notEmpty(res)";
fetch(connection, query.str());
fetch(connection, timeouts, query.str());
}
void fetch(Connection & connection, const std::string & query)
void fetch(Connection & connection, const ConnectionTimeouts & timeouts, const std::string & query)
{
connection.sendQuery(query);
connection.sendQuery(timeouts, query);
while (true)
{
@ -175,12 +176,11 @@ public:
connection_parameters.default_database,
connection_parameters.user,
connection_parameters.password,
connection_parameters.timeouts,
"client",
connection_parameters.compression,
connection_parameters.security);
loadImpl(connection, suggestion_limit);
loadImpl(connection, connection_parameters.timeouts, suggestion_limit);
}
catch (...)
{

View File

@ -0,0 +1,3 @@
#pragma once
#cmakedefine HAVE_READPASSPHRASE

View File

@ -0,0 +1,10 @@
# wget https://raw.githubusercontent.com/openssh/openssh-portable/master/openbsd-compat/readpassphrase.c
# wget https://raw.githubusercontent.com/openssh/openssh-portable/master/openbsd-compat/readpassphrase.h
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-result -Wno-reserved-id-macro")
configure_file(includes.h.in ${CMAKE_CURRENT_BINARY_DIR}/include/includes.h)
add_library(readpassphrase ${CMAKE_CURRENT_SOURCE_DIR}/readpassphrase.c)
# . to allow #include <readpassphrase.h>
target_include_directories(readpassphrase PUBLIC . ${CMAKE_CURRENT_BINARY_DIR}/include ${CMAKE_CURRENT_BINARY_DIR}/../include)

View File

@ -0,0 +1,9 @@
#pragma once
#cmakedefine HAVE_READPASSPHRASE
#if !defined(HAVE_READPASSPHRASE)
# ifndef _PATH_TTY
# define _PATH_TTY "/dev/tty"
# endif
#endif

View File

@ -0,0 +1,211 @@
/* $OpenBSD: readpassphrase.c,v 1.26 2016/10/18 12:47:18 millert Exp $ */
/*
* Copyright (c) 2000-2002, 2007, 2010
* Todd C. Miller <Todd.Miller@courtesan.com>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*
* Sponsored in part by the Defense Advanced Research Projects
* Agency (DARPA) and Air Force Research Laboratory, Air Force
* Materiel Command, USAF, under agreement number F39502-99-1-0512.
*/
/* OPENBSD ORIGINAL: lib/libc/gen/readpassphrase.c */
#include "includes.h"
#ifndef HAVE_READPASSPHRASE
#include <termios.h>
#include <signal.h>
#include <ctype.h>
#include <fcntl.h>
#include <readpassphrase.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#ifndef TCSASOFT
/* If we don't have TCSASOFT define it so that ORing it it below is a no-op. */
# define TCSASOFT 0
#endif
/* SunOS 4.x which lacks _POSIX_VDISABLE, but has VDISABLE */
#if !defined(_POSIX_VDISABLE) && defined(VDISABLE)
# define _POSIX_VDISABLE VDISABLE
#endif
static volatile sig_atomic_t signo[_NSIG];
static void handler(int);
char *
readpassphrase(const char *prompt, char *buf, size_t bufsiz, int flags)
{
ssize_t nr;
int input, output, save_errno, i, need_restart;
char ch, *p, *end;
struct termios term, oterm;
struct sigaction sa, savealrm, saveint, savehup, savequit, saveterm;
struct sigaction savetstp, savettin, savettou, savepipe;
/* I suppose we could alloc on demand in this case (XXX). */
if (bufsiz == 0) {
errno = EINVAL;
return(NULL);
}
restart:
for (i = 0; i < _NSIG; i++)
signo[i] = 0;
nr = -1;
save_errno = 0;
need_restart = 0;
/*
* Read and write to /dev/tty if available. If not, read from
* stdin and write to stderr unless a tty is required.
*/
if ((flags & RPP_STDIN) ||
(input = output = open(_PATH_TTY, O_RDWR)) == -1) {
if (flags & RPP_REQUIRE_TTY) {
errno = ENOTTY;
return(NULL);
}
input = STDIN_FILENO;
output = STDERR_FILENO;
}
/*
* Turn off echo if possible.
* If we are using a tty but are not the foreground pgrp this will
* generate SIGTTOU, so do it *before* installing the signal handlers.
*/
if (input != STDIN_FILENO && tcgetattr(input, &oterm) == 0) {
memcpy(&term, &oterm, sizeof(term));
if (!(flags & RPP_ECHO_ON))
term.c_lflag &= ~(ECHO | ECHONL);
#ifdef VSTATUS
if (term.c_cc[VSTATUS] != _POSIX_VDISABLE)
term.c_cc[VSTATUS] = _POSIX_VDISABLE;
#endif
(void)tcsetattr(input, TCSAFLUSH|TCSASOFT, &term);
} else {
memset(&term, 0, sizeof(term));
term.c_lflag |= ECHO;
memset(&oterm, 0, sizeof(oterm));
oterm.c_lflag |= ECHO;
}
/*
* Catch signals that would otherwise cause the user to end
* up with echo turned off in the shell. Don't worry about
* things like SIGXCPU and SIGVTALRM for now.
*/
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0; /* don't restart system calls */
sa.sa_handler = handler;
(void)sigaction(SIGALRM, &sa, &savealrm);
(void)sigaction(SIGHUP, &sa, &savehup);
(void)sigaction(SIGINT, &sa, &saveint);
(void)sigaction(SIGPIPE, &sa, &savepipe);
(void)sigaction(SIGQUIT, &sa, &savequit);
(void)sigaction(SIGTERM, &sa, &saveterm);
(void)sigaction(SIGTSTP, &sa, &savetstp);
(void)sigaction(SIGTTIN, &sa, &savettin);
(void)sigaction(SIGTTOU, &sa, &savettou);
if (!(flags & RPP_STDIN))
(void)write(output, prompt, strlen(prompt));
end = buf + bufsiz - 1;
p = buf;
while ((nr = read(input, &ch, 1)) == 1 && ch != '\n' && ch != '\r') {
if (p < end) {
if ((flags & RPP_SEVENBIT))
ch &= 0x7f;
if (isalpha((unsigned char)ch)) {
if ((flags & RPP_FORCELOWER))
ch = (char)tolower((unsigned char)ch);
if ((flags & RPP_FORCEUPPER))
ch = (char)toupper((unsigned char)ch);
}
*p++ = ch;
}
}
*p = '\0';
save_errno = errno;
if (!(term.c_lflag & ECHO))
(void)write(output, "\n", 1);
/* Restore old terminal settings and signals. */
if (memcmp(&term, &oterm, sizeof(term)) != 0) {
const int sigttou = signo[SIGTTOU];
/* Ignore SIGTTOU generated when we are not the fg pgrp. */
while (tcsetattr(input, TCSAFLUSH|TCSASOFT, &oterm) == -1 &&
errno == EINTR && !signo[SIGTTOU])
continue;
signo[SIGTTOU] = sigttou;
}
(void)sigaction(SIGALRM, &savealrm, NULL);
(void)sigaction(SIGHUP, &savehup, NULL);
(void)sigaction(SIGINT, &saveint, NULL);
(void)sigaction(SIGQUIT, &savequit, NULL);
(void)sigaction(SIGPIPE, &savepipe, NULL);
(void)sigaction(SIGTERM, &saveterm, NULL);
(void)sigaction(SIGTSTP, &savetstp, NULL);
(void)sigaction(SIGTTIN, &savettin, NULL);
(void)sigaction(SIGTTOU, &savettou, NULL);
if (input != STDIN_FILENO)
(void)close(input);
/*
* If we were interrupted by a signal, resend it to ourselves
* now that we have restored the signal handlers.
*/
for (i = 0; i < _NSIG; i++) {
if (signo[i]) {
kill(getpid(), i);
switch (i) {
case SIGTSTP:
case SIGTTIN:
case SIGTTOU:
need_restart = 1;
}
}
}
if (need_restart)
goto restart;
if (save_errno)
errno = save_errno;
return(nr == -1 ? NULL : buf);
}
//DEF_WEAK(readpassphrase);
#if 0
char *
getpass(const char *prompt)
{
static char buf[_PASSWORD_LEN + 1];
return(readpassphrase(prompt, buf, sizeof(buf), RPP_ECHO_OFF));
}
#endif
static void handler(int s)
{
signo[s] = 1;
}
#endif /* HAVE_READPASSPHRASE */

View File

@ -0,0 +1,56 @@
// /* $OpenBSD: readpassphrase.h,v 1.5 2003/06/17 21:56:23 millert Exp $ */
/*
* Copyright (c) 2000, 2002 Todd C. Miller <Todd.Miller@courtesan.com>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*
* Sponsored in part by the Defense Advanced Research Projects
* Agency (DARPA) and Air Force Research Laboratory, Air Force
* Materiel Command, USAF, under agreement number F39502-99-1-0512.
*/
/* OPENBSD ORIGINAL: include/readpassphrase.h */
#pragma once
// #ifndef _READPASSPHRASE_H_
// #define _READPASSPHRASE_H_
//#include "includes.h"
#include "config_client.h"
#ifndef HAVE_READPASSPHRASE
# ifdef __cplusplus
extern "C" {
# endif
# define RPP_ECHO_OFF 0x00 /* Turn off echo (default). */
# define RPP_ECHO_ON 0x01 /* Leave echo on. */
# define RPP_REQUIRE_TTY 0x02 /* Fail if there is no tty. */
# define RPP_FORCELOWER 0x04 /* Force input to lower case. */
# define RPP_FORCEUPPER 0x08 /* Force input to upper case. */
# define RPP_SEVENBIT 0x10 /* Strip the high bit from input. */
# define RPP_STDIN 0x20 /* Read from stdin, not /dev/tty */
char * readpassphrase(const char *, char *, size_t, int);
# ifdef __cplusplus
}
# endif
#endif /* HAVE_READPASSPHRASE */
// #endif /* !_READPASSPHRASE_H_ */

View File

@ -1,5 +1,5 @@
set(CLICKHOUSE_COPIER_SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/ClusterCopier.cpp)
set(CLICKHOUSE_COPIER_LINK PRIVATE clickhouse_functions clickhouse_table_functions clickhouse_aggregate_functions PUBLIC daemon)
set(CLICKHOUSE_COPIER_LINK PRIVATE clickhouse_functions clickhouse_table_functions clickhouse_aggregate_functions clickhouse_dictionaries PUBLIC daemon)
set(CLICKHOUSE_COPIER_INCLUDE SYSTEM PRIVATE ${PCG_RANDOM_INCLUDE_DIR})
clickhouse_program_add(copier)

View File

@ -16,7 +16,6 @@
#include <pcg_random.hpp>
#include <common/logger_useful.h>
#include <Common/ThreadPool.h>
#include <daemon/OwnPatternFormatter.h>
#include <Common/Exception.h>
#include <Common/ZooKeeper/ZooKeeper.h>
#include <Common/ZooKeeper/KeeperException.h>
@ -55,6 +54,7 @@
#include <DataStreams/AsynchronousBlockInputStream.h>
#include <DataStreams/copyData.h>
#include <DataStreams/NullBlockOutputStream.h>
#include <IO/ConnectionTimeouts.h>
#include <IO/Operators.h>
#include <IO/ReadBufferFromString.h>
#include <IO/ReadBufferFromFile.h>
@ -63,6 +63,7 @@
#include <AggregateFunctions/registerAggregateFunctions.h>
#include <Storages/registerStorages.h>
#include <Storages/StorageDistributed.h>
#include <Dictionaries/registerDictionaries.h>
#include <Databases/DatabaseMemory.h>
#include <Common/StatusFile.h>
@ -798,13 +799,13 @@ public:
}
void discoverShardPartitions(const TaskShardPtr & task_shard)
void discoverShardPartitions(const ConnectionTimeouts & timeouts, const TaskShardPtr & task_shard)
{
TaskTable & task_table = task_shard->task_table;
LOG_INFO(log, "Discover partitions of shard " << task_shard->getDescription());
auto get_partitions = [&] () { return getShardPartitions(*task_shard); };
auto get_partitions = [&] () { return getShardPartitions(timeouts, *task_shard); };
auto existing_partitions_names = retry(get_partitions, 60);
Strings filtered_partitions_names;
Strings missing_partitions;
@ -880,14 +881,14 @@ public:
}
/// Compute set of partitions, assume set of partitions aren't changed during the processing
void discoverTablePartitions(TaskTable & task_table, UInt64 num_threads = 0)
void discoverTablePartitions(const ConnectionTimeouts & timeouts, TaskTable & task_table, UInt64 num_threads = 0)
{
/// Fetch partitions list from a shard
{
ThreadPool thread_pool(num_threads ? num_threads : 2 * getNumberOfPhysicalCPUCores());
for (const TaskShardPtr & task_shard : task_table.all_shards)
thread_pool.schedule([this, task_shard]() { discoverShardPartitions(task_shard); });
thread_pool.schedule([this, timeouts, task_shard]() { discoverShardPartitions(timeouts, task_shard); });
LOG_DEBUG(log, "Waiting for " << thread_pool.active() << " setup jobs");
thread_pool.wait();
@ -955,7 +956,7 @@ public:
task_descprtion_current_version = version_to_update;
}
void process()
void process(const ConnectionTimeouts & timeouts)
{
for (TaskTable & task_table : task_cluster->table_tasks)
{
@ -969,7 +970,7 @@ public:
if (!task_table.has_enabled_partitions)
{
/// If there are no specified enabled_partitions, we must discover them manually
discoverTablePartitions(task_table);
discoverTablePartitions(timeouts, task_table);
/// After partitions of each shard are initialized, initialize cluster partitions
for (const TaskShardPtr & task_shard : task_table.all_shards)
@ -1009,7 +1010,7 @@ public:
bool table_is_done = false;
for (UInt64 num_table_tries = 0; num_table_tries < max_table_tries; ++num_table_tries)
{
if (tryProcessTable(task_table))
if (tryProcessTable(timeouts, task_table))
{
table_is_done = true;
break;
@ -1053,8 +1054,10 @@ protected:
return getWorkersPath() + "/" + host_id;
}
zkutil::EphemeralNodeHolder::Ptr createTaskWorkerNodeAndWaitIfNeed(const zkutil::ZooKeeperPtr & zookeeper,
const String & description, bool unprioritized)
zkutil::EphemeralNodeHolder::Ptr createTaskWorkerNodeAndWaitIfNeed(
const zkutil::ZooKeeperPtr & zookeeper,
const String & description,
bool unprioritized)
{
std::chrono::milliseconds current_sleep_time = default_sleep_time;
static constexpr std::chrono::milliseconds max_sleep_time(30000); // 30 sec
@ -1329,7 +1332,7 @@ protected:
static constexpr UInt64 max_table_tries = 1000;
static constexpr UInt64 max_shard_partition_tries = 600;
bool tryProcessTable(TaskTable & task_table)
bool tryProcessTable(const ConnectionTimeouts & timeouts, TaskTable & task_table)
{
/// An heuristic: if previous shard is already done, then check next one without sleeps due to max_workers constraint
bool previous_shard_is_instantly_finished = false;
@ -1360,7 +1363,7 @@ protected:
/// If not, did we check existence of that partition previously?
if (shard->checked_partitions.count(partition_name) == 0)
{
auto check_shard_has_partition = [&] () { return checkShardHasPartition(*shard, partition_name); };
auto check_shard_has_partition = [&] () { return checkShardHasPartition(timeouts, *shard, partition_name); };
bool has_partition = retry(check_shard_has_partition);
shard->checked_partitions.emplace(partition_name);
@ -1397,7 +1400,7 @@ protected:
bool was_error = false;
for (UInt64 try_num = 0; try_num < max_shard_partition_tries; ++try_num)
{
task_status = tryProcessPartitionTask(partition, is_unprioritized_task);
task_status = tryProcessPartitionTask(timeouts, partition, is_unprioritized_task);
/// Exit if success
if (task_status == PartitionTaskStatus::Finished)
@ -1483,13 +1486,13 @@ protected:
Error,
};
PartitionTaskStatus tryProcessPartitionTask(ShardPartition & task_partition, bool is_unprioritized_task)
PartitionTaskStatus tryProcessPartitionTask(const ConnectionTimeouts & timeouts, ShardPartition & task_partition, bool is_unprioritized_task)
{
PartitionTaskStatus res;
try
{
res = processPartitionTaskImpl(task_partition, is_unprioritized_task);
res = processPartitionTaskImpl(timeouts, task_partition, is_unprioritized_task);
}
catch (...)
{
@ -1510,7 +1513,7 @@ protected:
return res;
}
PartitionTaskStatus processPartitionTaskImpl(ShardPartition & task_partition, bool is_unprioritized_task)
PartitionTaskStatus processPartitionTaskImpl(const ConnectionTimeouts & timeouts, ShardPartition & task_partition, bool is_unprioritized_task)
{
TaskShard & task_shard = task_partition.task_shard;
TaskTable & task_table = task_shard.task_table;
@ -1611,7 +1614,7 @@ protected:
zookeeper->createAncestors(current_task_status_path);
/// We need to update table definitions for each partition, it could be changed after ALTER
createShardInternalTables(task_shard);
createShardInternalTables(timeouts, task_shard);
/// Check that destination partition is empty if we are first worker
/// NOTE: this check is incorrect if pull and push tables have different partition key!
@ -1828,23 +1831,25 @@ protected:
return typeid_cast<const ColumnString &>(*block.safeGetByPosition(0).column).getDataAt(0).toString();
}
ASTPtr getCreateTableForPullShard(TaskShard & task_shard)
ASTPtr getCreateTableForPullShard(const ConnectionTimeouts & timeouts, TaskShard & task_shard)
{
/// Fetch and parse (possibly) new definition
auto connection_entry = task_shard.info.pool->get(&task_cluster->settings_pull);
String create_query_pull_str = getRemoteCreateTable(task_shard.task_table.table_pull, *connection_entry,
&task_cluster->settings_pull);
auto connection_entry = task_shard.info.pool->get(timeouts, &task_cluster->settings_pull);
String create_query_pull_str = getRemoteCreateTable(
task_shard.task_table.table_pull,
*connection_entry,
&task_cluster->settings_pull);
ParserCreateQuery parser_create_query;
return parseQuery(parser_create_query, create_query_pull_str, 0);
}
void createShardInternalTables(TaskShard & task_shard, bool create_split = true)
void createShardInternalTables(const ConnectionTimeouts & timeouts, TaskShard & task_shard, bool create_split = true)
{
TaskTable & task_table = task_shard.task_table;
/// We need to update table definitions for each part, it could be changed after ALTER
task_shard.current_pull_table_create_query = getCreateTableForPullShard(task_shard);
task_shard.current_pull_table_create_query = getCreateTableForPullShard(timeouts, task_shard);
/// Create local Distributed tables:
/// a table fetching data from current shard and a table inserting data to the whole destination cluster
@ -1872,9 +1877,9 @@ protected:
}
std::set<String> getShardPartitions(TaskShard & task_shard)
std::set<String> getShardPartitions(const ConnectionTimeouts & timeouts, TaskShard & task_shard)
{
createShardInternalTables(task_shard, false);
createShardInternalTables(timeouts, task_shard, false);
TaskTable & task_table = task_shard.task_table;
@ -1914,9 +1919,9 @@ protected:
return res;
}
bool checkShardHasPartition(TaskShard & task_shard, const String & partition_quoted_name)
bool checkShardHasPartition(const ConnectionTimeouts & timeouts, TaskShard & task_shard, const String & partition_quoted_name)
{
createShardInternalTables(task_shard, false);
createShardInternalTables(timeouts, task_shard, false);
TaskTable & task_table = task_shard.task_table;
@ -1998,7 +2003,8 @@ protected:
Settings current_settings = settings ? *settings : task_cluster->settings_common;
current_settings.max_parallel_replicas = num_remote_replicas ? num_remote_replicas : 1;
auto connections = shard.pool->getMany(&current_settings, pool_mode);
auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(current_settings).getSaturated(current_settings.max_execution_time);
auto connections = shard.pool->getMany(timeouts, &current_settings, pool_mode);
for (auto & connection : connections)
{
@ -2169,6 +2175,7 @@ void ClusterCopierApp::mainImpl()
registerAggregateFunctions();
registerTableFunctions();
registerStorages();
registerDictionaries();
static const std::string default_database = "_local";
context->addDatabase(default_database, std::make_shared<DatabaseMemory>(default_database));
@ -2186,7 +2193,7 @@ void ClusterCopierApp::mainImpl()
copier->uploadTaskDescription(task_path, task_file, config().getBool("task-upload-force", false));
copier->init();
copier->process();
copier->process(ConnectionTimeouts::getTCPTimeoutsWithoutFailover(context->getSettingsRef()));
}

View File

@ -6,4 +6,4 @@ clickhouse_program_add(local)
if(NOT CLICKHOUSE_ONE_SHARED)
target_link_libraries(clickhouse-local-lib PRIVATE clickhouse-server-lib)
endif ()
endif()

View File

@ -34,6 +34,7 @@
#include <Dictionaries/registerDictionaries.h>
#include <boost/program_options/options_description.hpp>
#include <boost/program_options.hpp>
#include <common/argsToConfig.h>
namespace DB
@ -59,11 +60,29 @@ void LocalServer::initialize(Poco::Util::Application & self)
{
Poco::Util::Application::initialize(self);
// Turn off server logging to stderr
if (!config().has("verbose"))
/// Load config files if exists
if (config().has("config-file") || Poco::File("config.xml").exists())
{
Poco::Logger::root().setLevel("none");
Poco::Logger::root().setChannel(Poco::AutoPtr<Poco::NullChannel>(new Poco::NullChannel()));
const auto config_path = config().getString("config-file", "config.xml");
ConfigProcessor config_processor(config_path, false, true);
config_processor.setConfigPath(Poco::Path(config_path).makeParent().toString());
auto loaded_config = config_processor.loadConfig();
config_processor.savePreprocessedConfig(loaded_config, loaded_config.configuration->getString("path", "."));
config().add(loaded_config.configuration.duplicate(), PRIO_DEFAULT, false);
}
if (config().has("logger") || config().has("logger.level") || config().has("logger.log"))
{
buildLoggers(config(), logger());
}
else
{
// Turn off server logging to stderr
if (!config().has("verbose"))
{
Poco::Logger::root().setLevel("none");
Poco::Logger::root().setChannel(Poco::AutoPtr<Poco::NullChannel>(new Poco::NullChannel()));
}
}
}
@ -110,16 +129,6 @@ try
return Application::EXIT_OK;
}
/// Load config files if exists
if (config().has("config-file") || Poco::File("config.xml").exists())
{
const auto config_path = config().getString("config-file", "config.xml");
ConfigProcessor config_processor(config_path, false, true);
config_processor.setConfigPath(Poco::Path(config_path).makeParent().toString());
auto loaded_config = config_processor.loadConfig();
config_processor.savePreprocessedConfig(loaded_config, loaded_config.configuration->getString("path", DBMS_DEFAULT_PATH));
config().add(loaded_config.configuration.duplicate(), PRIO_DEFAULT, false);
}
context = std::make_unique<Context>(Context::createGlobal());
context->setGlobalContext(*context);
@ -428,6 +437,8 @@ void LocalServer::init(int argc, char ** argv)
("stacktrace", "print stack traces of exceptions")
("echo", "print query before execution")
("verbose", "print query and other debugging info")
("logger.log", po::value<std::string>(), "Log file name")
("logger.level", po::value<std::string>(), "Log level")
("ignore-error", "do not stop processing if a query failed")
("version,V", "print version information and exit")
;
@ -481,8 +492,17 @@ void LocalServer::init(int argc, char ** argv)
config().setBool("echo", true);
if (options.count("verbose"))
config().setBool("verbose", true);
if (options.count("logger.log"))
config().setString("logger.log", options["logger.log"].as<std::string>());
if (options.count("logger.level"))
config().setString("logger.level", options["logger.level"].as<std::string>());
if (options.count("ignore-error"))
config().setBool("ignore-error", true);
std::vector<std::string> arguments;
for (int arg_num = 1; arg_num < argc; ++arg_num)
arguments.emplace_back(argv[arg_num]);
argsToConfig(arguments, config(), 100);
}
void LocalServer::applyCmdOptions()

View File

@ -3,6 +3,7 @@
#include <Core/Settings.h>
#include <Poco/Util/Application.h>
#include <memory>
#include <loggers/Loggers.h>
namespace DB
@ -13,7 +14,7 @@ class Context;
/// Lightweight Application for clickhouse-local
/// No networking, no extra configs and working directories, no pid and status files, no dictionaries, no logging.
/// Quiet mode by default
class LocalServer : public Poco::Util::Application
class LocalServer : public Poco::Util::Application, public Loggers
{
public:
LocalServer();

View File

@ -15,7 +15,7 @@
#endif
#if USE_TCMALLOC
#include <gperftools/malloc_extension.h> // Y_IGNORE
#include <gperftools/malloc_extension.h>
#endif
#include <Common/StringUtils/StringUtils.h>

View File

@ -3,9 +3,9 @@
#if USE_POCO_SQLODBC || USE_POCO_DATAODBC
#if USE_POCO_SQLODBC
#include <Poco/SQL/ODBC/ODBCException.h> // Y_IGNORE
#include <Poco/SQL/ODBC/SessionImpl.h> // Y_IGNORE
#include <Poco/SQL/ODBC/Utility.h> // Y_IGNORE
#include <Poco/SQL/ODBC/ODBCException.h>
#include <Poco/SQL/ODBC/SessionImpl.h>
#include <Poco/SQL/ODBC/Utility.h>
#define POCO_SQL_ODBC_CLASS Poco::SQL::ODBC
#endif
#if USE_POCO_DATAODBC

View File

@ -2,9 +2,9 @@
#if USE_POCO_SQLODBC || USE_POCO_DATAODBC
#if USE_POCO_SQLODBC
#include <Poco/SQL/ODBC/ODBCException.h> // Y_IGNORE
#include <Poco/SQL/ODBC/SessionImpl.h> // Y_IGNORE
#include <Poco/SQL/ODBC/Utility.h> // Y_IGNORE
#include <Poco/SQL/ODBC/ODBCException.h>
#include <Poco/SQL/ODBC/SessionImpl.h>
#include <Poco/SQL/ODBC/Utility.h>
#define POCO_SQL_ODBC_CLASS Poco::SQL::ODBC
#endif
#if USE_POCO_DATAODBC

View File

@ -123,7 +123,7 @@ void ODBCBridge::initialize(Application & self)
config().setString("logger", "ODBCBridge");
buildLoggers(config());
buildLoggers(config(), logger());
log = &logger();
hostname = config().getString("listen-host", "localhost");
port = config().getUInt("http-port");

View File

@ -2,9 +2,9 @@
#if USE_POCO_SQLODBC || USE_POCO_DATAODBC
#if USE_POCO_SQLODBC
#include <Poco/SQL/ODBC/ODBCException.h> // Y_IGNORE
#include <Poco/SQL/ODBC/SessionImpl.h> // Y_IGNORE
#include <Poco/SQL/ODBC/Utility.h> // Y_IGNORE
#include <Poco/SQL/ODBC/ODBCException.h>
#include <Poco/SQL/ODBC/SessionImpl.h>
#include <Poco/SQL/ODBC/Utility.h>
#define POCO_SQL_ODBC_CLASS Poco::SQL::ODBC
#endif
#if USE_POCO_DATAODBC

View File

@ -8,7 +8,7 @@
#if USE_POCO_SQLODBC || USE_POCO_DATAODBC
#if USE_POCO_SQLODBC
#include <Poco/SQL/ODBC/Utility.h> // Y_IGNORE
#include <Poco/SQL/ODBC/Utility.h>
#endif
#if USE_POCO_DATAODBC
#include <Poco/Data/ODBC/Utility.h>

View File

@ -1,7 +1,9 @@
#include "PerformanceTest.h"
#include <Core/Types.h>
#include <Common/CpuId.h>
#include <common/getMemoryAmount.h>
#include <IO/ConnectionTimeouts.h>
#include <IO/ReadBufferFromFile.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromFile.h>
@ -23,6 +25,7 @@ namespace
void waitQuery(Connection & connection)
{
bool finished = false;
while (true)
{
if (!connection.poll(1000000))
@ -49,12 +52,14 @@ namespace fs = boost::filesystem;
PerformanceTest::PerformanceTest(
const XMLConfigurationPtr & config_,
Connection & connection_,
const ConnectionTimeouts & timeouts_,
InterruptListener & interrupt_listener_,
const PerformanceTestInfo & test_info_,
Context & context_,
const std::vector<size_t> & queries_to_run_)
: config(config_)
, connection(connection_)
, timeouts(timeouts_)
, interrupt_listener(interrupt_listener_)
, test_info(test_info_)
, context(context_)
@ -71,6 +76,7 @@ bool PerformanceTest::checkPreconditions() const
Strings preconditions;
config->keys("preconditions", preconditions);
size_t table_precondition_index = 0;
size_t cpu_precondition_index = 0;
for (const std::string & precondition : preconditions)
{
@ -106,7 +112,7 @@ bool PerformanceTest::checkPreconditions() const
size_t exist = 0;
connection.sendQuery(query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
connection.sendQuery(timeouts, query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
while (true)
{
@ -136,6 +142,30 @@ bool PerformanceTest::checkPreconditions() const
return false;
}
}
if (precondition == "cpu")
{
std::string precondition_key = "preconditions.cpu[" + std::to_string(cpu_precondition_index++) + "]";
std::string flag_to_check = config->getString(precondition_key);
#define CHECK_CPU_PRECONDITION(OP) \
if (flag_to_check == #OP) \
{ \
if (!Cpu::CpuFlagsCache::have_##OP) \
{ \
LOG_WARNING(log, "CPU doesn't support " << #OP); \
return false; \
} \
} else
CPU_ID_ENUMERATE(CHECK_CPU_PRECONDITION)
{
LOG_WARNING(log, "CPU doesn't support " << flag_to_check);
return false;
}
#undef CHECK_CPU_PRECONDITION
}
}
return true;
@ -159,18 +189,10 @@ UInt64 PerformanceTest::calculateMaxExecTime() const
void PerformanceTest::prepare() const
{
for (const auto & query : test_info.create_queries)
for (const auto & query : test_info.create_and_fill_queries)
{
LOG_INFO(log, "Executing create query \"" << query << '\"');
connection.sendQuery(query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
waitQuery(connection);
LOG_INFO(log, "Query finished");
}
for (const auto & query : test_info.fill_queries)
{
LOG_INFO(log, "Executing fill query \"" << query << '\"');
connection.sendQuery(query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
LOG_INFO(log, "Executing create or fill query \"" << query << '\"');
connection.sendQuery(timeouts, query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
waitQuery(connection);
LOG_INFO(log, "Query finished");
}
@ -182,7 +204,7 @@ void PerformanceTest::finish() const
for (const auto & query : test_info.drop_queries)
{
LOG_INFO(log, "Executing drop query \"" << query << '\"');
connection.sendQuery(query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
connection.sendQuery(timeouts, query, "", QueryProcessingStage::Complete, &test_info.settings, nullptr, false);
waitQuery(connection);
LOG_INFO(log, "Query finished");
}

View File

@ -5,6 +5,7 @@
#include <common/logger_useful.h>
#include <Poco/Util/XMLConfiguration.h>
#include <IO/ConnectionTimeouts.h>
#include "PerformanceTestInfo.h"
namespace DB
@ -20,6 +21,7 @@ public:
PerformanceTest(
const XMLConfigurationPtr & config_,
Connection & connection_,
const ConnectionTimeouts & timeouts_,
InterruptListener & interrupt_listener_,
const PerformanceTestInfo & test_info_,
Context & context_,
@ -30,11 +32,6 @@ public:
std::vector<TestStats> execute();
void finish() const;
const PerformanceTestInfo & getTestInfo() const
{
return test_info;
}
bool checkSIGINT() const
{
return got_SIGINT;
@ -50,6 +47,7 @@ private:
private:
XMLConfigurationPtr config;
Connection & connection;
const ConnectionTimeouts & timeouts;
InterruptListener & interrupt_listener;
PerformanceTestInfo test_info;

View File

@ -48,8 +48,8 @@ PerformanceTestInfo::PerformanceTestInfo(
: profiles_file(profiles_file_)
, settings(global_settings_)
{
test_name = config->getString("name");
path = config->getString("path");
test_name = fs::path(path).stem().string();
if (config->has("main_metric"))
{
Strings main_metrics;
@ -60,10 +60,10 @@ PerformanceTestInfo::PerformanceTestInfo(
applySettings(config);
extractQueries(config);
extractAuxiliaryQueries(config);
processSubstitutions(config);
getExecutionType(config);
getStopConditions(config);
extractAuxiliaryQueries(config);
}
void PerformanceTestInfo::applySettings(XMLConfigurationPtr config)
@ -153,13 +153,29 @@ void PerformanceTestInfo::processSubstitutions(XMLConfigurationPtr config)
ConfigurationPtr substitutions_view(config->createView("substitutions"));
constructSubstitutions(substitutions_view, substitutions);
auto queries_pre_format = queries;
auto create_and_fill_queries_preformat = create_and_fill_queries;
create_and_fill_queries.clear();
for (const auto & query : create_and_fill_queries_preformat)
{
auto formatted = formatQueries(query, substitutions);
create_and_fill_queries.insert(create_and_fill_queries.end(), formatted.begin(), formatted.end());
}
auto queries_preformat = queries;
queries.clear();
for (const auto & query : queries_pre_format)
for (const auto & query : queries_preformat)
{
auto formatted = formatQueries(query, substitutions);
queries.insert(queries.end(), formatted.begin(), formatted.end());
}
auto drop_queries_preformat = drop_queries;
drop_queries.clear();
for (const auto & query : drop_queries_preformat)
{
auto formatted = formatQueries(query, substitutions);
drop_queries.insert(drop_queries.end(), formatted.begin(), formatted.end());
}
}
}
@ -203,13 +219,20 @@ void PerformanceTestInfo::getStopConditions(XMLConfigurationPtr config)
void PerformanceTestInfo::extractAuxiliaryQueries(XMLConfigurationPtr config)
{
if (config->has("create_query"))
create_queries = getMultipleValuesFromConfig(*config, "", "create_query");
{
create_and_fill_queries = getMultipleValuesFromConfig(*config, "", "create_query");
}
if (config->has("fill_query"))
fill_queries = getMultipleValuesFromConfig(*config, "", "fill_query");
{
auto fill_queries = getMultipleValuesFromConfig(*config, "", "fill_query");
create_and_fill_queries.insert(create_and_fill_queries.end(), fill_queries.begin(), fill_queries.end());
}
if (config->has("drop_query"))
{
drop_queries = getMultipleValuesFromConfig(*config, "", "drop_query");
}
}
}

View File

@ -42,8 +42,7 @@ public:
std::vector<TestStopConditions> stop_conditions_by_run;
Strings create_queries;
Strings fill_queries;
Strings create_and_fill_queries;
Strings drop_queries;
private:
@ -52,7 +51,6 @@ private:
void processSubstitutions(XMLConfigurationPtr config);
void getExecutionType(XMLConfigurationPtr config);
void getStopConditions(XMLConfigurationPtr config);
void getMetrics(XMLConfigurationPtr config);
void extractAuxiliaryQueries(XMLConfigurationPtr config);
};

View File

@ -72,10 +72,11 @@ public:
Strings && tests_names_regexp_,
Strings && skip_names_regexp_,
const std::unordered_map<std::string, std::vector<size_t>> query_indexes_,
const ConnectionTimeouts & timeouts)
const ConnectionTimeouts & timeouts_)
: connection(host_, port_, default_database_, user_,
password_, timeouts, "performance-test", Protocol::Compression::Enable,
password_, "performance-test", Protocol::Compression::Enable,
secure_ ? Protocol::Secure::Enable : Protocol::Secure::Disable)
, timeouts(timeouts_)
, tests_tags(std::move(tests_tags_))
, tests_names(std::move(tests_names_))
, tests_names_regexp(std::move(tests_names_regexp_))
@ -100,7 +101,7 @@ public:
UInt64 version_minor;
UInt64 version_patch;
UInt64 version_revision;
connection.getServerVersion(name, version_major, version_minor, version_patch, version_revision);
connection.getServerVersion(timeouts, name, version_major, version_minor, version_patch, version_revision);
std::stringstream ss;
ss << version_major << "." << version_minor << "." << version_patch;
@ -115,6 +116,7 @@ public:
private:
Connection connection;
const ConnectionTimeouts & timeouts;
const Strings & tests_tags;
const Strings & tests_names;
@ -195,15 +197,14 @@ private:
{
PerformanceTestInfo info(test_config, profiles_file, global_context.getSettingsRef());
LOG_INFO(log, "Config for test '" << info.test_name << "' parsed");
PerformanceTest current(test_config, connection, interrupt_listener, info, global_context, query_indexes[info.path]);
PerformanceTest current(test_config, connection, timeouts, interrupt_listener, info, global_context, query_indexes[info.path]);
if (current.checkPreconditions())
{
LOG_INFO(log, "Preconditions for test '" << info.test_name << "' are fullfilled");
LOG_INFO(
log,
"Preparing for run, have " << info.create_queries.size() << " create queries and " << info.fill_queries.size()
<< " fill queries");
"Preparing for run, have " << info.create_and_fill_queries.size() << " create and fill queries");
current.prepare();
LOG_INFO(log, "Prepared");
LOG_INFO(log, "Running test '" << info.test_name << "'");
@ -370,7 +371,7 @@ try
Poco::Logger * log = &Poco::Logger::get("PerformanceTestSuite");
if (options.count("help"))
{
std::cout << "Usage: " << argv[0] << " [options] [test_file ...] [tests_folder]\n";
std::cout << "Usage: " << argv[0] << " [options]\n";
std::cout << desc << "\n";
return 0;
}

View File

@ -1,4 +1,5 @@
#include "TestStats.h"
#include <algorithm>
namespace DB
{
@ -92,11 +93,10 @@ void TestStats::update_average_speed(
avg_speed_value /= number_of_info_batches;
if (avg_speed_first == 0)
{
avg_speed_first = avg_speed_value;
}
if (std::abs(avg_speed_value - avg_speed_first) >= precision)
auto [min, max] = std::minmax(avg_speed_value, avg_speed_first);
if (1 - min / max >= precision)
{
avg_speed_first = avg_speed_value;
avg_speed_watch.restart();

View File

@ -40,11 +40,11 @@ struct TestStats
double avg_rows_speed_value = 0;
double avg_rows_speed_first = 0;
static inline double avg_rows_speed_precision = 0.001;
static inline double avg_rows_speed_precision = 0.005;
double avg_bytes_speed_value = 0;
double avg_bytes_speed_first = 0;
static inline double avg_bytes_speed_precision = 0.001;
static inline double avg_bytes_speed_precision = 0.005;
size_t number_of_rows_speed_info_batches = 0;
size_t number_of_bytes_speed_info_batches = 0;

View File

@ -13,7 +13,7 @@ void checkFulfilledConditionsAndUpdate(
TestStats & statistics, TestStopConditions & stop_conditions,
InterruptListener & interrupt_listener)
{
statistics.add(progress.rows, progress.bytes);
statistics.add(progress.read_rows, progress.read_bytes);
stop_conditions.reportRowsRead(statistics.total_rows_read);
stop_conditions.reportBytesReadUncompressed(statistics.total_bytes_read);

View File

@ -8,6 +8,8 @@ set(CLICKHOUSE_SERVER_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/RootRequestHandler.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Server.cpp
${CMAKE_CURRENT_SOURCE_DIR}/TCPHandler.cpp
${CMAKE_CURRENT_SOURCE_DIR}/MySQLHandler.cpp
${CMAKE_CURRENT_SOURCE_DIR}/MySQLHandlerFactory.cpp
)
set(CLICKHOUSE_SERVER_LINK PRIVATE clickhouse_dictionaries clickhouse_common_io PUBLIC daemon PRIVATE clickhouse_storages_system clickhouse_functions clickhouse_aggregate_functions clickhouse_table_functions ${Poco_Net_LIBRARY})

View File

@ -312,11 +312,13 @@ void HTTPHandler::processQuery(
client_supports_http_compression = true;
http_response_compression_method = CompressionMethod::Zlib;
}
#if USE_BROTLI
else if (http_response_compression_methods == "br")
{
client_supports_http_compression = true;
http_response_compression_method = CompressionMethod::Brotli;
}
#endif
}
/// Client can pass a 'compress' flag in the query string. In this case the query result is
@ -453,30 +455,6 @@ void HTTPHandler::processQuery(
return false;
};
/// Used in case of POST request with form-data, but it isn't expected to be deleted after that scope.
std::string full_query;
/// Support for "external data for query processing".
if (startsWith(request.getContentType().data(), "multipart/form-data"))
{
ExternalTablesHandler handler(context, params);
params.load(request, istr, handler);
/// Skip unneeded parameters to avoid confusing them later with context settings or query parameters.
reserved_param_suffixes.emplace_back("_format");
reserved_param_suffixes.emplace_back("_types");
reserved_param_suffixes.emplace_back("_structure");
/// Params are of both form params POST and uri (GET params)
for (const auto & it : params)
if (it.first == "query")
full_query += it.second;
in = std::make_unique<ReadBufferFromString>(full_query);
}
else
in = std::make_unique<ConcatReadBuffer>(*in_param, *in_post_maybe_compressed);
/// Settings can be overridden in the query.
/// Some parameters (database, default_format, everything used in the code above) do not
/// belong to the Settings class.
@ -497,30 +475,63 @@ void HTTPHandler::processQuery(
settings.readonly = 2;
}
SettingsChanges settings_changes;
for (auto it = params.begin(); it != params.end(); ++it)
bool isExternalData = startsWith(request.getContentType().data(), "multipart/form-data");
if (isExternalData)
{
if (it->first == "database")
/// Skip unneeded parameters to avoid confusing them later with context settings or query parameters.
reserved_param_suffixes.reserve(3);
/// It is a bug and ambiguity with `date_time_input_format` and `low_cardinality_allow_in_native_format` formats/settings.
reserved_param_suffixes.emplace_back("_format");
reserved_param_suffixes.emplace_back("_types");
reserved_param_suffixes.emplace_back("_structure");
}
SettingsChanges settings_changes;
for (const auto & [key, value] : params)
{
if (key == "database")
{
context.setCurrentDatabase(it->second);
context.setCurrentDatabase(value);
}
else if (it->first == "default_format")
else if (key == "default_format")
{
context.setDefaultFormat(it->second);
context.setDefaultFormat(value);
}
else if (param_could_be_skipped(it->first))
else if (param_could_be_skipped(key))
{
}
else
{
/// All other query parameters are treated as settings.
settings_changes.push_back({it->first, it->second});
settings_changes.push_back({key, value});
}
}
/// For external data we also want settings
context.checkSettingsConstraints(settings_changes);
context.applySettingsChanges(settings_changes);
/// Used in case of POST request with form-data, but it isn't expected to be deleted after that scope.
std::string full_query;
/// Support for "external data for query processing".
if (isExternalData)
{
ExternalTablesHandler handler(context, params);
params.load(request, istr, handler);
/// Params are of both form params POST and uri (GET params)
for (const auto & it : params)
if (it.first == "query")
full_query += it.second;
in = std::make_unique<ReadBufferFromString>(full_query);
}
else
in = std::make_unique<ConcatReadBuffer>(*in_param, *in_post_maybe_compressed);
/// HTTP response compression is turned on only if the client signalled that they support it
/// (using Accept-Encoding header) and 'enable_http_compression' setting is turned on.
used_output.out->setCompression(client_supports_http_compression && settings.enable_http_compression);

View File

@ -1,17 +1,16 @@
#include "InterserverIOHTTPHandler.h"
#include <Poco/Net/HTTPBasicCredentials.h>
#include <Poco/Net/HTTPServerRequest.h>
#include <Poco/Net/HTTPServerResponse.h>
#include <common/logger_useful.h>
#include <Common/HTMLForm.h>
#include <Common/setThreadName.h>
#include <Compression/CompressedWriteBuffer.h>
#include <IO/ReadBufferFromIStream.h>
#include <IO/WriteBufferFromHTTPServerResponse.h>
#include <Interpreters/InterserverIOHandler.h>
#include "InterserverIOHTTPHandler.h"
#include "IServer.h"
namespace DB
{
@ -50,7 +49,7 @@ std::pair<String, bool> InterserverIOHTTPHandler::checkAuthentication(Poco::Net:
return {"", true};
}
void InterserverIOHTTPHandler::processQuery(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response)
void InterserverIOHTTPHandler::processQuery(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response, Output & used_output)
{
HTMLForm params(request);
@ -61,24 +60,17 @@ void InterserverIOHTTPHandler::processQuery(Poco::Net::HTTPServerRequest & reque
ReadBufferFromIStream body(request.stream());
const auto & config = server.config();
unsigned keep_alive_timeout = config.getUInt("keep_alive_timeout", 10);
WriteBufferFromHTTPServerResponse out(request, response, keep_alive_timeout);
auto endpoint = server.context().getInterserverIOHandler().getEndpoint(endpoint_name);
if (compress)
{
CompressedWriteBuffer compressed_out(out);
CompressedWriteBuffer compressed_out(*used_output.out.get());
endpoint->processQuery(params, body, compressed_out, response);
}
else
{
endpoint->processQuery(params, body, out, response);
endpoint->processQuery(params, body, *used_output.out.get(), response);
}
out.finalize();
}
@ -90,30 +82,30 @@ void InterserverIOHTTPHandler::handleRequest(Poco::Net::HTTPServerRequest & requ
if (request.getVersion() == Poco::Net::HTTPServerRequest::HTTP_1_1)
response.setChunkedTransferEncoding(true);
Output used_output;
const auto & config = server.config();
unsigned keep_alive_timeout = config.getUInt("keep_alive_timeout", 10);
used_output.out = std::make_shared<WriteBufferFromHTTPServerResponse>(request, response, keep_alive_timeout);
try
{
if (auto [msg, success] = checkAuthentication(request); success)
if (auto [message, success] = checkAuthentication(request); success)
{
processQuery(request, response);
processQuery(request, response, used_output);
LOG_INFO(log, "Done processing query");
}
else
{
response.setStatusAndReason(Poco::Net::HTTPServerResponse::HTTP_UNAUTHORIZED);
if (!response.sent())
response.send() << msg << std::endl;
writeString(message, *used_output.out);
LOG_WARNING(log, "Query processing failed request: '" << request.getURI() << "' authentification failed");
}
}
catch (Exception & e)
{
if (e.code() == ErrorCodes::TOO_MANY_SIMULTANEOUS_QUERIES)
{
if (!response.sent())
response.send();
return;
}
response.setStatusAndReason(Poco::Net::HTTPResponse::HTTP_INTERNAL_SERVER_ERROR);
@ -122,7 +114,7 @@ void InterserverIOHTTPHandler::handleRequest(Poco::Net::HTTPServerRequest & requ
std::string message = getCurrentExceptionMessage(is_real_error);
if (!response.sent())
response.send() << message << std::endl;
writeString(message, *used_output.out);
if (is_real_error)
LOG_ERROR(log, message);
@ -134,7 +126,8 @@ void InterserverIOHTTPHandler::handleRequest(Poco::Net::HTTPServerRequest & requ
response.setStatusAndReason(Poco::Net::HTTPResponse::HTTP_INTERNAL_SERVER_ERROR);
std::string message = getCurrentExceptionMessage(false);
if (!response.sent())
response.send() << message << std::endl;
writeString(message, *used_output.out);
LOG_ERROR(log, message);
}
}

View File

@ -1,12 +1,10 @@
#pragma once
#include <memory>
#include <Poco/Logger.h>
#include <Poco/Net/HTTPRequestHandler.h>
#include <Common/CurrentMetrics.h>
#include "IServer.h"
namespace CurrentMetrics
{
@ -16,6 +14,9 @@ namespace CurrentMetrics
namespace DB
{
class IServer;
class WriteBufferFromHTTPServerResponse;
class InterserverIOHTTPHandler : public Poco::Net::HTTPRequestHandler
{
public:
@ -28,12 +29,17 @@ public:
void handleRequest(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response) override;
private:
struct Output
{
std::shared_ptr<WriteBufferFromHTTPServerResponse> out;
};
IServer & server;
Poco::Logger * log;
CurrentMetrics::Increment metric_increment{CurrentMetrics::InterserverConnection};
void processQuery(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response);
void processQuery(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response, Output & used_output);
std::pair<String, bool> checkAuthentication(Poco::Net::HTTPServerRequest & request) const;
};

View File

@ -0,0 +1,371 @@
#include <DataStreams/copyData.h>
#include <IO/ReadBufferFromString.h>
#include <IO/ReadBufferFromPocoSocket.h>
#include <IO/WriteBufferFromPocoSocket.h>
#include <Interpreters/executeQuery.h>
#include <Storages/IStorage.h>
#include <Core/MySQLProtocol.h>
#include <Core/NamesAndTypes.h>
#include <Columns/ColumnVector.h>
#include <Common/config_version.h>
#include <Common/NetException.h>
#include <Common/OpenSSLHelpers.h>
#include <Poco/Crypto/RSAKey.h>
#include <Poco/Crypto/CipherFactory.h>
#include <Poco/Net/SecureStreamSocket.h>
#include <Poco/Net/SSLManager.h>
#include "MySQLHandler.h"
#include <limits>
#include <ext/scope_guard.h>
namespace DB
{
using namespace MySQLProtocol;
using Poco::Net::SecureStreamSocket;
using Poco::Net::SSLManager;
namespace ErrorCodes
{
extern const int MYSQL_CLIENT_INSUFFICIENT_CAPABILITIES;
extern const int OPENSSL_ERROR;
}
MySQLHandler::MySQLHandler(IServer & server_, const Poco::Net::StreamSocket & socket_, RSA & public_key, RSA & private_key, bool ssl_enabled, size_t connection_id)
: Poco::Net::TCPServerConnection(socket_)
, server(server_)
, log(&Poco::Logger::get("MySQLHandler"))
, connection_context(server.context())
, connection_id(connection_id)
, public_key(public_key)
, private_key(private_key)
{
server_capability_flags = CLIENT_PROTOCOL_41 | CLIENT_SECURE_CONNECTION | CLIENT_PLUGIN_AUTH | CLIENT_PLUGIN_AUTH_LENENC_CLIENT_DATA | CLIENT_CONNECT_WITH_DB | CLIENT_DEPRECATE_EOF;
if (ssl_enabled)
server_capability_flags |= CLIENT_SSL;
}
void MySQLHandler::run()
{
connection_context = server.context();
connection_context.setSessionContext(connection_context);
connection_context.setDefaultFormat("MySQLWire");
in = std::make_shared<ReadBufferFromPocoSocket>(socket());
out = std::make_shared<WriteBufferFromPocoSocket>(socket());
packet_sender = std::make_shared<PacketSender>(*in, *out, connection_context.sequence_id);
try
{
String scramble = generateScramble();
/** Native authentication sent 20 bytes + '\0' character = 21 bytes.
* This plugin must do the same to stay consistent with historical behavior if it is set to operate as a default plugin.
* https://github.com/mysql/mysql-server/blob/8.0/sql/auth/sql_authentication.cc#L3994
*/
Handshake handshake(server_capability_flags, connection_id, VERSION_STRING + String("-") + VERSION_NAME, scramble + '\0');
packet_sender->sendPacket<Handshake>(handshake, true);
LOG_TRACE(log, "Sent handshake");
HandshakeResponse handshake_response = finishHandshake();
connection_context.client_capabilities = handshake_response.capability_flags;
if (handshake_response.max_packet_size)
connection_context.max_packet_size = handshake_response.max_packet_size;
if (!connection_context.max_packet_size)
connection_context.max_packet_size = MAX_PACKET_LENGTH;
LOG_DEBUG(log, "Capabilities: " << handshake_response.capability_flags
<< "\nmax_packet_size: "
<< handshake_response.max_packet_size
<< "\ncharacter_set: "
<< handshake_response.character_set
<< "\nuser: "
<< handshake_response.username
<< "\nauth_response length: "
<< handshake_response.auth_response.length()
<< "\nauth_response: "
<< handshake_response.auth_response
<< "\ndatabase: "
<< handshake_response.database
<< "\nauth_plugin_name: "
<< handshake_response.auth_plugin_name);
client_capability_flags = handshake_response.capability_flags;
if (!(client_capability_flags & CLIENT_PROTOCOL_41))
throw Exception("Required capability: CLIENT_PROTOCOL_41.", ErrorCodes::MYSQL_CLIENT_INSUFFICIENT_CAPABILITIES);
if (!(client_capability_flags & CLIENT_PLUGIN_AUTH))
throw Exception("Required capability: CLIENT_PLUGIN_AUTH.", ErrorCodes::MYSQL_CLIENT_INSUFFICIENT_CAPABILITIES);
authenticate(handshake_response, scramble);
OK_Packet ok_packet(0, handshake_response.capability_flags, 0, 0, 0);
packet_sender->sendPacket(ok_packet, true);
while (true)
{
packet_sender->resetSequenceId();
String payload = packet_sender->receivePacketPayload();
int command = payload[0];
LOG_DEBUG(log, "Received command: " << std::to_string(command) << ". Connection id: " << connection_id << ".");
try
{
switch (command)
{
case COM_QUIT:
return;
case COM_INIT_DB:
comInitDB(payload);
break;
case COM_QUERY:
comQuery(payload);
break;
case COM_FIELD_LIST:
comFieldList(payload);
break;
case COM_PING:
comPing();
break;
default:
throw Exception(Poco::format("Command %d is not implemented.", command), ErrorCodes::NOT_IMPLEMENTED);
}
}
catch (const NetException & exc)
{
log->log(exc);
throw;
}
catch (const Exception & exc)
{
log->log(exc);
packet_sender->sendPacket(ERR_Packet(exc.code(), "00000", exc.message()), true);
}
}
}
catch (Poco::Exception & exc)
{
log->log(exc);
}
}
/** Reads 3 bytes, finds out whether it is SSLRequest or HandshakeResponse packet, starts secure connection, if it is SSLRequest.
* Reading is performed from socket instead of ReadBuffer to prevent reading part of SSL handshake.
* If we read it from socket, it will be impossible to start SSL connection using Poco. Size of SSLRequest packet payload is 32 bytes, thus we can read at most 36 bytes.
*/
MySQLProtocol::HandshakeResponse MySQLHandler::finishHandshake()
{
HandshakeResponse packet;
size_t packet_size = PACKET_HEADER_SIZE + SSL_REQUEST_PAYLOAD_SIZE;
/// Buffer for SSLRequest or part of HandshakeResponse.
char buf[packet_size];
size_t pos = 0;
/// Reads at least count and at most packet_size bytes.
auto read_bytes = [this, &buf, &pos, &packet_size](size_t count) -> void {
while (pos < count)
{
int ret = socket().receiveBytes(buf + pos, packet_size - pos);
if (ret == 0)
{
throw Exception("Cannot read all data. Bytes read: " + std::to_string(pos) + ". Bytes expected: 3.", ErrorCodes::CANNOT_READ_ALL_DATA);
}
pos += ret;
}
};
read_bytes(3); /// We can find out whether it is SSLRequest of HandshakeResponse by first 3 bytes.
size_t payload_size = unalignedLoad<uint32_t>(buf) & 0xFFFFFFu;
LOG_TRACE(log, "payload size: " << payload_size);
if (payload_size == SSL_REQUEST_PAYLOAD_SIZE)
{
read_bytes(packet_size); /// Reading rest SSLRequest.
SSLRequest ssl_request;
ssl_request.readPayload(String(buf + PACKET_HEADER_SIZE, pos - PACKET_HEADER_SIZE));
connection_context.client_capabilities = ssl_request.capability_flags;
connection_context.max_packet_size = ssl_request.max_packet_size ? ssl_request.max_packet_size : MAX_PACKET_LENGTH;
secure_connection = true;
ss = std::make_shared<SecureStreamSocket>(SecureStreamSocket::attach(socket(), SSLManager::instance().defaultServerContext()));
in = std::make_shared<ReadBufferFromPocoSocket>(*ss);
out = std::make_shared<WriteBufferFromPocoSocket>(*ss);
connection_context.sequence_id = 2;
packet_sender = std::make_shared<PacketSender>(*in, *out, connection_context.sequence_id);
packet_sender->max_packet_size = connection_context.max_packet_size;
packet_sender->receivePacket(packet); /// Reading HandshakeResponse from secure socket.
}
else
{
/// Reading rest of HandshakeResponse.
packet_size = PACKET_HEADER_SIZE + payload_size;
WriteBufferFromOwnString buf_for_handshake_response;
buf_for_handshake_response.write(buf, pos);
copyData(*packet_sender->in, buf_for_handshake_response, packet_size - pos);
packet.readPayload(buf_for_handshake_response.str().substr(PACKET_HEADER_SIZE));
packet_sender->sequence_id++;
}
return packet;
}
String MySQLHandler::generateScramble()
{
String scramble(MySQLProtocol::SCRAMBLE_LENGTH, 0);
Poco::RandomInputStream generator;
for (size_t i = 0; i < scramble.size(); i++)
{
generator >> scramble[i];
}
return scramble;
}
void MySQLHandler::authenticate(const HandshakeResponse & handshake_response, const String & scramble)
{
String auth_response;
AuthSwitchResponse response;
if (handshake_response.auth_plugin_name != Authentication::SHA256)
{
packet_sender->sendPacket(AuthSwitchRequest(Authentication::SHA256, scramble + '\0'), true);
if (in->eof())
throw Exception(
"Client doesn't support authentication method " + String(Authentication::SHA256) + " used by ClickHouse",
ErrorCodes::MYSQL_CLIENT_INSUFFICIENT_CAPABILITIES);
packet_sender->receivePacket(response);
auth_response = response.value;
LOG_TRACE(log, "Authentication method mismatch.");
}
else
{
auth_response = handshake_response.auth_response;
LOG_TRACE(log, "Authentication method match.");
}
if (auth_response == "\1")
{
LOG_TRACE(log, "Client requests public key.");
BIO * mem = BIO_new(BIO_s_mem());
SCOPE_EXIT(BIO_free(mem));
if (PEM_write_bio_RSA_PUBKEY(mem, &public_key) != 1)
{
throw Exception("Failed to write public key to memory. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
}
char * pem_buf = nullptr;
long pem_size = BIO_get_mem_data(mem, &pem_buf);
String pem(pem_buf, pem_size);
LOG_TRACE(log, "Key: " << pem);
AuthMoreData data(pem);
packet_sender->sendPacket(data, true);
packet_sender->receivePacket(response);
auth_response = response.value;
}
else
{
LOG_TRACE(log, "Client didn't request public key.");
}
String password;
/** Decrypt password, if it's not empty.
* The original intention was that the password is a string[NUL] but this never got enforced properly so now we have to accept that
* an empty packet is a blank password, thus the check for auth_response.empty() has to be made too.
* https://github.com/mysql/mysql-server/blob/8.0/sql/auth/sql_authentication.cc#L4017
*/
if (!secure_connection && !auth_response.empty() && auth_response != String("\0", 1))
{
LOG_TRACE(log, "Received nonempty password");
auto ciphertext = reinterpret_cast<unsigned char *>(auth_response.data());
unsigned char plaintext[RSA_size(&private_key)];
int plaintext_size = RSA_private_decrypt(auth_response.size(), ciphertext, plaintext, &private_key, RSA_PKCS1_OAEP_PADDING);
if (plaintext_size == -1)
{
throw Exception("Failed to decrypt auth data. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
}
password.resize(plaintext_size);
for (int i = 0; i < plaintext_size; i++)
{
password[i] = plaintext[i] ^ static_cast<unsigned char>(scramble[i % scramble.size()]);
}
}
else if (secure_connection)
{
password = auth_response;
}
else
{
LOG_TRACE(log, "Received empty password");
}
if (!password.empty())
{
password.pop_back(); /// terminating null byte
}
try
{
connection_context.setUser(handshake_response.username, password, socket().address(), "");
if (!handshake_response.database.empty()) connection_context.setCurrentDatabase(handshake_response.database);
connection_context.setCurrentQueryId("");
LOG_ERROR(log, "Authentication for user " << handshake_response.username << " succeeded.");
}
catch (const Exception & exc)
{
LOG_ERROR(log, "Authentication for user " << handshake_response.username << " failed.");
packet_sender->sendPacket(ERR_Packet(exc.code(), "00000", exc.message()), true);
throw;
}
}
void MySQLHandler::comInitDB(const String & payload)
{
String database = payload.substr(1);
LOG_DEBUG(log, "Setting current database to " << database);
connection_context.setCurrentDatabase(database);
packet_sender->sendPacket(OK_Packet(0, client_capability_flags, 0, 0, 1), true);
}
void MySQLHandler::comFieldList(const String & payload)
{
ComFieldList packet;
packet.readPayload(payload);
String database = connection_context.getCurrentDatabase();
StoragePtr tablePtr = connection_context.getTable(database, packet.table);
for (const NameAndTypePair & column: tablePtr->getColumns().getAll())
{
ColumnDefinition column_definition(
database, packet.table, packet.table, column.name, column.name, CharacterSet::binary, 100, ColumnType::MYSQL_TYPE_STRING, 0, 0
);
packet_sender->sendPacket(column_definition);
}
packet_sender->sendPacket(OK_Packet(0xfe, client_capability_flags, 0, 0, 0), true);
}
void MySQLHandler::comPing()
{
packet_sender->sendPacket(OK_Packet(0x0, client_capability_flags, 0, 0, 0), true);
}
void MySQLHandler::comQuery(const String & payload)
{
bool with_output = false;
std::function<void(const String &)> set_content_type = [&with_output](const String &) -> void {
with_output = true;
};
String query = payload.substr(1);
// Translate query from MySQL to ClickHouse.
// This is a temporary workaround until ClickHouse supports the syntax "@@var_name".
if (query == "select @@version_comment limit 1") // MariaDB client starts session with that query
query = "select ''";
ReadBufferFromString buf(query);
executeQuery(buf, *out, true, connection_context, set_content_type, nullptr);
if (!with_output)
packet_sender->sendPacket(OK_Packet(0x00, client_capability_flags, 0, 0, 0), true);
}
}

View File

@ -0,0 +1,59 @@
#pragma once
#include <Poco/Net/TCPServerConnection.h>
#include <Poco/Net/SecureStreamSocket.h>
#include <Common/getFQDNOrHostName.h>
#include <Core/MySQLProtocol.h>
#include <openssl/rsa.h>
#include "IServer.h"
namespace DB
{
/// Handler for MySQL wire protocol connections. Allows to connect to ClickHouse using MySQL client.
class MySQLHandler : public Poco::Net::TCPServerConnection
{
public:
MySQLHandler(IServer & server_, const Poco::Net::StreamSocket & socket_, RSA & public_key, RSA & private_key, bool ssl_enabled, size_t connection_id);
void run() final;
private:
/// Enables SSL, if client requested.
MySQLProtocol::HandshakeResponse finishHandshake();
void comQuery(const String & payload);
void comFieldList(const String & payload);
void comPing();
void comInitDB(const String & payload);
static String generateScramble();
void authenticate(const MySQLProtocol::HandshakeResponse &, const String & scramble);
IServer & server;
Poco::Logger * log;
Context connection_context;
std::shared_ptr<MySQLProtocol::PacketSender> packet_sender;
size_t connection_id = 0;
size_t server_capability_flags = 0;
size_t client_capability_flags = 0;
RSA & public_key;
RSA & private_key;
std::shared_ptr<ReadBuffer> in;
std::shared_ptr<WriteBuffer> out;
bool secure_connection = false;
std::shared_ptr<Poco::Net::SecureStreamSocket> ss;
};
}

View File

@ -0,0 +1,124 @@
#include <Common/OpenSSLHelpers.h>
#include <Poco/Crypto/X509Certificate.h>
#include <Poco/Net/SSLManager.h>
#include <Poco/Net/TCPServerConnectionFactory.h>
#include <Poco/Util/Application.h>
#include <common/logger_useful.h>
#include <ext/scope_guard.h>
#include "IServer.h"
#include "MySQLHandler.h"
#include "MySQLHandlerFactory.h"
namespace DB
{
namespace ErrorCodes
{
extern const int CANNOT_OPEN_FILE;
extern const int NO_ELEMENTS_IN_CONFIG;
extern const int OPENSSL_ERROR;
extern const int SYSTEM_ERROR;
}
MySQLHandlerFactory::MySQLHandlerFactory(IServer & server_)
: server(server_)
, log(&Logger::get("MySQLHandlerFactory"))
{
try
{
Poco::Net::SSLManager::instance().defaultServerContext();
}
catch (...)
{
LOG_INFO(log, "Failed to create SSL context. SSL will be disabled. Error: " << getCurrentExceptionMessage(false));
ssl_enabled = false;
}
/// Reading rsa keys for SHA256 authentication plugin.
try
{
readRSAKeys();
}
catch (...)
{
LOG_WARNING(log, "Failed to read RSA keys. Error: " << getCurrentExceptionMessage(false));
generateRSAKeys();
}
}
void MySQLHandlerFactory::readRSAKeys()
{
const Poco::Util::LayeredConfiguration & config = Poco::Util::Application::instance().config();
String certificateFileProperty = "openSSL.server.certificateFile";
String privateKeyFileProperty = "openSSL.server.privateKeyFile";
if (!config.has(certificateFileProperty))
throw Exception("Certificate file is not set.", ErrorCodes::NO_ELEMENTS_IN_CONFIG);
if (!config.has(privateKeyFileProperty))
throw Exception("Private key file is not set.", ErrorCodes::NO_ELEMENTS_IN_CONFIG);
{
String certificateFile = config.getString(certificateFileProperty);
FILE * fp = fopen(certificateFile.data(), "r");
if (fp == nullptr)
throw Exception("Cannot open certificate file: " + certificateFile + ".", ErrorCodes::CANNOT_OPEN_FILE);
SCOPE_EXIT(fclose(fp));
X509 * x509 = PEM_read_X509(fp, nullptr, nullptr, nullptr);
SCOPE_EXIT(X509_free(x509));
if (x509 == nullptr)
throw Exception("Failed to read PEM certificate from " + certificateFile + ". Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
EVP_PKEY * p = X509_get_pubkey(x509);
if (p == nullptr)
throw Exception("Failed to get RSA key from X509. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
SCOPE_EXIT(EVP_PKEY_free(p));
public_key.reset(EVP_PKEY_get1_RSA(p));
if (public_key.get() == nullptr)
throw Exception("Failed to get RSA key from ENV_PKEY. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
}
{
String privateKeyFile = config.getString(privateKeyFileProperty);
FILE * fp = fopen(privateKeyFile.data(), "r");
if (fp == nullptr)
throw Exception ("Cannot open private key file " + privateKeyFile + ".", ErrorCodes::CANNOT_OPEN_FILE);
SCOPE_EXIT(fclose(fp));
private_key.reset(PEM_read_RSAPrivateKey(fp, nullptr, nullptr, nullptr));
if (!private_key)
throw Exception("Failed to read RSA private key from " + privateKeyFile + ". Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
}
}
void MySQLHandlerFactory::generateRSAKeys()
{
LOG_INFO(log, "Generating new RSA key.");
public_key.reset(RSA_new());
if (!public_key)
throw Exception("Failed to allocate RSA key. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
BIGNUM * e = BN_new();
if (!e)
throw Exception("Failed to allocate BIGNUM. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
SCOPE_EXIT(BN_free(e));
if (!BN_set_word(e, 65537) || !RSA_generate_key_ex(public_key.get(), 2048, e, nullptr))
throw Exception("Failed to generate RSA key. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
private_key.reset(RSAPrivateKey_dup(public_key.get()));
if (!private_key)
throw Exception("Failed to copy RSA key. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
}
Poco::Net::TCPServerConnection * MySQLHandlerFactory::createConnection(const Poco::Net::StreamSocket & socket)
{
size_t connection_id = last_connection_id++;
LOG_TRACE(log, "MySQL connection. Id: " << connection_id << ". Address: " << socket.peerAddress().toString());
return new MySQLHandler(server, socket, *public_key, *private_key, ssl_enabled, connection_id);
}
}

View File

@ -0,0 +1,39 @@
#pragma once
#include <Poco/Net/TCPServerConnectionFactory.h>
#include <atomic>
#include <openssl/rsa.h>
#include "IServer.h"
namespace DB
{
class MySQLHandlerFactory : public Poco::Net::TCPServerConnectionFactory
{
private:
IServer & server;
Poco::Logger * log;
struct RSADeleter
{
void operator()(RSA * ptr) { RSA_free(ptr); }
};
using RSAPtr = std::unique_ptr<RSA, RSADeleter>;
RSAPtr public_key;
RSAPtr private_key;
bool ssl_enabled = true;
std::atomic<size_t> last_connection_id = 0;
public:
explicit MySQLHandlerFactory(IServer & server_);
void readRSAKeys();
void generateRSAKeys();
Poco::Net::TCPServerConnection * createConnection(const Poco::Net::StreamSocket & socket) override;
};
}

View File

@ -33,6 +33,7 @@
#include <IO/UseSSL.h>
#include <Interpreters/AsynchronousMetrics.h>
#include <Interpreters/DDLWorker.h>
#include <Interpreters/ExternalDictionaries.h>
#include <Interpreters/ProcessList.h>
#include <Interpreters/loadMetadata.h>
#include <Interpreters/DNSCacheUpdater.h>
@ -49,6 +50,7 @@
#include <Common/StatusFile.h>
#include "TCPHandlerFactory.h"
#include "Common/config_version.h"
#include "MySQLHandlerFactory.h"
#if defined(__linux__)
#include <Common/hasLinuxCapability.h>
@ -79,6 +81,7 @@ namespace ErrorCodes
extern const int SYSTEM_ERROR;
extern const int FAILED_TO_GETPWUID;
extern const int MISMATCHING_USERS_FOR_PROCESS_AND_DATA;
extern const int NETWORK_ERROR;
}
@ -308,7 +311,7 @@ int Server::main(const std::vector<std::string> & /*args*/)
/// Initialize DateLUT early, to not interfere with running time of first query.
LOG_DEBUG(log, "Initializing DateLUT.");
DateLUT::instance();
LOG_TRACE(log, "Initialized DateLUT with time zone `" << DateLUT::instance().getTimeZone() << "'.");
LOG_TRACE(log, "Initialized DateLUT with time zone '" << DateLUT::instance().getTimeZone() << "'.");
/// Directory with temporary data for processing of heavy queries.
{
@ -403,7 +406,7 @@ int Server::main(const std::vector<std::string> & /*args*/)
main_config_zk_changed_event,
[&](ConfigurationPtr config)
{
buildLoggers(*config);
buildLoggers(*config, logger());
global_context->setClustersConfig(config);
global_context->setMacros(std::make_unique<Macros>(*config, "macros"));
},
@ -587,12 +590,12 @@ int Server::main(const std::vector<std::string> & /*args*/)
return socket_address;
};
auto socket_bind_listen = [&](auto & socket, const std::string & host, UInt16 port, bool secure = 0)
auto socket_bind_listen = [&](auto & socket, const std::string & host, UInt16 port, [[maybe_unused]] bool secure = 0)
{
auto address = make_socket_address(host, port);
#if !defined(POCO_CLICKHOUSE_PATCH) || POCO_VERSION <= 0x02000000 // TODO: fill correct version
#if !defined(POCO_CLICKHOUSE_PATCH) || POCO_VERSION < 0x01090100
if (secure)
/// Bug in old poco, listen() after bind() with reusePort param will fail because have no implementation in SecureServerSocketImpl
/// Bug in old (<1.9.1) poco, listen() after bind() with reusePort param will fail because have no implementation in SecureServerSocketImpl
/// https://github.com/pocoproject/poco/pull/2257
socket.bind(address, /* reuseAddress = */ true);
else
@ -611,13 +614,15 @@ int Server::main(const std::vector<std::string> & /*args*/)
for (const auto & listen_host : listen_hosts)
{
/// For testing purposes, user may omit tcp_port or http_port or https_port in configuration file.
uint16_t listen_port = 0;
try
{
/// HTTP
if (config().has("http_port"))
{
Poco::Net::ServerSocket socket;
auto address = socket_bind_listen(socket, listen_host, config().getInt("http_port"));
listen_port = config().getInt("http_port");
auto address = socket_bind_listen(socket, listen_host, listen_port);
socket.setReceiveTimeout(settings.http_receive_timeout);
socket.setSendTimeout(settings.http_send_timeout);
servers.emplace_back(std::make_unique<Poco::Net::HTTPServer>(
@ -634,7 +639,8 @@ int Server::main(const std::vector<std::string> & /*args*/)
{
#if USE_POCO_NETSSL
Poco::Net::SecureServerSocket socket;
auto address = socket_bind_listen(socket, listen_host, config().getInt("https_port"), /* secure = */ true);
listen_port = config().getInt("https_port");
auto address = socket_bind_listen(socket, listen_host, listen_port, /* secure = */ true);
socket.setReceiveTimeout(settings.http_receive_timeout);
socket.setSendTimeout(settings.http_send_timeout);
servers.emplace_back(std::make_unique<Poco::Net::HTTPServer>(
@ -654,7 +660,8 @@ int Server::main(const std::vector<std::string> & /*args*/)
if (config().has("tcp_port"))
{
Poco::Net::ServerSocket socket;
auto address = socket_bind_listen(socket, listen_host, config().getInt("tcp_port"));
listen_port = config().getInt("tcp_port");
auto address = socket_bind_listen(socket, listen_host, listen_port);
socket.setReceiveTimeout(settings.receive_timeout);
socket.setSendTimeout(settings.send_timeout);
servers.emplace_back(std::make_unique<Poco::Net::TCPServer>(
@ -663,7 +670,7 @@ int Server::main(const std::vector<std::string> & /*args*/)
socket,
new Poco::Net::TCPServerParams));
LOG_INFO(log, "Listening tcp: " + address.toString());
LOG_INFO(log, "Listening for connections with native protocol (tcp): " + address.toString());
}
/// TCP with SSL
@ -671,7 +678,8 @@ int Server::main(const std::vector<std::string> & /*args*/)
{
#if USE_POCO_NETSSL
Poco::Net::SecureServerSocket socket;
auto address = socket_bind_listen(socket, listen_host, config().getInt("tcp_port_secure"), /* secure = */ true);
listen_port = config().getInt("tcp_port_secure");
auto address = socket_bind_listen(socket, listen_host, listen_port, /* secure = */ true);
socket.setReceiveTimeout(settings.receive_timeout);
socket.setSendTimeout(settings.send_timeout);
servers.emplace_back(std::make_unique<Poco::Net::TCPServer>(
@ -679,7 +687,7 @@ int Server::main(const std::vector<std::string> & /*args*/)
server_pool,
socket,
new Poco::Net::TCPServerParams));
LOG_INFO(log, "Listening tcp_secure: " + address.toString());
LOG_INFO(log, "Listening for connections with secure native protocol (tcp_secure): " + address.toString());
#else
throw Exception{"SSL support for TCP protocol is disabled because Poco library was built without NetSSL support.",
ErrorCodes::SUPPORT_IS_DISABLED};
@ -694,7 +702,8 @@ int Server::main(const std::vector<std::string> & /*args*/)
if (config().has("interserver_http_port"))
{
Poco::Net::ServerSocket socket;
auto address = socket_bind_listen(socket, listen_host, config().getInt("interserver_http_port"));
listen_port = config().getInt("interserver_http_port");
auto address = socket_bind_listen(socket, listen_host, listen_port);
socket.setReceiveTimeout(settings.http_receive_timeout);
socket.setSendTimeout(settings.http_send_timeout);
servers.emplace_back(std::make_unique<Poco::Net::HTTPServer>(
@ -703,14 +712,15 @@ int Server::main(const std::vector<std::string> & /*args*/)
socket,
http_params));
LOG_INFO(log, "Listening interserver http: " + address.toString());
LOG_INFO(log, "Listening for replica communication (interserver) http://" + address.toString());
}
if (config().has("interserver_https_port"))
{
#if USE_POCO_NETSSL
Poco::Net::SecureServerSocket socket;
auto address = socket_bind_listen(socket, listen_host, config().getInt("interserver_https_port"), /* secure = */ true);
listen_port = config().getInt("interserver_https_port");
auto address = socket_bind_listen(socket, listen_host, listen_port, /* secure = */ true);
socket.setReceiveTimeout(settings.http_receive_timeout);
socket.setSendTimeout(settings.http_send_timeout);
servers.emplace_back(std::make_unique<Poco::Net::HTTPServer>(
@ -719,23 +729,39 @@ int Server::main(const std::vector<std::string> & /*args*/)
socket,
http_params));
LOG_INFO(log, "Listening interserver https: " + address.toString());
LOG_INFO(log, "Listening for secure replica communication (interserver) https://" + address.toString());
#else
throw Exception{"SSL support for TCP protocol is disabled because Poco library was built without NetSSL support.",
ErrorCodes::SUPPORT_IS_DISABLED};
#endif
}
if (config().has("mysql_port"))
{
Poco::Net::ServerSocket socket;
auto address = socket_bind_listen(socket, listen_host, config().getInt("mysql_port"), /* secure = */ true);
socket.setReceiveTimeout(Poco::Timespan());
socket.setSendTimeout(settings.send_timeout);
servers.emplace_back(std::make_unique<Poco::Net::TCPServer>(
new MySQLHandlerFactory(*this),
server_pool,
socket,
new Poco::Net::TCPServerParams));
LOG_INFO(log, "Listening for MySQL compatibility protocol: " + address.toString());
}
}
catch (const Poco::Net::NetException & e)
catch (const Poco::Exception & e)
{
std::string message = "Listen [" + listen_host + "]:" + std::to_string(listen_port) + " failed: " + std::to_string(e.code()) + ": " + e.what() + ": " + e.message();
if (listen_try)
LOG_ERROR(log, "Listen [" << listen_host << "]: " << e.code() << ": " << e.what() << ": " << e.message()
LOG_ERROR(log, message
<< " If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to "
"specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration "
"file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> ."
" Example for disabled IPv4: <listen_host>::</listen_host>");
else
throw;
throw Exception{message, ErrorCodes::NETWORK_ERROR};
}
}
@ -807,7 +833,7 @@ int Server::main(const std::vector<std::string> & /*args*/)
if (!config().getBool("dictionaries_lazy_load", true))
{
global_context->tryCreateEmbeddedDictionaries();
global_context->tryCreateExternalDictionaries();
global_context->getExternalDictionaries().enableAlwaysLoadEverything(true);
}
}
catch (...)

View File

@ -1,8 +1,6 @@
#include <iomanip>
#include <ext/scope_guard.h>
#include <Poco/Net/NetException.h>
#include <daemon/OwnSplitChannel.h>
#include <Common/ClickHouseRevision.h>
#include <Common/CurrentThread.h>
#include <Common/Stopwatch.h>

View File

@ -1,8 +1,5 @@
<yandex>
<!-- <zookeeper>
<node>
<host>localhost</host>
<port>2181</port>
</node>
</zookeeper>-->
<zookeeper>
<implementation>testkeeper</implementation>
</zookeeper>
</yandex>

View File

@ -1,2 +1,5 @@
*.bin
*.mrk
*.txt
*.dat
*.idx

View File

@ -27,8 +27,8 @@ struct AggregateFunctionArgMinMaxData
};
/// Returns the first arg value found for the minimum/maximum value. Example: argMax(arg, value).
template <typename Data>
class AggregateFunctionArgMinMax final : public IAggregateFunctionDataHelper<Data, AggregateFunctionArgMinMax<Data>>
template <typename Data, bool AllocatesMemoryInArena>
class AggregateFunctionArgMinMax final : public IAggregateFunctionDataHelper<Data, AggregateFunctionArgMinMax<Data, AllocatesMemoryInArena>>
{
private:
const DataTypePtr & type_res;
@ -36,7 +36,7 @@ private:
public:
AggregateFunctionArgMinMax(const DataTypePtr & type_res, const DataTypePtr & type_val)
: IAggregateFunctionDataHelper<Data, AggregateFunctionArgMinMax<Data>>({type_res, type_val}, {}),
: IAggregateFunctionDataHelper<Data, AggregateFunctionArgMinMax<Data, AllocatesMemoryInArena>>({type_res, type_val}, {}),
type_res(this->argument_types[0]), type_val(this->argument_types[1])
{
if (!type_val->isComparable())
@ -75,6 +75,11 @@ public:
this->data(place).value.read(buf, *type_val, arena);
}
bool allocatesMemoryInArena() const override
{
return AllocatesMemoryInArena;
}
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
{
this->data(place).result.insertResultInto(to);

View File

@ -1,10 +1,10 @@
#pragma once
#include <roaring.h>
#include <roaring/roaring.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <boost/noncopyable.hpp>
#include <roaring.hh>
#include <roaring/roaring.hh>
#include <Common/HashTable/SmallTable.h>
#include <Common/PODArray.h>
@ -308,16 +308,88 @@ public:
/**
* Check whether two bitmaps intersect.
* Intersection with an empty set is always 0 (consistent with hasAny).
*/
UInt8 rb_intersect(const RoaringBitmapWithSmallSet & r1)
UInt8 rb_intersect(const RoaringBitmapWithSmallSet & r1) const
{
if (isSmall())
toLarge();
roaring_bitmap_t * rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
UInt8 is_true = roaring_bitmap_intersect(rb, rb1);
if (r1.isSmall())
roaring_bitmap_free(rb1);
return is_true;
{
if (r1.isSmall())
{
for (const auto & x : r1.small)
if (small.find(x.getValue()) != small.end())
return 1;
}
else
{
for (const auto & x : small)
if (roaring_bitmap_contains(r1.rb, x.getValue()))
return 1;
}
}
else if (r1.isSmall())
{
for (const auto & x : r1.small)
if (roaring_bitmap_contains(rb, x.getValue()))
return 1;
}
else if (roaring_bitmap_intersect(rb, r1.rb))
return 1;
return 0;
}
/**
* Check whether the argument is the subset of this set.
* Empty set is a subset of any other set (consistent with hasAll).
*/
UInt8 rb_is_subset(const RoaringBitmapWithSmallSet & r1) const
{
if (isSmall())
{
if (r1.isSmall())
{
for (const auto & x : r1.small)
if (small.find(x.getValue()) == small.end())
return 0;
}
else
{
UInt64 r1_size = r1.size();
if (r1_size > small.size())
return 0; // A bigger set can not be a subset of ours.
// This is a rare case with a small number of elements on
// both sides: r1 was promoted to large for some reason and
// it is still not larger than our small set.
// If r1 is our subset then our size must be equal to
// r1_size + number of not found elements, if this sum becomes
// greater then r1 is not a subset.
for (const auto & x : small)
if (!roaring_bitmap_contains(r1.rb, x.getValue()) && ++r1_size > small.size())
return 0;
}
}
else if (r1.isSmall())
{
for (const auto & x : r1.small)
if (!roaring_bitmap_contains(rb, x.getValue()))
return 0;
}
else if (!roaring_bitmap_is_subset(r1.rb, rb))
return 0;
return 1;
}
/**
* Check whether this bitmap contains the argument.
*/
UInt8 rb_contains(const UInt32 x) const
{
return isSmall() ? small.find(x) != small.end() :
roaring_bitmap_contains(rb, x);
}
/**

View File

@ -0,0 +1,484 @@
#include "AggregateFunctionMLMethod.h"
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/castColumn.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnTuple.h>
#include <Common/FieldVisitors.h>
#include <Common/typeid_cast.h>
#include "AggregateFunctionFactory.h"
#include "FactoryHelpers.h"
#include "Helpers.h"
namespace DB
{
namespace
{
using FuncLinearRegression = AggregateFunctionMLMethod<LinearModelData, NameLinearRegression>;
using FuncLogisticRegression = AggregateFunctionMLMethod<LinearModelData, NameLogisticRegression>;
template <class Method>
AggregateFunctionPtr
createAggregateFunctionMLMethod(const std::string & name, const DataTypes & argument_types, const Array & parameters)
{
if (parameters.size() > 4)
throw Exception(
"Aggregate function " + name
+ " requires at most four parameters: learning_rate, l2_regularization_coef, mini-batch size and weights_updater "
"method",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
if (argument_types.size() < 2)
throw Exception(
"Aggregate function " + name + " requires at least two arguments: target and model's parameters",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
for (size_t i = 0; i < argument_types.size(); ++i)
{
if (!isNativeNumber(argument_types[i]))
throw Exception(
"Argument " + std::to_string(i) + " of type " + argument_types[i]->getName()
+ " must be numeric for aggregate function " + name,
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
/// Such default parameters were picked because they did good on some tests,
/// though it still requires to fit parameters to achieve better result
auto learning_rate = Float64(0.01);
auto l2_reg_coef = Float64(0.1);
UInt32 batch_size = 15;
std::string weights_updater_name = "SGD";
std::unique_ptr<IGradientComputer> gradient_computer;
if (!parameters.empty())
{
learning_rate = applyVisitor(FieldVisitorConvertToNumber<Float64>(), parameters[0]);
}
if (parameters.size() > 1)
{
l2_reg_coef = applyVisitor(FieldVisitorConvertToNumber<Float64>(), parameters[1]);
}
if (parameters.size() > 2)
{
batch_size = applyVisitor(FieldVisitorConvertToNumber<UInt32>(), parameters[2]);
}
if (parameters.size() > 3)
{
weights_updater_name = parameters[3].safeGet<String>();
if (weights_updater_name != "SGD" && weights_updater_name != "Momentum" && weights_updater_name != "Nesterov")
throw Exception("Invalid parameter for weights updater. The only supported are 'SGD', 'Momentum' and 'Nesterov'",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
if (std::is_same<Method, FuncLinearRegression>::value)
{
gradient_computer = std::make_unique<LinearRegression>();
}
else if (std::is_same<Method, FuncLogisticRegression>::value)
{
gradient_computer = std::make_unique<LogisticRegression>();
}
else
{
throw Exception("Such gradient computer is not implemented yet", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
return std::make_shared<Method>(
argument_types.size() - 1,
std::move(gradient_computer),
weights_updater_name,
learning_rate,
l2_reg_coef,
batch_size,
argument_types,
parameters);
}
}
void registerAggregateFunctionMLMethod(AggregateFunctionFactory & factory)
{
factory.registerFunction("stochasticLinearRegression", createAggregateFunctionMLMethod<FuncLinearRegression>);
factory.registerFunction("stochasticLogisticRegression", createAggregateFunctionMLMethod<FuncLogisticRegression>);
}
LinearModelData::LinearModelData(
Float64 learning_rate,
Float64 l2_reg_coef,
UInt32 param_num,
UInt32 batch_capacity,
std::shared_ptr<DB::IGradientComputer> gradient_computer,
std::shared_ptr<DB::IWeightsUpdater> weights_updater)
: learning_rate(learning_rate)
, l2_reg_coef(l2_reg_coef)
, batch_capacity(batch_capacity)
, batch_size(0)
, gradient_computer(std::move(gradient_computer))
, weights_updater(std::move(weights_updater))
{
weights.resize(param_num, Float64{0.0});
gradient_batch.resize(param_num + 1, Float64{0.0});
}
void LinearModelData::update_state()
{
if (batch_size == 0)
return;
weights_updater->update(batch_size, weights, bias, gradient_batch);
batch_size = 0;
++iter_num;
gradient_batch.assign(gradient_batch.size(), Float64{0.0});
}
void LinearModelData::predict(
ColumnVector<Float64>::Container & container,
Block & block,
size_t offset,
size_t limit,
const ColumnNumbers & arguments,
const Context & context) const
{
gradient_computer->predict(container, block, offset, limit, arguments, weights, bias, context);
}
void LinearModelData::returnWeights(IColumn & to) const
{
size_t size = weights.size() + 1;
ColumnArray & arr_to = static_cast<ColumnArray &>(to);
ColumnArray::Offsets & offsets_to = arr_to.getOffsets();
size_t old_size = offsets_to.back();
offsets_to.push_back(old_size + size);
typename ColumnFloat64::Container & val_to
= static_cast<ColumnFloat64 &>(arr_to.getData()).getData();
val_to.reserve(old_size + size);
for (size_t i = 0; i + 1 < size; ++i)
val_to.push_back(weights[i]);
val_to.push_back(bias);
}
void LinearModelData::read(ReadBuffer & buf)
{
readBinary(bias, buf);
readBinary(weights, buf);
readBinary(iter_num, buf);
readBinary(gradient_batch, buf);
readBinary(batch_size, buf);
weights_updater->read(buf);
}
void LinearModelData::write(WriteBuffer & buf) const
{
writeBinary(bias, buf);
writeBinary(weights, buf);
writeBinary(iter_num, buf);
writeBinary(gradient_batch, buf);
writeBinary(batch_size, buf);
weights_updater->write(buf);
}
void LinearModelData::merge(const DB::LinearModelData & rhs)
{
if (iter_num == 0 && rhs.iter_num == 0)
return;
update_state();
/// can't update rhs state because it's constant
Float64 frac = (static_cast<Float64>(iter_num) * iter_num) / (iter_num * iter_num + rhs.iter_num * rhs.iter_num);
for (size_t i = 0; i < weights.size(); ++i)
{
weights[i] = weights[i] * frac + rhs.weights[i] * (1 - frac);
}
bias = bias * frac + rhs.bias * (1 - frac);
iter_num += rhs.iter_num;
weights_updater->merge(*rhs.weights_updater, frac, 1 - frac);
}
void LinearModelData::add(const IColumn ** columns, size_t row_num)
{
/// first column stores target; features start from (columns + 1)
Float64 target = (*columns[0]).getFloat64(row_num);
/// Here we have columns + 1 as first column corresponds to target value, and others - to features
weights_updater->add_to_batch(
gradient_batch, *gradient_computer, weights, bias, learning_rate, l2_reg_coef, target, columns + 1, row_num);
++batch_size;
if (batch_size == batch_capacity)
{
update_state();
}
}
void Nesterov::read(ReadBuffer & buf)
{
readBinary(accumulated_gradient, buf);
}
void Nesterov::write(WriteBuffer & buf) const
{
writeBinary(accumulated_gradient, buf);
}
void Nesterov::merge(const IWeightsUpdater & rhs, Float64 frac, Float64 rhs_frac)
{
auto & nesterov_rhs = static_cast<const Nesterov &>(rhs);
for (size_t i = 0; i < accumulated_gradient.size(); ++i)
{
accumulated_gradient[i] = accumulated_gradient[i] * frac + nesterov_rhs.accumulated_gradient[i] * rhs_frac;
}
}
void Nesterov::update(UInt32 batch_size, std::vector<Float64> & weights, Float64 & bias, const std::vector<Float64> & batch_gradient)
{
if (accumulated_gradient.empty())
{
accumulated_gradient.resize(batch_gradient.size(), Float64{0.0});
}
for (size_t i = 0; i < batch_gradient.size(); ++i)
{
accumulated_gradient[i] = accumulated_gradient[i] * alpha_ + batch_gradient[i] / batch_size;
}
for (size_t i = 0; i < weights.size(); ++i)
{
weights[i] += accumulated_gradient[i];
}
bias += accumulated_gradient[weights.size()];
}
void Nesterov::add_to_batch(
std::vector<Float64> & batch_gradient,
IGradientComputer & gradient_computer,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num)
{
if (accumulated_gradient.empty())
{
accumulated_gradient.resize(batch_gradient.size(), Float64{0.0});
}
std::vector<Float64> shifted_weights(weights.size());
for (size_t i = 0; i != shifted_weights.size(); ++i)
{
shifted_weights[i] = weights[i] + accumulated_gradient[i] * alpha_;
}
auto shifted_bias = bias + accumulated_gradient[weights.size()] * alpha_;
gradient_computer.compute(batch_gradient, shifted_weights, shifted_bias, learning_rate, l2_reg_coef, target, columns, row_num);
}
void Momentum::read(ReadBuffer & buf)
{
readBinary(accumulated_gradient, buf);
}
void Momentum::write(WriteBuffer & buf) const
{
writeBinary(accumulated_gradient, buf);
}
void Momentum::merge(const IWeightsUpdater & rhs, Float64 frac, Float64 rhs_frac)
{
auto & momentum_rhs = static_cast<const Momentum &>(rhs);
for (size_t i = 0; i < accumulated_gradient.size(); ++i)
{
accumulated_gradient[i] = accumulated_gradient[i] * frac + momentum_rhs.accumulated_gradient[i] * rhs_frac;
}
}
void Momentum::update(UInt32 batch_size, std::vector<Float64> & weights, Float64 & bias, const std::vector<Float64> & batch_gradient)
{
/// batch_size is already checked to be greater than 0
if (accumulated_gradient.empty())
{
accumulated_gradient.resize(batch_gradient.size(), Float64{0.0});
}
for (size_t i = 0; i < batch_gradient.size(); ++i)
{
accumulated_gradient[i] = accumulated_gradient[i] * alpha_ + batch_gradient[i] / batch_size;
}
for (size_t i = 0; i < weights.size(); ++i)
{
weights[i] += accumulated_gradient[i];
}
bias += accumulated_gradient[weights.size()];
}
void StochasticGradientDescent::update(
UInt32 batch_size, std::vector<Float64> & weights, Float64 & bias, const std::vector<Float64> & batch_gradient)
{
/// batch_size is already checked to be greater than 0
for (size_t i = 0; i < weights.size(); ++i)
{
weights[i] += batch_gradient[i] / batch_size;
}
bias += batch_gradient[weights.size()] / batch_size;
}
void IWeightsUpdater::add_to_batch(
std::vector<Float64> & batch_gradient,
IGradientComputer & gradient_computer,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num)
{
gradient_computer.compute(batch_gradient, weights, bias, learning_rate, l2_reg_coef, target, columns, row_num);
}
void LogisticRegression::predict(
ColumnVector<Float64>::Container & container,
Block & block,
size_t offset,
size_t limit,
const ColumnNumbers & arguments,
const std::vector<Float64> & weights,
Float64 bias,
const Context & /*context*/) const
{
size_t rows_num = block.rows();
if (offset > rows_num || offset + limit > rows_num)
throw Exception("Invalid offset and limit for LogisticRegression::predict. "
"Block has " + toString(rows_num) + " rows, but offset is " + toString(offset) +
" and limit is " + toString(limit), ErrorCodes::LOGICAL_ERROR);
std::vector<Float64> results(limit, bias);
for (size_t i = 1; i < arguments.size(); ++i)
{
const ColumnWithTypeAndName & cur_col = block.getByPosition(arguments[i]);
if (!isNativeNumber(cur_col.type))
throw Exception("Prediction arguments must have numeric type", ErrorCodes::BAD_ARGUMENTS);
auto & features_column = cur_col.column;
for (size_t row_num = 0; row_num < limit; ++row_num)
results[row_num] += weights[i - 1] * features_column->getFloat64(offset + row_num);
}
container.reserve(container.size() + limit);
for (size_t row_num = 0; row_num < limit; ++row_num)
container.emplace_back(1 / (1 + exp(-results[row_num])));
}
void LogisticRegression::compute(
std::vector<Float64> & batch_gradient,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num)
{
Float64 derivative = bias;
for (size_t i = 0; i < weights.size(); ++i)
{
auto value = (*columns[i]).getFloat64(row_num);
derivative += weights[i] * value;
}
derivative *= target;
derivative = exp(derivative);
batch_gradient[weights.size()] += learning_rate * target / (derivative + 1);
for (size_t i = 0; i < weights.size(); ++i)
{
auto value = (*columns[i]).getFloat64(row_num);
batch_gradient[i] += learning_rate * target * value / (derivative + 1) - 2 * learning_rate * l2_reg_coef * weights[i];
}
}
void LinearRegression::predict(
ColumnVector<Float64>::Container & container,
Block & block,
size_t offset,
size_t limit,
const ColumnNumbers & arguments,
const std::vector<Float64> & weights,
Float64 bias,
const Context & /*context*/) const
{
if (weights.size() + 1 != arguments.size())
{
throw Exception("In predict function number of arguments differs from the size of weights vector", ErrorCodes::LOGICAL_ERROR);
}
size_t rows_num = block.rows();
if (offset > rows_num || offset + limit > rows_num)
throw Exception("Invalid offset and limit for LogisticRegression::predict. "
"Block has " + toString(rows_num) + " rows, but offset is " + toString(offset) +
" and limit is " + toString(limit), ErrorCodes::LOGICAL_ERROR);
std::vector<Float64> results(limit, bias);
for (size_t i = 1; i < arguments.size(); ++i)
{
const ColumnWithTypeAndName & cur_col = block.getByPosition(arguments[i]);
if (!isNativeNumber(cur_col.type))
throw Exception("Prediction arguments must have numeric type", ErrorCodes::BAD_ARGUMENTS);
auto features_column = cur_col.column;
if (!features_column)
throw Exception("Unexpectedly cannot dynamically cast features column " + std::to_string(i), ErrorCodes::LOGICAL_ERROR);
for (size_t row_num = 0; row_num < limit; ++row_num)
results[row_num] += weights[i - 1] * features_column->getFloat64(row_num + offset);
}
container.reserve(container.size() + limit);
for (size_t row_num = 0; row_num < limit; ++row_num)
container.emplace_back(results[row_num]);
}
void LinearRegression::compute(
std::vector<Float64> & batch_gradient,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num)
{
Float64 derivative = (target - bias);
for (size_t i = 0; i < weights.size(); ++i)
{
auto value = (*columns[i]).getFloat64(row_num);
derivative -= weights[i] * value;
}
derivative *= (2 * learning_rate);
batch_gradient[weights.size()] += derivative;
for (size_t i = 0; i < weights.size(); ++i)
{
auto value = (*columns[i]).getFloat64(row_num);
batch_gradient[i] += derivative * value - 2 * learning_rate * l2_reg_coef * weights[i];
}
}
}

View File

@ -0,0 +1,374 @@
#pragma once
#include <Columns/ColumnVector.h>
#include <Columns/ColumnsCommon.h>
#include <Columns/ColumnsNumber.h>
#include <Common/typeid_cast.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeArray.h>
#include "IAggregateFunction.h"
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int BAD_ARGUMENTS;
extern const int BAD_CAST;
}
/**
GradientComputer class computes gradient according to its loss function
*/
class IGradientComputer
{
public:
IGradientComputer() {}
virtual ~IGradientComputer() = default;
/// Adds computed gradient in new point (weights, bias) to batch_gradient
virtual void compute(
std::vector<Float64> & batch_gradient,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num) = 0;
virtual void predict(
ColumnVector<Float64>::Container & container,
Block & block,
size_t offset,
size_t limit,
const ColumnNumbers & arguments,
const std::vector<Float64> & weights,
Float64 bias,
const Context & context) const = 0;
};
class LinearRegression : public IGradientComputer
{
public:
LinearRegression() {}
void compute(
std::vector<Float64> & batch_gradient,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num) override;
void predict(
ColumnVector<Float64>::Container & container,
Block & block,
size_t offset,
size_t limit,
const ColumnNumbers & arguments,
const std::vector<Float64> & weights,
Float64 bias,
const Context & context) const override;
};
class LogisticRegression : public IGradientComputer
{
public:
LogisticRegression() {}
void compute(
std::vector<Float64> & batch_gradient,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num) override;
void predict(
ColumnVector<Float64>::Container & container,
Block & block,
size_t offset,
size_t limit,
const ColumnNumbers & arguments,
const std::vector<Float64> & weights,
Float64 bias,
const Context & context) const override;
};
/**
* IWeightsUpdater class defines the way to update current weights
* and uses GradientComputer class on each iteration
*/
class IWeightsUpdater
{
public:
virtual ~IWeightsUpdater() = default;
/// Calls GradientComputer to update current mini-batch
virtual void add_to_batch(
std::vector<Float64> & batch_gradient,
IGradientComputer & gradient_computer,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num);
/// Updates current weights according to the gradient from the last mini-batch
virtual void update(UInt32 batch_size, std::vector<Float64> & weights, Float64 & bias, const std::vector<Float64> & gradient) = 0;
/// Used during the merge of two states
virtual void merge(const IWeightsUpdater &, Float64, Float64) {}
/// Used for serialization when necessary
virtual void write(WriteBuffer &) const {}
/// Used for serialization when necessary
virtual void read(ReadBuffer &) {}
};
class StochasticGradientDescent : public IWeightsUpdater
{
public:
void update(UInt32 batch_size, std::vector<Float64> & weights, Float64 & bias, const std::vector<Float64> & batch_gradient) override;
};
class Momentum : public IWeightsUpdater
{
public:
Momentum() {}
Momentum(Float64 alpha) : alpha_(alpha) {}
void update(UInt32 batch_size, std::vector<Float64> & weights, Float64 & bias, const std::vector<Float64> & batch_gradient) override;
virtual void merge(const IWeightsUpdater & rhs, Float64 frac, Float64 rhs_frac) override;
void write(WriteBuffer & buf) const override;
void read(ReadBuffer & buf) override;
private:
Float64 alpha_{0.1};
std::vector<Float64> accumulated_gradient;
};
class Nesterov : public IWeightsUpdater
{
public:
Nesterov() {}
Nesterov(Float64 alpha) : alpha_(alpha) {}
void add_to_batch(
std::vector<Float64> & batch_gradient,
IGradientComputer & gradient_computer,
const std::vector<Float64> & weights,
Float64 bias,
Float64 learning_rate,
Float64 l2_reg_coef,
Float64 target,
const IColumn ** columns,
size_t row_num) override;
void update(UInt32 batch_size, std::vector<Float64> & weights, Float64 & bias, const std::vector<Float64> & batch_gradient) override;
virtual void merge(const IWeightsUpdater & rhs, Float64 frac, Float64 rhs_frac) override;
void write(WriteBuffer & buf) const override;
void read(ReadBuffer & buf) override;
private:
Float64 alpha_{0.1};
std::vector<Float64> accumulated_gradient;
};
/** LinearModelData is a class which manages current state of learning
*/
class LinearModelData
{
public:
LinearModelData() {}
LinearModelData(
Float64 learning_rate,
Float64 l2_reg_coef,
UInt32 param_num,
UInt32 batch_capacity,
std::shared_ptr<IGradientComputer> gradient_computer,
std::shared_ptr<IWeightsUpdater> weights_updater);
void add(const IColumn ** columns, size_t row_num);
void merge(const LinearModelData & rhs);
void write(WriteBuffer & buf) const;
void read(ReadBuffer & buf);
void predict(
ColumnVector<Float64>::Container & container,
Block & block,
size_t offset,
size_t limit,
const ColumnNumbers & arguments,
const Context & context) const;
void returnWeights(IColumn & to) const;
private:
std::vector<Float64> weights;
Float64 bias{0.0};
Float64 learning_rate;
Float64 l2_reg_coef;
UInt64 batch_capacity;
UInt64 iter_num = 0;
std::vector<Float64> gradient_batch;
UInt64 batch_size;
std::shared_ptr<IGradientComputer> gradient_computer;
std::shared_ptr<IWeightsUpdater> weights_updater;
/** The function is called when we want to flush current batch and update our weights
*/
void update_state();
};
template <
/// Implemented Machine Learning method
typename Data,
/// Name of the method
typename Name>
class AggregateFunctionMLMethod final : public IAggregateFunctionDataHelper<Data, AggregateFunctionMLMethod<Data, Name>>
{
public:
String getName() const override { return Name::name; }
explicit AggregateFunctionMLMethod(
UInt32 param_num,
std::unique_ptr<IGradientComputer> gradient_computer,
std::string weights_updater_name,
Float64 learning_rate,
Float64 l2_reg_coef,
UInt32 batch_size,
const DataTypes & arguments_types,
const Array & params)
: IAggregateFunctionDataHelper<Data, AggregateFunctionMLMethod<Data, Name>>(arguments_types, params)
, param_num(param_num)
, learning_rate(learning_rate)
, l2_reg_coef(l2_reg_coef)
, batch_size(batch_size)
, gradient_computer(std::move(gradient_computer))
, weights_updater_name(std::move(weights_updater_name))
{
}
/// This function is called when SELECT linearRegression(...) is called
DataTypePtr getReturnType() const override
{
return std::make_shared<DataTypeArray>(std::make_shared<DataTypeFloat64>());
}
/// This function is called from evalMLMethod function for correct predictValues call
DataTypePtr getReturnTypeToPredict() const override
{
return std::make_shared<DataTypeNumber<Float64>>();
}
void create(AggregateDataPtr place) const override
{
std::shared_ptr<IWeightsUpdater> new_weights_updater;
if (weights_updater_name == "SGD")
new_weights_updater = std::make_shared<StochasticGradientDescent>();
else if (weights_updater_name == "Momentum")
new_weights_updater = std::make_shared<Momentum>();
else if (weights_updater_name == "Nesterov")
new_weights_updater = std::make_shared<Nesterov>();
else
throw Exception("Illegal name of weights updater (should have been checked earlier)", ErrorCodes::LOGICAL_ERROR);
new (place) Data(learning_rate, l2_reg_coef, param_num, batch_size, gradient_computer, new_weights_updater);
}
void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
{
this->data(place).add(columns, row_num);
}
void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override { this->data(place).merge(this->data(rhs)); }
void serialize(ConstAggregateDataPtr place, WriteBuffer & buf) const override { this->data(place).write(buf); }
void deserialize(AggregateDataPtr place, ReadBuffer & buf, Arena *) const override { this->data(place).read(buf); }
void predictValues(
ConstAggregateDataPtr place,
IColumn & to,
Block & block,
size_t offset,
size_t limit,
const ColumnNumbers & arguments,
const Context & context) const override
{
if (arguments.size() != param_num + 1)
throw Exception(
"Predict got incorrect number of arguments. Got: " + std::to_string(arguments.size())
+ ". Required: " + std::to_string(param_num + 1),
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
/// This cast might be correct because column type is based on getReturnTypeToPredict.
auto * column = typeid_cast<ColumnFloat64 *>(&to);
if (!column)
throw Exception("Cast of column of predictions is incorrect. getReturnTypeToPredict must return same value as it is casted to",
ErrorCodes::BAD_CAST);
this->data(place).predict(column->getData(), block, offset, limit, arguments, context);
}
/** This function is called if aggregate function without State modifier is selected in a query.
* Inserts all weights of the model into the column 'to', so user may use such information if needed
*/
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
{
this->data(place).returnWeights(to);
}
const char * getHeaderFilePath() const override { return __FILE__; }
private:
UInt32 param_num;
Float64 learning_rate;
Float64 l2_reg_coef;
UInt32 batch_size;
std::shared_ptr<IGradientComputer> gradient_computer;
std::string weights_updater_name;
};
struct NameLinearRegression
{
static constexpr auto name = "stochasticLinearRegression";
};
struct NameLogisticRegression
{
static constexpr auto name = "stochasticLogisticRegression";
};
}

View File

@ -61,10 +61,10 @@ public:
AggregateFunctionIntersectionsMax(AggregateFunctionIntersectionsKind kind_, const DataTypes & arguments)
: IAggregateFunctionDataHelper<MaxIntersectionsData<PointType>, AggregateFunctionIntersectionsMax<PointType>>(arguments, {}), kind(kind_)
{
if (!isNumber(arguments[0]))
if (!isNativeNumber(arguments[0]))
throw Exception{getName() + ": first argument must be represented by integer", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
if (!isNumber(arguments[1]))
if (!isNativeNumber(arguments[1]))
throw Exception{getName() + ": second argument must be represented by integer", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
if (!arguments[0]->equals(*arguments[1]))

View File

@ -672,15 +672,15 @@ struct AggregateFunctionAnyHeavyData : Data
};
template <typename Data>
class AggregateFunctionsSingleValue final : public IAggregateFunctionDataHelper<Data, AggregateFunctionsSingleValue<Data>>
template <typename Data, bool AllocatesMemoryInArena>
class AggregateFunctionsSingleValue final : public IAggregateFunctionDataHelper<Data, AggregateFunctionsSingleValue<Data, AllocatesMemoryInArena>>
{
private:
DataTypePtr & type;
public:
AggregateFunctionsSingleValue(const DataTypePtr & type)
: IAggregateFunctionDataHelper<Data, AggregateFunctionsSingleValue<Data>>({type}, {})
: IAggregateFunctionDataHelper<Data, AggregateFunctionsSingleValue<Data, AllocatesMemoryInArena>>({type}, {})
, type(this->argument_types[0])
{
if (StringRef(Data::name()) == StringRef("min")
@ -719,6 +719,11 @@ public:
this->data(place).read(buf, *type.get(), arena);
}
bool allocatesMemoryInArena() const override
{
return AllocatesMemoryInArena;
}
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
{
this->data(place).insertResultInto(to);

View File

@ -43,8 +43,12 @@ template <typename Value, bool FloatReturn> using FuncQuantilesTDigestWeighted =
template <template <typename, bool> class Function>
static constexpr bool supportDecimal()
{
return std::is_same_v<Function<Float32, false>, FuncQuantileExact<Float32, false>> ||
std::is_same_v<Function<Float32, false>, FuncQuantilesExact<Float32, false>>;
return std::is_same_v<Function<Float32, false>, FuncQuantile<Float32, false>> ||
std::is_same_v<Function<Float32, false>, FuncQuantiles<Float32, false>> ||
std::is_same_v<Function<Float32, false>, FuncQuantileExact<Float32, false>> ||
std::is_same_v<Function<Float32, false>, FuncQuantilesExact<Float32, false>> ||
std::is_same_v<Function<Float32, false>, FuncQuantileExactWeighted<Float32, false>> ||
std::is_same_v<Function<Float32, false>, FuncQuantilesExactWeighted<Float32, false>>;
}
@ -66,9 +70,9 @@ AggregateFunctionPtr createAggregateFunctionQuantile(const std::string & name, c
if constexpr (supportDecimal<Function>())
{
if (which.idx == TypeIndex::Decimal32) return std::make_shared<Function<Decimal32, true>>(argument_type, params);
if (which.idx == TypeIndex::Decimal64) return std::make_shared<Function<Decimal64, true>>(argument_type, params);
if (which.idx == TypeIndex::Decimal128) return std::make_shared<Function<Decimal128, true>>(argument_type, params);
if (which.idx == TypeIndex::Decimal32) return std::make_shared<Function<Decimal32, false>>(argument_type, params);
if (which.idx == TypeIndex::Decimal64) return std::make_shared<Function<Decimal64, false>>(argument_type, params);
if (which.idx == TypeIndex::Decimal128) return std::make_shared<Function<Decimal128, false>>(argument_type, params);
}
throw Exception("Illegal type " + argument_type->getName() + " of argument for aggregate function " + name,

View File

@ -1,6 +1,12 @@
#include <AggregateFunctions/Helpers.h>
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/AggregateFunctionSequenceMatch.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime.h>
#include <ext/range.h>
namespace DB
{
@ -12,32 +18,58 @@ namespace ErrorCodes
namespace
{
AggregateFunctionPtr createAggregateFunctionSequenceCount(const std::string & name, const DataTypes & argument_types, const Array & params)
template <template <typename, typename> class AggregateFunction, template <typename> class Data>
AggregateFunctionPtr createAggregateFunctionSequenceBase(const std::string & name, const DataTypes & argument_types, const Array & params)
{
if (params.size() != 1)
throw Exception{"Aggregate function " + name + " requires exactly one parameter.",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH};
String pattern = params.front().safeGet<std::string>();
return std::make_shared<AggregateFunctionSequenceCount>(argument_types, params, pattern);
}
const auto arg_count = argument_types.size();
AggregateFunctionPtr createAggregateFunctionSequenceMatch(const std::string & name, const DataTypes & argument_types, const Array & params)
{
if (params.size() != 1)
throw Exception{"Aggregate function " + name + " requires exactly one parameter.",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH};
if (arg_count < 3)
throw Exception{"Aggregate function " + name + " requires at least 3 arguments.",
ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION};
if (arg_count - 1 > max_events)
throw Exception{"Aggregate function " + name + " supports up to "
+ toString(max_events) + " event arguments.",
ErrorCodes::TOO_MANY_ARGUMENTS_FOR_FUNCTION};
const auto time_arg = argument_types.front().get();
for (const auto i : ext::range(1, arg_count))
{
const auto cond_arg = argument_types[i].get();
if (!isUInt8(cond_arg))
throw Exception{"Illegal type " + cond_arg->getName() + " of argument " + toString(i + 1)
+ " of aggregate function " + name + ", must be UInt8",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
}
String pattern = params.front().safeGet<std::string>();
return std::make_shared<AggregateFunctionSequenceMatch>(argument_types, params, pattern);
AggregateFunctionPtr res(createWithUnsignedIntegerType<AggregateFunction, Data>(*argument_types[0], argument_types, params, pattern));
if (res)
return res;
WhichDataType which(argument_types.front().get());
if (which.isDateTime())
return std::make_shared<AggregateFunction<DataTypeDateTime::FieldType, Data<DataTypeDateTime::FieldType>>>(argument_types, params, pattern);
else if (which.isDate())
return std::make_shared<AggregateFunction<DataTypeDate::FieldType, Data<DataTypeDate::FieldType>>>(argument_types, params, pattern);
throw Exception{"Illegal type " + time_arg->getName() + " of first argument of aggregate function "
+ name + ", must be DateTime",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
}
}
void registerAggregateFunctionsSequenceMatch(AggregateFunctionFactory & factory)
{
factory.registerFunction("sequenceMatch", createAggregateFunctionSequenceMatch);
factory.registerFunction("sequenceCount", createAggregateFunctionSequenceCount);
factory.registerFunction("sequenceMatch", createAggregateFunctionSequenceBase<AggregateFunctionSequenceMatch, AggregateFunctionSequenceMatchData>);
factory.registerFunction("sequenceCount", createAggregateFunctionSequenceBase<AggregateFunctionSequenceCount, AggregateFunctionSequenceMatchData>);
}
}

View File

@ -36,11 +36,12 @@ struct ComparePairFirst final
}
};
static constexpr auto max_events = 32;
template <typename T>
struct AggregateFunctionSequenceMatchData final
{
static constexpr auto max_events = 32;
using Timestamp = std::uint32_t;
using Timestamp = T;
using Events = std::bitset<max_events>;
using TimestampEvents = std::pair<Timestamp, Events>;
using Comparator = ComparePairFirst<std::less>;
@ -61,6 +62,9 @@ struct AggregateFunctionSequenceMatchData final
void merge(const AggregateFunctionSequenceMatchData & other)
{
if (other.events_list.empty())
return;
const auto size = events_list.size();
events_list.insert(std::begin(other.events_list), std::end(other.events_list));
@ -119,7 +123,7 @@ struct AggregateFunctionSequenceMatchData final
for (size_t i = 0; i < size; ++i)
{
std::uint32_t timestamp;
Timestamp timestamp;
readBinary(timestamp, buf);
UInt64 events;
@ -135,48 +139,23 @@ struct AggregateFunctionSequenceMatchData final
constexpr auto sequence_match_max_iterations = 1000000;
template <typename Derived>
class AggregateFunctionSequenceBase : public IAggregateFunctionDataHelper<AggregateFunctionSequenceMatchData, Derived>
template <typename T, typename Data, typename Derived>
class AggregateFunctionSequenceBase : public IAggregateFunctionDataHelper<Data, Derived>
{
public:
AggregateFunctionSequenceBase(const DataTypes & arguments, const Array & params, const String & pattern)
: IAggregateFunctionDataHelper<AggregateFunctionSequenceMatchData, Derived>(arguments, params)
: IAggregateFunctionDataHelper<Data, Derived>(arguments, params)
, pattern(pattern)
{
arg_count = arguments.size();
if (!sufficientArgs(arg_count))
throw Exception{"Aggregate function " + derived().getName() + " requires at least 3 arguments.",
ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION};
if (arg_count - 1 > AggregateFunctionSequenceMatchData::max_events)
throw Exception{"Aggregate function " + derived().getName() + " supports up to " +
toString(AggregateFunctionSequenceMatchData::max_events) + " event arguments.",
ErrorCodes::TOO_MANY_ARGUMENTS_FOR_FUNCTION};
const auto time_arg = arguments.front().get();
if (!WhichDataType(time_arg).isDateTime())
throw Exception{"Illegal type " + time_arg->getName() + " of first argument of aggregate function "
+ derived().getName() + ", must be DateTime",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
for (const auto i : ext::range(1, arg_count))
{
const auto cond_arg = arguments[i].get();
if (!isUInt8(cond_arg))
throw Exception{"Illegal type " + cond_arg->getName() + " of argument " + toString(i + 1) +
" of aggregate function " + derived().getName() + ", must be UInt8",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
}
parsePattern();
}
void add(AggregateDataPtr place, const IColumn ** columns, const size_t row_num, Arena *) const override
{
const auto timestamp = static_cast<const ColumnUInt32 *>(columns[0])->getData()[row_num];
const auto timestamp = static_cast<const ColumnVector<T> *>(columns[0])->getData()[row_num];
AggregateFunctionSequenceMatchData::Events events;
typename Data::Events events;
for (const auto i : ext::range(1, arg_count))
{
const auto event = static_cast<const ColumnUInt8 *>(columns[i])->getData()[row_num];
@ -218,17 +197,15 @@ private:
struct PatternAction final
{
PatternActionType type;
std::uint32_t extra;
std::uint64_t extra;
PatternAction() = default;
PatternAction(const PatternActionType type, const std::uint32_t extra = 0) : type{type}, extra{extra} {}
PatternAction(const PatternActionType type, const std::uint64_t extra = 0) : type{type}, extra{extra} {}
};
static constexpr size_t bytes_on_stack = 64;
using PatternActions = PODArray<PatternAction, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
static bool sufficientArgs(const size_t arg_count) { return arg_count >= 3; }
Derived & derived() { return static_cast<Derived &>(*this); }
void parsePattern()
@ -340,8 +317,8 @@ protected:
/// This algorithm performs in O(mn) (with m the number of DFA states and N the number
/// of events) with a memory consumption and memory allocations in O(m). It means that
/// if n >>> m (which is expected to be the case), this algorithm can be considered linear.
template <typename T>
bool dfaMatch(T & events_it, const T events_end) const
template <typename EventEntry>
bool dfaMatch(EventEntry & events_it, const EventEntry events_end) const
{
using ActiveStates = std::vector<bool>;
@ -396,8 +373,8 @@ protected:
return active_states.back();
}
template <typename T>
bool backtrackingMatch(T & events_it, const T events_end) const
template <typename EventEntry>
bool backtrackingMatch(EventEntry & events_it, const EventEntry events_end) const
{
const auto action_begin = std::begin(actions);
const auto action_end = std::end(actions);
@ -407,7 +384,7 @@ protected:
auto base_it = events_it;
/// an iterator to action plus an iterator to row in events list plus timestamp at the start of sequence
using backtrack_info = std::tuple<decltype(action_it), T, T>;
using backtrack_info = std::tuple<decltype(action_it), EventEntry, EventEntry>;
std::stack<backtrack_info> back_stack;
/// backtrack if possible
@ -458,7 +435,7 @@ protected:
}
else if (action_it->type == PatternActionType::TimeLessOrEqual)
{
if (events_it->first - base_it->first <= action_it->extra)
if (events_it->first <= base_it->first + action_it->extra)
{
/// condition satisfied, move onto next action
back_stack.emplace(action_it, events_it, base_it);
@ -470,7 +447,7 @@ protected:
}
else if (action_it->type == PatternActionType::TimeLess)
{
if (events_it->first - base_it->first < action_it->extra)
if (events_it->first < base_it->first + action_it->extra)
{
back_stack.emplace(action_it, events_it, base_it);
base_it = events_it;
@ -481,7 +458,7 @@ protected:
}
else if (action_it->type == PatternActionType::TimeGreaterOrEqual)
{
if (events_it->first - base_it->first >= action_it->extra)
if (events_it->first >= base_it->first + action_it->extra)
{
back_stack.emplace(action_it, events_it, base_it);
base_it = events_it;
@ -492,7 +469,7 @@ protected:
}
else if (action_it->type == PatternActionType::TimeGreater)
{
if (events_it->first - base_it->first > action_it->extra)
if (events_it->first > base_it->first + action_it->extra)
{
back_stack.emplace(action_it, events_it, base_it);
base_it = events_it;
@ -575,14 +552,14 @@ private:
DFAStates dfa_states;
};
class AggregateFunctionSequenceMatch final : public AggregateFunctionSequenceBase<AggregateFunctionSequenceMatch>
template <typename T, typename Data>
class AggregateFunctionSequenceMatch final : public AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceMatch<T, Data>>
{
public:
AggregateFunctionSequenceMatch(const DataTypes & arguments, const Array & params, const String & pattern)
: AggregateFunctionSequenceBase<AggregateFunctionSequenceMatch>(arguments, params, pattern) {}
: AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceMatch<T, Data>>(arguments, params, pattern) {}
using AggregateFunctionSequenceBase<AggregateFunctionSequenceMatch>::AggregateFunctionSequenceBase;
using AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceMatch<T, Data>>::AggregateFunctionSequenceBase;
String getName() const override { return "sequenceMatch"; }
@ -590,27 +567,27 @@ public:
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
{
const_cast<Data &>(data(place)).sort();
const_cast<Data &>(this->data(place)).sort();
const auto & data_ref = data(place);
const auto & data_ref = this->data(place);
const auto events_begin = std::begin(data_ref.events_list);
const auto events_end = std::end(data_ref.events_list);
auto events_it = events_begin;
bool match = pattern_has_time ? backtrackingMatch(events_it, events_end) : dfaMatch(events_it, events_end);
bool match = this->pattern_has_time ? this->backtrackingMatch(events_it, events_end) : this->dfaMatch(events_it, events_end);
static_cast<ColumnUInt8 &>(to).getData().push_back(match);
}
};
class AggregateFunctionSequenceCount final : public AggregateFunctionSequenceBase<AggregateFunctionSequenceCount>
template <typename T, typename Data>
class AggregateFunctionSequenceCount final : public AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceCount<T, Data>>
{
public:
AggregateFunctionSequenceCount(const DataTypes & arguments, const Array & params, const String & pattern)
: AggregateFunctionSequenceBase<AggregateFunctionSequenceCount>(arguments, params, pattern) {}
: AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceCount<T, Data>>(arguments, params, pattern) {}
using AggregateFunctionSequenceBase<AggregateFunctionSequenceCount>::AggregateFunctionSequenceBase;
using AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceCount<T, Data>>::AggregateFunctionSequenceBase;
String getName() const override { return "sequenceCount"; }
@ -618,21 +595,21 @@ public:
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
{
const_cast<Data &>(data(place)).sort();
const_cast<Data &>(this->data(place)).sort();
static_cast<ColumnUInt64 &>(to).getData().push_back(count(place));
}
private:
UInt64 count(const ConstAggregateDataPtr & place) const
{
const auto & data_ref = data(place);
const auto & data_ref = this->data(place);
const auto events_begin = std::begin(data_ref.events_list);
const auto events_end = std::end(data_ref.events_list);
auto events_it = events_begin;
size_t count = 0;
while (events_it != events_end && backtrackingMatch(events_it, events_end))
while (events_it != events_end && this->backtrackingMatch(events_it, events_end))
++count;
return count;

View File

@ -1,8 +1,9 @@
#include <AggregateFunctions/AggregateFunctionLeastSqr.h>
#include <AggregateFunctions/AggregateFunctionSimpleLinearRegression.h>
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/FactoryHelpers.h>
#include <Core/TypeListNumber.h>
namespace DB
{
@ -10,7 +11,7 @@ namespace DB
namespace
{
AggregateFunctionPtr createAggregateFunctionLeastSqr(
AggregateFunctionPtr createAggregateFunctionSimpleLinearRegression(
const String & name,
const DataTypes & arguments,
const Array & params
@ -20,16 +21,11 @@ AggregateFunctionPtr createAggregateFunctionLeastSqr(
assertBinary(name, arguments);
const IDataType * x_arg = arguments.front().get();
WhichDataType which_x {
x_arg
};
WhichDataType which_x = x_arg;
const IDataType * y_arg = arguments.back().get();
WhichDataType which_y = y_arg;
WhichDataType which_y {
y_arg
};
#define FOR_LEASTSQR_TYPES_2(M, T) \
M(T, UInt8) \
@ -55,7 +51,7 @@ AggregateFunctionPtr createAggregateFunctionLeastSqr(
FOR_LEASTSQR_TYPES_2(M, Float64)
#define DISPATCH(T1, T2) \
if (which_x.idx == TypeIndex::T1 && which_y.idx == TypeIndex::T2) \
return std::make_shared<AggregateFunctionLeastSqr<T1, T2>>( \
return std::make_shared<AggregateFunctionSimpleLinearRegression<T1, T2>>( \
arguments, \
params \
);
@ -77,9 +73,9 @@ AggregateFunctionPtr createAggregateFunctionLeastSqr(
}
void registerAggregateFunctionLeastSqr(AggregateFunctionFactory & factory)
void registerAggregateFunctionSimpleLinearRegression(AggregateFunctionFactory & factory)
{
factory.registerFunction("leastSqr", createAggregateFunctionLeastSqr);
factory.registerFunction("simpleLinearRegression", createAggregateFunctionSimpleLinearRegression);
}
}

View File

@ -19,7 +19,7 @@ namespace ErrorCodes
}
template <typename X, typename Y, typename Ret>
struct AggregateFunctionLeastSqrData final
struct AggregateFunctionSimpleLinearRegressionData final
{
size_t count = 0;
Ret sum_x = 0;
@ -36,7 +36,7 @@ struct AggregateFunctionLeastSqrData final
sum_xy += x * y;
}
void merge(const AggregateFunctionLeastSqrData & other)
void merge(const AggregateFunctionSimpleLinearRegressionData & other)
{
count += other.count;
sum_x += other.sum_x;
@ -85,19 +85,19 @@ struct AggregateFunctionLeastSqrData final
/// Calculates simple linear regression parameters.
/// Result is a tuple (k, b) for y = k * x + b equation, solved by least squares approximation.
template <typename X, typename Y, typename Ret = Float64>
class AggregateFunctionLeastSqr final : public IAggregateFunctionDataHelper<
AggregateFunctionLeastSqrData<X, Y, Ret>,
AggregateFunctionLeastSqr<X, Y, Ret>
class AggregateFunctionSimpleLinearRegression final : public IAggregateFunctionDataHelper<
AggregateFunctionSimpleLinearRegressionData<X, Y, Ret>,
AggregateFunctionSimpleLinearRegression<X, Y, Ret>
>
{
public:
AggregateFunctionLeastSqr(
AggregateFunctionSimpleLinearRegression(
const DataTypes & arguments,
const Array & params
):
IAggregateFunctionDataHelper<
AggregateFunctionLeastSqrData<X, Y, Ret>,
AggregateFunctionLeastSqr<X, Y, Ret>
AggregateFunctionSimpleLinearRegressionData<X, Y, Ret>,
AggregateFunctionSimpleLinearRegression<X, Y, Ret>
> {arguments, params}
{
// notice: arguments has been checked before
@ -105,7 +105,7 @@ public:
String getName() const override
{
return "leastSqr";
return "simpleLinearRegression";
}
const char * getHeaderFilePath() const override
@ -120,12 +120,8 @@ public:
Arena *
) const override
{
auto col_x {
static_cast<const ColumnVector<X> *>(columns[0])
};
auto col_y {
static_cast<const ColumnVector<Y> *>(columns[1])
};
auto col_x = static_cast<const ColumnVector<X> *>(columns[0]);
auto col_y = static_cast<const ColumnVector<Y> *>(columns[1]);
X x = col_x->getData()[row_num];
Y y = col_y->getData()[row_num];
@ -159,12 +155,14 @@ public:
DataTypePtr getReturnType() const override
{
DataTypes types {
DataTypes types
{
std::make_shared<DataTypeNumber<Ret>>(),
std::make_shared<DataTypeNumber<Ret>>(),
};
Strings names {
Strings names
{
"k",
"b",
};

View File

@ -56,6 +56,10 @@ void registerAggregateFunctionsStatisticsSimple(AggregateFunctionFactory & facto
factory.registerFunction("varPop", createAggregateFunctionStatisticsUnary<AggregateFunctionVarPopSimple>);
factory.registerFunction("stddevSamp", createAggregateFunctionStatisticsUnary<AggregateFunctionStddevSampSimple>);
factory.registerFunction("stddevPop", createAggregateFunctionStatisticsUnary<AggregateFunctionStddevPopSimple>);
factory.registerFunction("skewSamp", createAggregateFunctionStatisticsUnary<AggregateFunctionSkewSampSimple>);
factory.registerFunction("skewPop", createAggregateFunctionStatisticsUnary<AggregateFunctionSkewPopSimple>);
factory.registerFunction("kurtSamp", createAggregateFunctionStatisticsUnary<AggregateFunctionKurtSampSimple>);
factory.registerFunction("kurtPop", createAggregateFunctionStatisticsUnary<AggregateFunctionKurtPopSimple>);
factory.registerFunction("covarSamp", createAggregateFunctionStatisticsBinary<AggregateFunctionCovarSampSimple>);
factory.registerFunction("covarPop", createAggregateFunctionStatisticsBinary<AggregateFunctionCovarPopSimple>);

View File

@ -32,30 +32,42 @@ namespace DB
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int DECIMAL_OVERFLOW;
}
template <typename T>
/**
Calculating univariate central moments
Levels:
level 2 (pop & samp): var, stddev
level 3: skewness
level 4: kurtosis
References:
https://en.wikipedia.org/wiki/Moment_(mathematics)
https://en.wikipedia.org/wiki/Skewness
https://en.wikipedia.org/wiki/Kurtosis
*/
template <typename T, size_t _level>
struct VarMoments
{
T m0{};
T m1{};
T m2{};
T m[_level + 1]{};
void add(T x)
{
++m0;
m1 += x;
m2 += x * x;
++m[0];
m[1] += x;
m[2] += x * x;
if constexpr (_level >= 3) m[3] += x * x * x;
if constexpr (_level >= 4) m[4] += x * x * x * x;
}
void merge(const VarMoments & rhs)
{
m0 += rhs.m0;
m1 += rhs.m1;
m2 += rhs.m2;
m[0] += rhs.m[0];
m[1] += rhs.m[1];
m[2] += rhs.m[2];
if constexpr (_level >= 3) m[3] += rhs.m[3];
if constexpr (_level >= 4) m[4] += rhs.m[4];
}
void write(WriteBuffer & buf) const
@ -70,45 +82,90 @@ struct VarMoments
T NO_SANITIZE_UNDEFINED getPopulation() const
{
return (m2 - m1 * m1 / m0) / m0;
return (m[2] - m[1] * m[1] / m[0]) / m[0];
}
T NO_SANITIZE_UNDEFINED getSample() const
{
if (m0 == 0)
if (m[0] == 0)
return std::numeric_limits<T>::quiet_NaN();
return (m2 - m1 * m1 / m0) / (m0 - 1);
return (m[2] - m[1] * m[1] / m[0]) / (m[0] - 1);
}
T get() const { throw Exception("Unexpected call", ErrorCodes::LOGICAL_ERROR); }
T NO_SANITIZE_UNDEFINED getMoment3() const
{
// to avoid accuracy problem
if (m[0] == 1)
return 0;
return (m[3]
- (3 * m[2]
- 2 * m[1] * m[1] / m[0]
) * m[1] / m[0]
) / m[0];
}
T NO_SANITIZE_UNDEFINED getMoment4() const
{
// to avoid accuracy problem
if (m[0] == 1)
return 0;
return (m[4]
- (4 * m[3]
- (6 * m[2]
- 3 * m[1] * m[1] / m[0]
) * m[1] / m[0]
) * m[1] / m[0]
) / m[0];
}
};
template <typename T>
template <typename T, size_t _level>
struct VarMomentsDecimal
{
using NativeType = typename T::NativeType;
UInt64 m0{};
NativeType m1{};
NativeType m2{};
NativeType m[_level]{};
NativeType & getM(size_t i)
{
return m[i - 1];
}
const NativeType & getM(size_t i) const
{
return m[i - 1];
}
void add(NativeType x)
{
++m0;
m1 += x;
getM(1) += x;
NativeType tmp; /// scale' = 2 * scale
if (common::mulOverflow(x, x, tmp) || common::addOverflow(m2, tmp, m2))
NativeType tmp;
if (common::mulOverflow(x, x, tmp) || common::addOverflow(getM(2), tmp, getM(2)))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
if constexpr (_level >= 3)
if (common::mulOverflow(tmp, x, tmp) || common::addOverflow(getM(3), tmp, getM(3)))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
if constexpr (_level >= 4)
if (common::mulOverflow(tmp, x, tmp) || common::addOverflow(getM(4), tmp, getM(4)))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
}
void merge(const VarMomentsDecimal & rhs)
{
m0 += rhs.m0;
m1 += rhs.m1;
getM(1) += rhs.getM(1);
if (common::addOverflow(m2, rhs.m2, m2))
if (common::addOverflow(getM(2), rhs.getM(2), getM(2)))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
if constexpr (_level >= 3)
if (common::addOverflow(getM(3), rhs.getM(3), getM(3)))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
if constexpr (_level >= 4)
if (common::addOverflow(getM(4), rhs.getM(4), getM(4)))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
}
void write(WriteBuffer & buf) const { writePODBinary(*this, buf); }
@ -120,8 +177,8 @@ struct VarMomentsDecimal
return std::numeric_limits<Float64>::infinity();
NativeType tmp;
if (common::mulOverflow(m1, m1, tmp) ||
common::subOverflow(m2, NativeType(tmp / m0), tmp))
if (common::mulOverflow(getM(1), getM(1), tmp) ||
common::subOverflow(getM(2), NativeType(tmp / m0), tmp))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
return convertFromDecimal<DataTypeDecimal<T>, DataTypeNumber<Float64>>(tmp / m0, scale);
}
@ -134,15 +191,50 @@ struct VarMomentsDecimal
return std::numeric_limits<Float64>::infinity();
NativeType tmp;
if (common::mulOverflow(m1, m1, tmp) ||
common::subOverflow(m2, NativeType(tmp / m0), tmp))
if (common::mulOverflow(getM(1), getM(1), tmp) ||
common::subOverflow(getM(2), NativeType(tmp / m0), tmp))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
return convertFromDecimal<DataTypeDecimal<T>, DataTypeNumber<Float64>>(tmp / (m0 - 1), scale);
}
Float64 get() const { throw Exception("Unexpected call", ErrorCodes::LOGICAL_ERROR); }
Float64 getMoment3(UInt32 scale) const
{
if (m0 == 0)
return std::numeric_limits<Float64>::infinity();
NativeType tmp;
if (common::mulOverflow(2 * getM(1), getM(1), tmp) ||
common::subOverflow(3 * getM(2), NativeType(tmp / m0), tmp) ||
common::mulOverflow(tmp, getM(1), tmp) ||
common::subOverflow(getM(3), NativeType(tmp / m0), tmp))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
return convertFromDecimal<DataTypeDecimal<T>, DataTypeNumber<Float64>>(tmp / m0, scale);
}
Float64 getMoment4(UInt32 scale) const
{
if (m0 == 0)
return std::numeric_limits<Float64>::infinity();
NativeType tmp;
if (common::mulOverflow(3 * getM(1), getM(1), tmp) ||
common::subOverflow(6 * getM(2), NativeType(tmp / m0), tmp) ||
common::mulOverflow(tmp, getM(1), tmp) ||
common::subOverflow(4 * getM(3), NativeType(tmp / m0), tmp) ||
common::mulOverflow(tmp, getM(1), tmp) ||
common::subOverflow(getM(4), NativeType(tmp / m0), tmp))
throw Exception("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW);
return convertFromDecimal<DataTypeDecimal<T>, DataTypeNumber<Float64>>(tmp / m0, scale);
}
};
/**
Calculating multivariate central moments
Levels:
level 2 (pop & samp): covar
References:
https://en.wikipedia.org/wiki/Moment_(mathematics)
*/
template <typename T>
struct CovarMoments
{
@ -188,8 +280,6 @@ struct CovarMoments
return std::numeric_limits<T>::quiet_NaN();
return (xy - x1 * y1 / m0) / (m0 - 1);
}
T get() const { throw Exception("Unexpected call", ErrorCodes::LOGICAL_ERROR); }
};
template <typename T>
@ -236,9 +326,6 @@ struct CorrMoments
{
return (m0 * xy - x1 * y1) / sqrt((m0 * x2 - x1 * x1) * (m0 * y2 - y1 * y1));
}
T getPopulation() const { throw Exception("Unexpected call", ErrorCodes::LOGICAL_ERROR); }
T getSample() const { throw Exception("Unexpected call", ErrorCodes::LOGICAL_ERROR); }
};
@ -246,18 +333,20 @@ enum class StatisticsFunctionKind
{
varPop, varSamp,
stddevPop, stddevSamp,
skewPop, skewSamp,
kurtPop, kurtSamp,
covarPop, covarSamp,
corr
};
template <typename T, StatisticsFunctionKind _kind>
template <typename T, StatisticsFunctionKind _kind, size_t _level>
struct StatFuncOneArg
{
using Type1 = T;
using Type2 = T;
using ResultType = std::conditional_t<std::is_same_v<T, Float32>, Float32, Float64>;
using Data = std::conditional_t<IsDecimalNumber<T>, VarMomentsDecimal<Decimal128>, VarMoments<ResultType>>;
using Data = std::conditional_t<IsDecimalNumber<T>, VarMomentsDecimal<Decimal128, _level>, VarMoments<ResultType, _level>>;
static constexpr StatisticsFunctionKind kind = _kind;
static constexpr UInt32 num_args = 1;
@ -300,16 +389,28 @@ public:
String getName() const override
{
switch (StatFunc::kind)
{
case StatisticsFunctionKind::varPop: return "varPop";
case StatisticsFunctionKind::varSamp: return "varSamp";
case StatisticsFunctionKind::stddevPop: return "stddevPop";
case StatisticsFunctionKind::stddevSamp: return "stddevSamp";
case StatisticsFunctionKind::covarPop: return "covarPop";
case StatisticsFunctionKind::covarSamp: return "covarSamp";
case StatisticsFunctionKind::corr: return "corr";
}
if constexpr (StatFunc::kind == StatisticsFunctionKind::varPop)
return "varPop";
if constexpr (StatFunc::kind == StatisticsFunctionKind::varSamp)
return "varSamp";
if constexpr (StatFunc::kind == StatisticsFunctionKind::stddevPop)
return "stddevPop";
if constexpr (StatFunc::kind == StatisticsFunctionKind::stddevSamp)
return "stddevSamp";
if constexpr (StatFunc::kind == StatisticsFunctionKind::skewPop)
return "skewPop";
if constexpr (StatFunc::kind == StatisticsFunctionKind::skewSamp)
return "skewSamp";
if constexpr (StatFunc::kind == StatisticsFunctionKind::kurtPop)
return "kurtPop";
if constexpr (StatFunc::kind == StatisticsFunctionKind::kurtSamp)
return "kurtSamp";
if constexpr (StatFunc::kind == StatisticsFunctionKind::covarPop)
return "covarPop";
if constexpr (StatFunc::kind == StatisticsFunctionKind::covarSamp)
return "covarSamp";
if constexpr (StatFunc::kind == StatisticsFunctionKind::corr)
return "corr";
__builtin_unreachable();
}
@ -351,28 +452,103 @@ public:
if constexpr (IsDecimalNumber<T1>)
{
switch (StatFunc::kind)
if constexpr (StatFunc::kind == StatisticsFunctionKind::varPop)
dst.push_back(data.getPopulation(src_scale * 2));
if constexpr (StatFunc::kind == StatisticsFunctionKind::varSamp)
dst.push_back(data.getSample(src_scale * 2));
if constexpr (StatFunc::kind == StatisticsFunctionKind::stddevPop)
dst.push_back(sqrt(data.getPopulation(src_scale * 2)));
if constexpr (StatFunc::kind == StatisticsFunctionKind::stddevSamp)
dst.push_back(sqrt(data.getSample(src_scale * 2)));
if constexpr (StatFunc::kind == StatisticsFunctionKind::skewPop)
{
case StatisticsFunctionKind::varPop: dst.push_back(data.getPopulation(src_scale * 2)); break;
case StatisticsFunctionKind::varSamp: dst.push_back(data.getSample(src_scale * 2)); break;
case StatisticsFunctionKind::stddevPop: dst.push_back(sqrt(data.getPopulation(src_scale * 2))); break;
case StatisticsFunctionKind::stddevSamp: dst.push_back(sqrt(data.getSample(src_scale * 2))); break;
default:
__builtin_unreachable();
Float64 var_value = data.getPopulation(src_scale * 2);
if (var_value > 0)
dst.push_back(data.getMoment3(src_scale * 3) / pow(var_value, 1.5));
else
dst.push_back(std::numeric_limits<Float64>::quiet_NaN());
}
if constexpr (StatFunc::kind == StatisticsFunctionKind::skewSamp)
{
Float64 var_value = data.getSample(src_scale * 2);
if (var_value > 0)
dst.push_back(data.getMoment3(src_scale * 3) / pow(var_value, 1.5));
else
dst.push_back(std::numeric_limits<Float64>::quiet_NaN());
}
if constexpr (StatFunc::kind == StatisticsFunctionKind::kurtPop)
{
Float64 var_value = data.getPopulation(src_scale * 2);
if (var_value > 0)
dst.push_back(data.getMoment4(src_scale * 4) / pow(var_value, 2));
else
dst.push_back(std::numeric_limits<Float64>::quiet_NaN());
}
if constexpr (StatFunc::kind == StatisticsFunctionKind::kurtSamp)
{
Float64 var_value = data.getSample(src_scale * 2);
if (var_value > 0)
dst.push_back(data.getMoment4(src_scale * 4) / pow(var_value, 2));
else
dst.push_back(std::numeric_limits<Float64>::quiet_NaN());
}
}
else
{
switch (StatFunc::kind)
if constexpr (StatFunc::kind == StatisticsFunctionKind::varPop)
dst.push_back(data.getPopulation());
if constexpr (StatFunc::kind == StatisticsFunctionKind::varSamp)
dst.push_back(data.getSample());
if constexpr (StatFunc::kind == StatisticsFunctionKind::stddevPop)
dst.push_back(sqrt(data.getPopulation()));
if constexpr (StatFunc::kind == StatisticsFunctionKind::stddevSamp)
dst.push_back(sqrt(data.getSample()));
if constexpr (StatFunc::kind == StatisticsFunctionKind::skewPop)
{
case StatisticsFunctionKind::varPop: dst.push_back(data.getPopulation()); break;
case StatisticsFunctionKind::varSamp: dst.push_back(data.getSample()); break;
case StatisticsFunctionKind::stddevPop: dst.push_back(sqrt(data.getPopulation())); break;
case StatisticsFunctionKind::stddevSamp: dst.push_back(sqrt(data.getSample())); break;
case StatisticsFunctionKind::covarPop: dst.push_back(data.getPopulation()); break;
case StatisticsFunctionKind::covarSamp: dst.push_back(data.getSample()); break;
case StatisticsFunctionKind::corr: dst.push_back(data.get()); break;
ResultType var_value = data.getPopulation();
if (var_value > 0)
dst.push_back(data.getMoment3() / pow(var_value, 1.5));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
}
if constexpr (StatFunc::kind == StatisticsFunctionKind::skewSamp)
{
ResultType var_value = data.getSample();
if (var_value > 0)
dst.push_back(data.getMoment3() / pow(var_value, 1.5));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
}
if constexpr (StatFunc::kind == StatisticsFunctionKind::kurtPop)
{
ResultType var_value = data.getPopulation();
if (var_value > 0)
dst.push_back(data.getMoment4() / pow(var_value, 2));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
}
if constexpr (StatFunc::kind == StatisticsFunctionKind::kurtSamp)
{
ResultType var_value = data.getSample();
if (var_value > 0)
dst.push_back(data.getMoment4() / pow(var_value, 2));
else
dst.push_back(std::numeric_limits<ResultType>::quiet_NaN());
}
if constexpr (StatFunc::kind == StatisticsFunctionKind::covarPop)
dst.push_back(data.getPopulation());
if constexpr (StatFunc::kind == StatisticsFunctionKind::covarSamp)
dst.push_back(data.getSample());
if constexpr (StatFunc::kind == StatisticsFunctionKind::corr)
dst.push_back(data.get());
}
}
@ -383,10 +559,14 @@ private:
};
template <typename T> using AggregateFunctionVarPopSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::varPop>>;
template <typename T> using AggregateFunctionVarSampSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::varSamp>>;
template <typename T> using AggregateFunctionStddevPopSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::stddevPop>>;
template <typename T> using AggregateFunctionStddevSampSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::stddevSamp>>;
template <typename T> using AggregateFunctionVarPopSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::varPop, 2>>;
template <typename T> using AggregateFunctionVarSampSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::varSamp, 2>>;
template <typename T> using AggregateFunctionStddevPopSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::stddevPop, 2>>;
template <typename T> using AggregateFunctionStddevSampSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::stddevSamp, 2>>;
template <typename T> using AggregateFunctionSkewPopSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::skewPop, 3>>;
template <typename T> using AggregateFunctionSkewSampSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::skewSamp, 3>>;
template <typename T> using AggregateFunctionKurtPopSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::kurtPop, 4>>;
template <typename T> using AggregateFunctionKurtSampSimple = AggregateFunctionVarianceSimple<StatFuncOneArg<T, StatisticsFunctionKind::kurtSamp, 4>>;
template <typename T1, typename T2> using AggregateFunctionCovarPopSimple = AggregateFunctionVarianceSimple<StatFuncTwoArg<T1, T2, StatisticsFunctionKind::covarPop>>;
template <typename T1, typename T2> using AggregateFunctionCovarSampSimple = AggregateFunctionVarianceSimple<StatFuncTwoArg<T1, T2, StatisticsFunctionKind::covarSamp>>;
template <typename T1, typename T2> using AggregateFunctionCorrSimple = AggregateFunctionVarianceSimple<StatFuncTwoArg<T1, T2, StatisticsFunctionKind::corr>>;

Some files were not shown because too many files have changed in this diff Show More