Merged with master.

This commit is contained in:
Nikolai Kochetov 2019-02-01 11:36:57 +03:00
commit 6a729e59ba
118 changed files with 4005 additions and 2016 deletions

View File

@ -1 +0,0 @@

View File

@ -1,139 +1,113 @@
## ClickHouse release 19.1.6, 2019-01-24 ## ClickHouse release 19.1.6, 2019-01-24
### Backward Incompatible Change
* Removed `ALTER MODIFY PRIMARY KEY` command because it was superseded by the `ALTER MODIFY ORDER BY` command. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([ztlpn](https://github.com/ztlpn))
### New Features ### New Features
* Add ability to choose per column codecs for storage log and tiny log. [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin))
* Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([bgranvea](https://github.com/bgranvea)) * Custom per column compression codecs for tables. [#3899](https://github.com/yandex/ClickHouse/pull/3899) [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin), [Winter Zhang](https://github.com/zhang2014), [Anatoly](https://github.com/Sindbag))
* Add custom compression codecs. [#3899](https://github.com/yandex/ClickHouse/pull/3899) ([alesapin](https://github.com/alesapin)) * Added compression codec `Delta`. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin))
* Allow to `ALTER` compression codecs. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin))
* Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub` for SQL standard compatibility. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([Ivan Blinkov](https://github.com/blinkov))
* Support for write in `HDFS` tables and `hdfs` table function. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
* Added functions to search for multiple constant strings from big haystack: `multiPosition`, `multiSearch` ,`firstMatch` also with `-UTF8`, `-CaseInsensitive`, and `-CaseInsensitiveUTF8` variants. [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([Danila Kutenin](https://github.com/danlark1))
* Pruning of unused shards if `SELECT` query filters by sharding key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Ivan](https://github.com/abyss7))
* Allow `Kafka` engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([Ivan](https://github.com/abyss7))
* Added support for `CatBoost` multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([Boris Granveaud](https://github.com/bgranvea))
* Added hashing functions `xxHash64` and `xxHash32`. [#3905](https://github.com/yandex/ClickHouse/pull/3905) ([filimonov](https://github.com/filimonov)) * Added hashing functions `xxHash64` and `xxHash32`. [#3905](https://github.com/yandex/ClickHouse/pull/3905) ([filimonov](https://github.com/filimonov))
* Added multiple joins emulation (very experimental). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([4ertus2](https://github.com/4ertus2)) * Added `gccMurmurHash` hashing function (GCC flavoured Murmur hash) which uses the same hash seed as [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundyli](https://github.com/sundy-li))
* Added support for CatBoost multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai)) * Added hashing functions `javaHash`, `hiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365))
* Added gccHash function which uses the same hash seed as [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundy-li](https://github.com/sundy-li)) * Added table function `remoteSecure`. Function works as `remote`, but uses secure connection. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller))
* Added compression codec delta. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin))
* Added multi searcher to search from multiple constant strings from big haystack. Added functions (`multiPosition`, `multiSearch` ,`firstMatch`) * (` `, `UTF8`, `CaseInsensitive`, `CaseInsensitiveUTF8`) [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([danlark1](https://github.com/danlark1))
* Added ability to alter compression codecs. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin)) ### Experimental features
* Add ability to write data into HDFS and small refactoring. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
* Removed some redundant objects from compiled expressions cache (optimization). [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin)) * Added multiple JOINs emulation (`allow_experimental_multiple_joins_emulation` setting). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([Artem Zuikov](https://github.com/4ertus2))
* Added functions `JavaHash`, `HiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365))
* Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub`. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([blinkov](https://github.com/blinkov))
* Added function `remoteSecure`. Function works as `remote`, but uses secure connection. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller)) ### Bug Fixes
* Make `compiled_expression_cache_size` setting limited by default to lower memory consumption. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin))
* Fix a bug that led to hangups in threads that perform ALTERs of Replicated tables and in the thread that updates configuration from ZooKeeper. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3891](https://github.com/yandex/ClickHouse/issues/3891) [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([Alex Zatelepin](https://github.com/ztlpn))
* Fixed a race condition when executing a distributed ALTER task. The race condition led to more than one replica trying to execute the task and all replicas except one failing with a ZooKeeper error. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([Alex Zatelepin](https://github.com/ztlpn))
* Fix a bug when `from_zk` config elements weren't refreshed after a request to ZooKeeper timed out. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([Alex Zatelepin](https://github.com/ztlpn))
* Fix bug with wrong prefix for IPv4 subnet masks. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin))
* Fixed crash (`std::terminate`) in rare cases when a new thread cannot be created due to exhausted resources. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug when in `remote` table function execution when wrong restrictions were used for in `getStructureOfRemoteTable`. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin))
* Fix a leak of netlink sockets. They were placed in a pool where they were never deleted and new sockets were created at the start of a new thread when all current sockets were in use. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([Alex Zatelepin](https://github.com/ztlpn))
* Fix bug with closing `/proc/self/fd` directory earlier than all fds were read from `/proc` after forking `odbc-bridge` subprocess. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin))
* Fixed String to UInt monotonic conversion in case of usage String in primary key. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([Winter Zhang](https://github.com/zhang2014))
* Fixed error in calculation of integer conversion function monotonicity. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed segfault in `arrayEnumerateUniq`, `arrayEnumerateDense` functions in case of some invalid arguments. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix UB in StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([Amos Bird](https://github.com/amosbird))
* Fixed segfault in functions `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error: functions `round`, `floor`, `trunc`, `ceil` may return bogus result when executed on integer argument and large negative scale. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed a bug induced by 'kill query sync' which leads to a core dump. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([muVulDeePecker](https://github.com/fancyqlx))
* Fix bug with long delay after empty replication queue. [#3928](https://github.com/yandex/ClickHouse/pull/3928) [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin))
* Fixed excessive memory usage in case of inserting into table with `LowCardinality` primary key. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed `LowCardinality` serialization for `Native` format in case of empty arrays. [#3907](https://github.com/yandex/ClickHouse/issues/3907) [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed incorrect result while using distinct by single LowCardinality numeric column. [#3895](https://github.com/yandex/ClickHouse/issues/3895) [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed specialized aggregation with LowCardinality key (in case when `compile` setting is enabled). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fix user and password forwarding for replicated tables queries. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin)) ([小路](https://github.com/nicelulu))
* Fixed very rare race condition that can happen when listing tables in Dictionary database while reloading dictionaries. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed incorrect result when HAVING was used with ROLLUP or CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([Sam Chou](https://github.com/reflection))
* Fixed column aliases for query with `JOIN ON` syntax and distributed tables. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([Winter Zhang](https://github.com/zhang2014))
* Fixed error in internal implementation of `quantileTDigest` (found by Artem Vakhrushev). This error never happens in ClickHouse and was relevant only for those who use ClickHouse codebase as a library directly. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Improvements ### Improvements
* Support for IF NOT EXISTS in ALTER TABLE ADD COLUMN statements, and for IF EXISTS in DROP/MODIFY/CLEAR/COMMENT COLUMN. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([bgranvea](https://github.com/bgranvea))
* Support for `IF NOT EXISTS` in `ALTER TABLE ADD COLUMN` statements along with `IF EXISTS` in `DROP/MODIFY/CLEAR/COMMENT COLUMN`. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([Boris Granveaud](https://github.com/bgranvea))
* Function `parseDateTimeBestEffort`: support for formats `DD.MM.YYYY`, `DD.MM.YY`, `DD-MM-YYYY`, `DD-Mon-YYYY`, `DD/Month/YYYY` and similar. [#3922](https://github.com/yandex/ClickHouse/pull/3922) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Function `parseDateTimeBestEffort`: support for formats `DD.MM.YYYY`, `DD.MM.YY`, `DD-MM-YYYY`, `DD-Mon-YYYY`, `DD/Month/YYYY` and similar. [#3922](https://github.com/yandex/ClickHouse/pull/3922) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add a MergeTree setting `use_minimalistic_part_header_in_zookeeper`. If enabled, Replicated tables will store compact part metadata in a single part znode. This can dramatically reduce ZooKeeper snapshot size (especially if the tables have a lot of columns). Note that after enabling this setting you will not be able to downgrade to a version that doesn't support it. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([ztlpn](https://github.com/ztlpn)) * `CapnProtoInputStream` now support jagged structures. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Odin Hultgren Van Der Horst](https://github.com/Miniwoffer))
* Add an DFA-based implementation for functions `sequenceMatch` and `sequenceCount` in case pattern doesn't contain time. [#\](https://github.com/yandex/ClickHouse/pull/4004) ([ercolanelli-leo](https://github.com/ercolanelli-leo)) * Usability improvement: added a check that server process is started from the data directory's owner. Do not allow to start server from root if the data belongs to non-root user. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
* Changed the way CapnProtoInputStream creates actions in such a way that it now support structures that are jagged. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Miniwoffer](https://github.com/Miniwoffer)) * Better logic of checking required columns during analysis of queries with JOINs. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([Artem Zuikov](https://github.com/4ertus2))
* Better way to collect columns, tables and joins from AST when checking required columns. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([4ertus2](https://github.com/4ertus2)) * Decreased the number of connections in case of large number of Distributed tables in a single server. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([Winter Zhang](https://github.com/zhang2014))
* Zero left padding PODArray so that -1 element is always valid and zeroed. It's used for branchless Offset access. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([amosbird](https://github.com/amosbird)) * Supported totals row for `WITH TOTALS` query for ODBC driver. [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([Maksim Koritckiy](https://github.com/nightweb))
* Performance improvement for int serialization. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([amosbird](https://github.com/amosbird)) * Allowed to use `Enum`s as integers inside if function. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([Ivan](https://github.com/abyss7))
* Moved debian/ specific entries to debian/.gitignore [#4106](https://github.com/yandex/ClickHouse/pull/4106) ([gerasiov](https://github.com/gerasiov)) * Added `low_cardinality_allow_in_native_format` setting. If disabled, do not use `LowCadrinality` type in `Native` format. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Decreased the number of connections in case of large number of Distributed tables in a single server. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([zhang2014](https://github.com/zhang2014)) * Removed some redundant objects from compiled expressions cache to lower memory usage. [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin))
* Supported totals row for `WITH TOTALS` query for ODBC driver (ODBCDriver2 format). [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([nightweb](https://github.com/nightweb)) * Add check that `SET send_logs_level = 'value'` query accept appropriate value. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([Sabyanin Maxim](https://github.com/s-mx))
* Better constant expression folding. Possibility to skip unused shards if SELECT query filters by sharding_key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([abyss7](https://github.com/abyss7)) * Fixed data type check in type conversion functions. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([Winter Zhang](https://github.com/zhang2014))
* Do not log from odbc-bridge when there is no console. [#3857](https://github.com/yandex/ClickHouse/pull/3857) ([alesapin](https://github.com/alesapin))
* Forbid using aggregate functions inside scalar subqueries. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([abyss7](https://github.com/abyss7)) ### Performance Improvements
* Added ability to use Enums as integers inside if function. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([abyss7](https://github.com/abyss7))
* Added `low_cardinality_allow_in_native_format` setting. If disabled, do not use `LowCadrinality` type in native format. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai)) * Add a MergeTree setting `use_minimalistic_part_header_in_zookeeper`. If enabled, Replicated tables will store compact part metadata in a single part znode. This can dramatically reduce ZooKeeper snapshot size (especially if the tables have a lot of columns). Note that after enabling this setting you will not be able to downgrade to a version that doesn't support it. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([Alex Zatelepin](https://github.com/ztlpn))
* Removed duplicate code. [#3915](https://github.com/yandex/ClickHouse/pull/3915) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev)) * Add an DFA-based implementation for functions `sequenceMatch` and `sequenceCount` in case pattern doesn't contain time. [#4004](https://github.com/yandex/ClickHouse/pull/4004) ([Léo Ercolanelli](https://github.com/ercolanelli-leo))
* Minor improvements in StorageKafka. [#3919](https://github.com/yandex/ClickHouse/pull/3919) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Performance improvement for integer numbers serialization. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([Amos Bird](https://github.com/amosbird))
* Automatically disable logs in negative tests. [#3940](https://github.com/yandex/ClickHouse/pull/3940) ([4ertus2](https://github.com/4ertus2)) * Zero left padding PODArray so that -1 element is always valid and zeroed. It's used for branchless calculation of offsets. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([Amos Bird](https://github.com/amosbird))
* Refactored SyntaxAnalyzer. [#4014](https://github.com/yandex/ClickHouse/pull/4014) ([4ertus2](https://github.com/4ertus2)) * Reverted `jemalloc` version which lead to performance degradation. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Reverted jemalloc patch which lead to performance degradation. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Refactored QueryNormalizer. Unified column sources for ASTIdentifier and ASTQualifiedAsterisk (were different), removed column duplicates for ASTQualifiedAsterisk sources, cleared asterisks replacement. [#4031](https://github.com/yandex/ClickHouse/pull/4031) ([4ertus2](https://github.com/4ertus2)) ### Backward Incompatible Changes
* Refactored code with ASTIdentifier. [#4056](https://github.com/yandex/ClickHouse/pull/4056) [#4077](https://github.com/yandex/ClickHouse/pull/4077) [#4087](https://github.com/yandex/ClickHouse/pull/4087) ([4ertus2](https://github.com/4ertus2))
* Improve error message in `clickhouse-test` script when no ClickHouse binary was found. [#4130](https://github.com/yandex/ClickHouse/pull/4130) ([Miniwoffer](https://github.com/Miniwoffer)) * Removed undocumented feature `ALTER MODIFY PRIMARY KEY` because it was superseded by the `ALTER MODIFY ORDER BY` command. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([Alex Zatelepin](https://github.com/ztlpn))
* Rewrited code to calculate integer conversion function monotonicity. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Removed function `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed typos in comments. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([kvinty](https://github.com/kvinty)) * Forbid using scalar subqueries with result of type `AggregateFunction`. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([Ivan](https://github.com/abyss7))
### Build/Testing/Packaging Improvements ### Build/Testing/Packaging Improvements
* Added minimal support for powerpc build. [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([danlark1](https://github.com/danlark1))
* Fixed error when the server cannot start with the `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` message within Docker or systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Added support for PowerPC (`ppc64le`) build. [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([Danila Kutenin](https://github.com/danlark1))
* Updated `mariadb-client` library. Fixed one of issues found by UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Some fixes for UBSan builds. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Move docker images to 18.10 and add compatibility file for glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin))
* Add env variable if user don't want to chown directories in server docker image. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin))
* Stateful functional tests are run on public available dataset. [#3969](https://github.com/yandex/ClickHouse/pull/3969) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Stateful functional tests are run on public available dataset. [#3969](https://github.com/yandex/ClickHouse/pull/3969) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Enabled most of the warnings from `-Weverything` in clang. Enabled `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Fixed error when the server cannot start with the `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` message within Docker or systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Link to libLLVM rather than to individual LLVM libs when USE_STATIC_LIBRARIES is off. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([orivej](https://github.com/orivej)) * Updated `rdkafka` library to v1.0.0-RC5. Used cppkafka instead of raw C interface. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([Ivan](https://github.com/abyss7))
* Added a few more warnings that are available only in clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Updated `mariadb-client` library. Fixed one of issues found by UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Some fixes for UBSan builds. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3021](https://github.com/yandex/ClickHouse/pull/3021) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added per-commit runs of tests with UBSan build.
* Added per-commit runs of PVS-Studio static analyzer.
* Fixed bugs found by PVS-Studio. [#4013](https://github.com/yandex/ClickHouse/pull/4013) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Fixed bugs found by PVS-Studio. [#4013](https://github.com/yandex/ClickHouse/pull/4013) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed glibc compatibility issues. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Move Docker images to 18.10 and add compatibility file for glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin))
* Add env variable if user don't want to chown directories in server Docker image. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin))
* Enabled most of the warnings from `-Weverything` in clang. Enabled `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added a few more warnings that are available only in clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Link to `libLLVM` rather than to individual LLVM libs when using shared linking. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([Orivej Desh](https://github.com/orivej))
* Added sanitizer variables for test images. [#4072](https://github.com/yandex/ClickHouse/pull/4072) ([alesapin](https://github.com/alesapin)) * Added sanitizer variables for test images. [#4072](https://github.com/yandex/ClickHouse/pull/4072) ([alesapin](https://github.com/alesapin))
* clickhouse-server debian package will recommend `libcap2-bin` package to use `setcap` tool for setting capabilities. This is optional. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov)) * `clickhouse-server` debian package will recommend `libcap2-bin` package to use `setcap` tool for setting capabilities. This is optional. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Improved compilation time, fixed includes. [#3898](https://github.com/yandex/ClickHouse/pull/3898) ([proller](https://github.com/proller)) * Improved compilation time, fixed includes. [#3898](https://github.com/yandex/ClickHouse/pull/3898) ([proller](https://github.com/proller))
* Added performance tests for hash functions. [#3918](https://github.com/yandex/ClickHouse/pull/3918) ([filimonov](https://github.com/filimonov)) * Added performance tests for hash functions. [#3918](https://github.com/yandex/ClickHouse/pull/3918) ([filimonov](https://github.com/filimonov))
* Fixed cyclic library dependences. [#3958](https://github.com/yandex/ClickHouse/pull/3958) ([proller](https://github.com/proller)) * Fixed cyclic library dependences. [#3958](https://github.com/yandex/ClickHouse/pull/3958) ([proller](https://github.com/proller))
* Improved compilation with low available memory. [#4030](https://github.com/yandex/ClickHouse/pull/4030) ([proller](https://github.com/proller)) * Improved compilation with low available memory. [#4030](https://github.com/yandex/ClickHouse/pull/4030) ([proller](https://github.com/proller))
* Added test script to reproduce performance degradation in `jemalloc`. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed misspells in comments and string literals under `dbms`. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha))
* Fixed typos in comments. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([Evgenii Pravda](https://github.com/kvinty))
### Bug Fixes
* Fix bug when in remote table function execution when wrong restrictions were used for in `getStructureOfRemoteTable`. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin))
* Fix a leak of netlink sockets. They were placed in a pool where they were never deleted and new sockets were created at the start of a new thread when all current sockets were in use. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([ztlpn](https://github.com/ztlpn))
* Regression in master. Fix "Unknown identifier" error in case column names appear in lambdas. [#4115](https://github.com/yandex/ClickHouse/pull/4115) ([4ertus2](https://github.com/4ertus2))
* Fix bug with closing /proc/self/fd earlier than all fds were read from /proc. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin))
* Fixed misspells in **comments** and **string literals** under `dbms`. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha))
* Fixed String to UInt monotonic conversion in case of usage String in primary key. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([zhang2014](https://github.com/zhang2014))
* Add checking that 'SET send_logs_level = value' query accept appropriate value. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([s-mx](https://github.com/s-mx))
* Fixed a race condition when executing a distributed ALTER task. The race condition led to more than one replica trying to execute the task and all replicas except one failing with a ZooKeeper error. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([ztlpn](https://github.com/ztlpn))
* Fixed segfault in `arrayEnumerateUniq`, `arrayEnumerateDense` functions in case of some invalid arguments. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix UB in StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([amosbird](https://github.com/amosbird))
* Fixed segfault in functions `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error: functions `round`, `floor`, `trunc`, `ceil` may return bogus result when executed on integer argument and large negative scale. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed a bug introduced by 'kill query sync' which leads to a core dump. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([fancyqlx](https://github.com/fancyqlx))
* Fix bug with long delay after empty replication queue. [#3928](https://github.com/yandex/ClickHouse/pull/3928) ([alesapin](https://github.com/alesapin))
* Don't do exponential backoff when there is nothing to do for task. [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin))
* Fix a bug that led to hangups in threads that perform ALTERs of Replicated tables and in the thread that updates configuration from ZooKeeper. #2947 #3891 [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([ztlpn](https://github.com/ztlpn))
* Fixed error in internal implementation of `quantileTDigest` (found by Artem Vakhrushev). This error never happens in ClickHouse and was relevant only for those who use ClickHouse codebase as a library directly. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug with wrong prefix for ipv4 subnet masks. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin))
* Fix a bug when `from_zk` config elements weren't refreshed after a request to ZooKeeper timed out. #2947 [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([ztlpn](https://github.com/ztlpn))
* Fixed dictionary copying at LowCardinality::cloneEmpty() method which lead to excessive memory usage in case of inserting into table with LowCardinality primary key. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed crash (`std::terminate`) in rare cases when a new thread cannot be created due to exhausted resources. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix user and password forwarding for replicated tables queries. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin))
* Fixed very rare race condition that can happen when listing tables in Dictionary database while reloading dictionaries. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed LowCardinality serialization for Native format in case of empty arrays. #3907 [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed incorrect result while using distinct by single LowCardinality numeric column. #3895 [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Make compiled_expression_cache_size setting limited by default. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin))
* Fix ubsan bug in compression codecs. [#4069](https://github.com/yandex/ClickHouse/pull/4069) ([alesapin](https://github.com/alesapin))
* Allow Kafka Engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([abyss7](https://github.com/abyss7))
* Fixed glibc compatibility issues. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed issues found by PVS-Studio. [#4103](https://github.com/yandex/ClickHouse/pull/4103) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix a way how to collect array join columns. [#4121](https://github.com/yandex/ClickHouse/pull/4121) ([4ertus2](https://github.com/4ertus2))
* Fixed incorrect result when HAVING was used with ROLLUP or CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([reflection](https://github.com/reflection))
* Fixed specialized aggregation with LowCardinality key (in case when `compile` setting is enabled). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed data type check in type conversion functions. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([zhang2014](https://github.com/zhang2014))
* Fixed column aliases for query with `JOIN ON` syntax and distributed tables. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([zhang2014](https://github.com/zhang2014))
* Fixed issues detected by UBSan. [#3021](https://github.com/yandex/ClickHouse/pull/3021) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Doc fixes
* Translated table engines related part to Chinese. [#3844](https://github.com/yandex/ClickHouse/pull/3844) ([lamber-ken](https://github.com/lamber-ken))
* Fixed `toStartOfFiveMinute` description. [#4096](https://github.com/yandex/ClickHouse/pull/4096) ([cheesedosa](https://github.com/cheesedosa))
* Added description for client `--secure` argument. [#3961](https://github.com/yandex/ClickHouse/pull/3961) ([vicdashkov](https://github.com/vicdashkov))
* Added descriptions for settings `merge_tree_uniform_read_distribution`, `merge_tree_min_rows_for_concurrent_read`, `merge_tree_min_rows_for_seek`, `merge_tree_coarse_index_granularity`, `merge_tree_max_rows_to_use_cache` [#4024](https://github.com/yandex/ClickHouse/pull/4024) ([BayoNet](https://github.com/BayoNet))
* Minor doc fixes. [#4098](https://github.com/yandex/ClickHouse/pull/4098) ([blinkov](https://github.com/blinkov))
* Updated example for zookeeper config setting. [#3883](https://github.com/yandex/ClickHouse/pull/3883) [#3894](https://github.com/yandex/ClickHouse/pull/3894) ([ogorbacheva](https://github.com/ogorbacheva))
* Updated info about escaping in formats Vertical, Pretty and VerticalRaw. [#4118](https://github.com/yandex/ClickHouse/pull/4118) ([ogorbacheva](https://github.com/ogorbacheva))
* Adding description of the functions for working with UUID. [#4059](https://github.com/yandex/ClickHouse/pull/4059) ([ogorbacheva](https://github.com/ogorbacheva))
* Add the description of the CHECK TABLE query. [#3881](https://github.com/yandex/ClickHouse/pull/3881) [#4043](https://github.com/yandex/ClickHouse/pull/4043) ([ogorbacheva](https://github.com/ogorbacheva))
* Add `zh/tests` doc translate to Chinese. [#4034](https://github.com/yandex/ClickHouse/pull/4034) ([sundy-li](https://github.com/sundy-li))
* Added documentation about functions `multiPosition`, `firstMatch`, `multiSearch`. [#4123](https://github.com/yandex/ClickHouse/pull/4123) ([danlark1](https://github.com/danlark1))
* Add puppet module to the list of the third party libraries. [#3862](https://github.com/yandex/ClickHouse/pull/3862) ([Felixoid](https://github.com/Felixoid))
* Fixed typo in the English version of Creating a Table example [#3872](https://github.com/yandex/ClickHouse/pull/3872) ([areldar](https://github.com/areldar))
* Mention about nagios plugin for ClickHouse [#3878](https://github.com/yandex/ClickHouse/pull/3878) ([lisuml](https://github.com/lisuml))
* Update of query language syntax description. [#4065](https://github.com/yandex/ClickHouse/pull/4065) ([BayoNet](https://github.com/BayoNet))
* Added documentation for per-column compression codecs. [#4073](https://github.com/yandex/ClickHouse/pull/4073) ([alex-krash](https://github.com/alex-krash))
* Updated articles about CollapsingMergeTree, GraphiteMergeTree, Replicated*MergeTree, `CREATE TABLE` query [#4085](https://github.com/yandex/ClickHouse/pull/4085) ([BayoNet](https://github.com/BayoNet))
* Other minor improvements. [#3897](https://github.com/yandex/ClickHouse/pull/3897) [#3923](https://github.com/yandex/ClickHouse/pull/3923) [#4066](https://github.com/yandex/ClickHouse/pull/4066) [#3860](https://github.com/yandex/ClickHouse/pull/3860) [#3906](https://github.com/yandex/ClickHouse/pull/3906) [#3936](https://github.com/yandex/ClickHouse/pull/3936) [#3975](https://github.com/yandex/ClickHouse/pull/3975) ([ogorbacheva](https://github.com/ogorbacheva)) ([ogorbacheva](https://github.com/ogorbacheva)) ([ogorbacheva](https://github.com/ogorbacheva)) ([blinkov](https://github.com/blinkov)) ([blinkov](https://github.com/blinkov)) ([sdk2](https://github.com/sdk2)) ([blinkov](https://github.com/blinkov))
### Other
* Updated librdkafka to v1.0.0-RC5. Used cppkafka instead of raw C interface. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([abyss7](https://github.com/abyss7))
* Fixed `hidden` on page title [#4033](https://github.com/yandex/ClickHouse/pull/4033) ([xboston](https://github.com/xboston))
* Updated year in copyright to 2019. [#4039](https://github.com/yandex/ClickHouse/pull/4039) ([xboston](https://github.com/xboston))
* Added check that server process is started from the data directory's owner. Do not start server from root. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
* Removed function `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed typo in ClusterCopier. [#3854](https://github.com/yandex/ClickHouse/pull/3854) ([dqminh](https://github.com/dqminh))
* Minor grammar fixes. [#3855](https://github.com/yandex/ClickHouse/pull/3855) ([intgr](https://github.com/intgr))
* Added test script to reproduce performance degradation in jemalloc. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov))
## ClickHouse release 18.16.1, 2018-12-21 ## ClickHouse release 18.16.1, 2018-12-21

View File

@ -96,7 +96,7 @@ option (ENABLE_TESTS "Enables tests" ON)
if (CMAKE_SYSTEM_PROCESSOR MATCHES "amd64|x86_64") if (CMAKE_SYSTEM_PROCESSOR MATCHES "amd64|x86_64")
option (USE_INTERNAL_MEMCPY "Use internal implementation of 'memcpy' function instead of provided by libc. Only for x86_64." ON) option (USE_INTERNAL_MEMCPY "Use internal implementation of 'memcpy' function instead of provided by libc. Only for x86_64." ON)
if (OS_LINUX AND NOT UNBUNDLED AND MAKE_STATIC_LIBRARIES) if (OS_LINUX AND NOT UNBUNDLED AND MAKE_STATIC_LIBRARIES AND CMAKE_VERSION VERSION_GREATER "3.9.0")
option (GLIBC_COMPATIBILITY "Set to TRUE to enable compatibility with older glibc libraries. Only for x86_64, Linux. Implies USE_INTERNAL_MEMCPY." ON) option (GLIBC_COMPATIBILITY "Set to TRUE to enable compatibility with older glibc libraries. Only for x86_64, Linux. Implies USE_INTERNAL_MEMCPY." ON)
if (GLIBC_COMPATIBILITY) if (GLIBC_COMPATIBILITY)
message (STATUS "Some symbols from glibc will be replaced for compatibility") message (STATUS "Some symbols from glibc will be replaced for compatibility")

View File

@ -5,13 +5,24 @@ if (NOT USE_INTERNAL_RE2_LIBRARY)
find_path (RE2_INCLUDE_DIR NAMES re2/re2.h PATHS ${RE2_INCLUDE_PATHS}) find_path (RE2_INCLUDE_DIR NAMES re2/re2.h PATHS ${RE2_INCLUDE_PATHS})
endif () endif ()
string(FIND ${CMAKE_CURRENT_BINARY_DIR} " " _have_space)
if(_have_space GREATER 0)
message(WARNING "Using spaces in build path [${CMAKE_CURRENT_BINARY_DIR}] highly not recommended. Library re2st will be disabled.")
set (MISSING_INTERNAL_RE2_ST_LIBRARY 1)
endif()
if (RE2_LIBRARY AND RE2_INCLUDE_DIR) if (RE2_LIBRARY AND RE2_INCLUDE_DIR)
set (RE2_ST_LIBRARY ${RE2_LIBRARY}) set (RE2_ST_LIBRARY ${RE2_LIBRARY})
else () elseif (NOT MISSING_INTERNAL_RE2_LIBRARY)
set (USE_INTERNAL_RE2_LIBRARY 1) set (USE_INTERNAL_RE2_LIBRARY 1)
set (RE2_LIBRARY re2) set (RE2_LIBRARY re2)
set (RE2_ST_LIBRARY re2_st) set (RE2_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/re2)
set (USE_RE2_ST 1) if (NOT MISSING_INTERNAL_RE2_ST_LIBRARY)
set (RE2_ST_LIBRARY re2_st)
set (USE_RE2_ST 1)
else ()
set (RE2_ST_LIBRARY ${RE2_LIBRARY})
endif ()
endif () endif ()
message (STATUS "Using re2: ${RE2_INCLUDE_DIR} : ${RE2_LIBRARY}; ${RE2_ST_INCLUDE_DIR} : ${RE2_ST_LIBRARY}") message (STATUS "Using re2: ${RE2_INCLUDE_DIR} : ${RE2_LIBRARY}; ${RE2_ST_INCLUDE_DIR} : ${RE2_ST_LIBRARY}")

View File

@ -206,6 +206,8 @@ target_link_libraries (clickhouse_common_io
${CMAKE_DL_LIBS} ${CMAKE_DL_LIBS}
) )
target_include_directories(clickhouse_common_io SYSTEM BEFORE PUBLIC ${RE2_INCLUDE_DIR})
if(CPUID_LIBRARY) if(CPUID_LIBRARY)
target_link_libraries(clickhouse_common_io PRIVATE ${CPUID_LIBRARY}) target_link_libraries(clickhouse_common_io PRIVATE ${CPUID_LIBRARY})
endif() endif()
@ -235,9 +237,6 @@ target_link_libraries (dbms
Threads::Threads Threads::Threads
) )
if (NOT USE_INTERNAL_RE2_LIBRARY)
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${RE2_INCLUDE_DIR})
endif ()
if (NOT USE_INTERNAL_BOOST_LIBRARY) if (NOT USE_INTERNAL_BOOST_LIBRARY)
target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${Boost_INCLUDE_DIRS}) target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${Boost_INCLUDE_DIRS})
@ -257,7 +256,6 @@ if (USE_POCO_SQLODBC)
endif() endif()
endif() endif()
#if (Poco_Data_FOUND AND NOT USE_INTERNAL_POCO_LIBRARY)
if (Poco_Data_FOUND) if (Poco_Data_FOUND)
target_include_directories (clickhouse_common_io SYSTEM PRIVATE ${Poco_Data_INCLUDE_DIR}) target_include_directories (clickhouse_common_io SYSTEM PRIVATE ${Poco_Data_INCLUDE_DIR})
target_include_directories (dbms SYSTEM PRIVATE ${Poco_Data_INCLUDE_DIR}) target_include_directories (dbms SYSTEM PRIVATE ${Poco_Data_INCLUDE_DIR})

View File

@ -28,11 +28,18 @@ add_subdirectory (copier)
add_subdirectory (format) add_subdirectory (format)
add_subdirectory (clang) add_subdirectory (clang)
add_subdirectory (obfuscator) add_subdirectory (obfuscator)
add_subdirectory (odbc-bridge)
if (ENABLE_CLICKHOUSE_ODBC_BRIDGE)
add_subdirectory (odbc-bridge)
endif ()
if (CLICKHOUSE_SPLIT_BINARY) if (CLICKHOUSE_SPLIT_BINARY)
set (CLICKHOUSE_ALL_TARGETS clickhouse-server clickhouse-client clickhouse-local clickhouse-benchmark clickhouse-performance-test set (CLICKHOUSE_ALL_TARGETS clickhouse-server clickhouse-client clickhouse-local clickhouse-benchmark clickhouse-performance-test
clickhouse-extract-from-config clickhouse-compressor clickhouse-format clickhouse-copier clickhouse-odbc-bridge) clickhouse-extract-from-config clickhouse-compressor clickhouse-format clickhouse-copier)
if (ENABLE_CLICKHOUSE_ODBC_BRIDGE)
list (APPEND CLICKHOUSE_ALL_TARGETS clickhouse-odbc-bridge)
endif ()
if (USE_EMBEDDED_COMPILER) if (USE_EMBEDDED_COMPILER)
list (APPEND CLICKHOUSE_ALL_TARGETS clickhouse-clang clickhouse-lld) list (APPEND CLICKHOUSE_ALL_TARGETS clickhouse-clang clickhouse-lld)
@ -85,9 +92,6 @@ else ()
if (USE_EMBEDDED_COMPILER) if (USE_EMBEDDED_COMPILER)
target_link_libraries (clickhouse PRIVATE clickhouse-compiler-lib) target_link_libraries (clickhouse PRIVATE clickhouse-compiler-lib)
endif () endif ()
if (ENABLE_CLICKHOUSE_ODBC_BRIDGE)
target_link_libraries (clickhouse PRIVATE clickhouse-odbc-bridge-lib)
endif()
set (CLICKHOUSE_BUNDLE) set (CLICKHOUSE_BUNDLE)
if (ENABLE_CLICKHOUSE_SERVER) if (ENABLE_CLICKHOUSE_SERVER)
@ -135,15 +139,14 @@ else ()
install (FILES ${CMAKE_CURRENT_BINARY_DIR}/clickhouse-format DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) install (FILES ${CMAKE_CURRENT_BINARY_DIR}/clickhouse-format DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
list(APPEND CLICKHOUSE_BUNDLE clickhouse-format) list(APPEND CLICKHOUSE_BUNDLE clickhouse-format)
endif () endif ()
if (ENABLE_CLICKHOUSE_COPIER) if (ENABLE_CLICKHOUSE_OBFUSCATOR)
add_custom_target (clickhouse-obfuscator ALL COMMAND ${CMAKE_COMMAND} -E create_symlink clickhouse clickhouse-obfuscator DEPENDS clickhouse) add_custom_target (clickhouse-obfuscator ALL COMMAND ${CMAKE_COMMAND} -E create_symlink clickhouse clickhouse-obfuscator DEPENDS clickhouse)
install (FILES ${CMAKE_CURRENT_BINARY_DIR}/clickhouse-obfuscator DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) install (FILES ${CMAKE_CURRENT_BINARY_DIR}/clickhouse-obfuscator DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
list(APPEND CLICKHOUSE_BUNDLE clickhouse-obfuscator) list(APPEND CLICKHOUSE_BUNDLE clickhouse-obfuscator)
endif () endif ()
if (ENABLE_CLICKHOUSE_ODBC_BRIDGE) if (ENABLE_CLICKHOUSE_ODBC_BRIDGE)
add_custom_target (clickhouse-odbc-bridge ALL COMMAND ${CMAKE_COMMAND} -E create_symlink clickhouse clickhouse-odbc-bridge DEPENDS clickhouse) # just to be able to run integration tests
install (FILES ${CMAKE_CURRENT_BINARY_DIR}/clickhouse-odbc-bridge DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) add_custom_target (clickhouse-odbc-bridge-copy ALL COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_BINARY_DIR}/odbc-bridge/clickhouse-odbc-bridge clickhouse-odbc-bridge DEPENDS clickhouse-odbc-bridge)
list(APPEND CLICKHOUSE_BUNDLE clickhouse-odbc-bridge)
endif () endif ()

View File

@ -56,9 +56,6 @@ int mainEntryClickHouseClusterCopier(int argc, char ** argv);
#if ENABLE_CLICKHOUSE_OBFUSCATOR || !defined(ENABLE_CLICKHOUSE_OBFUSCATOR) #if ENABLE_CLICKHOUSE_OBFUSCATOR || !defined(ENABLE_CLICKHOUSE_OBFUSCATOR)
int mainEntryClickHouseObfuscator(int argc, char ** argv); int mainEntryClickHouseObfuscator(int argc, char ** argv);
#endif #endif
#if ENABLE_CLICKHOUSE_ODBC_BRIDGE || !defined(ENABLE_CLICKHOUSE_ODBC_BRIDGE)
int mainEntryClickHouseODBCBridge(int argc, char ** argv);
#endif
#if USE_EMBEDDED_COMPILER #if USE_EMBEDDED_COMPILER
@ -105,9 +102,6 @@ std::pair<const char *, MainFunc> clickhouse_applications[] =
#if ENABLE_CLICKHOUSE_OBFUSCATOR || !defined(ENABLE_CLICKHOUSE_OBFUSCATOR) #if ENABLE_CLICKHOUSE_OBFUSCATOR || !defined(ENABLE_CLICKHOUSE_OBFUSCATOR)
{"obfuscator", mainEntryClickHouseObfuscator}, {"obfuscator", mainEntryClickHouseObfuscator},
#endif #endif
#if ENABLE_CLICKHOUSE_ODBC_BRIDGE || !defined(ENABLE_CLICKHOUSE_ODBC_BRIDGE)
{"odbc-bridge", mainEntryClickHouseODBCBridge},
#endif
#if USE_EMBEDDED_COMPILER #if USE_EMBEDDED_COMPILER
{"clang", mainEntryClickHouseClang}, {"clang", mainEntryClickHouseClang},

View File

@ -9,7 +9,7 @@ add_library (clickhouse-odbc-bridge-lib ${LINK_MODE}
validateODBCConnectionString.cpp validateODBCConnectionString.cpp
) )
target_link_libraries (clickhouse-odbc-bridge-lib PRIVATE clickhouse_dictionaries daemon dbms clickhouse_common_io) target_link_libraries (clickhouse-odbc-bridge-lib PRIVATE daemon dbms clickhouse_common_io)
target_include_directories (clickhouse-odbc-bridge-lib PUBLIC ${ClickHouse_SOURCE_DIR}/libs/libdaemon/include) target_include_directories (clickhouse-odbc-bridge-lib PUBLIC ${ClickHouse_SOURCE_DIR}/libs/libdaemon/include)
if (USE_POCO_SQLODBC) if (USE_POCO_SQLODBC)
@ -33,8 +33,11 @@ if (ENABLE_TESTS)
add_subdirectory (tests) add_subdirectory (tests)
endif () endif ()
if (CLICKHOUSE_SPLIT_BINARY) # clickhouse-odbc-bridge is always a separate binary.
add_executable (clickhouse-odbc-bridge odbc-bridge.cpp) # Reason: it must not export symbols from SSL, mariadb-client, etc. to not break ABI compatibility with ODBC drivers.
target_link_libraries (clickhouse-odbc-bridge PRIVATE clickhouse-odbc-bridge-lib) # For this reason, we disabling -rdynamic linker flag. But we do it in strange way:
install (TARGETS clickhouse-odbc-bridge ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) SET(CMAKE_SHARED_LIBRARY_LINK_CXX_FLAGS "")
endif ()
add_executable (clickhouse-odbc-bridge odbc-bridge.cpp)
target_link_libraries (clickhouse-odbc-bridge PRIVATE clickhouse-odbc-bridge-lib)
install (TARGETS clickhouse-odbc-bridge RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)

View File

@ -1,4 +1,16 @@
add_library (clickhouse-performance-test-lib ${LINK_MODE} PerformanceTest.cpp) add_library (clickhouse-performance-test-lib ${LINK_MODE}
JSONString.cpp
StopConditionsSet.cpp
TestStopConditions.cpp
TestStats.cpp
ConfigPreprocessor.cpp
PerformanceTest.cpp
PerformanceTestInfo.cpp
executeQuery.cpp
applySubstitutions.cpp
ReportBuilder.cpp
PerformanceTestSuite.cpp
)
target_link_libraries (clickhouse-performance-test-lib PRIVATE dbms clickhouse_common_io clickhouse_common_config ${Boost_PROGRAM_OPTIONS_LIBRARY}) target_link_libraries (clickhouse-performance-test-lib PRIVATE dbms clickhouse_common_io clickhouse_common_config ${Boost_PROGRAM_OPTIONS_LIBRARY})
target_include_directories (clickhouse-performance-test-lib SYSTEM PRIVATE ${PCG_RANDOM_INCLUDE_DIR}) target_include_directories (clickhouse-performance-test-lib SYSTEM PRIVATE ${PCG_RANDOM_INCLUDE_DIR})

View File

@ -0,0 +1,90 @@
#include "ConfigPreprocessor.h"
#include <Core/Types.h>
#include <Poco/Path.h>
#include <regex>
namespace DB
{
std::vector<XMLConfigurationPtr> ConfigPreprocessor::processConfig(
const Strings & tests_tags,
const Strings & tests_names,
const Strings & tests_names_regexp,
const Strings & skip_tags,
const Strings & skip_names,
const Strings & skip_names_regexp) const
{
std::vector<XMLConfigurationPtr> result;
for (const auto & path : paths)
{
result.emplace_back(new XMLConfiguration(path));
result.back()->setString("path", Poco::Path(path).absolute().toString());
}
/// Leave tests:
removeConfigurationsIf(result, FilterType::Tag, tests_tags, true);
removeConfigurationsIf(result, FilterType::Name, tests_names, true);
removeConfigurationsIf(result, FilterType::Name_regexp, tests_names_regexp, true);
/// Skip tests
removeConfigurationsIf(result, FilterType::Tag, skip_tags, false);
removeConfigurationsIf(result, FilterType::Name, skip_names, false);
removeConfigurationsIf(result, FilterType::Name_regexp, skip_names_regexp, false);
return result;
}
void ConfigPreprocessor::removeConfigurationsIf(
std::vector<XMLConfigurationPtr> & configs,
ConfigPreprocessor::FilterType filter_type,
const Strings & values,
bool leave) const
{
auto checker = [&filter_type, &values, &leave] (XMLConfigurationPtr & config)
{
if (values.size() == 0)
return false;
bool remove_or_not = false;
if (filter_type == FilterType::Tag)
{
Strings tags_keys;
config->keys("tags", tags_keys);
Strings tags(tags_keys.size());
for (size_t i = 0; i != tags_keys.size(); ++i)
tags[i] = config->getString("tags.tag[" + std::to_string(i) + "]");
for (const std::string & config_tag : tags)
{
if (std::find(values.begin(), values.end(), config_tag) != values.end())
remove_or_not = true;
}
}
if (filter_type == FilterType::Name)
{
remove_or_not = (std::find(values.begin(), values.end(), config->getString("name", "")) != values.end());
}
if (filter_type == FilterType::Name_regexp)
{
std::string config_name = config->getString("name", "");
auto regex_checker = [&config_name](const std::string & name_regexp)
{
std::regex pattern(name_regexp);
return std::regex_search(config_name, pattern);
};
remove_or_not = config->has("name") ? (std::find_if(values.begin(), values.end(), regex_checker) != values.end()) : false;
}
if (leave)
remove_or_not = !remove_or_not;
return remove_or_not;
};
auto new_end = std::remove_if(configs.begin(), configs.end(), checker);
configs.erase(new_end, configs.end());
}
}

View File

@ -0,0 +1,50 @@
#pragma once
#include <Poco/DOM/Document.h>
#include <Poco/Util/XMLConfiguration.h>
#include <Core/Types.h>
#include <vector>
#include <string>
namespace DB
{
using XMLConfiguration = Poco::Util::XMLConfiguration;
using XMLConfigurationPtr = Poco::AutoPtr<XMLConfiguration>;
using XMLDocumentPtr = Poco::AutoPtr<Poco::XML::Document>;
class ConfigPreprocessor
{
public:
ConfigPreprocessor(const Strings & paths_)
: paths(paths_)
{}
std::vector<XMLConfigurationPtr> processConfig(
const Strings & tests_tags,
const Strings & tests_names,
const Strings & tests_names_regexp,
const Strings & skip_tags,
const Strings & skip_names,
const Strings & skip_names_regexp) const;
private:
enum class FilterType
{
Tag,
Name,
Name_regexp
};
/// Removes configurations that has a given value.
/// If leave is true, the logic is reversed.
void removeConfigurationsIf(
std::vector<XMLConfigurationPtr> & configs,
FilterType filter_type,
const Strings & values,
bool leave = false) const;
const Strings paths;
};
}

View File

@ -0,0 +1,66 @@
#include "JSONString.h"
#include <regex>
#include <sstream>
namespace DB
{
namespace
{
std::string pad(size_t padding)
{
return std::string(padding * 4, ' ');
}
const std::regex NEW_LINE{"\n"};
}
void JSONString::set(const std::string & key, std::string value, bool wrap)
{
if (value.empty())
value = "null";
bool reserved = (value[0] == '[' || value[0] == '{' || value == "null");
if (!reserved && wrap)
value = '"' + std::regex_replace(value, NEW_LINE, "\\n") + '"';
content[key] = value;
}
void JSONString::set(const std::string & key, const std::vector<JSONString> & run_infos)
{
std::ostringstream value;
value << "[\n";
for (size_t i = 0; i < run_infos.size(); ++i)
{
value << pad(padding + 1) + run_infos[i].asString(padding + 2);
if (i != run_infos.size() - 1)
value << ',';
value << "\n";
}
value << pad(padding) << ']';
content[key] = value.str();
}
std::string JSONString::asString(size_t cur_padding) const
{
std::ostringstream repr;
repr << "{";
for (auto it = content.begin(); it != content.end(); ++it)
{
if (it != content.begin())
repr << ',';
/// construct "key": "value" string with padding
repr << "\n" << pad(cur_padding) << '"' << it->first << '"' << ": " << it->second;
}
repr << "\n" << pad(cur_padding - 1) << '}';
return repr.str();
}
}

View File

@ -0,0 +1,40 @@
#pragma once
#include <Core/Types.h>
#include <sys/stat.h>
#include <type_traits>
#include <vector>
#include <map>
namespace DB
{
/// NOTE The code is totally wrong.
class JSONString
{
private:
std::map<std::string, std::string> content;
size_t padding;
public:
explicit JSONString(size_t padding_ = 1) : padding(padding_) {}
void set(const std::string & key, std::string value, bool wrap = true);
template <typename T>
std::enable_if_t<std::is_arithmetic_v<T>> set(const std::string key, T value)
{
set(key, std::to_string(value), /*wrap= */ false);
}
void set(const std::string & key, const std::vector<JSONString> & run_infos);
std::string asString() const
{
return asString(padding);
}
std::string asString(size_t cur_padding) const;
};
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,59 @@
#pragma once
#include <Client/Connection.h>
#include <Common/InterruptListener.h>
#include <common/logger_useful.h>
#include <Poco/Util/XMLConfiguration.h>
#include "PerformanceTestInfo.h"
namespace DB
{
using XMLConfiguration = Poco::Util::XMLConfiguration;
using XMLConfigurationPtr = Poco::AutoPtr<XMLConfiguration>;
using QueriesWithIndexes = std::vector<std::pair<std::string, size_t>>;
class PerformanceTest
{
public:
PerformanceTest(
const XMLConfigurationPtr & config_,
Connection & connection_,
InterruptListener & interrupt_listener_,
const PerformanceTestInfo & test_info_,
Context & context_);
bool checkPreconditions() const;
std::vector<TestStats> execute();
const PerformanceTestInfo & getTestInfo() const
{
return test_info;
}
bool checkSIGINT() const
{
return got_SIGINT;
}
private:
void runQueries(
const QueriesWithIndexes & queries_with_indexes,
std::vector<TestStats> & statistics_by_run);
UInt64 calculateMaxExecTime() const;
private:
XMLConfigurationPtr config;
Connection & connection;
InterruptListener & interrupt_listener;
PerformanceTestInfo test_info;
Context & context;
Poco::Logger * log;
bool got_SIGINT = false;
};
}

View File

@ -0,0 +1,272 @@
#include "PerformanceTestInfo.h"
#include <Common/getMultipleKeysFromConfig.h>
#include <IO/ReadBufferFromFile.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromFile.h>
#include <boost/filesystem.hpp>
#include "applySubstitutions.h"
#include <iostream>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
namespace
{
void extractSettings(
const XMLConfigurationPtr & config,
const std::string & key,
const Strings & settings_list,
std::map<std::string, std::string> & settings_to_apply)
{
for (const std::string & setup : settings_list)
{
if (setup == "profile")
continue;
std::string value = config->getString(key + "." + setup);
if (value.empty())
value = "true";
settings_to_apply[setup] = value;
}
}
void checkMetricsInput(const Strings & metrics, ExecutionType exec_type)
{
Strings loop_metrics = {
"min_time", "quantiles", "total_time",
"queries_per_second", "rows_per_second",
"bytes_per_second"};
Strings non_loop_metrics = {
"max_rows_per_second", "max_bytes_per_second",
"avg_rows_per_second", "avg_bytes_per_second"};
if (exec_type == ExecutionType::Loop)
{
for (const std::string & metric : metrics)
{
auto non_loop_pos =
std::find(non_loop_metrics.begin(), non_loop_metrics.end(), metric);
if (non_loop_pos != non_loop_metrics.end())
throw Exception("Wrong type of metric for loop execution type (" + metric + ")",
ErrorCodes::BAD_ARGUMENTS);
}
}
else
{
for (const std::string & metric : metrics)
{
auto loop_pos = std::find(loop_metrics.begin(), loop_metrics.end(), metric);
if (loop_pos != loop_metrics.end())
throw Exception(
"Wrong type of metric for non-loop execution type (" + metric + ")",
ErrorCodes::BAD_ARGUMENTS);
}
}
}
}
namespace fs = boost::filesystem;
PerformanceTestInfo::PerformanceTestInfo(
XMLConfigurationPtr config,
const std::string & profiles_file_)
: profiles_file(profiles_file_)
{
test_name = config->getString("name");
path = config->getString("path");
applySettings(config);
extractQueries(config);
processSubstitutions(config);
getExecutionType(config);
getStopConditions(config);
getMetrics(config);
}
void PerformanceTestInfo::applySettings(XMLConfigurationPtr config)
{
if (config->has("settings"))
{
std::map<std::string, std::string> settings_to_apply;
Strings config_settings;
config->keys("settings", config_settings);
auto settings_contain = [&config_settings] (const std::string & setting)
{
auto position = std::find(config_settings.begin(), config_settings.end(), setting);
return position != config_settings.end();
};
/// Preprocess configuration file
if (settings_contain("profile"))
{
if (!profiles_file.empty())
{
std::string profile_name = config->getString("settings.profile");
XMLConfigurationPtr profiles_config(new XMLConfiguration(profiles_file));
Strings profile_settings;
profiles_config->keys("profiles." + profile_name, profile_settings);
extractSettings(profiles_config, "profiles." + profile_name, profile_settings, settings_to_apply);
}
}
extractSettings(config, "settings", config_settings, settings_to_apply);
/// This macro goes through all settings in the Settings.h
/// and, if found any settings in test's xml configuration
/// with the same name, sets its value to settings
std::map<std::string, std::string>::iterator it;
#define EXTRACT_SETTING(TYPE, NAME, DEFAULT, DESCRIPTION) \
it = settings_to_apply.find(#NAME); \
if (it != settings_to_apply.end()) \
settings.set(#NAME, settings_to_apply[#NAME]);
APPLY_FOR_SETTINGS(EXTRACT_SETTING)
#undef EXTRACT_SETTING
if (settings_contain("average_rows_speed_precision"))
TestStats::avg_rows_speed_precision =
config->getDouble("settings.average_rows_speed_precision");
if (settings_contain("average_bytes_speed_precision"))
TestStats::avg_bytes_speed_precision =
config->getDouble("settings.average_bytes_speed_precision");
}
}
void PerformanceTestInfo::extractQueries(XMLConfigurationPtr config)
{
if (config->has("query"))
queries = getMultipleValuesFromConfig(*config, "", "query");
if (config->has("query_file"))
{
const std::string filename = config->getString("query_file");
if (filename.empty())
throw Exception("Empty file name", ErrorCodes::BAD_ARGUMENTS);
bool tsv = fs::path(filename).extension().string() == ".tsv";
ReadBufferFromFile query_file(filename);
std::string query;
if (tsv)
{
while (!query_file.eof())
{
readEscapedString(query, query_file);
assertChar('\n', query_file);
queries.push_back(query);
}
}
else
{
readStringUntilEOF(query, query_file);
queries.push_back(query);
}
}
if (queries.empty())
throw Exception("Did not find any query to execute: " + test_name,
ErrorCodes::BAD_ARGUMENTS);
}
void PerformanceTestInfo::processSubstitutions(XMLConfigurationPtr config)
{
if (config->has("substitutions"))
{
/// Make "subconfig" of inner xml block
ConfigurationPtr substitutions_view(config->createView("substitutions"));
constructSubstitutions(substitutions_view, substitutions);
auto queries_pre_format = queries;
queries.clear();
for (const auto & query : queries_pre_format)
{
auto formatted = formatQueries(query, substitutions);
queries.insert(queries.end(), formatted.begin(), formatted.end());
}
}
}
void PerformanceTestInfo::getExecutionType(XMLConfigurationPtr config)
{
if (!config->has("type"))
throw Exception("Missing type property in config: " + test_name,
ErrorCodes::BAD_ARGUMENTS);
std::string config_exec_type = config->getString("type");
if (config_exec_type == "loop")
exec_type = ExecutionType::Loop;
else if (config_exec_type == "once")
exec_type = ExecutionType::Once;
else
throw Exception("Unknown type " + config_exec_type + " in :" + test_name,
ErrorCodes::BAD_ARGUMENTS);
}
void PerformanceTestInfo::getStopConditions(XMLConfigurationPtr config)
{
TestStopConditions stop_conditions_template;
if (config->has("stop_conditions"))
{
ConfigurationPtr stop_conditions_config(config->createView("stop_conditions"));
stop_conditions_template.loadFromConfig(stop_conditions_config);
}
if (stop_conditions_template.empty())
throw Exception("No termination conditions were found in config",
ErrorCodes::BAD_ARGUMENTS);
times_to_run = config->getUInt("times_to_run", 1);
for (size_t i = 0; i < times_to_run * queries.size(); ++i)
stop_conditions_by_run.push_back(stop_conditions_template);
}
void PerformanceTestInfo::getMetrics(XMLConfigurationPtr config)
{
ConfigurationPtr metrics_view(config->createView("metrics"));
metrics_view->keys(metrics);
if (config->has("main_metric"))
{
Strings main_metrics;
config->keys("main_metric", main_metrics);
if (main_metrics.size())
main_metric = main_metrics[0];
}
if (!main_metric.empty())
{
if (std::find(metrics.begin(), metrics.end(), main_metric) == metrics.end())
metrics.push_back(main_metric);
}
else
{
if (metrics.empty())
throw Exception("You shoud specify at least one metric",
ErrorCodes::BAD_ARGUMENTS);
main_metric = metrics[0];
}
if (metrics.size() > 0)
checkMetricsInput(metrics, exec_type);
}
}

View File

@ -0,0 +1,55 @@
#pragma once
#include <string>
#include <vector>
#include <map>
#include <Interpreters/Settings.h>
#include <Poco/Util/XMLConfiguration.h>
#include <Poco/AutoPtr.h>
#include "StopConditionsSet.h"
#include "TestStopConditions.h"
#include "TestStats.h"
namespace DB
{
enum class ExecutionType
{
Loop,
Once
};
using XMLConfiguration = Poco::Util::XMLConfiguration;
using XMLConfigurationPtr = Poco::AutoPtr<XMLConfiguration>;
using StringToVector = std::map<std::string, Strings>;
/// Class containing all info to run performance test
class PerformanceTestInfo
{
public:
PerformanceTestInfo(XMLConfigurationPtr config, const std::string & profiles_file_);
std::string test_name;
std::string path;
std::string main_metric;
Strings queries;
Strings metrics;
Settings settings;
ExecutionType exec_type;
StringToVector substitutions;
size_t times_to_run;
std::string profiles_file;
std::vector<TestStopConditions> stop_conditions_by_run;
private:
void applySettings(XMLConfigurationPtr config);
void extractQueries(XMLConfigurationPtr config);
void processSubstitutions(XMLConfigurationPtr config);
void getExecutionType(XMLConfigurationPtr config);
void getStopConditions(XMLConfigurationPtr config);
void getMetrics(XMLConfigurationPtr config);
};
}

View File

@ -0,0 +1,382 @@
#include <algorithm>
#include <iostream>
#include <limits>
#include <regex>
#include <thread>
#include <memory>
#include <port/unistd.h>
#include <sys/stat.h>
#include <boost/filesystem.hpp>
#include <boost/program_options.hpp>
#include <Poco/Util/XMLConfiguration.h>
#include <Poco/Logger.h>
#include <Poco/ConsoleChannel.h>
#include <Poco/FormattingChannel.h>
#include <Poco/PatternFormatter.h>
#include <common/logger_useful.h>
#include <Client/Connection.h>
#include <Core/Types.h>
#include <Interpreters/Context.h>
#include <IO/ConnectionTimeouts.h>
#include <IO/UseSSL.h>
#include <Interpreters/Settings.h>
#include <Poco/AutoPtr.h>
#include <Common/Exception.h>
#include <Common/InterruptListener.h>
#include "TestStopConditions.h"
#include "TestStats.h"
#include "ConfigPreprocessor.h"
#include "PerformanceTest.h"
#include "ReportBuilder.h"
namespace fs = boost::filesystem;
namespace po = boost::program_options;
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
extern const int FILE_DOESNT_EXIST;
}
/** Tests launcher for ClickHouse.
* The tool walks through given or default folder in order to find files with
* tests' descriptions and launches it.
*/
class PerformanceTestSuite
{
public:
PerformanceTestSuite(const std::string & host_,
const UInt16 port_,
const bool secure_,
const std::string & default_database_,
const std::string & user_,
const std::string & password_,
const bool lite_output_,
const std::string & profiles_file_,
Strings && input_files_,
Strings && tests_tags_,
Strings && skip_tags_,
Strings && tests_names_,
Strings && skip_names_,
Strings && tests_names_regexp_,
Strings && skip_names_regexp_,
const ConnectionTimeouts & timeouts)
: connection(host_, port_, default_database_, user_,
password_, timeouts, "performance-test", Protocol::Compression::Enable,
secure_ ? Protocol::Secure::Enable : Protocol::Secure::Disable)
, tests_tags(std::move(tests_tags_))
, tests_names(std::move(tests_names_))
, tests_names_regexp(std::move(tests_names_regexp_))
, skip_tags(std::move(skip_tags_))
, skip_names(std::move(skip_names_))
, skip_names_regexp(std::move(skip_names_regexp_))
, lite_output(lite_output_)
, profiles_file(profiles_file_)
, input_files(input_files_)
, log(&Poco::Logger::get("PerformanceTestSuite"))
{
if (input_files.size() < 1)
throw Exception("No tests were specified", ErrorCodes::BAD_ARGUMENTS);
}
/// This functionality seems strange.
//void initialize(Poco::Util::Application & self [[maybe_unused]])
//{
// std::string home_path;
// const char * home_path_cstr = getenv("HOME");
// if (home_path_cstr)
// home_path = home_path_cstr;
// configReadClient(Poco::Util::Application::instance().config(), home_path);
//}
int run()
{
std::string name;
UInt64 version_major;
UInt64 version_minor;
UInt64 version_patch;
UInt64 version_revision;
connection.getServerVersion(name, version_major, version_minor, version_patch, version_revision);
std::stringstream ss;
ss << version_major << "." << version_minor << "." << version_patch;
server_version = ss.str();
report_builder = std::make_shared<ReportBuilder>(server_version);
processTestsConfigurations(input_files);
return 0;
}
private:
Connection connection;
const Strings & tests_tags;
const Strings & tests_names;
const Strings & tests_names_regexp;
const Strings & skip_tags;
const Strings & skip_names;
const Strings & skip_names_regexp;
Context global_context = Context::createGlobal();
std::shared_ptr<ReportBuilder> report_builder;
std::string server_version;
InterruptListener interrupt_listener;
using XMLConfiguration = Poco::Util::XMLConfiguration;
using XMLConfigurationPtr = Poco::AutoPtr<XMLConfiguration>;
bool lite_output;
std::string profiles_file;
Strings input_files;
std::vector<XMLConfigurationPtr> tests_configurations;
Poco::Logger * log;
void processTestsConfigurations(const Strings & paths)
{
LOG_INFO(log, "Preparing test configurations");
ConfigPreprocessor config_prep(paths);
tests_configurations = config_prep.processConfig(
tests_tags,
tests_names,
tests_names_regexp,
skip_tags,
skip_names,
skip_names_regexp);
LOG_INFO(log, "Test configurations prepared");
if (tests_configurations.size())
{
Strings outputs;
for (auto & test_config : tests_configurations)
{
auto [output, signal] = runTest(test_config);
if (lite_output)
std::cout << output;
else
outputs.push_back(output);
if (signal)
break;
}
if (!lite_output && outputs.size())
{
std::cout << "[" << std::endl;
for (size_t i = 0; i != outputs.size(); ++i)
{
std::cout << outputs[i];
if (i != outputs.size() - 1)
std::cout << ",";
std::cout << std::endl;
}
std::cout << "]" << std::endl;
}
}
}
std::pair<std::string, bool> runTest(XMLConfigurationPtr & test_config)
{
PerformanceTestInfo info(test_config, profiles_file);
LOG_INFO(log, "Config for test '" << info.test_name << "' parsed");
PerformanceTest current(test_config, connection, interrupt_listener, info, global_context);
current.checkPreconditions();
LOG_INFO(log, "Preconditions for test '" << info.test_name << "' are fullfilled");
LOG_INFO(log, "Running test '" << info.test_name << "'");
auto result = current.execute();
LOG_INFO(log, "Test '" << info.test_name << "' finished");
if (lite_output)
return {report_builder->buildCompactReport(info, result), current.checkSIGINT()};
else
return {report_builder->buildFullReport(info, result), current.checkSIGINT()};
}
};
}
static void getFilesFromDir(const fs::path & dir, std::vector<std::string> & input_files, const bool recursive = false)
{
Poco::Logger * log = &Poco::Logger::get("PerformanceTestSuite");
if (dir.extension().string() == ".xml")
LOG_WARNING(log, dir.string() + "' is a directory, but has .xml extension");
fs::directory_iterator end;
for (fs::directory_iterator it(dir); it != end; ++it)
{
const fs::path file = (*it);
if (recursive && fs::is_directory(file))
getFilesFromDir(file, input_files, recursive);
else if (!fs::is_directory(file) && file.extension().string() == ".xml")
input_files.push_back(file.string());
}
}
static std::vector<std::string> getInputFiles(const po::variables_map & options, Poco::Logger * log)
{
std::vector<std::string> input_files;
bool recursive = options.count("recursive");
if (!options.count("input-files"))
{
LOG_INFO(log, "Trying to find test scenario files in the current folder...");
fs::path curr_dir(".");
getFilesFromDir(curr_dir, input_files, recursive);
if (input_files.empty())
throw DB::Exception("Did not find any xml files", DB::ErrorCodes::BAD_ARGUMENTS);
else
LOG_INFO(log, "Found " << input_files.size() << " files");
}
else
{
input_files = options["input-files"].as<std::vector<std::string>>();
LOG_INFO(log, "Found " + std::to_string(input_files.size()) + " input files");
std::vector<std::string> collected_files;
for (const std::string & filename : input_files)
{
fs::path file(filename);
if (!fs::exists(file))
throw DB::Exception("File '" + filename + "' does not exist", DB::ErrorCodes::FILE_DOESNT_EXIST);
if (fs::is_directory(file))
{
getFilesFromDir(file, collected_files, recursive);
}
else
{
if (file.extension().string() != ".xml")
throw DB::Exception("File '" + filename + "' does not have .xml extension", DB::ErrorCodes::BAD_ARGUMENTS);
collected_files.push_back(filename);
}
}
input_files = std::move(collected_files);
}
std::sort(input_files.begin(), input_files.end());
return input_files;
}
int mainEntryClickHousePerformanceTest(int argc, char ** argv)
try
{
using po::value;
using Strings = DB::Strings;
po::options_description desc("Allowed options");
desc.add_options()
("help", "produce help message")
("lite", "use lite version of output")
("profiles-file", value<std::string>()->default_value(""), "Specify a file with global profiles")
("host,h", value<std::string>()->default_value("localhost"), "")
("port", value<UInt16>()->default_value(9000), "")
("secure,s", "Use TLS connection")
("database", value<std::string>()->default_value("default"), "")
("user", value<std::string>()->default_value("default"), "")
("password", value<std::string>()->default_value(""), "")
("log-level", value<std::string>()->default_value("information"), "Set log level")
("tags", value<Strings>()->multitoken(), "Run only tests with tag")
("skip-tags", value<Strings>()->multitoken(), "Do not run tests with tag")
("names", value<Strings>()->multitoken(), "Run tests with specific name")
("skip-names", value<Strings>()->multitoken(), "Do not run tests with name")
("names-regexp", value<Strings>()->multitoken(), "Run tests with names matching regexp")
("skip-names-regexp", value<Strings>()->multitoken(), "Do not run tests with names matching regexp")
("recursive,r", "Recurse in directories to find all xml's");
/// These options will not be displayed in --help
po::options_description hidden("Hidden options");
hidden.add_options()
("input-files", value<std::vector<std::string>>(), "");
/// But they will be legit, though. And they must be given without name
po::positional_options_description positional;
positional.add("input-files", -1);
po::options_description cmdline_options;
cmdline_options.add(desc).add(hidden);
po::variables_map options;
po::store(
po::command_line_parser(argc, argv).
options(cmdline_options).positional(positional).run(), options);
po::notify(options);
Poco::AutoPtr<Poco::PatternFormatter> formatter(new Poco::PatternFormatter("%Y.%m.%d %H:%M:%S.%F <%p> %s: %t"));
Poco::AutoPtr<Poco::ConsoleChannel> console_chanel(new Poco::ConsoleChannel);
Poco::AutoPtr<Poco::FormattingChannel> channel(new Poco::FormattingChannel(formatter, console_chanel));
Poco::Logger::root().setLevel(options["log-level"].as<std::string>());
Poco::Logger::root().setChannel(channel);
Poco::Logger * log = &Poco::Logger::get("PerformanceTestSuite");
if (options.count("help"))
{
std::cout << "Usage: " << argv[0] << " [options] [test_file ...] [tests_folder]\n";
std::cout << desc << "\n";
return 0;
}
Strings input_files = getInputFiles(options, log);
Strings tests_tags = options.count("tags") ? options["tags"].as<Strings>() : Strings({});
Strings skip_tags = options.count("skip-tags") ? options["skip-tags"].as<Strings>() : Strings({});
Strings tests_names = options.count("names") ? options["names"].as<Strings>() : Strings({});
Strings skip_names = options.count("skip-names") ? options["skip-names"].as<Strings>() : Strings({});
Strings tests_names_regexp = options.count("names-regexp") ? options["names-regexp"].as<Strings>() : Strings({});
Strings skip_names_regexp = options.count("skip-names-regexp") ? options["skip-names-regexp"].as<Strings>() : Strings({});
auto timeouts = DB::ConnectionTimeouts::getTCPTimeoutsWithoutFailover(DB::Settings());
DB::UseSSL use_ssl;
DB::PerformanceTestSuite performance_test_suite(
options["host"].as<std::string>(),
options["port"].as<UInt16>(),
options.count("secure"),
options["database"].as<std::string>(),
options["user"].as<std::string>(),
options["password"].as<std::string>(),
options.count("lite") > 0,
options["profiles-file"].as<std::string>(),
std::move(input_files),
std::move(tests_tags),
std::move(skip_tags),
std::move(tests_names),
std::move(skip_names),
std::move(tests_names_regexp),
std::move(skip_names_regexp),
timeouts);
return performance_test_suite.run();
}
catch (...)
{
std::cout << DB::getCurrentExceptionMessage(/*with stacktrace = */ true) << std::endl;
int code = DB::getCurrentExceptionCode();
return code ? code : 1;
}

View File

@ -0,0 +1,196 @@
#include "ReportBuilder.h"
#include <algorithm>
#include <regex>
#include <sstream>
#include <thread>
#include <Common/getNumberOfPhysicalCPUCores.h>
#include <Common/getFQDNOrHostName.h>
#include <common/getMemoryAmount.h>
#include "JSONString.h"
namespace DB
{
namespace
{
const std::regex QUOTE_REGEX{"\""};
}
ReportBuilder::ReportBuilder(const std::string & server_version_)
: server_version(server_version_)
, hostname(getFQDNOrHostName())
, num_cores(getNumberOfPhysicalCPUCores())
, num_threads(std::thread::hardware_concurrency())
, ram(getMemoryAmount())
{
}
std::string ReportBuilder::getCurrentTime() const
{
return DateLUT::instance().timeToString(time(nullptr));
}
std::string ReportBuilder::buildFullReport(
const PerformanceTestInfo & test_info,
std::vector<TestStats> & stats) const
{
JSONString json_output;
json_output.set("hostname", hostname);
json_output.set("num_cores", num_cores);
json_output.set("num_threads", num_threads);
json_output.set("ram", ram);
json_output.set("server_version", server_version);
json_output.set("time", getCurrentTime());
json_output.set("test_name", test_info.test_name);
json_output.set("path", test_info.path);
json_output.set("main_metric", test_info.main_metric);
auto has_metric = [&test_info] (const std::string & metric_name)
{
return std::find(test_info.metrics.begin(),
test_info.metrics.end(), metric_name) != test_info.metrics.end();
};
if (test_info.substitutions.size())
{
JSONString json_parameters(2); /// here, 2 is the size of \t padding
for (auto it = test_info.substitutions.begin(); it != test_info.substitutions.end(); ++it)
{
std::string parameter = it->first;
Strings values = it->second;
std::ostringstream array_string;
array_string << "[";
for (size_t i = 0; i != values.size(); ++i)
{
array_string << '"' << std::regex_replace(values[i], QUOTE_REGEX, "\\\"") << '"';
if (i != values.size() - 1)
{
array_string << ", ";
}
}
array_string << ']';
json_parameters.set(parameter, array_string.str());
}
json_output.set("parameters", json_parameters.asString());
}
std::vector<JSONString> run_infos;
for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index)
{
for (size_t number_of_launch = 0; number_of_launch < test_info.times_to_run; ++number_of_launch)
{
size_t stat_index = number_of_launch * test_info.queries.size() + query_index;
TestStats & statistics = stats[stat_index];
if (!statistics.ready)
continue;
JSONString runJSON;
auto query = std::regex_replace(test_info.queries[query_index], QUOTE_REGEX, "\\\"");
runJSON.set("query", query);
if (!statistics.exception.empty())
runJSON.set("exception", statistics.exception);
if (test_info.exec_type == ExecutionType::Loop)
{
/// in seconds
if (has_metric("min_time"))
runJSON.set("min_time", statistics.min_time / double(1000));
if (has_metric("quantiles"))
{
JSONString quantiles(4); /// here, 4 is the size of \t padding
for (double percent = 10; percent <= 90; percent += 10)
{
std::string quantile_key = std::to_string(percent / 100.0);
while (quantile_key.back() == '0')
quantile_key.pop_back();
quantiles.set(quantile_key,
statistics.sampler.quantileInterpolated(percent / 100.0));
}
quantiles.set("0.95",
statistics.sampler.quantileInterpolated(95 / 100.0));
quantiles.set("0.99",
statistics.sampler.quantileInterpolated(99 / 100.0));
quantiles.set("0.999",
statistics.sampler.quantileInterpolated(99.9 / 100.0));
quantiles.set("0.9999",
statistics.sampler.quantileInterpolated(99.99 / 100.0));
runJSON.set("quantiles", quantiles.asString());
}
if (has_metric("total_time"))
runJSON.set("total_time", statistics.total_time);
if (has_metric("queries_per_second"))
runJSON.set("queries_per_second",
double(statistics.queries) / statistics.total_time);
if (has_metric("rows_per_second"))
runJSON.set("rows_per_second",
double(statistics.total_rows_read) / statistics.total_time);
if (has_metric("bytes_per_second"))
runJSON.set("bytes_per_second",
double(statistics.total_bytes_read) / statistics.total_time);
}
else
{
if (has_metric("max_rows_per_second"))
runJSON.set("max_rows_per_second", statistics.max_rows_speed);
if (has_metric("max_bytes_per_second"))
runJSON.set("max_bytes_per_second", statistics.max_bytes_speed);
if (has_metric("avg_rows_per_second"))
runJSON.set("avg_rows_per_second", statistics.avg_rows_speed_value);
if (has_metric("avg_bytes_per_second"))
runJSON.set("avg_bytes_per_second", statistics.avg_bytes_speed_value);
}
run_infos.push_back(runJSON);
}
}
json_output.set("runs", run_infos);
return json_output.asString();
}
std::string ReportBuilder::buildCompactReport(
const PerformanceTestInfo & test_info,
std::vector<TestStats> & stats) const
{
std::ostringstream output;
for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index)
{
for (size_t number_of_launch = 0; number_of_launch < test_info.times_to_run; ++number_of_launch)
{
if (test_info.queries.size() > 1)
output << "query \"" << test_info.queries[query_index] << "\", ";
output << "run " << std::to_string(number_of_launch + 1) << ": ";
output << test_info.main_metric << " = ";
size_t index = number_of_launch * test_info.queries.size() + query_index;
output << stats[index].getStatisticByName(test_info.main_metric);
output << "\n";
}
}
return output.str();
}
}

View File

@ -0,0 +1,32 @@
#pragma once
#include "PerformanceTestInfo.h"
#include <vector>
#include <string>
namespace DB
{
class ReportBuilder
{
public:
explicit ReportBuilder(const std::string & server_version_);
std::string buildFullReport(
const PerformanceTestInfo & test_info,
std::vector<TestStats> & stats) const;
std::string buildCompactReport(
const PerformanceTestInfo & test_info,
std::vector<TestStats> & stats) const;
private:
std::string server_version;
std::string hostname;
size_t num_cores;
size_t num_threads;
size_t ram;
private:
std::string getCurrentTime() const;
};
}

View File

@ -0,0 +1,63 @@
#include "StopConditionsSet.h"
#include <Common/Exception.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
void StopConditionsSet::loadFromConfig(const ConfigurationPtr & stop_conditions_view)
{
Strings keys;
stop_conditions_view->keys(keys);
for (const std::string & key : keys)
{
if (key == "total_time_ms")
total_time_ms.value = stop_conditions_view->getUInt64(key);
else if (key == "rows_read")
rows_read.value = stop_conditions_view->getUInt64(key);
else if (key == "bytes_read_uncompressed")
bytes_read_uncompressed.value = stop_conditions_view->getUInt64(key);
else if (key == "iterations")
iterations.value = stop_conditions_view->getUInt64(key);
else if (key == "min_time_not_changing_for_ms")
min_time_not_changing_for_ms.value = stop_conditions_view->getUInt64(key);
else if (key == "max_speed_not_changing_for_ms")
max_speed_not_changing_for_ms.value = stop_conditions_view->getUInt64(key);
else if (key == "average_speed_not_changing_for_ms")
average_speed_not_changing_for_ms.value = stop_conditions_view->getUInt64(key);
else
throw Exception("Met unkown stop condition: " + key, ErrorCodes::LOGICAL_ERROR);
}
++initialized_count;
}
void StopConditionsSet::reset()
{
total_time_ms.fulfilled = false;
rows_read.fulfilled = false;
bytes_read_uncompressed.fulfilled = false;
iterations.fulfilled = false;
min_time_not_changing_for_ms.fulfilled = false;
max_speed_not_changing_for_ms.fulfilled = false;
average_speed_not_changing_for_ms.fulfilled = false;
fulfilled_count = 0;
}
void StopConditionsSet::report(UInt64 value, StopConditionsSet::StopCondition & condition)
{
if (condition.value && !condition.fulfilled && value >= condition.value)
{
condition.fulfilled = true;
++fulfilled_count;
}
}
}

View File

@ -0,0 +1,39 @@
#pragma once
#include <Core/Types.h>
#include <Poco/Util/XMLConfiguration.h>
namespace DB
{
using ConfigurationPtr = Poco::AutoPtr<Poco::Util::AbstractConfiguration>;
/// A set of supported stop conditions.
struct StopConditionsSet
{
void loadFromConfig(const ConfigurationPtr & stop_conditions_view);
void reset();
/// Note: only conditions with UInt64 minimal thresholds are supported.
/// I.e. condition is fulfilled when value is exceeded.
struct StopCondition
{
UInt64 value = 0;
bool fulfilled = false;
};
void report(UInt64 value, StopCondition & condition);
StopCondition total_time_ms;
StopCondition rows_read;
StopCondition bytes_read_uncompressed;
StopCondition iterations;
StopCondition min_time_not_changing_for_ms;
StopCondition max_speed_not_changing_for_ms;
StopCondition average_speed_not_changing_for_ms;
size_t initialized_count = 0;
size_t fulfilled_count = 0;
};
}

View File

@ -0,0 +1,165 @@
#include "TestStats.h"
namespace DB
{
namespace
{
const std::string FOUR_SPACES = " ";
}
std::string TestStats::getStatisticByName(const std::string & statistic_name)
{
if (statistic_name == "min_time")
return std::to_string(min_time) + "ms";
if (statistic_name == "quantiles")
{
std::string result = "\n";
for (double percent = 10; percent <= 90; percent += 10)
{
result += FOUR_SPACES + std::to_string((percent / 100));
result += ": " + std::to_string(sampler.quantileInterpolated(percent / 100.0));
result += "\n";
}
result += FOUR_SPACES + "0.95: " + std::to_string(sampler.quantileInterpolated(95 / 100.0)) + "\n";
result += FOUR_SPACES + "0.99: " + std::to_string(sampler.quantileInterpolated(99 / 100.0)) + "\n";
result += FOUR_SPACES + "0.999: " + std::to_string(sampler.quantileInterpolated(99.9 / 100.)) + "\n";
result += FOUR_SPACES + "0.9999: " + std::to_string(sampler.quantileInterpolated(99.99 / 100.));
return result;
}
if (statistic_name == "total_time")
return std::to_string(total_time) + "s";
if (statistic_name == "queries_per_second")
return std::to_string(queries / total_time);
if (statistic_name == "rows_per_second")
return std::to_string(total_rows_read / total_time);
if (statistic_name == "bytes_per_second")
return std::to_string(total_bytes_read / total_time);
if (statistic_name == "max_rows_per_second")
return std::to_string(max_rows_speed);
if (statistic_name == "max_bytes_per_second")
return std::to_string(max_bytes_speed);
if (statistic_name == "avg_rows_per_second")
return std::to_string(avg_rows_speed_value);
if (statistic_name == "avg_bytes_per_second")
return std::to_string(avg_bytes_speed_value);
return "";
}
void TestStats::update_min_time(UInt64 min_time_candidate)
{
if (min_time_candidate < min_time)
{
min_time = min_time_candidate;
min_time_watch.restart();
}
}
void TestStats::update_max_speed(
size_t max_speed_candidate,
Stopwatch & max_speed_watch,
UInt64 & max_speed)
{
if (max_speed_candidate > max_speed)
{
max_speed = max_speed_candidate;
max_speed_watch.restart();
}
}
void TestStats::update_average_speed(
double new_speed_info,
Stopwatch & avg_speed_watch,
size_t & number_of_info_batches,
double precision,
double & avg_speed_first,
double & avg_speed_value)
{
avg_speed_value = ((avg_speed_value * number_of_info_batches) + new_speed_info);
++number_of_info_batches;
avg_speed_value /= number_of_info_batches;
if (avg_speed_first == 0)
{
avg_speed_first = avg_speed_value;
}
if (std::abs(avg_speed_value - avg_speed_first) >= precision)
{
avg_speed_first = avg_speed_value;
avg_speed_watch.restart();
}
}
void TestStats::add(size_t rows_read_inc, size_t bytes_read_inc)
{
total_rows_read += rows_read_inc;
total_bytes_read += bytes_read_inc;
last_query_rows_read += rows_read_inc;
last_query_bytes_read += bytes_read_inc;
double new_rows_speed = last_query_rows_read / watch_per_query.elapsedSeconds();
double new_bytes_speed = last_query_bytes_read / watch_per_query.elapsedSeconds();
/// Update rows speed
update_max_speed(new_rows_speed, max_rows_speed_watch, max_rows_speed);
update_average_speed(new_rows_speed,
avg_rows_speed_watch,
number_of_rows_speed_info_batches,
avg_rows_speed_precision,
avg_rows_speed_first,
avg_rows_speed_value);
/// Update bytes speed
update_max_speed(new_bytes_speed, max_bytes_speed_watch, max_bytes_speed);
update_average_speed(new_bytes_speed,
avg_bytes_speed_watch,
number_of_bytes_speed_info_batches,
avg_bytes_speed_precision,
avg_bytes_speed_first,
avg_bytes_speed_value);
}
void TestStats::updateQueryInfo()
{
++queries;
sampler.insert(watch_per_query.elapsedSeconds());
update_min_time(watch_per_query.elapsed() / (1000 * 1000)); /// ns to ms
}
TestStats::TestStats()
{
watch.reset();
watch_per_query.reset();
min_time_watch.reset();
max_rows_speed_watch.reset();
max_bytes_speed_watch.reset();
avg_rows_speed_watch.reset();
avg_bytes_speed_watch.reset();
}
void TestStats::startWatches()
{
watch.start();
watch_per_query.start();
min_time_watch.start();
max_rows_speed_watch.start();
max_bytes_speed_watch.start();
avg_rows_speed_watch.start();
avg_bytes_speed_watch.start();
}
}

View File

@ -0,0 +1,87 @@
#pragma once
#include <Core/Types.h>
#include <limits>
#include <Common/Stopwatch.h>
#include <AggregateFunctions/ReservoirSampler.h>
namespace DB
{
struct TestStats
{
TestStats();
Stopwatch watch;
Stopwatch watch_per_query;
Stopwatch min_time_watch;
Stopwatch max_rows_speed_watch;
Stopwatch max_bytes_speed_watch;
Stopwatch avg_rows_speed_watch;
Stopwatch avg_bytes_speed_watch;
bool last_query_was_cancelled = false;
size_t queries = 0;
size_t total_rows_read = 0;
size_t total_bytes_read = 0;
size_t last_query_rows_read = 0;
size_t last_query_bytes_read = 0;
using Sampler = ReservoirSampler<double>;
Sampler sampler{1 << 16};
/// min_time in ms
UInt64 min_time = std::numeric_limits<UInt64>::max();
double total_time = 0;
UInt64 max_rows_speed = 0;
UInt64 max_bytes_speed = 0;
double avg_rows_speed_value = 0;
double avg_rows_speed_first = 0;
static inline double avg_rows_speed_precision = 0.001;
double avg_bytes_speed_value = 0;
double avg_bytes_speed_first = 0;
static inline double avg_bytes_speed_precision = 0.001;
size_t number_of_rows_speed_info_batches = 0;
size_t number_of_bytes_speed_info_batches = 0;
bool ready = false; // check if a query wasn't interrupted by SIGINT
std::string exception;
/// Hack, actually this field doesn't required for statistics
bool got_SIGINT = false;
std::string getStatisticByName(const std::string & statistic_name);
void update_min_time(UInt64 min_time_candidate);
void update_average_speed(
double new_speed_info,
Stopwatch & avg_speed_watch,
size_t & number_of_info_batches,
double precision,
double & avg_speed_first,
double & avg_speed_value);
void update_max_speed(
size_t max_speed_candidate,
Stopwatch & max_speed_watch,
UInt64 & max_speed);
void add(size_t rows_read_inc, size_t bytes_read_inc);
void updateQueryInfo();
void setTotalTime()
{
total_time = watch.elapsedSeconds();
}
void startWatches();
};
}

View File

@ -0,0 +1,38 @@
#include "TestStopConditions.h"
namespace DB
{
void TestStopConditions::loadFromConfig(ConfigurationPtr & stop_conditions_config)
{
if (stop_conditions_config->has("all_of"))
{
ConfigurationPtr config_all_of(stop_conditions_config->createView("all_of"));
conditions_all_of.loadFromConfig(config_all_of);
}
if (stop_conditions_config->has("any_of"))
{
ConfigurationPtr config_any_of(stop_conditions_config->createView("any_of"));
conditions_any_of.loadFromConfig(config_any_of);
}
}
bool TestStopConditions::areFulfilled() const
{
return (conditions_all_of.initialized_count && conditions_all_of.fulfilled_count >= conditions_all_of.initialized_count)
|| (conditions_any_of.initialized_count && conditions_any_of.fulfilled_count);
}
UInt64 TestStopConditions::getMaxExecTime() const
{
UInt64 all_of_time = conditions_all_of.total_time_ms.value;
if (all_of_time == 0 && conditions_all_of.initialized_count != 0) /// max time is not set in all conditions
return 0;
else if(all_of_time != 0 && conditions_all_of.initialized_count > 1) /// max time is set, but we have other conditions
return 0;
UInt64 any_of_time = conditions_any_of.total_time_ms.value;
return std::max(all_of_time, any_of_time);
}
}

View File

@ -0,0 +1,57 @@
#pragma once
#include "StopConditionsSet.h"
#include <Poco/Util/XMLConfiguration.h>
namespace DB
{
/// Stop conditions for a test run. The running test will be terminated in either of two conditions:
/// 1. All conditions marked 'all_of' are fulfilled
/// or
/// 2. Any condition marked 'any_of' is fulfilled
using ConfigurationPtr = Poco::AutoPtr<Poco::Util::AbstractConfiguration>;
class TestStopConditions
{
public:
void loadFromConfig(ConfigurationPtr & stop_conditions_config);
inline bool empty() const
{
return !conditions_all_of.initialized_count && !conditions_any_of.initialized_count;
}
#define DEFINE_REPORT_FUNC(FUNC_NAME, CONDITION) \
void FUNC_NAME(UInt64 value) \
{ \
conditions_all_of.report(value, conditions_all_of.CONDITION); \
conditions_any_of.report(value, conditions_any_of.CONDITION); \
}
DEFINE_REPORT_FUNC(reportTotalTime, total_time_ms)
DEFINE_REPORT_FUNC(reportRowsRead, rows_read)
DEFINE_REPORT_FUNC(reportBytesReadUncompressed, bytes_read_uncompressed)
DEFINE_REPORT_FUNC(reportIterations, iterations)
DEFINE_REPORT_FUNC(reportMinTimeNotChangingFor, min_time_not_changing_for_ms)
DEFINE_REPORT_FUNC(reportMaxSpeedNotChangingFor, max_speed_not_changing_for_ms)
DEFINE_REPORT_FUNC(reportAverageSpeedNotChangingFor, average_speed_not_changing_for_ms)
#undef REPORT
bool areFulfilled() const;
void reset()
{
conditions_all_of.reset();
conditions_any_of.reset();
}
/// Return max exec time for these conditions
/// Return zero if max time cannot be determined
UInt64 getMaxExecTime() const;
private:
StopConditionsSet conditions_all_of;
StopConditionsSet conditions_any_of;
};
}

View File

@ -0,0 +1,82 @@
#include "applySubstitutions.h"
#include <algorithm>
#include <vector>
namespace DB
{
void constructSubstitutions(ConfigurationPtr & substitutions_view, StringToVector & out_substitutions)
{
Strings xml_substitutions;
substitutions_view->keys(xml_substitutions);
for (size_t i = 0; i != xml_substitutions.size(); ++i)
{
const ConfigurationPtr xml_substitution(substitutions_view->createView("substitution[" + std::to_string(i) + "]"));
/// Property values for substitution will be stored in a vector
/// accessible by property name
Strings xml_values;
xml_substitution->keys("values", xml_values);
std::string name = xml_substitution->getString("name");
for (size_t j = 0; j != xml_values.size(); ++j)
{
out_substitutions[name].push_back(xml_substitution->getString("values.value[" + std::to_string(j) + "]"));
}
}
}
/// Recursive method which goes through all substitution blocks in xml
/// and replaces property {names} by their values
void runThroughAllOptionsAndPush(StringToVector::iterator substitutions_left,
StringToVector::iterator substitutions_right,
const std::string & template_query,
Strings & out_queries)
{
if (substitutions_left == substitutions_right)
{
out_queries.push_back(template_query); /// completely substituted query
return;
}
std::string substitution_mask = "{" + substitutions_left->first + "}";
if (template_query.find(substitution_mask) == std::string::npos) /// nothing to substitute here
{
runThroughAllOptionsAndPush(std::next(substitutions_left), substitutions_right, template_query, out_queries);
return;
}
for (const std::string & value : substitutions_left->second)
{
/// Copy query string for each unique permutation
std::string query = template_query;
size_t substr_pos = 0;
while (substr_pos != std::string::npos)
{
substr_pos = query.find(substitution_mask);
if (substr_pos != std::string::npos)
query.replace(substr_pos, substitution_mask.length(), value);
}
runThroughAllOptionsAndPush(std::next(substitutions_left), substitutions_right, query, out_queries);
}
}
Strings formatQueries(const std::string & query, StringToVector substitutions_to_generate)
{
Strings queries_res;
runThroughAllOptionsAndPush(
substitutions_to_generate.begin(),
substitutions_to_generate.end(),
query,
queries_res);
return queries_res;
}
}

View File

@ -0,0 +1,19 @@
#pragma once
#include <Poco/Util/XMLConfiguration.h>
#include <Core/Types.h>
#include <vector>
#include <string>
#include <map>
namespace DB
{
using StringToVector = std::map<std::string, Strings>;
using ConfigurationPtr = Poco::AutoPtr<Poco::Util::AbstractConfiguration>;
void constructSubstitutions(ConfigurationPtr & substitutions_view, StringToVector & out_substitutions);
Strings formatQueries(const std::string & query, StringToVector substitutions_to_generate);
}

View File

@ -0,0 +1,73 @@
#include "executeQuery.h"
#include <IO/Progress.h>
#include <DataStreams/RemoteBlockInputStream.h>
#include <Core/Block.h>
namespace DB
{
namespace
{
void checkFulfilledConditionsAndUpdate(
const Progress & progress, RemoteBlockInputStream & stream,
TestStats & statistics, TestStopConditions & stop_conditions,
InterruptListener & interrupt_listener)
{
statistics.add(progress.rows, progress.bytes);
stop_conditions.reportRowsRead(statistics.total_rows_read);
stop_conditions.reportBytesReadUncompressed(statistics.total_bytes_read);
stop_conditions.reportTotalTime(statistics.watch.elapsed() / (1000 * 1000));
stop_conditions.reportMinTimeNotChangingFor(statistics.min_time_watch.elapsed() / (1000 * 1000));
stop_conditions.reportMaxSpeedNotChangingFor(statistics.max_rows_speed_watch.elapsed() / (1000 * 1000));
stop_conditions.reportAverageSpeedNotChangingFor(statistics.avg_rows_speed_watch.elapsed() / (1000 * 1000));
if (stop_conditions.areFulfilled())
{
statistics.last_query_was_cancelled = true;
stream.cancel(false);
}
if (interrupt_listener.check())
{
statistics.got_SIGINT = true;
statistics.last_query_was_cancelled = true;
stream.cancel(false);
}
}
}
void executeQuery(
Connection & connection,
const std::string & query,
TestStats & statistics,
TestStopConditions & stop_conditions,
InterruptListener & interrupt_listener,
Context & context)
{
statistics.watch_per_query.restart();
statistics.last_query_was_cancelled = false;
statistics.last_query_rows_read = 0;
statistics.last_query_bytes_read = 0;
Settings settings;
RemoteBlockInputStream stream(connection, query, {}, context, &settings);
stream.setProgressCallback(
[&](const Progress & value)
{
checkFulfilledConditionsAndUpdate(
value, stream, statistics,
stop_conditions, interrupt_listener);
});
stream.readPrefix();
while (Block block = stream.read());
stream.readSuffix();
if (!statistics.last_query_was_cancelled)
statistics.updateQueryInfo();
statistics.setTotalTime();
}
}

View File

@ -0,0 +1,18 @@
#pragma once
#include <string>
#include "TestStats.h"
#include "TestStopConditions.h"
#include <Common/InterruptListener.h>
#include <Interpreters/Context.h>
#include <Client/Connection.h>
namespace DB
{
void executeQuery(
Connection & connection,
const std::string & query,
TestStats & statistics,
TestStopConditions & stop_conditions,
InterruptListener & interrupt_listener,
Context & context);
}

View File

@ -1,16 +1,8 @@
<yandex> <yandex>
<zookeeper> <!-- <zookeeper>
<node> <node>
<host>localhost</host> <host>localhost</host>
<port>2181</port> <port>2181</port>
</node> </node>
<node> </zookeeper>-->
<host>yandex.ru</host>
<port>2181</port>
</node>
<node>
<host>111.0.1.2</host>
<port>2181</port>
</node>
</zookeeper>
</yandex> </yandex>

View File

@ -1,9 +1,9 @@
#include "SharedLibrary.h" #include "SharedLibrary.h"
#include <string> #include <string>
#include <dlfcn.h>
#include <boost/core/noncopyable.hpp> #include <boost/core/noncopyable.hpp>
#include "Exception.h" #include "Exception.h"
namespace DB namespace DB
{ {
namespace ErrorCodes namespace ErrorCodes
@ -12,9 +12,9 @@ namespace ErrorCodes
extern const int CANNOT_DLSYM; extern const int CANNOT_DLSYM;
} }
SharedLibrary::SharedLibrary(const std::string & path) SharedLibrary::SharedLibrary(const std::string & path, int flags)
{ {
handle = dlopen(path.c_str(), RTLD_LAZY); handle = dlopen(path.c_str(), flags);
if (!handle) if (!handle)
throw Exception(std::string("Cannot dlopen: ") + dlerror(), ErrorCodes::CANNOT_DLOPEN); throw Exception(std::string("Cannot dlopen: ") + dlerror(), ErrorCodes::CANNOT_DLOPEN);
} }

View File

@ -1,5 +1,6 @@
#pragma once #pragma once
#include <dlfcn.h>
#include <memory> #include <memory>
#include <string> #include <string>
#include <boost/noncopyable.hpp> #include <boost/noncopyable.hpp>
@ -8,12 +9,12 @@
namespace DB namespace DB
{ {
/** Allows you to open a dynamic library and get a pointer to a function from it. /** Allows you to open a dynamic library and get a pointer to a function from it.
*/ */
class SharedLibrary : private boost::noncopyable class SharedLibrary : private boost::noncopyable
{ {
public: public:
explicit SharedLibrary(const std::string & path); explicit SharedLibrary(const std::string & path, int flags = RTLD_LAZY);
~SharedLibrary(); ~SharedLibrary();

View File

@ -262,13 +262,7 @@ struct ODBCBridgeMixin
std::vector<std::string> cmd_args; std::vector<std::string> cmd_args;
path.setFileName( path.setFileName("clickhouse-odbc-bridge");
#if CLICKHOUSE_SPLIT_BINARY
"clickhouse-odbc-bridge"
#else
"clickhouse"
#endif
);
std::stringstream command; std::stringstream command;

View File

@ -20,7 +20,7 @@ void CachedCompressedReadBuffer::initInput()
if (!file_in) if (!file_in)
{ {
file_in = createReadBufferFromFileBase(path, estimated_size, aio_threshold, buf_size); file_in = createReadBufferFromFileBase(path, estimated_size, aio_threshold, buf_size);
compressed_in = &*file_in; compressed_in = file_in.get();
if (profile_callback) if (profile_callback)
file_in->setProfileCallback(profile_callback, clock_type); file_in->setProfileCallback(profile_callback, clock_type);
@ -30,11 +30,12 @@ void CachedCompressedReadBuffer::initInput()
bool CachedCompressedReadBuffer::nextImpl() bool CachedCompressedReadBuffer::nextImpl()
{ {
/// Let's check for the presence of a decompressed block in the cache, grab the ownership of this block, if it exists. /// Let's check for the presence of a decompressed block in the cache, grab the ownership of this block, if it exists.
UInt128 key = cache->hash(path, file_pos); UInt128 key = cache->hash(path, file_pos);
owned_cell = cache->get(key); owned_cell = cache->get(key);
if (!owned_cell) if (!owned_cell || !codec)
{ {
/// If not, read it from the file. /// If not, read it from the file.
initInput(); initInput();
@ -42,7 +43,6 @@ bool CachedCompressedReadBuffer::nextImpl()
owned_cell = std::make_shared<UncompressedCacheCell>(); owned_cell = std::make_shared<UncompressedCacheCell>();
size_t size_decompressed; size_t size_decompressed;
size_t size_compressed_without_checksum; size_t size_compressed_without_checksum;
owned_cell->compressed_size = readCompressedData(size_decompressed, size_compressed_without_checksum); owned_cell->compressed_size = readCompressedData(size_decompressed, size_compressed_without_checksum);
@ -50,7 +50,7 @@ bool CachedCompressedReadBuffer::nextImpl()
if (owned_cell->compressed_size) if (owned_cell->compressed_size)
{ {
owned_cell->data.resize(size_decompressed + codec->getAdditionalSizeAtTheEndOfBuffer()); owned_cell->data.resize(size_decompressed + codec->getAdditionalSizeAtTheEndOfBuffer());
decompress(owned_cell->data.data(), size_decompressed, owned_cell->compressed_size); decompress(owned_cell->data.data(), size_decompressed, size_compressed_without_checksum);
/// Put data into cache. /// Put data into cache.
cache->set(key, owned_cell); cache->set(key, owned_cell);

View File

@ -23,7 +23,7 @@ bool CompressedReadBufferFromFile::nextImpl()
if (!size_compressed) if (!size_compressed)
return false; return false;
memory.resize(size_decompressed + LZ4::ADDITIONAL_BYTES_AT_END_OF_BUFFER); memory.resize(size_decompressed + codec->getAdditionalSizeAtTheEndOfBuffer());
working_buffer = Buffer(memory.data(), &memory[size_decompressed]); working_buffer = Buffer(memory.data(), &memory[size_decompressed]);
decompress(working_buffer.begin(), size_decompressed, size_compressed_without_checksum); decompress(working_buffer.begin(), size_decompressed, size_compressed_without_checksum);
@ -91,7 +91,7 @@ size_t CompressedReadBufferFromFile::readBig(char * to, size_t n)
return bytes_read; return bytes_read;
/// If the decompressed block fits entirely where it needs to be copied. /// If the decompressed block fits entirely where it needs to be copied.
if (size_decompressed + LZ4::ADDITIONAL_BYTES_AT_END_OF_BUFFER <= n - bytes_read) if (size_decompressed + codec->getAdditionalSizeAtTheEndOfBuffer() <= n - bytes_read)
{ {
decompress(to + bytes_read, size_decompressed, size_compressed_without_checksum); decompress(to + bytes_read, size_decompressed, size_compressed_without_checksum);
bytes_read += size_decompressed; bytes_read += size_decompressed;
@ -101,7 +101,7 @@ size_t CompressedReadBufferFromFile::readBig(char * to, size_t n)
{ {
size_compressed = new_size_compressed; size_compressed = new_size_compressed;
bytes += offset(); bytes += offset();
memory.resize(size_decompressed + LZ4::ADDITIONAL_BYTES_AT_END_OF_BUFFER); memory.resize(size_decompressed + codec->getAdditionalSizeAtTheEndOfBuffer());
working_buffer = Buffer(memory.data(), &memory[size_decompressed]); working_buffer = Buffer(memory.data(), &memory[size_decompressed]);
pos = working_buffer.begin(); pos = working_buffer.begin();

View File

@ -48,10 +48,24 @@ ColumnPtr recursiveRemoveLowCardinality(const ColumnPtr & column)
return column; return column;
if (const auto * column_array = typeid_cast<const ColumnArray *>(column.get())) if (const auto * column_array = typeid_cast<const ColumnArray *>(column.get()))
return ColumnArray::create(recursiveRemoveLowCardinality(column_array->getDataPtr()), column_array->getOffsetsPtr()); {
auto & data = column_array->getDataPtr();
auto data_no_lc = recursiveRemoveLowCardinality(data);
if (data.get() == data_no_lc.get())
return column;
return ColumnArray::create(data_no_lc, column_array->getOffsetsPtr());
}
if (const auto * column_const = typeid_cast<const ColumnConst *>(column.get())) if (const auto * column_const = typeid_cast<const ColumnConst *>(column.get()))
return ColumnConst::create(recursiveRemoveLowCardinality(column_const->getDataColumnPtr()), column_const->size()); {
auto & nested = column_const->getDataColumnPtr();
auto nested_no_lc = recursiveRemoveLowCardinality(nested);
if (nested.get() == nested_no_lc.get())
return column;
return ColumnConst::create(nested_no_lc, column_const->size());
}
if (const auto * column_tuple = typeid_cast<const ColumnTuple *>(column.get())) if (const auto * column_tuple = typeid_cast<const ColumnTuple *>(column.get()))
{ {
@ -76,8 +90,14 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data
return column; return column;
if (const auto * column_const = typeid_cast<const ColumnConst *>(column.get())) if (const auto * column_const = typeid_cast<const ColumnConst *>(column.get()))
return ColumnConst::create(recursiveLowCardinalityConversion(column_const->getDataColumnPtr(), from_type, to_type), {
column_const->size()); auto & nested = column_const->getDataColumnPtr();
auto nested_no_lc = recursiveLowCardinalityConversion(nested, from_type, to_type);
if (nested.get() == nested_no_lc.get())
return column;
return ColumnConst::create(nested_no_lc, column_const->size());
}
if (const auto * low_cardinality_type = typeid_cast<const DataTypeLowCardinality *>(from_type.get())) if (const auto * low_cardinality_type = typeid_cast<const DataTypeLowCardinality *>(from_type.get()))
{ {
@ -125,11 +145,23 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data
Columns columns = column_tuple->getColumns(); Columns columns = column_tuple->getColumns();
auto & from_elements = from_tuple_type->getElements(); auto & from_elements = from_tuple_type->getElements();
auto & to_elements = to_tuple_type->getElements(); auto & to_elements = to_tuple_type->getElements();
bool has_converted = false;
for (size_t i = 0; i < columns.size(); ++i) for (size_t i = 0; i < columns.size(); ++i)
{ {
auto & element = columns[i]; auto & element = columns[i];
element = recursiveLowCardinalityConversion(element, from_elements.at(i), to_elements.at(i)); auto element_no_lc = recursiveLowCardinalityConversion(element, from_elements.at(i), to_elements.at(i));
if (element.get() != element_no_lc.get())
{
element = element_no_lc;
has_converted = true;
}
} }
if (!has_converted)
return column;
return ColumnTuple::create(columns); return ColumnTuple::create(columns);
} }
} }

View File

@ -70,7 +70,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(
, query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks} , query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks}
, sample_block{sample_block} , sample_block{sample_block}
, context(context) , context(context)
, is_local{isLocalAddress({host, port}, config.getInt("tcp_port", 0))} , is_local{isLocalAddress({host, port}, context.getTCPPort())}
, pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)} , pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)}
, load_all_query{query_builder.composeLoadAllQuery()} , load_all_query{query_builder.composeLoadAllQuery()}
{ {

View File

@ -135,7 +135,11 @@ LibraryDictionarySource::LibraryDictionarySource(
"LibraryDictionarySource: Can't load lib " + toString() + ": " + Poco::File(path).path() + " - File doesn't exist", "LibraryDictionarySource: Can't load lib " + toString() + ": " + Poco::File(path).path() + " - File doesn't exist",
ErrorCodes::FILE_DOESNT_EXIST); ErrorCodes::FILE_DOESNT_EXIST);
description.init(sample_block); description.init(sample_block);
library = std::make_shared<SharedLibrary>(path); library = std::make_shared<SharedLibrary>(path, RTLD_LAZY
#if defined(RTLD_DEEPBIND) // Does not exists in freebsd
| RTLD_DEEPBIND
#endif
);
settings = std::make_shared<CStringsHolder>(getLibSettings(config, config_prefix + lib_config_settings)); settings = std::make_shared<CStringsHolder>(getLibSettings(config, config_prefix + lib_config_settings));
if (auto libNew = library->tryGet<decltype(lib_data) (*)(decltype(&settings->strings), decltype(&ClickHouseLibrary::log))>( if (auto libNew = library->tryGet<decltype(lib_data) (*)(decltype(&settings->strings), decltype(&ClickHouseLibrary::log))>(
"ClickHouseDictionary_v3_libNew")) "ClickHouseDictionary_v3_libNew"))

View File

@ -718,11 +718,12 @@ bool Aggregator::executeOnBlock(const Block & block, AggregatedDataVariants & re
materialized_columns.push_back(block.safeGetByPosition(params.keys[i]).column->convertToFullColumnIfConst()); materialized_columns.push_back(block.safeGetByPosition(params.keys[i]).column->convertToFullColumnIfConst());
key_columns[i] = materialized_columns.back().get(); key_columns[i] = materialized_columns.back().get();
if (const auto * low_cardinality_column = typeid_cast<const ColumnLowCardinality *>(key_columns[i])) if (!result.isLowCardinality())
{ {
if (!result.isLowCardinality()) auto column_no_lc = recursiveRemoveLowCardinality(key_columns[i]->getPtr());
if (column_no_lc.get() != key_columns[i])
{ {
materialized_columns.push_back(low_cardinality_column->convertToFullColumn()); materialized_columns.emplace_back(std::move(column_no_lc));
key_columns[i] = materialized_columns.back().get(); key_columns[i] = materialized_columns.back().get();
} }
} }
@ -738,9 +739,10 @@ bool Aggregator::executeOnBlock(const Block & block, AggregatedDataVariants & re
materialized_columns.push_back(block.safeGetByPosition(params.aggregates[i].arguments[j]).column->convertToFullColumnIfConst()); materialized_columns.push_back(block.safeGetByPosition(params.aggregates[i].arguments[j]).column->convertToFullColumnIfConst());
aggregate_columns[i][j] = materialized_columns.back().get(); aggregate_columns[i][j] = materialized_columns.back().get();
if (auto * col_low_cardinality = typeid_cast<const ColumnLowCardinality *>(aggregate_columns[i][j])) auto column_no_lc = recursiveRemoveLowCardinality(aggregate_columns[i][j]->getPtr());
if (column_no_lc.get() != aggregate_columns[i][j])
{ {
materialized_columns.push_back(col_low_cardinality->convertToFullColumn()); materialized_columns.emplace_back(std::move(column_no_lc));
aggregate_columns[i][j] = materialized_columns.back().get(); aggregate_columns[i][j] = materialized_columns.back().get();
} }
} }

View File

@ -16,8 +16,7 @@ namespace DB
ExpressionActionsPtr AnalyzedJoin::createJoinedBlockActions( ExpressionActionsPtr AnalyzedJoin::createJoinedBlockActions(
const JoinedColumnsList & columns_added_by_join, const JoinedColumnsList & columns_added_by_join,
const ASTSelectQuery * select_query_with_join, const ASTSelectQuery * select_query_with_join,
const Context & context, const Context & context) const
NameSet & required_columns_from_joined_table) const
{ {
if (!select_query_with_join) if (!select_query_with_join)
return nullptr; return nullptr;
@ -48,8 +47,14 @@ ExpressionActionsPtr AnalyzedJoin::createJoinedBlockActions(
ASTPtr query = expression_list; ASTPtr query = expression_list;
auto syntax_result = SyntaxAnalyzer(context).analyze(query, source_column_names, required_columns); auto syntax_result = SyntaxAnalyzer(context).analyze(query, source_column_names, required_columns);
ExpressionAnalyzer analyzer(query, syntax_result, context, {}, required_columns); ExpressionAnalyzer analyzer(query, syntax_result, context, {}, required_columns_set);
auto joined_block_actions = analyzer.getActions(false); return analyzer.getActions(false);
}
NameSet AnalyzedJoin::getRequiredColumnsFromJoinedTable(const JoinedColumnsList & columns_added_by_join,
const ExpressionActionsPtr & joined_block_actions) const
{
NameSet required_columns_from_joined_table;
auto required_action_columns = joined_block_actions->getRequiredColumns(); auto required_action_columns = joined_block_actions->getRequiredColumns();
required_columns_from_joined_table.insert(required_action_columns.begin(), required_action_columns.end()); required_columns_from_joined_table.insert(required_action_columns.begin(), required_action_columns.end());
@ -63,7 +68,7 @@ ExpressionActionsPtr AnalyzedJoin::createJoinedBlockActions(
if (!sample.has(column.name_and_type.name)) if (!sample.has(column.name_and_type.name))
required_columns_from_joined_table.insert(column.name_and_type.name); required_columns_from_joined_table.insert(column.name_and_type.name);
return joined_block_actions; return required_columns_from_joined_table;
} }
const JoinedColumnsList & AnalyzedJoin::getColumnsFromJoinedTable( const JoinedColumnsList & AnalyzedJoin::getColumnsFromJoinedTable(

View File

@ -64,9 +64,11 @@ struct AnalyzedJoin
ExpressionActionsPtr createJoinedBlockActions( ExpressionActionsPtr createJoinedBlockActions(
const JoinedColumnsList & columns_added_by_join, /// Subset of available_joined_columns. const JoinedColumnsList & columns_added_by_join, /// Subset of available_joined_columns.
const ASTSelectQuery * select_query_with_join, const ASTSelectQuery * select_query_with_join,
const Context & context, const Context & context) const;
NameSet & required_columns_from_joined_table /// Columns which will be used in query from joined table.
) const; /// Columns which will be used in query from joined table.
NameSet getRequiredColumnsFromJoinedTable(const JoinedColumnsList & columns_added_by_join,
const ExpressionActionsPtr & joined_block_actions) const;
const JoinedColumnsList & getColumnsFromJoinedTable(const NameSet & source_columns, const JoinedColumnsList & getColumnsFromJoinedTable(const NameSet & source_columns,
const Context & context, const Context & context,

View File

@ -160,15 +160,13 @@ ExpressionAction ExpressionAction::arrayJoin(const NameSet & array_joined_column
ExpressionAction ExpressionAction::ordinaryJoin( ExpressionAction ExpressionAction::ordinaryJoin(
std::shared_ptr<const Join> join_, std::shared_ptr<const Join> join_,
const Names & join_key_names_left, const Names & join_key_names_left,
const NamesAndTypesList & columns_added_by_join_, const NamesAndTypesList & columns_added_by_join_)
const NameSet & columns_added_by_join_from_right_keys_)
{ {
ExpressionAction a; ExpressionAction a;
a.type = JOIN; a.type = JOIN;
a.join = std::move(join_); a.join = std::move(join_);
a.join_key_names_left = join_key_names_left; a.join_key_names_left = join_key_names_left;
a.columns_added_by_join = columns_added_by_join_; a.columns_added_by_join = columns_added_by_join_;
a.columns_added_by_join_from_right_keys = columns_added_by_join_from_right_keys_;
return a; return a;
} }
@ -463,7 +461,7 @@ void ExpressionAction::execute(Block & block, bool dry_run) const
case JOIN: case JOIN:
{ {
join->joinBlock(block, join_key_names_left, columns_added_by_join_from_right_keys); join->joinBlock(block, join_key_names_left, columns_added_by_join);
break; break;
} }
@ -1115,7 +1113,8 @@ BlockInputStreamPtr ExpressionActions::createStreamWithNonJoinedDataIfFullOrRigh
{ {
for (const auto & action : actions) for (const auto & action : actions)
if (action.join && (action.join->getKind() == ASTTableJoin::Kind::Full || action.join->getKind() == ASTTableJoin::Kind::Right)) if (action.join && (action.join->getKind() == ASTTableJoin::Kind::Full || action.join->getKind() == ASTTableJoin::Kind::Right))
return action.join->createStreamWithNonJoinedRows(source_header, action.join_key_names_left, max_block_size); return action.join->createStreamWithNonJoinedRows(
source_header, action.join_key_names_left, action.columns_added_by_join, max_block_size);
return {}; return {};
} }

View File

@ -109,7 +109,6 @@ public:
std::shared_ptr<const Join> join; std::shared_ptr<const Join> join;
Names join_key_names_left; Names join_key_names_left;
NamesAndTypesList columns_added_by_join; NamesAndTypesList columns_added_by_join;
NameSet columns_added_by_join_from_right_keys;
/// For PROJECT. /// For PROJECT.
NamesWithAliases projection; NamesWithAliases projection;
@ -126,7 +125,7 @@ public:
static ExpressionAction addAliases(const NamesWithAliases & aliased_columns_); static ExpressionAction addAliases(const NamesWithAliases & aliased_columns_);
static ExpressionAction arrayJoin(const NameSet & array_joined_columns, bool array_join_is_left, const Context & context); static ExpressionAction arrayJoin(const NameSet & array_joined_columns, bool array_join_is_left, const Context & context);
static ExpressionAction ordinaryJoin(std::shared_ptr<const Join> join_, const Names & join_key_names_left, static ExpressionAction ordinaryJoin(std::shared_ptr<const Join> join_, const Names & join_key_names_left,
const NamesAndTypesList & columns_added_by_join_, const NameSet & columns_added_by_join_from_right_keys_); const NamesAndTypesList & columns_added_by_join_);
/// Which columns necessary to perform this action. /// Which columns necessary to perform this action.
Names getNeededColumns() const; Names getNeededColumns() const;

View File

@ -83,7 +83,7 @@ ExpressionAnalyzer::ExpressionAnalyzer(
const SyntaxAnalyzerResultPtr & syntax_analyzer_result_, const SyntaxAnalyzerResultPtr & syntax_analyzer_result_,
const Context & context_, const Context & context_,
const NamesAndTypesList & additional_source_columns, const NamesAndTypesList & additional_source_columns,
const Names & required_result_columns_, const NameSet & required_result_columns_,
size_t subquery_depth_, size_t subquery_depth_,
bool do_global_, bool do_global_,
const SubqueriesForSets & subqueries_for_sets_) const SubqueriesForSets & subqueries_for_sets_)
@ -504,13 +504,12 @@ void ExpressionAnalyzer::addJoinAction(ExpressionActionsPtr & actions, bool only
columns_added_by_join_list.push_back(joined_column.name_and_type); columns_added_by_join_list.push_back(joined_column.name_and_type);
if (only_types) if (only_types)
actions->add(ExpressionAction::ordinaryJoin(nullptr, analyzedJoin().key_names_left, actions->add(ExpressionAction::ordinaryJoin(nullptr, analyzedJoin().key_names_left, columns_added_by_join_list));
columns_added_by_join_list, columns_added_by_join_from_right_keys));
else else
for (auto & subquery_for_set : subqueries_for_sets) for (auto & subquery_for_set : subqueries_for_sets)
if (subquery_for_set.second.join) if (subquery_for_set.second.join)
actions->add(ExpressionAction::ordinaryJoin(subquery_for_set.second.join, analyzedJoin().key_names_left, actions->add(ExpressionAction::ordinaryJoin(subquery_for_set.second.join, analyzedJoin().key_names_left,
columns_added_by_join_list, columns_added_by_join_from_right_keys)); columns_added_by_join_list));
} }
bool ExpressionAnalyzer::appendJoin(ExpressionActionsChain & chain, bool only_types) bool ExpressionAnalyzer::appendJoin(ExpressionActionsChain & chain, bool only_types)
@ -851,8 +850,7 @@ void ExpressionAnalyzer::appendProjectResult(ExpressionActionsChain & chain) con
for (size_t i = 0; i < asts.size(); ++i) for (size_t i = 0; i < asts.size(); ++i)
{ {
String result_name = asts[i]->getAliasOrColumnName(); String result_name = asts[i]->getAliasOrColumnName();
if (required_result_columns.empty() if (required_result_columns.empty() || required_result_columns.count(result_name))
|| std::find(required_result_columns.begin(), required_result_columns.end(), result_name) != required_result_columns.end())
{ {
result_columns.emplace_back(asts[i]->getColumnName(), result_name); result_columns.emplace_back(asts[i]->getColumnName(), result_name);
step.required_output.push_back(result_columns.back().second); step.required_output.push_back(result_columns.back().second);
@ -1003,10 +1001,6 @@ void ExpressionAnalyzer::collectUsedColumns()
for (const auto & name : source_columns) for (const auto & name : source_columns)
avaliable_columns.insert(name.name); avaliable_columns.insert(name.name);
NameSet right_keys;
for (const auto & right_key_name : analyzed_join.key_names_right)
right_keys.insert(right_key_name);
/** You also need to ignore the identifiers of the columns that are obtained by JOIN. /** You also need to ignore the identifiers of the columns that are obtained by JOIN.
* (Do not assume that they are required for reading from the "left" table). * (Do not assume that they are required for reading from the "left" table).
*/ */
@ -1018,10 +1012,6 @@ void ExpressionAnalyzer::collectUsedColumns()
{ {
columns_added_by_join.push_back(joined_column); columns_added_by_join.push_back(joined_column);
required.erase(name); required.erase(name);
/// Some columns from right join key may be used in query. This columns will be appended to block during join.
if (right_keys.count(name))
columns_added_by_join_from_right_keys.insert(name);
} }
} }
@ -1057,8 +1047,6 @@ void ExpressionAnalyzer::collectUsedColumns()
if (cropped_name == name) if (cropped_name == name)
{ {
columns_added_by_join.push_back(joined_column); columns_added_by_join.push_back(joined_column);
if (right_keys.count(name))
columns_added_by_join_from_right_keys.insert(name);
collated = true; collated = true;
break; break;
} }
@ -1072,9 +1060,8 @@ void ExpressionAnalyzer::collectUsedColumns()
required.swap(fixed_required); required.swap(fixed_required);
} }
/// @note required_columns_from_joined_table is output joined_block_actions = analyzed_join.createJoinedBlockActions(columns_added_by_join, select_query, context);
joined_block_actions = analyzed_join.createJoinedBlockActions( required_columns_from_joined_table = analyzed_join.getRequiredColumnsFromJoinedTable(columns_added_by_join, joined_block_actions);
columns_added_by_join, select_query, context, required_columns_from_joined_table);
} }
if (columns_context.has_array_join) if (columns_context.has_array_join)

View File

@ -43,7 +43,7 @@ struct ExpressionAnalyzerData
NamesAndTypesList source_columns; NamesAndTypesList source_columns;
/// If non-empty, ignore all expressions in not from this list. /// If non-empty, ignore all expressions in not from this list.
Names required_result_columns; NameSet required_result_columns;
SubqueriesForSets subqueries_for_sets; SubqueriesForSets subqueries_for_sets;
PreparedSets prepared_sets; PreparedSets prepared_sets;
@ -73,13 +73,9 @@ struct ExpressionAnalyzerData
/// Columns which will be used in query from joined table. Duplicate names are qualified. /// Columns which will be used in query from joined table. Duplicate names are qualified.
NameSet required_columns_from_joined_table; NameSet required_columns_from_joined_table;
/// Such columns will be copied from left join keys during join.
/// Example: select right from tab1 join tab2 on left + 1 = right
NameSet columns_added_by_join_from_right_keys;
protected: protected:
ExpressionAnalyzerData(const NamesAndTypesList & source_columns_, ExpressionAnalyzerData(const NamesAndTypesList & source_columns_,
const Names & required_result_columns_, const NameSet & required_result_columns_,
const SubqueriesForSets & subqueries_for_sets_) const SubqueriesForSets & subqueries_for_sets_)
: source_columns(source_columns_), : source_columns(source_columns_),
required_result_columns(required_result_columns_), required_result_columns(required_result_columns_),
@ -136,7 +132,7 @@ public:
const SyntaxAnalyzerResultPtr & syntax_analyzer_result_, const SyntaxAnalyzerResultPtr & syntax_analyzer_result_,
const Context & context_, const Context & context_,
const NamesAndTypesList & additional_source_columns = {}, const NamesAndTypesList & additional_source_columns = {},
const Names & required_result_columns_ = {}, const NameSet & required_result_columns_ = {},
size_t subquery_depth_ = 0, size_t subquery_depth_ = 0,
bool do_global_ = false, bool do_global_ = false,
const SubqueriesForSets & subqueries_for_set_ = {}); const SubqueriesForSets & subqueries_for_set_ = {});

View File

@ -222,9 +222,9 @@ void ExternalLoader::reloadAndUpdate(bool throw_on_error)
} }
else else
{ {
tryLogCurrentException(log, "Cannot update " + object_name + " '" + name + "', leaving old version"); tryLogException(exception, log, "Cannot update " + object_name + " '" + name + "', leaving old version");
if (throw_on_error) if (throw_on_error)
throw; std::rethrow_exception(exception);
} }
} }
} }

View File

@ -195,7 +195,8 @@ InterpreterSelectQuery::InterpreterSelectQuery(
syntax_analyzer_result = SyntaxAnalyzer(context, subquery_depth).analyze( syntax_analyzer_result = SyntaxAnalyzer(context, subquery_depth).analyze(
query_ptr, source_header.getNamesAndTypesList(), required_result_column_names, storage); query_ptr, source_header.getNamesAndTypesList(), required_result_column_names, storage);
query_analyzer = std::make_unique<ExpressionAnalyzer>( query_analyzer = std::make_unique<ExpressionAnalyzer>(
query_ptr, syntax_analyzer_result, context, NamesAndTypesList(), required_result_column_names, subquery_depth, !only_analyze); query_ptr, syntax_analyzer_result, context, NamesAndTypesList(),
NameSet(required_result_column_names.begin(), required_result_column_names.end()), subquery_depth, !only_analyze);
if (!only_analyze) if (!only_analyze)
{ {

View File

@ -32,6 +32,23 @@ namespace ErrorCodes
extern const int ILLEGAL_COLUMN; extern const int ILLEGAL_COLUMN;
} }
static NameSet requiredRightKeys(const Names & key_names, const NamesAndTypesList & columns_added_by_join)
{
NameSet required;
NameSet right_keys;
for (const auto & name : key_names)
right_keys.insert(name);
for (const auto & column : columns_added_by_join)
{
if (right_keys.count(column.name))
required.insert(column.name);
}
return required;
}
Join::Join(const Names & key_names_right_, bool use_nulls_, const SizeLimits & limits, Join::Join(const Names & key_names_right_, bool use_nulls_, const SizeLimits & limits,
ASTTableJoin::Kind kind_, ASTTableJoin::Strictness strictness_, bool any_take_last_row_) ASTTableJoin::Kind kind_, ASTTableJoin::Strictness strictness_, bool any_take_last_row_)
@ -273,12 +290,18 @@ void Join::setSampleBlock(const Block & block)
size_t keys_size = key_names_right.size(); size_t keys_size = key_names_right.size();
ColumnRawPtrs key_columns(keys_size); ColumnRawPtrs key_columns(keys_size);
Columns materialized_columns(keys_size); Columns materialized_columns;
for (size_t i = 0; i < keys_size; ++i) for (size_t i = 0; i < keys_size; ++i)
{ {
materialized_columns[i] = recursiveRemoveLowCardinality(block.getByName(key_names_right[i]).column); auto & column = block.getByName(key_names_right[i]).column;
key_columns[i] = materialized_columns[i].get(); key_columns[i] = column.get();
auto column_no_lc = recursiveRemoveLowCardinality(column);
if (column.get() != column_no_lc.get())
{
materialized_columns.emplace_back(std::move(column_no_lc));
key_columns[i] = materialized_columns[i].get();
}
/// We will join only keys, where all components are not NULL. /// We will join only keys, where all components are not NULL.
if (key_columns[i]->isColumnNullable()) if (key_columns[i]->isColumnNullable())
@ -511,19 +534,19 @@ namespace
struct Adder<true, ASTTableJoin::Strictness::Any, Map> struct Adder<true, ASTTableJoin::Strictness::Any, Map>
{ {
static void addFound(const typename Map::mapped_type & mapped, size_t num_columns_to_add, MutableColumns & added_columns, static void addFound(const typename Map::mapped_type & mapped, size_t num_columns_to_add, MutableColumns & added_columns,
size_t i, IColumn::Filter * filter, IColumn::Offset & /*current_offset*/, IColumn::Offsets * /*offsets*/, size_t i, IColumn::Filter & filter, IColumn::Offset & /*current_offset*/, IColumn::Offsets * /*offsets*/,
const std::vector<size_t> & right_indexes) const std::vector<size_t> & right_indexes)
{ {
(*filter)[i] = 1; filter[i] = 1;
for (size_t j = 0; j < num_columns_to_add; ++j) for (size_t j = 0; j < num_columns_to_add; ++j)
added_columns[j]->insertFrom(*mapped.block->getByPosition(right_indexes[j]).column, mapped.row_num); added_columns[j]->insertFrom(*mapped.block->getByPosition(right_indexes[j]).column, mapped.row_num);
} }
static void addNotFound(size_t num_columns_to_add, MutableColumns & added_columns, static void addNotFound(size_t num_columns_to_add, MutableColumns & added_columns,
size_t i, IColumn::Filter * filter, IColumn::Offset & /*current_offset*/, IColumn::Offsets * /*offsets*/) size_t i, IColumn::Filter & filter, IColumn::Offset & /*current_offset*/, IColumn::Offsets * /*offsets*/)
{ {
(*filter)[i] = 0; filter[i] = 0;
for (size_t j = 0; j < num_columns_to_add; ++j) for (size_t j = 0; j < num_columns_to_add; ++j)
added_columns[j]->insertDefault(); added_columns[j]->insertDefault();
@ -534,19 +557,19 @@ namespace
struct Adder<false, ASTTableJoin::Strictness::Any, Map> struct Adder<false, ASTTableJoin::Strictness::Any, Map>
{ {
static void addFound(const typename Map::mapped_type & mapped, size_t num_columns_to_add, MutableColumns & added_columns, static void addFound(const typename Map::mapped_type & mapped, size_t num_columns_to_add, MutableColumns & added_columns,
size_t i, IColumn::Filter * filter, IColumn::Offset & /*current_offset*/, IColumn::Offsets * /*offsets*/, size_t i, IColumn::Filter & filter, IColumn::Offset & /*current_offset*/, IColumn::Offsets * /*offsets*/,
const std::vector<size_t> & right_indexes) const std::vector<size_t> & right_indexes)
{ {
(*filter)[i] = 1; filter[i] = 1;
for (size_t j = 0; j < num_columns_to_add; ++j) for (size_t j = 0; j < num_columns_to_add; ++j)
added_columns[j]->insertFrom(*mapped.block->getByPosition(right_indexes[j]).column, mapped.row_num); added_columns[j]->insertFrom(*mapped.block->getByPosition(right_indexes[j]).column, mapped.row_num);
} }
static void addNotFound(size_t /*num_columns_to_add*/, MutableColumns & /*added_columns*/, static void addNotFound(size_t /*num_columns_to_add*/, MutableColumns & /*added_columns*/,
size_t i, IColumn::Filter * filter, IColumn::Offset & /*current_offset*/, IColumn::Offsets * /*offsets*/) size_t i, IColumn::Filter & filter, IColumn::Offset & /*current_offset*/, IColumn::Offsets * /*offsets*/)
{ {
(*filter)[i] = 0; filter[i] = 0;
} }
}; };
@ -554,10 +577,10 @@ namespace
struct Adder<fill_left, ASTTableJoin::Strictness::All, Map> struct Adder<fill_left, ASTTableJoin::Strictness::All, Map>
{ {
static void addFound(const typename Map::mapped_type & mapped, size_t num_columns_to_add, MutableColumns & added_columns, static void addFound(const typename Map::mapped_type & mapped, size_t num_columns_to_add, MutableColumns & added_columns,
size_t i, IColumn::Filter * filter, IColumn::Offset & current_offset, IColumn::Offsets * offsets, size_t i, IColumn::Filter & filter, IColumn::Offset & current_offset, IColumn::Offsets * offsets,
const std::vector<size_t> & right_indexes) const std::vector<size_t> & right_indexes)
{ {
(*filter)[i] = 1; filter[i] = 1;
size_t rows_joined = 0; size_t rows_joined = 0;
for (auto current = &static_cast<const typename Map::mapped_type::Base_t &>(mapped); current != nullptr; current = current->next) for (auto current = &static_cast<const typename Map::mapped_type::Base_t &>(mapped); current != nullptr; current = current->next)
@ -573,9 +596,9 @@ namespace
} }
static void addNotFound(size_t num_columns_to_add, MutableColumns & added_columns, static void addNotFound(size_t num_columns_to_add, MutableColumns & added_columns,
size_t i, IColumn::Filter * filter, IColumn::Offset & current_offset, IColumn::Offsets * offsets) size_t i, IColumn::Filter & filter, IColumn::Offset & current_offset, IColumn::Offsets * offsets)
{ {
(*filter)[i] = 0; filter[i] = 0;
if (!fill_left) if (!fill_left)
{ {
@ -595,10 +618,11 @@ namespace
template <ASTTableJoin::Kind KIND, ASTTableJoin::Strictness STRICTNESS, typename KeyGetter, typename Map, bool has_null_map> template <ASTTableJoin::Kind KIND, ASTTableJoin::Strictness STRICTNESS, typename KeyGetter, typename Map, bool has_null_map>
void NO_INLINE joinBlockImplTypeCase( void NO_INLINE joinBlockImplTypeCase(
const Map & map, size_t rows, const ColumnRawPtrs & key_columns, const Sizes & key_sizes, const Map & map, size_t rows, const ColumnRawPtrs & key_columns, const Sizes & key_sizes,
MutableColumns & added_columns, ConstNullMapPtr null_map, std::unique_ptr<IColumn::Filter> & filter, MutableColumns & added_columns, ConstNullMapPtr null_map, IColumn::Filter & filter,
IColumn::Offset & current_offset, std::unique_ptr<IColumn::Offsets> & offsets_to_replicate, std::unique_ptr<IColumn::Offsets> & offsets_to_replicate,
const std::vector<size_t> & right_indexes) const std::vector<size_t> & right_indexes)
{ {
IColumn::Offset current_offset = 0;
size_t num_columns_to_add = right_indexes.size(); size_t num_columns_to_add = right_indexes.size();
Arena pool; Arena pool;
@ -609,7 +633,7 @@ namespace
if (has_null_map && (*null_map)[i]) if (has_null_map && (*null_map)[i])
{ {
Adder<Join::KindTrait<KIND>::fill_left, STRICTNESS, Map>::addNotFound( Adder<Join::KindTrait<KIND>::fill_left, STRICTNESS, Map>::addNotFound(
num_columns_to_add, added_columns, i, filter.get(), current_offset, offsets_to_replicate.get()); num_columns_to_add, added_columns, i, filter, current_offset, offsets_to_replicate.get());
} }
else else
{ {
@ -620,30 +644,40 @@ namespace
auto & mapped = find_result.getMapped(); auto & mapped = find_result.getMapped();
mapped.setUsed(); mapped.setUsed();
Adder<Join::KindTrait<KIND>::fill_left, STRICTNESS, Map>::addFound( Adder<Join::KindTrait<KIND>::fill_left, STRICTNESS, Map>::addFound(
mapped, num_columns_to_add, added_columns, i, filter.get(), current_offset, offsets_to_replicate.get(), right_indexes); mapped, num_columns_to_add, added_columns, i, filter, current_offset, offsets_to_replicate.get(), right_indexes);
} }
else else
Adder<Join::KindTrait<KIND>::fill_left, STRICTNESS, Map>::addNotFound( Adder<Join::KindTrait<KIND>::fill_left, STRICTNESS, Map>::addNotFound(
num_columns_to_add, added_columns, i, filter.get(), current_offset, offsets_to_replicate.get()); num_columns_to_add, added_columns, i, filter, current_offset, offsets_to_replicate.get());
} }
} }
} }
using BlockFilterData = std::pair<
std::unique_ptr<IColumn::Filter>,
std::unique_ptr<IColumn::Offsets>>;
template <ASTTableJoin::Kind KIND, ASTTableJoin::Strictness STRICTNESS, typename KeyGetter, typename Map> template <ASTTableJoin::Kind KIND, ASTTableJoin::Strictness STRICTNESS, typename KeyGetter, typename Map>
void joinBlockImplType( BlockFilterData joinBlockImplType(
const Map & map, size_t rows, const ColumnRawPtrs & key_columns, const Sizes & key_sizes, const Map & map, size_t rows, const ColumnRawPtrs & key_columns, const Sizes & key_sizes,
MutableColumns & added_columns, ConstNullMapPtr null_map, std::unique_ptr<IColumn::Filter> & filter, MutableColumns & added_columns, ConstNullMapPtr null_map, const std::vector<size_t> & right_indexes)
IColumn::Offset & current_offset, std::unique_ptr<IColumn::Offsets> & offsets_to_replicate,
const std::vector<size_t> & right_indexes)
{ {
std::unique_ptr<IColumn::Filter> filter = std::make_unique<IColumn::Filter>(rows);
std::unique_ptr<IColumn::Offsets> offsets_to_replicate;
if (STRICTNESS == ASTTableJoin::Strictness::All)
offsets_to_replicate = std::make_unique<IColumn::Offsets>(rows);
if (null_map) if (null_map)
joinBlockImplTypeCase<KIND, STRICTNESS, KeyGetter, Map, true>( joinBlockImplTypeCase<KIND, STRICTNESS, KeyGetter, Map, true>(
map, rows, key_columns, key_sizes, added_columns, null_map, filter, map, rows, key_columns, key_sizes, added_columns, null_map, *filter,
current_offset, offsets_to_replicate, right_indexes); offsets_to_replicate, right_indexes);
else else
joinBlockImplTypeCase<KIND, STRICTNESS, KeyGetter, Map, false>( joinBlockImplTypeCase<KIND, STRICTNESS, KeyGetter, Map, false>(
map, rows, key_columns, key_sizes, added_columns, null_map, filter, map, rows, key_columns, key_sizes, added_columns, null_map, *filter,
current_offset, offsets_to_replicate, right_indexes); offsets_to_replicate, right_indexes);
return {std::move(filter), std::move(offsets_to_replicate)};
} }
} }
@ -652,7 +686,7 @@ template <ASTTableJoin::Kind KIND, ASTTableJoin::Strictness STRICTNESS, typename
void Join::joinBlockImpl( void Join::joinBlockImpl(
Block & block, Block & block,
const Names & key_names_left, const Names & key_names_left,
const NameSet & needed_key_names_right, const NamesAndTypesList & columns_added_by_join,
const Block & block_with_columns_to_add, const Block & block_with_columns_to_add,
const Maps & maps_) const const Maps & maps_) const
{ {
@ -729,27 +763,16 @@ void Join::joinBlockImpl(
} }
} }
size_t rows = block.rows();
std::unique_ptr<IColumn::Filter> filter; std::unique_ptr<IColumn::Filter> filter;
bool filter_left_keys = (kind == ASTTableJoin::Kind::Inner || kind == ASTTableJoin::Kind::Right) && strictness == ASTTableJoin::Strictness::Any;
filter = std::make_unique<IColumn::Filter>(rows);
/// Used with ALL ... JOIN
IColumn::Offset current_offset = 0;
std::unique_ptr<IColumn::Offsets> offsets_to_replicate; std::unique_ptr<IColumn::Offsets> offsets_to_replicate;
if (strictness == ASTTableJoin::Strictness::All)
offsets_to_replicate = std::make_unique<IColumn::Offsets>(rows);
switch (type) switch (type)
{ {
#define M(TYPE) \ #define M(TYPE) \
case Join::Type::TYPE: \ case Join::Type::TYPE: \
joinBlockImplType<KIND, STRICTNESS, typename KeyGetterForType<Join::Type::TYPE, const std::remove_reference_t<decltype(*maps_.TYPE)>>::Type>(\ std::tie(filter, offsets_to_replicate) = \
*maps_.TYPE, rows, key_columns, key_sizes, added_columns, null_map, \ joinBlockImplType<KIND, STRICTNESS, typename KeyGetterForType<Join::Type::TYPE, const std::remove_reference_t<decltype(*maps_.TYPE)>>::Type>(\
filter, current_offset, offsets_to_replicate, right_indexes); \ *maps_.TYPE, block.rows(), key_columns, key_sizes, added_columns, null_map, right_indexes); \
break; break;
APPLY_FOR_JOIN_VARIANTS(M) APPLY_FOR_JOIN_VARIANTS(M)
#undef M #undef M
@ -762,47 +785,96 @@ void Join::joinBlockImpl(
for (size_t i = 0; i < added_columns_size; ++i) for (size_t i = 0; i < added_columns_size; ++i)
block.insert(ColumnWithTypeAndName(std::move(added_columns[i]), added_type_name[i].first, added_type_name[i].second)); block.insert(ColumnWithTypeAndName(std::move(added_columns[i]), added_type_name[i].first, added_type_name[i].second));
/// If ANY INNER | RIGHT JOIN - filter all the columns except the new ones. if (!filter)
if (filter_left_keys) throw Exception("No data to filter columns", ErrorCodes::LOGICAL_ERROR);
for (size_t i = 0; i < existing_columns; ++i)
block.safeGetByPosition(i).column = block.safeGetByPosition(i).column->filter(*filter, -1);
ColumnUInt64::Ptr mapping; NameSet needed_key_names_right = requiredRightKeys(key_names_right, columns_added_by_join);
/// Add join key columns from right block if they has different name. if (strictness == ASTTableJoin::Strictness::Any)
for (size_t i = 0; i < key_names_right.size(); ++i)
{ {
auto & right_name = key_names_right[i]; if (kind == ASTTableJoin::Kind::Inner || kind == ASTTableJoin::Kind::Right)
auto & left_name = key_names_left[i];
if (needed_key_names_right.count(right_name) && !block.has(right_name))
{ {
const auto & col = block.getByName(left_name); /// If ANY INNER | RIGHT JOIN - filter all the columns except the new ones.
auto column = col.column; for (size_t i = 0; i < existing_columns; ++i)
if (!filter_left_keys) block.safeGetByPosition(i).column = block.safeGetByPosition(i).column->filter(*filter, -1);
/// Add join key columns from right block if they has different name.
for (size_t i = 0; i < key_names_right.size(); ++i)
{ {
if (!mapping) auto & right_name = key_names_right[i];
auto & left_name = key_names_left[i];
if (needed_key_names_right.count(right_name) && !block.has(right_name))
{ {
auto mut_mapping = ColumnUInt64::create(column->size()); const auto & col = block.getByName(left_name);
auto & data = mut_mapping->getData(); block.insert({col.column, col.type, right_name});
size_t size = column->size(); }
for (size_t j = 0; j < size; ++j) }
data[j] = (*filter)[j] ? j : size; }
else
mapping = std::move(mut_mapping); {
/// Add join key columns from right block if they has different name.
for (size_t i = 0; i < key_names_right.size(); ++i)
{
auto & right_name = key_names_right[i];
auto & left_name = key_names_left[i];
if (needed_key_names_right.count(right_name) && !block.has(right_name))
{
const auto & col = block.getByName(left_name);
auto & column = col.column;
MutableColumnPtr mut_column = column->cloneEmpty();
for (size_t col_no = 0; col_no < filter->size(); ++col_no)
{
if ((*filter)[col_no])
mut_column->insertFrom(*column, col_no);
else
mut_column->insertDefault();
}
block.insert({std::move(mut_column), col.type, right_name});
} }
auto mut_column = (*std::move(column)).mutate();
mut_column->insertDefault();
column = mut_column->index(*mapping, 0);
} }
block.insert({column, col.type, right_name});
} }
} }
else
/// If ALL ... JOIN - we replicate all the columns except the new ones.
if (offsets_to_replicate)
{ {
if (!offsets_to_replicate)
throw Exception("No data to filter columns", ErrorCodes::LOGICAL_ERROR);
/// Add join key columns from right block if they has different name.
for (size_t i = 0; i < key_names_right.size(); ++i)
{
auto & right_name = key_names_right[i];
auto & left_name = key_names_left[i];
if (needed_key_names_right.count(right_name) && !block.has(right_name))
{
const auto & col = block.getByName(left_name);
auto & column = col.column;
MutableColumnPtr mut_column = column->cloneEmpty();
size_t last_offset = 0;
for (size_t col_no = 0; col_no < column->size(); ++col_no)
{
if (size_t to_insert = (*offsets_to_replicate)[col_no] - last_offset)
{
if (!(*filter)[col_no])
mut_column->insertDefault();
else
for (size_t dup = 0; dup < to_insert; ++dup)
mut_column->insertFrom(*column, col_no);
}
last_offset = (*offsets_to_replicate)[col_no];
}
block.insert({std::move(mut_column), col.type, right_name});
}
}
/// If ALL ... JOIN - we replicate all the columns except the new ones.
for (size_t i = 0; i < existing_columns; ++i) for (size_t i = 0; i < existing_columns; ++i)
block.safeGetByPosition(i).column = block.safeGetByPosition(i).column->replicate(*offsets_to_replicate); block.safeGetByPosition(i).column = block.safeGetByPosition(i).column->replicate(*offsets_to_replicate);
} }
@ -936,7 +1008,7 @@ void Join::joinGet(Block & block, const String & column_name) const
} }
void Join::joinBlock(Block & block, const Names & key_names_left, const NameSet & needed_key_names_right) const void Join::joinBlock(Block & block, const Names & key_names_left, const NamesAndTypesList & columns_added_by_join) const
{ {
// std::cerr << "joinBlock: " << block.dumpStructure() << "\n"; // std::cerr << "joinBlock: " << block.dumpStructure() << "\n";
@ -946,7 +1018,7 @@ void Join::joinBlock(Block & block, const Names & key_names_left, const NameSet
if (dispatch([&](auto kind_, auto strictness_, auto & map) if (dispatch([&](auto kind_, auto strictness_, auto & map)
{ {
joinBlockImpl<kind_, strictness_>(block, key_names_left, needed_key_names_right, sample_block_with_columns_to_add, map); joinBlockImpl<kind_, strictness_>(block, key_names_left, columns_added_by_join, sample_block_with_columns_to_add, map);
})) }))
{ {
/// Joined /// Joined
@ -992,14 +1064,12 @@ struct AdderNonJoined;
template <typename Mapped> template <typename Mapped>
struct AdderNonJoined<ASTTableJoin::Strictness::Any, Mapped> struct AdderNonJoined<ASTTableJoin::Strictness::Any, Mapped>
{ {
static void add(const Mapped & mapped, size_t & rows_added, static void add(const Mapped & mapped, size_t & rows_added, MutableColumns & columns_left, MutableColumns & columns_right)
size_t num_columns_left, MutableColumns & columns_left,
size_t num_columns_right, MutableColumns & columns_right)
{ {
for (size_t j = 0; j < num_columns_left; ++j) for (size_t j = 0; j < columns_left.size(); ++j)
columns_left[j]->insertDefault(); columns_left[j]->insertDefault();
for (size_t j = 0; j < num_columns_right; ++j) for (size_t j = 0; j < columns_right.size(); ++j)
columns_right[j]->insertFrom(*mapped.block->getByPosition(j).column.get(), mapped.row_num); columns_right[j]->insertFrom(*mapped.block->getByPosition(j).column.get(), mapped.row_num);
++rows_added; ++rows_added;
@ -1009,16 +1079,14 @@ struct AdderNonJoined<ASTTableJoin::Strictness::Any, Mapped>
template <typename Mapped> template <typename Mapped>
struct AdderNonJoined<ASTTableJoin::Strictness::All, Mapped> struct AdderNonJoined<ASTTableJoin::Strictness::All, Mapped>
{ {
static void add(const Mapped & mapped, size_t & rows_added, static void add(const Mapped & mapped, size_t & rows_added, MutableColumns & columns_left, MutableColumns & columns_right)
size_t num_columns_left, MutableColumns & columns_left,
size_t num_columns_right, MutableColumns & columns_right)
{ {
for (auto current = &static_cast<const typename Mapped::Base_t &>(mapped); current != nullptr; current = current->next) for (auto current = &static_cast<const typename Mapped::Base_t &>(mapped); current != nullptr; current = current->next)
{ {
for (size_t j = 0; j < num_columns_left; ++j) for (size_t j = 0; j < columns_left.size(); ++j)
columns_left[j]->insertDefault(); columns_left[j]->insertDefault();
for (size_t j = 0; j < num_columns_right; ++j) for (size_t j = 0; j < columns_right.size(); ++j)
columns_right[j]->insertFrom(*current->block->getByPosition(j).column.get(), current->row_num); columns_right[j]->insertFrom(*current->block->getByPosition(j).column.get(), current->row_num);
++rows_added; ++rows_added;
@ -1031,61 +1099,61 @@ struct AdderNonJoined<ASTTableJoin::Strictness::All, Mapped>
class NonJoinedBlockInputStream : public IBlockInputStream class NonJoinedBlockInputStream : public IBlockInputStream
{ {
public: public:
NonJoinedBlockInputStream(const Join & parent_, const Block & left_sample_block, const Names & key_names_left, size_t max_block_size_) NonJoinedBlockInputStream(const Join & parent_, const Block & left_sample_block, const Names & key_names_left,
const NamesAndTypesList & columns_added_by_join, size_t max_block_size_)
: parent(parent_), max_block_size(max_block_size_) : parent(parent_), max_block_size(max_block_size_)
{ {
/** left_sample_block contains keys and "left" columns. /** left_sample_block contains keys and "left" columns.
* result_sample_block - keys, "left" columns, and "right" columns. * result_sample_block - keys, "left" columns, and "right" columns.
*/ */
std::unordered_map<String, String> key_renames;
makeResultSampleBlock(left_sample_block, key_names_left, columns_added_by_join, key_renames);
const Block & right_sample_block = parent.sample_block_with_columns_to_add;
size_t num_keys = key_names_left.size(); size_t num_keys = key_names_left.size();
size_t num_columns_left = left_sample_block.columns() - num_keys; size_t num_columns_left = left_sample_block.columns() - num_keys;
size_t num_columns_right = parent.sample_block_with_columns_to_add.columns(); size_t num_columns_right = right_sample_block.columns();
result_sample_block = materializeBlock(left_sample_block);
/// Add columns from the right-side table to the block.
for (size_t i = 0; i < num_columns_right; ++i)
{
const ColumnWithTypeAndName & src_column = parent.sample_block_with_columns_to_add.getByPosition(i);
result_sample_block.insert(src_column.cloneEmpty());
}
column_indices_left.reserve(num_columns_left); column_indices_left.reserve(num_columns_left);
column_indices_keys_and_right.reserve(num_keys + num_columns_right); column_indices_keys_and_right.reserve(num_keys + num_columns_right);
std::vector<bool> is_key_column_in_left_block(num_keys + num_columns_left, false);
std::vector<bool> is_left_key(left_sample_block.columns(), false);
for (const std::string & key : key_names_left) for (const std::string & key : key_names_left)
{ {
size_t key_pos = left_sample_block.getPositionByName(key); size_t key_pos = left_sample_block.getPositionByName(key);
is_key_column_in_left_block[key_pos] = true; is_left_key[key_pos] = true;
/// Here we establish the mapping between key columns of the left- and right-side tables. /// Here we establish the mapping between key columns of the left- and right-side tables.
/// key_pos index is inserted in the position corresponding to key column in parent.blocks /// key_pos index is inserted in the position corresponding to key column in parent.blocks
/// (saved blocks of the right-side table) and points to the same key column /// (saved blocks of the right-side table) and points to the same key column
/// in the left_sample_block and thus in the result_sample_block. /// in the left_sample_block and thus in the result_sample_block.
column_indices_keys_and_right.push_back(key_pos); column_indices_keys_and_right.push_back(key_pos);
auto it = key_renames.find(key);
if (it != key_renames.end())
key_renames_indices[key_pos] = result_sample_block.getPositionByName(it->second);
} }
for (size_t i = 0; i < num_keys + num_columns_left; ++i) size_t num_src_columns = left_sample_block.columns() + right_sample_block.columns();
{
if (!is_key_column_in_left_block[i])
column_indices_left.push_back(i);
}
for (size_t i = 0; i < num_columns_right; ++i) for (size_t i = 0; i < result_sample_block.columns(); ++i)
column_indices_keys_and_right.push_back(num_keys + num_columns_left + i);
/// If use_nulls, convert left columns to Nullable.
if (parent.use_nulls)
{ {
for (size_t i = 0; i < num_columns_left; ++i) if (i < left_sample_block.columns())
{ {
convertColumnToNullable(result_sample_block.getByPosition(column_indices_left[i])); if (!is_left_key[i])
} {
} column_indices_left.emplace_back(i);
columns_left.resize(num_columns_left); /// If use_nulls, convert left columns to Nullable.
columns_keys_and_right.resize(num_keys + num_columns_right); if (parent.use_nulls)
convertColumnToNullable(result_sample_block.getByPosition(i));
}
}
else if (i < num_src_columns)
column_indices_keys_and_right.emplace_back(i);
}
} }
String getName() const override { return "NonJoined"; } String getName() const override { return "NonJoined"; }
@ -1117,31 +1185,49 @@ private:
/// Indices of key columns in result_sample_block or columns that come from the right-side table. /// Indices of key columns in result_sample_block or columns that come from the right-side table.
/// Order is significant: it is the same as the order of columns in the blocks of the right-side table that are saved in parent.blocks. /// Order is significant: it is the same as the order of columns in the blocks of the right-side table that are saved in parent.blocks.
ColumnNumbers column_indices_keys_and_right; ColumnNumbers column_indices_keys_and_right;
/// Columns of the current output block corresponding to column_indices_left. std::unordered_map<size_t, size_t> key_renames_indices;
MutableColumns columns_left;
/// Columns of the current output block corresponding to column_indices_keys_and_right.
MutableColumns columns_keys_and_right;
std::unique_ptr<void, std::function<void(void *)>> position; /// type erasure std::unique_ptr<void, std::function<void(void *)>> position; /// type erasure
void makeResultSampleBlock(const Block & left_sample_block, const Names & key_names_left,
const NamesAndTypesList & columns_added_by_join, std::unordered_map<String, String> & key_renames)
{
const Block & right_sample_block = parent.sample_block_with_columns_to_add;
result_sample_block = materializeBlock(left_sample_block);
/// Add columns from the right-side table to the block.
for (size_t i = 0; i < right_sample_block.columns(); ++i)
{
const ColumnWithTypeAndName & src_column = right_sample_block.getByPosition(i);
result_sample_block.insert(src_column.cloneEmpty());
}
const auto & key_names_right = parent.key_names_right;
NameSet needed_key_names_right = requiredRightKeys(key_names_right, columns_added_by_join);
/// Add join key columns from right block if they has different name.
for (size_t i = 0; i < key_names_right.size(); ++i)
{
auto & right_name = key_names_right[i];
auto & left_name = key_names_left[i];
if (needed_key_names_right.count(right_name) && !result_sample_block.has(right_name))
{
const auto & col = result_sample_block.getByName(left_name);
result_sample_block.insert({col.column, col.type, right_name});
key_renames[left_name] = right_name;
}
}
}
template <ASTTableJoin::Strictness STRICTNESS, typename Maps> template <ASTTableJoin::Strictness STRICTNESS, typename Maps>
Block createBlock(const Maps & maps) Block createBlock(const Maps & maps)
{ {
size_t num_columns_left = column_indices_left.size(); MutableColumns columns_left = columnsForIndex(result_sample_block, column_indices_left);
size_t num_columns_right = column_indices_keys_and_right.size(); MutableColumns columns_keys_and_right = columnsForIndex(result_sample_block, column_indices_keys_and_right);
for (size_t i = 0; i < num_columns_left; ++i)
{
const auto & src_col = result_sample_block.safeGetByPosition(column_indices_left[i]);
columns_left[i] = src_col.type->createColumn();
}
for (size_t i = 0; i < num_columns_right; ++i)
{
const auto & src_col = result_sample_block.safeGetByPosition(column_indices_keys_and_right[i]);
columns_keys_and_right[i] = src_col.type->createColumn();
}
size_t rows_added = 0; size_t rows_added = 0;
@ -1149,7 +1235,7 @@ private:
{ {
#define M(TYPE) \ #define M(TYPE) \
case Join::Type::TYPE: \ case Join::Type::TYPE: \
rows_added = fillColumns<STRICTNESS>(*maps.TYPE); \ rows_added = fillColumns<STRICTNESS>(*maps.TYPE, columns_left, columns_keys_and_right); \
break; break;
APPLY_FOR_JOIN_VARIANTS(M) APPLY_FOR_JOIN_VARIANTS(M)
#undef M #undef M
@ -1162,21 +1248,56 @@ private:
return {}; return {};
Block res = result_sample_block.cloneEmpty(); Block res = result_sample_block.cloneEmpty();
for (size_t i = 0; i < num_columns_left; ++i)
for (size_t i = 0; i < columns_left.size(); ++i)
res.getByPosition(column_indices_left[i]).column = std::move(columns_left[i]); res.getByPosition(column_indices_left[i]).column = std::move(columns_left[i]);
for (size_t i = 0; i < num_columns_right; ++i)
res.getByPosition(column_indices_keys_and_right[i]).column = std::move(columns_keys_and_right[i]); if (key_renames_indices.empty())
{
for (size_t i = 0; i < columns_keys_and_right.size(); ++i)
res.getByPosition(column_indices_keys_and_right[i]).column = std::move(columns_keys_and_right[i]);
}
else
{
for (size_t i = 0; i < columns_keys_and_right.size(); ++i)
{
size_t key_idx = column_indices_keys_and_right[i];
auto it = key_renames_indices.find(key_idx);
if (it != key_renames_indices.end())
{
auto & key_column = res.getByPosition(key_idx).column;
if (key_column->empty())
key_column = key_column->cloneResized(columns_keys_and_right[i]->size());
res.getByPosition(it->second).column = std::move(columns_keys_and_right[i]);
}
else
res.getByPosition(key_idx).column = std::move(columns_keys_and_right[i]);
}
}
return res; return res;
} }
static MutableColumns columnsForIndex(const Block & block, const ColumnNumbers & indices)
{
size_t num_columns = indices.size();
MutableColumns columns;
columns.resize(num_columns);
for (size_t i = 0; i < num_columns; ++i)
{
const auto & src_col = block.safeGetByPosition(indices[i]);
columns[i] = src_col.type->createColumn();
}
return columns;
}
template <ASTTableJoin::Strictness STRICTNESS, typename Map> template <ASTTableJoin::Strictness STRICTNESS, typename Map>
size_t fillColumns(const Map & map) size_t fillColumns(const Map & map, MutableColumns & columns_left, MutableColumns & columns_keys_and_right)
{ {
size_t num_columns_left = column_indices_left.size();
size_t num_columns_right = column_indices_keys_and_right.size();
size_t rows_added = 0; size_t rows_added = 0;
if (!position) if (!position)
@ -1192,7 +1313,7 @@ private:
if (it->second.getUsed()) if (it->second.getUsed())
continue; continue;
AdderNonJoined<STRICTNESS, typename Map::mapped_type>::add(it->second, rows_added, num_columns_left, columns_left, num_columns_right, columns_keys_and_right); AdderNonJoined<STRICTNESS, typename Map::mapped_type>::add(it->second, rows_added, columns_left, columns_keys_and_right);
if (rows_added >= max_block_size) if (rows_added >= max_block_size)
{ {
@ -1206,9 +1327,10 @@ private:
}; };
BlockInputStreamPtr Join::createStreamWithNonJoinedRows(const Block & left_sample_block, const Names & key_names_left, size_t max_block_size) const BlockInputStreamPtr Join::createStreamWithNonJoinedRows(const Block & left_sample_block, const Names & key_names_left,
const NamesAndTypesList & columns_added_by_join, size_t max_block_size) const
{ {
return std::make_shared<NonJoinedBlockInputStream>(*this, left_sample_block, key_names_left, max_block_size); return std::make_shared<NonJoinedBlockInputStream>(*this, left_sample_block, key_names_left, columns_added_by_join, max_block_size);
} }

View File

@ -241,7 +241,7 @@ public:
/** Join data from the map (that was previously built by calls to insertFromBlock) to the block with data from "left" table. /** Join data from the map (that was previously built by calls to insertFromBlock) to the block with data from "left" table.
* Could be called from different threads in parallel. * Could be called from different threads in parallel.
*/ */
void joinBlock(Block & block, const Names & key_names_left, const NameSet & needed_key_names_right) const; void joinBlock(Block & block, const Names & key_names_left, const NamesAndTypesList & columns_added_by_join) const;
/// Infer the return type for joinGet function /// Infer the return type for joinGet function
DataTypePtr joinGetReturnType(const String & column_name) const; DataTypePtr joinGetReturnType(const String & column_name) const;
@ -261,7 +261,8 @@ public:
* Use only after all calls to joinBlock was done. * Use only after all calls to joinBlock was done.
* left_sample_block is passed without account of 'use_nulls' setting (columns will be converted to Nullable inside). * left_sample_block is passed without account of 'use_nulls' setting (columns will be converted to Nullable inside).
*/ */
BlockInputStreamPtr createStreamWithNonJoinedRows(const Block & left_sample_block, const Names & key_names_left, size_t max_block_size) const; BlockInputStreamPtr createStreamWithNonJoinedRows(const Block & left_sample_block, const Names & key_names_left,
const NamesAndTypesList & columns_added_by_join, size_t max_block_size) const;
/// Number of keys in all built JOIN maps. /// Number of keys in all built JOIN maps.
size_t getTotalRowCount() const; size_t getTotalRowCount() const;
@ -511,7 +512,7 @@ private:
void joinBlockImpl( void joinBlockImpl(
Block & block, Block & block,
const Names & key_names_left, const Names & key_names_left,
const NameSet & needed_key_names_right, const NamesAndTypesList & columns_added_by_join,
const Block & block_with_columns_to_add, const Block & block_with_columns_to_add,
const Maps & maps) const; const Maps & maps) const;

View File

@ -66,15 +66,20 @@ void MergeTreeDataPart::MinMaxIndex::load(const MergeTreeData & data, const Stri
} }
void MergeTreeDataPart::MinMaxIndex::store(const MergeTreeData & data, const String & part_path, Checksums & out_checksums) const void MergeTreeDataPart::MinMaxIndex::store(const MergeTreeData & data, const String & part_path, Checksums & out_checksums) const
{
store(data.minmax_idx_columns, data.minmax_idx_column_types, part_path, out_checksums);
}
void MergeTreeDataPart::MinMaxIndex::store(const Names & column_names, const DataTypes & data_types, const String & part_path, Checksums & out_checksums) const
{ {
if (!initialized) if (!initialized)
throw Exception("Attempt to store uninitialized MinMax index for part " + part_path + ". This is a bug.", throw Exception("Attempt to store uninitialized MinMax index for part " + part_path + ". This is a bug.",
ErrorCodes::LOGICAL_ERROR); ErrorCodes::LOGICAL_ERROR);
for (size_t i = 0; i < data.minmax_idx_columns.size(); ++i) for (size_t i = 0; i < column_names.size(); ++i)
{ {
String file_name = "minmax_" + escapeForFileName(data.minmax_idx_columns[i]) + ".idx"; String file_name = "minmax_" + escapeForFileName(column_names[i]) + ".idx";
const DataTypePtr & type = data.minmax_idx_column_types[i]; const DataTypePtr & type = data_types.at(i);
WriteBufferFromFile out(part_path + file_name); WriteBufferFromFile out(part_path + file_name);
HashingWriteBuffer out_hashing(out); HashingWriteBuffer out_hashing(out);
@ -517,7 +522,7 @@ void MergeTreeDataPart::loadPartitionAndMinMaxIndex()
minmax_idx.load(storage, full_path); minmax_idx.load(storage, full_path);
} }
String calculated_partition_id = partition.getID(storage); String calculated_partition_id = partition.getID(storage.partition_key_sample);
if (calculated_partition_id != info.partition_id) if (calculated_partition_id != info.partition_id)
throw Exception( throw Exception(
"While loading part " + getFullPath() + ": calculated partition ID: " + calculated_partition_id "While loading part " + getFullPath() + ": calculated partition ID: " + calculated_partition_id

View File

@ -200,6 +200,7 @@ struct MergeTreeDataPart
void load(const MergeTreeData & storage, const String & part_path); void load(const MergeTreeData & storage, const String & part_path);
void store(const MergeTreeData & storage, const String & part_path, Checksums & checksums) const; void store(const MergeTreeData & storage, const String & part_path, Checksums & checksums) const;
void store(const Names & column_names, const DataTypes & data_types, const String & part_path, Checksums & checksums) const;
void update(const Block & block, const Names & column_names); void update(const Block & block, const Names & column_names);
void merge(const MinMaxIndex & other); void merge(const MinMaxIndex & other);

View File

@ -141,7 +141,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataWriter::writeTempPart(BlockWithPa
MergeTreePartition partition(std::move(block_with_partition.partition)); MergeTreePartition partition(std::move(block_with_partition.partition));
MergeTreePartInfo new_part_info(partition.getID(data), temp_index, temp_index, 0); MergeTreePartInfo new_part_info(partition.getID(data.partition_key_sample), temp_index, temp_index, 0);
String part_name; String part_name;
if (data.format_version < MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING) if (data.format_version < MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING)
{ {

View File

@ -10,6 +10,7 @@
#include <Common/SipHash.h> #include <Common/SipHash.h>
#include <Common/typeid_cast.h> #include <Common/typeid_cast.h>
#include <Common/hex.h> #include <Common/hex.h>
#include <Core/Block.h>
#include <Poco/File.h> #include <Poco/File.h>
@ -21,11 +22,16 @@ static ReadBufferFromFile openForReading(const String & path)
return ReadBufferFromFile(path, std::min(static_cast<Poco::File::FileSize>(DBMS_DEFAULT_BUFFER_SIZE), Poco::File(path).getSize())); return ReadBufferFromFile(path, std::min(static_cast<Poco::File::FileSize>(DBMS_DEFAULT_BUFFER_SIZE), Poco::File(path).getSize()));
} }
/// NOTE: This ID is used to create part names which are then persisted in ZK and as directory names on the file system.
/// So if you want to change this method, be sure to guarantee compatibility with existing table data.
String MergeTreePartition::getID(const MergeTreeData & storage) const String MergeTreePartition::getID(const MergeTreeData & storage) const
{ {
if (value.size() != storage.partition_key_sample.columns()) return getID(storage.partition_key_sample);
}
/// NOTE: This ID is used to create part names which are then persisted in ZK and as directory names on the file system.
/// So if you want to change this method, be sure to guarantee compatibility with existing table data.
String MergeTreePartition::getID(const Block & partition_key_sample) const
{
if (value.size() != partition_key_sample.columns())
throw Exception("Invalid partition key size: " + toString(value.size()), ErrorCodes::LOGICAL_ERROR); throw Exception("Invalid partition key size: " + toString(value.size()), ErrorCodes::LOGICAL_ERROR);
if (value.empty()) if (value.empty())
@ -53,7 +59,7 @@ String MergeTreePartition::getID(const MergeTreeData & storage) const
if (i > 0) if (i > 0)
result += '-'; result += '-';
if (typeid_cast<const DataTypeDate *>(storage.partition_key_sample.getByPosition(i).type.get())) if (typeid_cast<const DataTypeDate *>(partition_key_sample.getByPosition(i).type.get()))
result += toString(DateLUT::instance().toNumYYYYMMDD(DayNum(value[i].safeGet<UInt64>()))); result += toString(DateLUT::instance().toNumYYYYMMDD(DayNum(value[i].safeGet<UInt64>())));
else else
result += applyVisitor(to_string_visitor, value[i]); result += applyVisitor(to_string_visitor, value[i]);
@ -126,13 +132,18 @@ void MergeTreePartition::load(const MergeTreeData & storage, const String & part
void MergeTreePartition::store(const MergeTreeData & storage, const String & part_path, MergeTreeDataPartChecksums & checksums) const void MergeTreePartition::store(const MergeTreeData & storage, const String & part_path, MergeTreeDataPartChecksums & checksums) const
{ {
if (!storage.partition_key_expr) store(storage.partition_key_sample, part_path, checksums);
}
void MergeTreePartition::store(const Block & partition_key_sample, const String & part_path, MergeTreeDataPartChecksums & checksums) const
{
if (!partition_key_sample)
return; return;
WriteBufferFromFile out(part_path + "partition.dat"); WriteBufferFromFile out(part_path + "partition.dat");
HashingWriteBuffer out_hashing(out); HashingWriteBuffer out_hashing(out);
for (size_t i = 0; i < value.size(); ++i) for (size_t i = 0; i < value.size(); ++i)
storage.partition_key_sample.getByPosition(i).type->serializeBinary(value[i], out_hashing); partition_key_sample.getByPosition(i).type->serializeBinary(value[i], out_hashing);
out_hashing.next(); out_hashing.next();
checksums.files["partition.dat"].file_size = out_hashing.count(); checksums.files["partition.dat"].file_size = out_hashing.count();
checksums.files["partition.dat"].file_hash = out_hashing.getHash(); checksums.files["partition.dat"].file_hash = out_hashing.getHash();

View File

@ -7,6 +7,7 @@
namespace DB namespace DB
{ {
class Block;
class MergeTreeData; class MergeTreeData;
struct FormatSettings; struct FormatSettings;
struct MergeTreeDataPartChecksums; struct MergeTreeDataPartChecksums;
@ -25,11 +26,13 @@ public:
explicit MergeTreePartition(UInt32 yyyymm) : value(1, yyyymm) {} explicit MergeTreePartition(UInt32 yyyymm) : value(1, yyyymm) {}
String getID(const MergeTreeData & storage) const; String getID(const MergeTreeData & storage) const;
String getID(const Block & partition_key_sample) const;
void serializeText(const MergeTreeData & storage, WriteBuffer & out, const FormatSettings & format_settings) const; void serializeText(const MergeTreeData & storage, WriteBuffer & out, const FormatSettings & format_settings) const;
void load(const MergeTreeData & storage, const String & part_path); void load(const MergeTreeData & storage, const String & part_path);
void store(const MergeTreeData & storage, const String & part_path, MergeTreeDataPartChecksums & checksums) const; void store(const MergeTreeData & storage, const String & part_path, MergeTreeDataPartChecksums & checksums) const;
void store(const Block & partition_key_sample, const String & part_path, MergeTreeDataPartChecksums & checksums) const;
void assign(const MergeTreePartition & other) { value.assign(other.value); } void assign(const MergeTreePartition & other) { value.assign(other.value); }
}; };

View File

@ -125,6 +125,7 @@ if [ -n "$*" ]; then
else else
TEST_RUN=${TEST_RUN=1} TEST_RUN=${TEST_RUN=1}
TEST_PERF=${TEST_PERF=1} TEST_PERF=${TEST_PERF=1}
TEST_DICT=${TEST_DICT=1}
CLICKHOUSE_CLIENT_QUERY="${CLICKHOUSE_CLIENT} --config ${CLICKHOUSE_CONFIG_CLIENT} --port $CLICKHOUSE_PORT_TCP -m -n -q" CLICKHOUSE_CLIENT_QUERY="${CLICKHOUSE_CLIENT} --config ${CLICKHOUSE_CONFIG_CLIENT} --port $CLICKHOUSE_PORT_TCP -m -n -q"
$CLICKHOUSE_CLIENT_QUERY 'SELECT * from system.build_options; SELECT * FROM system.clusters;' $CLICKHOUSE_CLIENT_QUERY 'SELECT * from system.build_options; SELECT * FROM system.clusters;'
CLICKHOUSE_TEST="env PATH=$PATH:$BIN_DIR ${TEST_DIR}clickhouse-test --binary ${BIN_DIR}${CLICKHOUSE_BINARY} --configclient $CLICKHOUSE_CONFIG_CLIENT --configserver $CLICKHOUSE_CONFIG --tmp $DATA_DIR/tmp --queries $QUERIES_DIR $TEST_OPT0 $TEST_OPT" CLICKHOUSE_TEST="env PATH=$PATH:$BIN_DIR ${TEST_DIR}clickhouse-test --binary ${BIN_DIR}${CLICKHOUSE_BINARY} --configclient $CLICKHOUSE_CONFIG_CLIENT --configserver $CLICKHOUSE_CONFIG --tmp $DATA_DIR/tmp --queries $QUERIES_DIR $TEST_OPT0 $TEST_OPT"
@ -139,6 +140,7 @@ else
fi fi
( [ "$TEST_RUN" ] && $CLICKHOUSE_TEST ) || ${TEST_TRUE:=false} ( [ "$TEST_RUN" ] && $CLICKHOUSE_TEST ) || ${TEST_TRUE:=false}
( [ "$TEST_PERF" ] && $CLICKHOUSE_PERFORMANCE_TEST $* ) || true ( [ "$TEST_PERF" ] && $CLICKHOUSE_PERFORMANCE_TEST $* ) || true
( [ "$TEST_DICT" ] && mkdir -p $DATA_DIR/etc/dictionaries/ && cd $CUR_DIR/external_dictionaries && python generate_and_test.py --port=$CLICKHOUSE_PORT_TCP --client=$CLICKHOUSE_CLIENT --source=$CUR_DIR/external_dictionaries/source.tsv --reference=$CUR_DIR/external_dictionaries/reference --generated=$DATA_DIR/etc/dictionaries/ --no_mysql --no_mongo ) || true
$CLICKHOUSE_CLIENT_QUERY "SELECT event, value FROM system.events; SELECT metric, value FROM system.metrics; SELECT metric, value FROM system.asynchronous_metrics;" $CLICKHOUSE_CLIENT_QUERY "SELECT event, value FROM system.events; SELECT metric, value FROM system.metrics; SELECT metric, value FROM system.asynchronous_metrics;"
$CLICKHOUSE_CLIENT_QUERY "SELECT 'Still alive'" $CLICKHOUSE_CLIENT_QUERY "SELECT 'Still alive'"
fi fi

View File

@ -394,8 +394,8 @@ def generate_dictionaries(args):
</source> </source>
<lifetime> <lifetime>
<min>0</min> <min>5</min>
<max>0</max> <max>15</max>
</lifetime> </lifetime>
<layout> <layout>

View File

@ -0,0 +1,198 @@
Сборка ClickHouse поддерживается на Linux, FreeBSD, Mac OS X.
# Если вы используете Windows
Если вы используете Windows, вам потребуется создать виртуальную машину с Ubuntu. Для работы с виртуальной машиной, установите VirtualBox. Скачать Ubuntu можно на сайте: https://www.ubuntu.com/#download Создайте виртуальную машину из полученного образа. Выделите для неё не менее 4 GB оперативной памяти. Для запуска терминала в Ubuntu, найдите в меню программу со словом terminal (gnome-terminal, konsole или что-то в этом роде) или нажмите Ctrl+Alt+T.
# Создание репозитория на GitHub
Для работы с репозиторием ClickHouse, вам потребуется аккаунт на GitHub. Наверное, он у вас уже есть.
Если аккаунта нет - зарегистрируйтесь на https://github.com/. Создайте ssh ключи, если их нет, и загрузите публичные ключи на GitHub. Это потребуется для отправки изменений. Для работы с GitHub можно использовать такие же ssh ключи, как и для работы с другими ssh серверами - скорее всего, они уже у вас есть.
Создайте fork репозитория ClickHouse. Для этого, на странице https://github.com/yandex/ClickHouse нажмите на кнопку "fork" в правом верхнем углу. Вы получите полную копию репозитория ClickHouse на своём аккаунте, которая называется "форк". Процесс разработки состоит в том, чтобы внести нужные изменения в свой форк репозитория, а затем создать "pull request" для принятия изменений в основной репозиторий.
Для работы с git репозиториями, установите `git`.
В Ubuntu выполните в терминале:
```
sudo apt update
sudo apt install git
```
Краткое руководство по использованию Git: https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf
Подробное руководство по использованию Git: https://git-scm.com/book/ru/v2
# Клонирование репозитория на рабочую машину
Затем вам потребуется загрузить исходники для работы на свой компьютер. Это называется "клонирование репозитория", потому что создаёт на вашем компьютере локальную копию репозитория, с которой вы будете работать.
Выполните в терминале:
```
git clone --recursive git@github.com:yandex/ClickHouse.git
cd ClickHouse
```
Замените *yandex* на имя вашего аккаунта на GitHub.
Эта команда создаст директорию ClickHouse, содержащую рабочую копию проекта.
Необходимо, чтобы путь к рабочей копии не содержал пробелы в именах директорий. Это может привести к проблемам в работе системы сборки.
Обратите внимание, что репозиторий ClickHouse использует submodules. Так называются ссылки на дополнительные репозитории (например, внешние библиотеки, от которых зависит проект). Это значит, что при клонировании репозитория, следует указывать ключ `--recursive`, как в примере выше. Если репозиторий был клонирован без submodules, то для их скачивания, необходимо выполнить:
```
git submodule init
git submodule update
```
Проверить наличие submodules можно с помощью команды `git submodule status`.
# Система сборки
ClickHouse использует систему сборки CMake и Ninja.
CMake - генератор задач сборки.
Ninja - система запуска сборочных задач.
Для установки на Ubuntu или Debian, Mint, выполните `sudo apt install cmake ninja-build`.
Для установки на CentOS, RedHat, выполните `sudo yum install cmake ninja-build`.
Если у вас Arch или Gentoo, то вы сами знаете, как установить CMake.
Для установки CMake и Ninja на Mac OS X, сначала установите Homebrew, а затем, с помощью него, установите всё остальное.
```
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install cmake ninja
```
# Необязательные внешние библиотеки
ClickHouse использует для сборки некоторое количество внешних библиотек. Большинство из них не требуется отдельно устанавливать, так как они собираются вместе с ClickHouse, из исходников, которые расположены в submodules. Посмотреть набор этих библиотек можно в директории contrib.
Пара библиотек не собирается из исходников, а используется из системы: ICU и Readline, и их рекомендуется установить.
Ubuntu: `sudo apt install libicu-dev libreadline-dev`
Mac OS X: `brew install icu4c readline`
Впрочем, эти библиотеки не обязательны для работы и ClickHouse может быть собран без них. ICU используется для поддержки `COLLATE` в `ORDER BY` (например, для сортировки с учётом турецкого алфавита). Readline используется для более удобного набора команд в интерактивном режиме в clickhouse-client.
# Компилятор C++
В качестве компилятора C++ поддерживается GCC начиная с версии 7 или Clang начиная с версии 7.
Официальные сборки от Яндекса, на данный момент, используют GCC, так как он генерирует слегка более производительный машинный код (разница в среднем до нескольких процентов по нашим бенчмаркам). Clang обычно более удобен для разработки. Впрочем, наша среда continuous integration проверяет около десятка вариантов сборки.
Для установки GCC под Ubuntu, выполните: `sudo apt install gcc g++`.
Проверьте версию gcc: `gcc --version`. Если версия меньше 7, то следуйте инструкции: https://clickhouse.yandex/docs/en/development/build/#install-gcc-7
Для установки GCC под Mac OS X, выполните `brew install gcc`.
Если вы решили использовать Clang, вы также можете установить `libc++` и `lld`, если вы знаете, что это такое. При желании, установите `ccache`.
# Процесс сборки
Теперь вы готовы к сборке ClickHouse. Для размещения собранных файлов, рекомендуется создать отдельную директорию build внутри директории ClickHouse:
```
mkdir build
cd build
```
Вы можете иметь несколько разных директорий (build_release, build_debug) для разных вариантов сборки.
Находясь в директории build, выполните конфигурацию сборки с помощью CMake.
Перед первым запуском необходимо выставить переменные окружения, отвечающие за выбор компилятора (в данном примере это - gcc версии 7).
```
export CC=gcc-7 CXX=g++-7
cmake ..
```
Переменная CC отвечает за компилятор C (сокращение от слов C Compiler), переменная CXX отвечает за выбор компилятора C++ (символ X - это как плюс, но положенный набок, ради того, чтобы превратить его в букву).
Для более быстрой сборки, можно использовать debug вариант - сборку без оптимизаций. Для этого, укажите параметр `-D CMAKE_BUILD_TYPE=Debug`:
```
cmake -D CMAKE_BUILD_TYPE=Debug ..
```
Вы можете изменить вариант сборки, выполнив эту команду в директории build.
Запустите ninja для сборки:
```
ninja clickhouse-server clickhouse-client
```
В этом примере собираются только нужные в первую очередь программы.
Если вы хотите собрать все программы (утилиты и тесты), то запустите ninja без параметров:
```
ninja
```
Для полной сборки требуется около 30 GB свободного места на диске или 15 GB для сборки только основных программ.
При наличии небольшого количества оперативной памяти на компьютере, следует ограничить количество параллельных задач с помощью параметра `-j`:
```
ninja -j 1 clickhouse-server clickhouse-client
```
На машинах с 4 GB памяти, рекомендуется указывать значение 1, а если памяти до 8 GB, укажите значение 2.
Если вы получили сообщение `ninja: error: loading 'build.ninja': No such file or directory`, значит конфигурация сборки прошла с ошибкой и вам необходимо посмотреть на сообщение об ошибке выше.
В случае успешного запуска, вы увидите прогресс сборки - количество обработанных задач и общее количество задач.
В процессе сборки могут появится сообщения `libprotobuf WARNING` про protobuf файлы в библиотеке libhdfs2. Это не имеет значения.
При успешной сборке, вы получите готовый исполняемый файл `ClickHouse/build/dbms/programs/clickhouse`:
```
ls -l dbms/programs/clickhouse
```
# Запуск собранной версии ClickHouse
Для запуска сервера из под текущего пользователя, с выводом логов в терминал и с использованием примеров конфигурационных файлов, расположенных в исходниках, перейдите в директорию `ClickHouse/dbms/programs/server/` (эта директория находится не в директории build) и выполните:
```
../../../build/dbms/programs/clickhouse server
```
В этом случае, ClickHouse будет использовать конфигурационные файлы, расположенные в текущей директории. Вы можете запустить `clickhouse server` из любой директории, передав ему путь к конфигурационному файлу в аргументе командной строки `--config-file`.
Для подключения к ClickHouse с помощью clickhouse-client, в соседнем терминале, зайдите в директорию `ClickHouse/build/dbms/programs/` и выполните `clickhouse client`.
Вы можете заменить собранным вами ClickHouse продакшен версию, установленную в системе. Для этого, установите ClickHouse на свою машину по инструкции с официального сайта. Затем выполните:
```
sudo service clickhouse-server stop
sudo cp ClickHouse/build/dbms/programs/clickhouse /usr/bin/
sudo service clickhouse-server start
```
Также вы можете запустить собранный вами ClickHouse с конфигурационным файлом системного ClickHouse:
```
sudo service clickhouse-server stop
sudo -u clickhouse ClickHouse/build/dbms/programs/clickhouse server --config-file /etc/clickhouse-server/config.xml
```
# Среда разработки
Если вы не знаете, какую среду разработки использовать, то рекомендуется использовать CLion. CLion является платным ПО, но его можно использовать бесплатно в течение пробного периода. Также он бесплатен для учащихся. CLion можно использовать как под Linux, так и под Mac OS X.
Также в качестве среды разработки, вы можете использовать KDevelop или QTCreator. KDevelop - очень удобная, но нестабильная среда разработки. Если KDevelop вылетает через небольшое время после открытия проекта, вам следует нажать на кнопку "Stop All" как только он открыл список файлов проекта. После этого, KDevelop можно будет использовать.
В качестве простых редакторов кода можно использовать Sublime Text или Visual Studio Code или Kate (все варианты доступны под Linux).
На всякий случай заметим, что CLion самостоятельно создаёт свою build директорию, самостоятельно выбирает тип сборки debug по-умолчанию, для конфигурации использует встроенную в CLion версию CMake вместо установленного вами, а для запуска задач использует make вместо ninja. Это нормально, просто имейте это ввиду, чтобы не возникало путаницы.
# Написание кода
Описание архитектуры ClickHouse: https://clickhouse.yandex/docs/ru/development/architecture/
Стиль кода: https://clickhouse.yandex/docs/ru/development/style/

View File

@ -43,6 +43,17 @@ def subprocess_call(args):
# print('run:', ' ' . join(args)) # print('run:', ' ' . join(args))
subprocess.call(args) subprocess.call(args)
def get_odbc_bridge_path():
path = os.environ.get('CLICKHOUSE_TESTS_ODBC_BRIDGE_BIN_PATH')
if path is None:
server_path = os.environ.get('CLICKHOUSE_TESTS_SERVER_BIN_PATH')
if server_path is not None:
return os.path.join(os.path.dirname(server_path), 'clickhouse-odbc-bridge')
else:
return '/usr/bin/clickhouse-odbc-bridge'
return path
class ClickHouseCluster: class ClickHouseCluster:
"""ClickHouse cluster with several instances and (possibly) ZooKeeper. """ClickHouse cluster with several instances and (possibly) ZooKeeper.
@ -53,12 +64,13 @@ class ClickHouseCluster:
""" """
def __init__(self, base_path, name=None, base_configs_dir=None, server_bin_path=None, client_bin_path=None, def __init__(self, base_path, name=None, base_configs_dir=None, server_bin_path=None, client_bin_path=None,
zookeeper_config_path=None, custom_dockerd_host=None): odbc_bridge_bin_path=None, zookeeper_config_path=None, custom_dockerd_host=None):
self.base_dir = p.dirname(base_path) self.base_dir = p.dirname(base_path)
self.name = name if name is not None else '' self.name = name if name is not None else ''
self.base_configs_dir = base_configs_dir or os.environ.get('CLICKHOUSE_TESTS_BASE_CONFIG_DIR', '/etc/clickhouse-server/') self.base_configs_dir = base_configs_dir or os.environ.get('CLICKHOUSE_TESTS_BASE_CONFIG_DIR', '/etc/clickhouse-server/')
self.server_bin_path = p.realpath(server_bin_path or os.environ.get('CLICKHOUSE_TESTS_SERVER_BIN_PATH', '/usr/bin/clickhouse')) self.server_bin_path = p.realpath(server_bin_path or os.environ.get('CLICKHOUSE_TESTS_SERVER_BIN_PATH', '/usr/bin/clickhouse'))
self.odbc_bridge_bin_path = p.realpath(odbc_bridge_bin_path or get_odbc_bridge_path())
self.client_bin_path = p.realpath(client_bin_path or os.environ.get('CLICKHOUSE_TESTS_CLIENT_BIN_PATH', '/usr/bin/clickhouse-client')) self.client_bin_path = p.realpath(client_bin_path or os.environ.get('CLICKHOUSE_TESTS_CLIENT_BIN_PATH', '/usr/bin/clickhouse-client'))
self.zookeeper_config_path = p.join(self.base_dir, zookeeper_config_path) if zookeeper_config_path else p.join(HELPERS_DIR, 'zookeeper_config.xml') self.zookeeper_config_path = p.join(self.base_dir, zookeeper_config_path) if zookeeper_config_path else p.join(HELPERS_DIR, 'zookeeper_config.xml')
@ -116,8 +128,8 @@ class ClickHouseCluster:
instance = ClickHouseInstance( instance = ClickHouseInstance(
self, self.base_dir, name, config_dir, main_configs, user_configs, macros, with_zookeeper, self, self.base_dir, name, config_dir, main_configs, user_configs, macros, with_zookeeper,
self.zookeeper_config_path, with_mysql, with_kafka, self.base_configs_dir, self.server_bin_path, self.zookeeper_config_path, with_mysql, with_kafka, self.base_configs_dir, self.server_bin_path,
clickhouse_path_dir, with_odbc_drivers, hostname=hostname, env_variables=env_variables, image=image, self.odbc_bridge_bin_path, clickhouse_path_dir, with_odbc_drivers, hostname=hostname,
stay_alive=stay_alive, ipv4_address=ipv4_address, ipv6_address=ipv6_address) env_variables=env_variables, image=image, stay_alive=stay_alive, ipv4_address=ipv4_address, ipv6_address=ipv6_address)
self.instances[name] = instance self.instances[name] = instance
self.base_cmd.extend(['--file', instance.docker_compose_path]) self.base_cmd.extend(['--file', instance.docker_compose_path])
@ -340,6 +352,7 @@ services:
hostname: {hostname} hostname: {hostname}
volumes: volumes:
- {binary_path}:/usr/bin/clickhouse:ro - {binary_path}:/usr/bin/clickhouse:ro
- {odbc_bridge_bin_path}:/usr/bin/clickhouse-odbc-bridge:ro
- {configs_dir}:/etc/clickhouse-server/ - {configs_dir}:/etc/clickhouse-server/
- {db_dir}:/var/lib/clickhouse/ - {db_dir}:/var/lib/clickhouse/
- {logs_dir}:/var/log/clickhouse-server/ - {logs_dir}:/var/log/clickhouse-server/
@ -372,7 +385,7 @@ class ClickHouseInstance:
def __init__( def __init__(
self, cluster, base_path, name, custom_config_dir, custom_main_configs, custom_user_configs, macros, self, cluster, base_path, name, custom_config_dir, custom_main_configs, custom_user_configs, macros,
with_zookeeper, zookeeper_config_path, with_mysql, with_kafka, base_configs_dir, server_bin_path, with_zookeeper, zookeeper_config_path, with_mysql, with_kafka, base_configs_dir, server_bin_path, odbc_bridge_bin_path,
clickhouse_path_dir, with_odbc_drivers, hostname=None, env_variables={}, image="yandex/clickhouse-integration-test", clickhouse_path_dir, with_odbc_drivers, hostname=None, env_variables={}, image="yandex/clickhouse-integration-test",
stay_alive=False, ipv4_address=None, ipv6_address=None): stay_alive=False, ipv4_address=None, ipv6_address=None):
@ -392,6 +405,7 @@ class ClickHouseInstance:
self.base_configs_dir = base_configs_dir self.base_configs_dir = base_configs_dir
self.server_bin_path = server_bin_path self.server_bin_path = server_bin_path
self.odbc_bridge_bin_path = odbc_bridge_bin_path
self.with_mysql = with_mysql self.with_mysql = with_mysql
self.with_kafka = with_kafka self.with_kafka = with_kafka
@ -649,6 +663,7 @@ class ClickHouseInstance:
name=self.name, name=self.name,
hostname=self.hostname, hostname=self.hostname,
binary_path=self.server_bin_path, binary_path=self.server_bin_path,
odbc_bridge_bin_path=self.odbc_bridge_bin_path,
configs_dir=configs_dir, configs_dir=configs_dir,
config_d_dir=config_d_dir, config_d_dir=config_d_dir,
db_dir=db_dir, db_dir=db_dir,

View File

@ -18,7 +18,8 @@ RUN apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --yes -
python-pip \ python-pip \
tzdata \ tzdata \
libreadline-dev \ libreadline-dev \
libicu-dev libicu-dev \
curl
ENV TZ=Europe/Moscow ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

View File

@ -9,5 +9,6 @@ echo "Start tests"
export CLICKHOUSE_TESTS_SERVER_BIN_PATH=/clickhouse export CLICKHOUSE_TESTS_SERVER_BIN_PATH=/clickhouse
export CLICKHOUSE_TESTS_CLIENT_BIN_PATH=/clickhouse export CLICKHOUSE_TESTS_CLIENT_BIN_PATH=/clickhouse
export CLICKHOUSE_TESTS_BASE_CONFIG_DIR=/clickhouse-config export CLICKHOUSE_TESTS_BASE_CONFIG_DIR=/clickhouse-config
export CLICKHOUSE_ODBC_BRIDGE_BINARY_PATH=/clickhouse-odbc-bridge
cd /ClickHouse/dbms/tests/integration && pytest $PYTEST_OPTS cd /ClickHouse/dbms/tests/integration && pytest $PYTEST_OPTS

View File

@ -51,6 +51,11 @@ if __name__ == "__main__":
default=os.environ.get("CLICKHOUSE_TESTS_SERVER_BIN_PATH", os.environ.get("CLICKHOUSE_TESTS_CLIENT_BIN_PATH", "/usr/bin/clickhouse")), default=os.environ.get("CLICKHOUSE_TESTS_SERVER_BIN_PATH", os.environ.get("CLICKHOUSE_TESTS_CLIENT_BIN_PATH", "/usr/bin/clickhouse")),
help="Path to clickhouse binary") help="Path to clickhouse binary")
parser.add_argument(
"--bridge-binary",
default=os.environ.get("CLICKHOUSE_TESTS_ODBC_BRIDGE_BIN_PATH", "/usr/bin/clickhouse-odbc-bridge"),
help="Path to clickhouse-odbc-bridge binary")
parser.add_argument( parser.add_argument(
"--configs-dir", "--configs-dir",
default=os.environ.get("CLICKHOUSE_TESTS_BASE_CONFIG_DIR", os.path.join(DEFAULT_CLICKHOUSE_ROOT, "dbms/programs/server")), default=os.environ.get("CLICKHOUSE_TESTS_BASE_CONFIG_DIR", os.path.join(DEFAULT_CLICKHOUSE_ROOT, "dbms/programs/server")),
@ -77,10 +82,11 @@ if __name__ == "__main__":
if not args.disable_net_host: if not args.disable_net_host:
net = "--net=host" net = "--net=host"
cmd = "docker run {net} --name {name} --user={user} --privileged --volume={bin}:/clickhouse \ cmd = "docker run {net} --name {name} --user={user} --privileged --volume={bridge_bin}:/clickhouse-odbc-bridge --volume={bin}:/clickhouse \
--volume={cfg}:/clickhouse-config --volume={pth}:/ClickHouse -e PYTEST_OPTS='{opts}' {img} ".format( --volume={cfg}:/clickhouse-config --volume={pth}:/ClickHouse -e PYTEST_OPTS='{opts}' {img} ".format(
net=net, net=net,
bin=args.binary, bin=args.binary,
bridge_bin=args.bridge_binary,
cfg=args.configs_dir, cfg=args.configs_dir,
pth=args.clickhouse_root, pth=args.clickhouse_root,
opts=' '.join(args.pytest_args), opts=' '.join(args.pytest_args),

View File

@ -0,0 +1,24 @@
<?xml version="1.0"?>
<yandex>
<profiles>
<default>
<use_uncompressed_cache>1</use_uncompressed_cache>
</default>
</profiles>
<users>
<default>
<password></password>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
</default>
</users>
<quotas>
<default>
</default>
</quotas>
</yandex>

View File

@ -10,6 +10,8 @@ cluster = ClickHouseCluster(__file__)
node1 = cluster.add_instance('node1', main_configs=['configs/zstd_compression_by_default.xml']) node1 = cluster.add_instance('node1', main_configs=['configs/zstd_compression_by_default.xml'])
node2 = cluster.add_instance('node2', main_configs=['configs/lz4hc_compression_by_default.xml']) node2 = cluster.add_instance('node2', main_configs=['configs/lz4hc_compression_by_default.xml'])
node3 = cluster.add_instance('node3', main_configs=['configs/custom_compression_by_default.xml']) node3 = cluster.add_instance('node3', main_configs=['configs/custom_compression_by_default.xml'])
node4 = cluster.add_instance('node4', user_configs=['configs/enable_uncompressed_cache.xml'])
node5 = cluster.add_instance('node5', main_configs=['configs/zstd_compression_by_default.xml'], user_configs=['configs/enable_uncompressed_cache.xml'])
@pytest.fixture(scope="module") @pytest.fixture(scope="module")
def start_cluster(): def start_cluster():
@ -68,3 +70,34 @@ def test_preconfigured_custom_codec(start_cluster):
node3.query("OPTIMIZE TABLE compression_codec_multiple_with_key FINAL") node3.query("OPTIMIZE TABLE compression_codec_multiple_with_key FINAL")
assert node3.query("SELECT COUNT(*) from compression_codec_multiple_with_key WHERE length(data) = 10000") == "11\n" assert node3.query("SELECT COUNT(*) from compression_codec_multiple_with_key WHERE length(data) = 10000") == "11\n"
def test_uncompressed_cache_custom_codec(start_cluster):
node4.query("""
CREATE TABLE compression_codec_multiple_with_key (
somedate Date CODEC(ZSTD, ZSTD, ZSTD(12), LZ4HC(12)),
id UInt64 CODEC(LZ4, ZSTD, NONE, LZ4HC),
data String,
somecolumn Float64 CODEC(ZSTD(2), LZ4HC, NONE, NONE, NONE, LZ4HC(5))
) ENGINE = MergeTree() PARTITION BY somedate ORDER BY id SETTINGS index_granularity = 2;
""")
node4.query("INSERT INTO compression_codec_multiple_with_key VALUES(toDate('2018-10-12'), 100000, '{}', 88.88)".format(''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(10000))))
# two equal requests one by one, to get into UncompressedCache for the first block
assert node4.query("SELECT max(length(data)) from compression_codec_multiple_with_key GROUP BY data ORDER BY max(length(data)) DESC LIMIT 1") == "10000\n"
assert node4.query("SELECT max(length(data)) from compression_codec_multiple_with_key GROUP BY data ORDER BY max(length(data)) DESC LIMIT 1") == "10000\n"
def test_uncompressed_cache_plus_zstd_codec(start_cluster):
node5.query("""
CREATE TABLE compression_codec_multiple_with_key (
somedate Date CODEC(ZSTD, ZSTD, ZSTD(12), LZ4HC(12)),
id UInt64 CODEC(LZ4, ZSTD, NONE, LZ4HC),
data String,
somecolumn Float64 CODEC(ZSTD(2), LZ4HC, NONE, NONE, NONE, LZ4HC(5))
) ENGINE = MergeTree() PARTITION BY somedate ORDER BY id SETTINGS index_granularity = 2;
""")
node5.query("INSERT INTO compression_codec_multiple_with_key VALUES(toDate('2018-10-12'), 100000, '{}', 88.88)".format('a' * 10000))
assert node5.query("SELECT max(length(data)) from compression_codec_multiple_with_key GROUP BY data ORDER BY max(length(data)) DESC LIMIT 1") == "10000\n"

View File

@ -1,12 +1,18 @@
<?xml version="1.0"?> <?xml version="1.0"?>
<yandex> <yandex>
<logger> <logger>
<level>trace</level> <level>trace</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log> <log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog> <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size> <log>/var/log/clickhouse-server/clickhouse-server.log</log>
<count>10</count> <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
</logger> <odbc_bridge_log>/var/log/clickhouse-server/clickhouse-odbc-bridge.log</odbc_bridge_log>
<odbc_bridge_errlog>/var/log/clickhouse-server/clickhouse-odbc-bridge.err.log</odbc_bridge_errlog>
<odbc_bridge_level>trace</odbc_bridge_level>
<size>1000M</size>
<count>10</count>
</logger>
<tcp_port>9000</tcp_port> <tcp_port>9000</tcp_port>
<listen_host>127.0.0.1</listen_host> <listen_host>127.0.0.1</listen_host>

View File

@ -92,10 +92,10 @@ CREATE TABLE {}(id UInt32, name String, age UInt32, money UInt32) ENGINE = MySQL
node1.query("INSERT INTO {}(id, name, money) select number, concat('name_', toString(number)), 3 from numbers(100) ".format(table_name)) node1.query("INSERT INTO {}(id, name, money) select number, concat('name_', toString(number)), 3 from numbers(100) ".format(table_name))
# actually, I don't know, what wrong with that connection string, but libmyodbc always falls into segfault assert node1.query("SELECT count(*) FROM odbc('DSN={}', '{}')".format(mysql_setup["DSN"], table_name)) == '100\n'
node1.query("SELECT * FROM odbc('DSN={}', '{}')".format(mysql_setup["DSN"], table_name), ignore_error=True)
# server still works after segfault # previously this test fails with segfault
# just to be sure :)
assert node1.query("select 1") == "1\n" assert node1.query("select 1") == "1\n"
conn.close() conn.close()

View File

@ -1,10 +1,10 @@
0 0 0 0 0 0
0 1 1 0 0 1
1 2 2 1 1 2
1 3 3 1 1 3
2 4 4 2 2 4
2 0 5 2 2 5
3 0 6 3 3 6
3 0 7 3 3 7
4 0 8 4 4 8
4 0 9 4 4 9

View File

@ -0,0 +1,118 @@
inner
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
inner subs
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
inner expr
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
left
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i
left subs
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i
left expr
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i
right
0 6 g
0 7 h
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
right subs
0 6 g
0 7 h
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
full
0 6 g
0 7 h
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i
full subs
0 6 g
0 7 h
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i

View File

@ -0,0 +1,40 @@
use test;
drop table if exists X;
drop table if exists Y;
create table X (id Int32, x_name String) engine Memory;
create table Y (id Int32, y_name String) engine Memory;
insert into X (id, x_name) values (1, 'A'), (2, 'B'), (2, 'C'), (3, 'D'), (4, 'E'), (4, 'F'), (5, 'G'), (8, 'H'), (9, 'I');
insert into Y (id, y_name) values (1, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (3, 'e'), (4, 'f'), (6, 'g'), (7, 'h'), (9, 'i');
select 'inner';
select X.*, Y.* from X inner join Y on X.id = Y.id;
select 'inner subs';
select s.*, j.* from (select * from X) as s inner join (select * from Y) as j on s.id = j.id;
select 'inner expr';
select X.*, Y.* from X inner join Y on (X.id + 1) = (Y.id + 1);
select 'left';
select X.*, Y.* from X left join Y on X.id = Y.id;
select 'left subs';
select s.*, j.* from (select * from X) as s left join (select * from Y) as j on s.id = j.id;
select 'left expr';
select X.*, Y.* from X left join Y on (X.id + 1) = (Y.id + 1);
select 'right';
select X.*, Y.* from X right join Y on X.id = Y.id order by id;
select 'right subs';
select s.*, j.* from (select * from X) as s right join (select * from Y) as j on s.id = j.id order by id;
--select 'right expr';
--select X.*, Y.* from X right join Y on (X.id + 1) = (Y.id + 1) order by id;
select 'full';
select X.*, Y.* from X full join Y on X.id = Y.id order by id;
select 'full subs';
select s.*, j.* from (select * from X) as s full join (select * from Y) as j on s.id = j.id order by id;
--select 'full expr';
--select X.*, Y.* from X full join Y on (X.id + 1) = (Y.id + 1) order by id;
drop table X;
drop table Y;

View File

@ -0,0 +1,96 @@
inner
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
inner subs
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
left
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i
left subs
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i
right
0 6 g
0 7 h
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
right subs
0 6 g
0 7 h
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
9 I 9 i
full
0 6 g
0 7 h
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i
full subs
0 6 g
0 7 h
1 A 1 a
1 A 1 b
2 B 2 c
2 C 2 c
3 D 3 d
3 D 3 e
4 E 4 f
4 F 4 f
5 G 0
8 H 0
9 I 9 i

View File

@ -0,0 +1,32 @@
use test;
drop table if exists X;
drop table if exists Y;
create table X (id Int32, x_name String) engine Memory;
create table Y (id Int32, y_name String) engine Memory;
insert into X (id, x_name) values (1, 'A'), (2, 'B'), (2, 'C'), (3, 'D'), (4, 'E'), (4, 'F'), (5, 'G'), (8, 'H'), (9, 'I');
insert into Y (id, y_name) values (1, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (3, 'e'), (4, 'f'), (6, 'g'), (7, 'h'), (9, 'i');
select 'inner';
select X.*, Y.* from X inner join Y using id;
select 'inner subs';
select s.*, j.* from (select * from X) as s inner join (select * from Y) as j using id;
select 'left';
select X.*, Y.* from X left join Y using id;
select 'left subs';
select s.*, j.* from (select * from X) as s left join (select * from Y) as j using id;
select 'right';
select X.*, Y.* from X right join Y using id order by id;
select 'right subs';
select s.*, j.* from (select * from X) as s right join (select * from Y) as j using id order by id;
select 'full';
select X.*, Y.* from X full join Y using id order by id;
select 'full subs';
select s.*, j.* from (select * from X) as s full join (select * from Y) as j using id order by id;
drop table X;
drop table Y;

View File

@ -21,6 +21,8 @@
└──────────┴──────┘ └──────────┴──────┘
one one
system one system one
system one
test one test one
2 2
2 2
2

View File

@ -58,10 +58,10 @@ SELECT t.name --, db.name
FROM (SELECT name, database FROM system.tables WHERE name = 'one') AS t FROM (SELECT name, database FROM system.tables WHERE name = 'one') AS t
JOIN (SELECT name FROM system.databases WHERE name = 'system') AS db ON t.database = db.name; JOIN (SELECT name FROM system.databases WHERE name = 'system') AS db ON t.database = db.name;
--SELECT db.name, t.name SELECT db.name, t.name
-- FROM system.tables AS t FROM system.tables AS t
-- JOIN (SELECT * FROM system.databases WHERE name = 'system') AS db ON t.database = db.name JOIN (SELECT * FROM system.databases WHERE name = 'system') AS db ON t.database = db.name
-- WHERE t.name = 'one'; WHERE t.name = 'one';
SELECT database, t.name SELECT database, t.name
FROM system.tables AS t FROM system.tables AS t
@ -72,10 +72,10 @@ SELECT count(t.database)
FROM (SELECT * FROM system.tables WHERE name = 'one') AS t FROM (SELECT * FROM system.tables WHERE name = 'one') AS t
JOIN system.databases AS db ON t.database = db.name; JOIN system.databases AS db ON t.database = db.name;
--SELECT count(db.name) SELECT count(db.name)
-- FROM system.tables AS t FROM system.tables AS t
-- JOIN system.databases AS db ON t.database = db.name JOIN system.databases AS db ON t.database = db.name
-- WHERE t.name = 'one'; WHERE t.name = 'one';
SELECT count() SELECT count()
FROM system.tables AS t FROM system.tables AS t

View File

@ -1,3 +1,7 @@
1 1 1 2 1 1 1 2
1 2 1 2 1 2 1 2
2 3 0 0 2 3 0 0
-
1 1 1 2
1 2 1 2
2 3 0 0

View File

@ -8,7 +8,8 @@ INSERT INTO test.a1 VALUES (1, 1), (1, 2), (2, 3);
INSERT INTO test.a2 VALUES (1, 2), (1, 3), (1, 4); INSERT INTO test.a2 VALUES (1, 2), (1, 3), (1, 4);
SELECT * FROM test.a1 as a left JOIN test.a2 as b on a.a=b.a ORDER BY b SETTINGS join_default_strictness='ANY'; SELECT * FROM test.a1 as a left JOIN test.a2 as b on a.a=b.a ORDER BY b SETTINGS join_default_strictness='ANY';
SELECT '-';
SELECT a1.*, a2.* FROM test.a1 ANY LEFT JOIN test.a2 USING a ORDER BY b;
DROP TABLE IF EXISTS test.a1; DROP TABLE IF EXISTS test.a1;
DROP TABLE IF EXISTS test.a2; DROP TABLE IF EXISTS test.a2;

View File

@ -0,0 +1 @@
2019-01-14 1 ['aaa','aaa','bbb','ccc']

View File

@ -0,0 +1,33 @@
SET allow_experimental_low_cardinality_type = 1;
DROP TABLE IF EXISTS test.table1;
DROP TABLE IF EXISTS test.table2;
CREATE TABLE test.table1
(
dt Date,
id Int32,
arr Array(LowCardinality(String))
) ENGINE = MergeTree PARTITION BY toMonday(dt)
ORDER BY (dt, id) SETTINGS index_granularity = 8192;
CREATE TABLE test.table2
(
dt Date,
id Int32,
arr Array(LowCardinality(String))
) ENGINE = MergeTree PARTITION BY toMonday(dt)
ORDER BY (dt, id) SETTINGS index_granularity = 8192;
insert into test.table1 (dt, id, arr) values ('2019-01-14', 1, ['aaa']);
insert into test.table2 (dt, id, arr) values ('2019-01-14', 1, ['aaa','bbb','ccc']);
select dt, id, arraySort(groupArrayArray(arr))
from (
select dt, id, arr from test.table1
where dt = '2019-01-14' and id = 1
UNION ALL
select dt, id, arr from test.table2
where dt = '2019-01-14' and id = 1
)
group by dt, id;

View File

@ -110,7 +110,7 @@
<table>query_log</table> <table>query_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds> <flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log> </query_log>
<dictionaries_config>*_dictionary.xml</dictionaries_config> <dictionaries_config>dictionaries/dictionary_*.xml</dictionaries_config>
<compression incl="clickhouse_compression"> <compression incl="clickhouse_compression">
</compression> </compression>
<distributed_ddl> <distributed_ddl>

View File

@ -8,22 +8,22 @@
# Short-Description: Yandex clickhouse-server daemon # Short-Description: Yandex clickhouse-server daemon
### END INIT INFO ### END INIT INFO
CLICKHOUSE_USER=clickhouse CLICKHOUSE_USER=clickhouse
CLICKHOUSE_GROUP=${CLICKHOUSE_USER} CLICKHOUSE_GROUP=${CLICKHOUSE_USER}
SHELL=/bin/bash SHELL=/bin/bash
PROGRAM=clickhouse-server PROGRAM=clickhouse-server
GENERIC_PROGRAM=clickhouse CLICKHOUSE_GENERIC_PROGRAM=clickhouse
CLICKHOUSE_PROGRAM_ENV="" CLICKHOUSE_PROGRAM_ENV=""
EXTRACT_FROM_CONFIG=${GENERIC_PROGRAM}-extract-from-config EXTRACT_FROM_CONFIG=${CLICKHOUSE_GENERIC_PROGRAM}-extract-from-config
SYSCONFDIR=/etc/$PROGRAM CLICKHOUSE_CONFDIR=/etc/$PROGRAM
CLICKHOUSE_LOGDIR=/var/log/clickhouse-server CLICKHOUSE_LOGDIR=/var/log/clickhouse-server
CLICKHOUSE_LOGDIR_USER=root CLICKHOUSE_LOGDIR_USER=root
CLICKHOUSE_DATADIR_OLD=/opt/clickhouse CLICKHOUSE_DATADIR_OLD=/opt/clickhouse
CLICKHOUSE_DATADIR=/var/lib/clickhouse
LOCALSTATEDIR=/var/lock LOCALSTATEDIR=/var/lock
BINDIR=/usr/bin CLICKHOUSE_BINDIR=/usr/bin
CLICKHOUSE_CRONFILE=/etc/cron.d/clickhouse-server CLICKHOUSE_CRONFILE=/etc/cron.d/clickhouse-server
CLICKHOUSE_CONFIG=$SYSCONFDIR/config.xml CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config.xml
LOCKFILE=$LOCALSTATEDIR/$PROGRAM LOCKFILE=$LOCALSTATEDIR/$PROGRAM
RETVAL=0 RETVAL=0
@ -92,22 +92,22 @@ die()
# Check that configuration file is Ok. # Check that configuration file is Ok.
check_config() check_config()
{ {
if [ -x "$BINDIR/$EXTRACT_FROM_CONFIG" ]; then if [ -x "$CLICKHOUSE_BINDIR/$EXTRACT_FROM_CONFIG" ]; then
su -s $SHELL ${CLICKHOUSE_USER} -c "$BINDIR/$EXTRACT_FROM_CONFIG --config-file=\"$CLICKHOUSE_CONFIG\" --key=path" >/dev/null || die "Configuration file ${CLICKHOUSE_CONFIG} doesn't parse successfully. Won't restart server. You may use forcerestart if you are sure."; su -s $SHELL ${CLICKHOUSE_USER} -c "$CLICKHOUSE_BINDIR/$EXTRACT_FROM_CONFIG --config-file=\"$CLICKHOUSE_CONFIG\" --key=path" >/dev/null || die "Configuration file ${CLICKHOUSE_CONFIG} doesn't parse successfully. Won't restart server. You may use forcerestart if you are sure.";
fi fi
} }
initdb() initdb()
{ {
if [ -x "$BINDIR/$EXTRACT_FROM_CONFIG" ]; then if [ -x "$CLICKHOUSE_BINDIR/$EXTRACT_FROM_CONFIG" ]; then
CLICKHOUSE_DATADIR_FROM_CONFIG=$(su -s $SHELL ${CLICKHOUSE_USER} -c "$BINDIR/$EXTRACT_FROM_CONFIG --config-file=\"$CLICKHOUSE_CONFIG\" --key=path") CLICKHOUSE_DATADIR_FROM_CONFIG=$(su -s $SHELL ${CLICKHOUSE_USER} -c "$CLICKHOUSE_BINDIR/$EXTRACT_FROM_CONFIG --config-file=\"$CLICKHOUSE_CONFIG\" --key=path")
if [ "(" "$?" -ne "0" ")" -o "(" -z "${CLICKHOUSE_DATADIR_FROM_CONFIG}" ")" ]; then if [ "(" "$?" -ne "0" ")" -o "(" -z "${CLICKHOUSE_DATADIR_FROM_CONFIG}" ")" ]; then
die "Cannot obtain value of path from config file: ${CLICKHOUSE_CONFIG}"; die "Cannot obtain value of path from config file: ${CLICKHOUSE_CONFIG}";
fi fi
echo "Path to data directory in ${CLICKHOUSE_CONFIG}: ${CLICKHOUSE_DATADIR_FROM_CONFIG}" echo "Path to data directory in ${CLICKHOUSE_CONFIG}: ${CLICKHOUSE_DATADIR_FROM_CONFIG}"
else else
CLICKHOUSE_DATADIR_FROM_CONFIG="/var/lib/clickhouse" CLICKHOUSE_DATADIR_FROM_CONFIG=$CLICKHOUSE_DATADIR
fi fi
if ! getent group ${CLICKHOUSE_USER} >/dev/null; then if ! getent group ${CLICKHOUSE_USER} >/dev/null; then
@ -148,7 +148,7 @@ initdb()
start() start()
{ {
[ -x $BINDIR/$PROGRAM ] || exit 0 [ -x $CLICKHOUSE_BINDIR/$PROGRAM ] || exit 0
local EXIT_STATUS local EXIT_STATUS
EXIT_STATUS=0 EXIT_STATUS=0
@ -165,7 +165,7 @@ start()
if ! is_running; then if ! is_running; then
# Lock should not be held while running child process, so we release the lock. Note: obviously, there is race condition. # Lock should not be held while running child process, so we release the lock. Note: obviously, there is race condition.
# But clickhouse-server has protection from simultaneous runs with same data directory. # But clickhouse-server has protection from simultaneous runs with same data directory.
su -s $SHELL ${CLICKHOUSE_USER} -c "$FLOCK -u 9; $CLICKHOUSE_PROGRAM_ENV exec -a \"$PROGRAM\" \"$BINDIR/$PROGRAM\" --daemon --pid-file=\"$CLICKHOUSE_PIDFILE\" --config-file=\"$CLICKHOUSE_CONFIG\"" su -s $SHELL ${CLICKHOUSE_USER} -c "$FLOCK -u 9; $CLICKHOUSE_PROGRAM_ENV exec -a \"$PROGRAM\" \"$CLICKHOUSE_BINDIR/$PROGRAM\" --daemon --pid-file=\"$CLICKHOUSE_PIDFILE\" --config-file=\"$CLICKHOUSE_CONFIG\""
EXIT_STATUS=$? EXIT_STATUS=$?
if [ $EXIT_STATUS -ne 0 ]; then if [ $EXIT_STATUS -ne 0 ]; then
break break

View File

@ -8,6 +8,9 @@ CLICKHOUSE_DATADIR=${CLICKHOUSE_DATADIR=/var/lib/clickhouse}
CLICKHOUSE_LOGDIR=${CLICKHOUSE_LOGDIR=/var/log/clickhouse-server} CLICKHOUSE_LOGDIR=${CLICKHOUSE_LOGDIR=/var/log/clickhouse-server}
CLICKHOUSE_BINDIR=${CLICKHOUSE_BINDIR=/usr/bin} CLICKHOUSE_BINDIR=${CLICKHOUSE_BINDIR=/usr/bin}
CLICKHOUSE_GENERIC_PROGRAM=${CLICKHOUSE_GENERIC_PROGRAM=clickhouse} CLICKHOUSE_GENERIC_PROGRAM=${CLICKHOUSE_GENERIC_PROGRAM=clickhouse}
EXTRACT_FROM_CONFIG=${CLICKHOUSE_GENERIC_PROGRAM}-extract-from-config
CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config.xml
OS=${OS=`lsb_release -is 2>/dev/null || uname -s ||:`} OS=${OS=`lsb_release -is 2>/dev/null || uname -s ||:`}
@ -68,18 +71,23 @@ Please fix this and reinstall this package." >&2
exit 1 exit 1
fi fi
if [ -x "$CLICKHOUSE_BINDIR/$EXTRACT_FROM_CONFIG" ]; then
CLICKHOUSE_DATADIR_FROM_CONFIG=$(su -s $SHELL ${CLICKHOUSE_USER} -c "$CLICKHOUSE_BINDIR/$EXTRACT_FROM_CONFIG --config-file=\"$CLICKHOUSE_CONFIG\" --key=path")
echo "Path to data directory in ${CLICKHOUSE_CONFIG}: ${CLICKHOUSE_DATADIR_FROM_CONFIG}"
fi
CLICKHOUSE_DATADIR_FROM_CONFIG=${CLICKHOUSE_DATADIR_FROM_CONFIG=$CLICKHOUSE_DATADIR}
if [ ! -d ${CLICKHOUSE_DATADIR} ]; then if [ ! -d ${CLICKHOUSE_DATADIR_FROM_CONFIG} ]; then
mkdir -p ${CLICKHOUSE_DATADIR} mkdir -p ${CLICKHOUSE_DATADIR_FROM_CONFIG}
chown ${CLICKHOUSE_USER}:${CLICKHOUSE_GROUP} ${CLICKHOUSE_DATADIR} chown ${CLICKHOUSE_USER}:${CLICKHOUSE_GROUP} ${CLICKHOUSE_DATADIR_FROM_CONFIG}
chmod 700 ${CLICKHOUSE_DATADIR} chmod 700 ${CLICKHOUSE_DATADIR_FROM_CONFIG}
fi fi
if [ -d ${CLICKHOUSE_CONFDIR} ]; then if [ -d ${CLICKHOUSE_CONFDIR} ]; then
rm -fv ${CLICKHOUSE_CONFDIR}/*-preprocessed.xml ||: rm -fv ${CLICKHOUSE_CONFDIR}/*-preprocessed.xml ||:
fi fi
[ -e ${CLICKHOUSE_CONFDIR}/preprocessed ] || ln -s ${CLICKHOUSE_DATADIR}/preprocessed_configs ${CLICKHOUSE_CONFDIR}/preprocessed ||: [ -e ${CLICKHOUSE_CONFDIR}/preprocessed ] || ln -s ${CLICKHOUSE_DATADIR_FROM_CONFIG}/preprocessed_configs ${CLICKHOUSE_CONFDIR}/preprocessed ||:
if [ ! -d ${CLICKHOUSE_LOGDIR} ]; then if [ ! -d ${CLICKHOUSE_LOGDIR} ]; then
mkdir -p ${CLICKHOUSE_LOGDIR} mkdir -p ${CLICKHOUSE_LOGDIR}
@ -108,7 +116,7 @@ Please fix this and reinstall this package." >&2
|| echo "Cannot set 'net_admin' or 'ipc_lock' capability for clickhouse binary. This is optional. Taskstats accounting will be disabled. To enable taskstats accounting you may add the required capability later manually." || echo "Cannot set 'net_admin' or 'ipc_lock' capability for clickhouse binary. This is optional. Taskstats accounting will be disabled. To enable taskstats accounting you may add the required capability later manually."
# Clean old dynamic compilation results # Clean old dynamic compilation results
if [ -d "${CLICKHOUSE_DATADIR}/build" ]; then if [ -d "${CLICKHOUSE_DATADIR_FROM_CONFIG}/build" ]; then
rm -f ${CLICKHOUSE_DATADIR}/build/*.cpp ${CLICKHOUSE_DATADIR}/build/*.so ||: rm -f ${CLICKHOUSE_DATADIR_FROM_CONFIG}/build/*.cpp ${CLICKHOUSE_DATADIR_FROM_CONFIG}/build/*.so ||:
fi fi
fi fi

View File

@ -49,7 +49,7 @@ if [ "${TEST_CONNECT}" ]; then
echo "<yandex><tcp_port>${CLICKHOUSE_PORT_TCP}</tcp_port><tcp_port_secure>${CLICKHOUSE_PORT_TCP_SECURE}</tcp_port_secure>${CLICKHOUSE_SSL_CONFIG}</yandex>" > /etc/clickhouse-client/config.xml echo "<yandex><tcp_port>${CLICKHOUSE_PORT_TCP}</tcp_port><tcp_port_secure>${CLICKHOUSE_PORT_TCP_SECURE}</tcp_port_secure>${CLICKHOUSE_SSL_CONFIG}</yandex>" > /etc/clickhouse-client/config.xml
openssl dhparam -out /etc/clickhouse-server/dhparam.pem 256 openssl dhparam -out /etc/clickhouse-server/dhparam.pem 256
openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt
chmod a+r /etc/clickhouse-server/* /etc/clickhouse-client/* ||: chmod -f a+r /etc/clickhouse-server/* /etc/clickhouse-client/* ||:
CLIENT_ADD+="--secure --port ${CLICKHOUSE_PORT_TCP_SECURE}" CLIENT_ADD+="--secure --port ${CLICKHOUSE_PORT_TCP_SECURE}"
else else
CLIENT_ADD+="--port ${CLICKHOUSE_PORT_TCP}" CLIENT_ADD+="--port ${CLICKHOUSE_PORT_TCP}"
@ -68,6 +68,7 @@ if [ "${TEST_CONNECT}" ]; then
service clickhouse-server start service clickhouse-server start
sleep ${TEST_SERVER_STARTUP_WAIT:=5} sleep ${TEST_SERVER_STARTUP_WAIT:=5}
service clickhouse-server status
# TODO: remove me or make only on error: # TODO: remove me or make only on error:
tail -n100 /var/log/clickhouse-server/*.log ||: tail -n100 /var/log/clickhouse-server/*.log ||:

View File

@ -1,7 +1,6 @@
# How to Build ClickHouse on Mac OS X # How to Build ClickHouse on Mac OS X
Build should work on Mac OS X 10.12. If you're using earlier version, you can try to build ClickHouse using Gentoo Prefix and clang sl in this instruction. Build should work on Mac OS X 10.12.
With appropriate changes, it should also work on any other Linux distribution.
## Install Homebrew ## Install Homebrew
@ -12,7 +11,7 @@ With appropriate changes, it should also work on any other Linux distribution.
## Install Required Compilers, Tools, and Libraries ## Install Required Compilers, Tools, and Libraries
```bash ```bash
brew install cmake ninja gcc icu4c mariadb-connector-c openssl libtool gettext readline brew install cmake ninja gcc icu4c openssl libtool gettext readline
``` ```
## Checkout ClickHouse Sources ## Checkout ClickHouse Sources

View File

@ -323,7 +323,7 @@ Outputs data as separate JSON objects for each row (newline delimited JSON).
Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON. Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON.
For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines. For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults, but it can be turned on by option `insert_sample_with_metadata=1`. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.
## Native {#native} ## Native {#native}

View File

@ -22,6 +22,7 @@
- Configuration management - Configuration management
- [puppet](https://puppet.com) - [puppet](https://puppet.com)
- [innogames/clickhouse](https://forge.puppet.com/innogames/clickhouse) - [innogames/clickhouse](https://forge.puppet.com/innogames/clickhouse)
- [mfedotov/clickhouse](https://forge.puppet.com/mfedotov/clickhouse)
- Monitoring - Monitoring
- [Graphite](https://graphiteapp.org) - [Graphite](https://graphiteapp.org)
- [graphouse](https://github.com/yandex/graphouse) - [graphouse](https://github.com/yandex/graphouse)

View File

@ -262,12 +262,12 @@ Useful for breaking away from a specific network interface.
## keep_alive_timeout ## keep_alive_timeout
The number of seconds that ClickHouse waits for incoming requests before closing the connection. Defaults to 10 seconds The number of seconds that ClickHouse waits for incoming requests before closing the connection. Defaults to 3 seconds.
**Example** **Example**
```xml ```xml
<keep_alive_timeout>10</keep_alive_timeout> <keep_alive_timeout>3</keep_alive_timeout>
``` ```
@ -326,8 +326,7 @@ Keys:
- user_syslog — Required setting if you want to write to the syslog. - user_syslog — Required setting if you want to write to the syslog.
- address — The host[:порт] of syslogd. If omitted, the local daemon is used. - address — The host[:порт] of syslogd. If omitted, the local daemon is used.
- hostname — Optional. The name of the host that logs are sent from. - hostname — Optional. The name of the host that logs are sent from.
- facility — [The syslog facility keyword](https://en.wikipedia.org/wiki/Syslog#Facility) - facility — [The syslog facility keyword](https://en.wikipedia.org/wiki/Syslog#Facility) in uppercase letters with the "LOG_" prefix: (``LOG_USER``, ``LOG_DAEMON``, ``LOG_LOCAL3``, and so on).
in uppercase letters with the "LOG_" prefix: (``LOG_USER``, ``LOG_DAEMON``, ``LOG_LOCAL3``, and so on).
Default value: ``LOG_USER`` if ``address`` is specified, ``LOG_DAEMON otherwise.`` Default value: ``LOG_USER`` if ``address`` is specified, ``LOG_DAEMON otherwise.``
- format Message format. Possible values: ``bsd`` and ``syslog.`` - format Message format. Possible values: ``bsd`` and ``syslog.``

View File

@ -144,7 +144,7 @@ At this time, it isn't checked during parsing, but only after parsing the query.
## max_ast_elements ## max_ast_elements
Maximum number of elements in a query syntactic tree. If exceeded, an exception is thrown. Maximum number of elements in a query syntactic tree. If exceeded, an exception is thrown.
In the same way as the previous setting, it is checked only after parsing the query. By default, 10,000. In the same way as the previous setting, it is checked only after parsing the query. By default, 50,000.
## max_rows_in_set ## max_rows_in_set

View File

@ -81,6 +81,9 @@ If an error occurred while reading rows but the error counter is still less than
If `input_format_allow_errors_ratio` is exceeded, ClickHouse throws an exception. If `input_format_allow_errors_ratio` is exceeded, ClickHouse throws an exception.
## insert_sample_with_metadata
For INSERT queries, specifies that the server need to send metadata about column defaults to the client. This will be used to calculate default expressions. Disabled by default.
## join_default_strictness ## join_default_strictness
@ -108,7 +111,7 @@ Blocks the size of `max_block_size` are not always loaded from the table. If it
Used for the same purpose as `max_block_size`, but it sets the recommended block size in bytes by adapting it to the number of rows in the block. Used for the same purpose as `max_block_size`, but it sets the recommended block size in bytes by adapting it to the number of rows in the block.
However, the block size cannot be more than `max_block_size` rows. However, the block size cannot be more than `max_block_size` rows.
Disabled by default (set to 0). It only works when reading from MergeTree engines. By default: 1,000,000. It only works when reading from MergeTree engines.
## merge_tree_uniform_read_distribution {#setting-merge_tree_uniform_read_distribution} ## merge_tree_uniform_read_distribution {#setting-merge_tree_uniform_read_distribution}
@ -189,7 +192,7 @@ Disables lagging replicas for distributed queries. See "[Replication](../../oper
Sets the time in seconds. If a replica lags more than the set value, this replica is not used. Sets the time in seconds. If a replica lags more than the set value, this replica is not used.
Default value: 0 (off). Default value: 300.
Used when performing `SELECT` from a distributed table that points to replicated tables. Used when performing `SELECT` from a distributed table that points to replicated tables.
@ -202,7 +205,7 @@ The maximum number of query processing threads
This parameter applies to threads that perform the same stages of the query processing pipeline in parallel. This parameter applies to threads that perform the same stages of the query processing pipeline in parallel.
For example, if reading from a table, evaluating expressions with functions, filtering with WHERE and pre-aggregating for GROUP BY can all be done in parallel using at least 'max_threads' number of threads, then 'max_threads' are used. For example, if reading from a table, evaluating expressions with functions, filtering with WHERE and pre-aggregating for GROUP BY can all be done in parallel using at least 'max_threads' number of threads, then 'max_threads' are used.
By default, 8. By default, 2.
If less than one SELECT query is normally run on a server at a time, set this parameter to a value slightly less than the actual number of processor cores. If less than one SELECT query is normally run on a server at a time, set this parameter to a value slightly less than the actual number of processor cores.
@ -243,11 +246,7 @@ The interval in microseconds for checking whether request execution has been can
By default, 100,000 (check for canceling and send progress ten times per second). By default, 100,000 (check for canceling and send progress ten times per second).
## connect_timeout ## connect_timeout, receive_timeout, send_timeout
## receive_timeout
## send_timeout
Timeouts in seconds on the socket used for communicating with the client. Timeouts in seconds on the socket used for communicating with the client.
@ -263,7 +262,7 @@ By default, 10.
The maximum number of simultaneous connections with remote servers for distributed processing of a single query to a single Distributed table. We recommend setting a value no less than the number of servers in the cluster. The maximum number of simultaneous connections with remote servers for distributed processing of a single query to a single Distributed table. We recommend setting a value no less than the number of servers in the cluster.
By default, 100. By default, 1024.
The following parameters are only used when creating Distributed tables (and when launching a server), so there is no reason to change them at runtime. The following parameters are only used when creating Distributed tables (and when launching a server), so there is no reason to change them at runtime.
@ -271,7 +270,7 @@ The following parameters are only used when creating Distributed tables (and whe
The maximum number of simultaneous connections with remote servers for distributed processing of all queries to a single Distributed table. We recommend setting a value no less than the number of servers in the cluster. The maximum number of simultaneous connections with remote servers for distributed processing of all queries to a single Distributed table. We recommend setting a value no less than the number of servers in the cluster.
By default, 128. By default, 1024.
## connect_timeout_with_failover_ms ## connect_timeout_with_failover_ms
@ -291,10 +290,9 @@ By default, 3.
Whether to count extreme values (the minimums and maximums in columns of a query result). Accepts 0 or 1. By default, 0 (disabled). Whether to count extreme values (the minimums and maximums in columns of a query result). Accepts 0 or 1. By default, 0 (disabled).
For more information, see the section "Extreme values". For more information, see the section "Extreme values".
## use_uncompressed_cache {#setting-use_uncompressed_cache} ## use_uncompressed_cache {#setting-use_uncompressed_cache}
Whether to use a cache of uncompressed blocks. Accepts 0 or 1. By default, 0 (disabled). Whether to use a cache of uncompressed blocks. Accepts 0 or 1. By default, 1 (enabled).
The uncompressed cache (only for tables in the MergeTree family) allows significantly reducing latency and increasing throughput when working with a large number of short queries. Enable this setting for users who send frequent short requests. Also pay attention to the [uncompressed_cache_size](../server_settings/settings.md#server-settings-uncompressed_cache_size) configuration parameter (only set in the config file) the size of uncompressed cache blocks. By default, it is 8 GiB. The uncompressed cache is filled in as needed; the least-used data is automatically deleted. The uncompressed cache (only for tables in the MergeTree family) allows significantly reducing latency and increasing throughput when working with a large number of short queries. Enable this setting for users who send frequent short requests. Also pay attention to the [uncompressed_cache_size](../server_settings/settings.md#server-settings-uncompressed_cache_size) configuration parameter (only set in the config file) the size of uncompressed cache blocks. By default, it is 8 GiB. The uncompressed cache is filled in as needed; the least-used data is automatically deleted.
For queries that read at least a somewhat large volume of data (one million rows or more), the uncompressed cache is disabled automatically in order to save space for truly small queries. So you can keep the 'use_uncompressed_cache' setting always set to 1. For queries that read at least a somewhat large volume of data (one million rows or more), the uncompressed cache is disabled automatically in order to save space for truly small queries. So you can keep the 'use_uncompressed_cache' setting always set to 1.
@ -355,16 +353,9 @@ See the section "WITH TOTALS modifier".
## totals_auto_threshold ## totals_auto_threshold
The threshold for ` totals_mode = 'auto'`. The threshold for `totals_mode = 'auto'`.
See the section "WITH TOTALS modifier". See the section "WITH TOTALS modifier".
## default_sample
Floating-point number from 0 to 1. By default, 1.
Allows you to set the default sampling ratio for all SELECT queries.
(For tables that do not support sampling, it throws an exception.)
If set to 1, sampling is not performed by default.
## max_parallel_replicas ## max_parallel_replicas
The maximum number of replicas for each shard when executing a query. The maximum number of replicas for each shard when executing a query.
@ -400,14 +391,12 @@ If the value is true, integers appear in quotes when using JSON\* Int64 and UInt
The character interpreted as a delimiter in the CSV data. By default, the delimiter is `,`. The character interpreted as a delimiter in the CSV data. By default, the delimiter is `,`.
## join_use_nulls ## join_use_nulls
Affects the behavior of [JOIN](../../query_language/select.md). Affects the behavior of [JOIN](../../query_language/select.md).
With `join_use_nulls=1,` `JOIN` behaves like in standard SQL, i.e. if empty cells appear when merging, the type of the corresponding field is converted to [Nullable](../../data_types/nullable.md#data_type-nullable), and empty cells are filled with [NULL](../../query_language/syntax.md). With `join_use_nulls=1,` `JOIN` behaves like in standard SQL, i.e. if empty cells appear when merging, the type of the corresponding field is converted to [Nullable](../../data_types/nullable.md#data_type-nullable), and empty cells are filled with [NULL](../../query_language/syntax.md).
## insert_quorum ## insert_quorum
Enables quorum writes. Enables quorum writes.

View File

@ -469,4 +469,64 @@ If you want to get a list of unique items in an array, you can use arrayReduce('
A special function. See the section ["ArrayJoin function"](array_join.md#functions_arrayjoin). A special function. See the section ["ArrayJoin function"](array_join.md#functions_arrayjoin).
## arrayDifference(arr)
Takes an array, returns an array with the difference between all pairs of neighboring elements. For example:
```sql
SELECT arrayDifference([1, 2, 3, 4])
```
```
┌─arrayDifference([1, 2, 3, 4])─┐
│ [0,1,1,1] │
└───────────────────────────────┘
```
## arrayDistinct(arr)
Takes an array, returns an array containing the different elements in all the arrays. For example:
```sql
SELECT arrayDifference([1, 2, 3, 4])
```
```
┌─arrayDifference([1, 2, 3, 4])─┐
│ [0,1,1,1] │
└───────────────────────────────┘
```
## arrayEnumerateDense(arr)
Returns an array of the same size as the source array, indicating where each element first appears in the source array. For example: arrayEnumerateDense([10,20,10,30]) = [1,2,1,4].
## arrayIntersect(arr)
Takes an array, returns the intersection of all array elements. For example:
```sql
SELECT
arrayIntersect([1, 2], [1, 3], [2, 3]) AS no_intersect,
arrayIntersect([1, 2], [1, 3], [1, 4]) AS intersect
```
```
┌─no_intersect─┬─intersect─┐
│ [] │ [1] │
└──────────────┴───────────┘
```
## arrayReduce(agg_func, arr1, ...)
Applies an aggregate function to array and returns its result.If aggregate function has multiple arguments, then this function can be applied to multiple arrays of the same size.
arrayReduce('agg_func', arr1, ...) - apply the aggregate function `agg_func` to arrays `arr1...`. If multiple arrays passed, then elements on corresponding positions are passed as multiple arguments to the aggregate function. For example: SELECT arrayReduce('max', [1,2,3]) = 3
## arrayReverse(arr)
Returns an array of the same size as the source array, containing the result of inverting all elements of the source array.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/array_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/array_functions/) <!--hide-->

View File

@ -16,5 +16,16 @@ The result type is an integer with bits equal to the maximum bits of its argumen
## bitShiftRight(a, b) ## bitShiftRight(a, b)
## bitRotateLeft(a, b)
## bitRotateRight(a, b)
## bitTest(a, b)
## bitTestAll(a, b)
## bitTestAny(a, b)
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/bit_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/bit_functions/) <!--hide-->

View File

@ -20,17 +20,29 @@ SELECT
Only time zones that differ from UTC by a whole number of hours are supported. Only time zones that differ from UTC by a whole number of hours are supported.
## toTimeZone
Convert time or date and time to the specified time zone.
## toYear ## toYear
Converts a date or date with time to a UInt16 number containing the year number (AD). Converts a date or date with time to a UInt16 number containing the year number (AD).
## toQuarter
Converts a date or date with time to a UInt8 number containing the quarter number.
## toMonth ## toMonth
Converts a date or date with time to a UInt8 number containing the month number (1-12). Converts a date or date with time to a UInt8 number containing the month number (1-12).
## toDayOfYear
Converts a date or date with time to a UInt8 number containing the number of the day of the year (1-366).
## toDayOfMonth ## toDayOfMonth
-Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31). Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
## toDayOfWeek ## toDayOfWeek
@ -50,11 +62,20 @@ Converts a date with time to a UInt8 number containing the number of the minute
Converts a date with time to a UInt8 number containing the number of the second in the minute (0-59). Converts a date with time to a UInt8 number containing the number of the second in the minute (0-59).
Leap seconds are not accounted for. Leap seconds are not accounted for.
## toUnixTimestamp
Converts a date with time to a unix timestamp.
## toMonday ## toMonday
Rounds down a date or date with time to the nearest Monday. Rounds down a date or date with time to the nearest Monday.
Returns the date. Returns the date.
## toStartOfISOYear
Rounds down a date or date with time to the first day of ISO year.
Returns the date.
## toStartOfMonth ## toStartOfMonth
Rounds down a date or date with time to the first day of the month. Rounds down a date or date with time to the first day of the month.
@ -104,6 +125,10 @@ Converts a date with time to a certain fixed date, while preserving the time.
Converts a date with time or date to the number of the year, starting from a certain fixed point in the past. Converts a date with time or date to the number of the year, starting from a certain fixed point in the past.
## toRelativeQuarterNum
Converts a date with time or date to the number of the quarter, starting from a certain fixed point in the past.
## toRelativeMonthNum ## toRelativeMonthNum
Converts a date with time or date to the number of the month, starting from a certain fixed point in the past. Converts a date with time or date to the number of the month, starting from a certain fixed point in the past.
@ -128,6 +153,14 @@ Converts a date with time or date to the number of the minute, starting from a c
Converts a date with time or date to the number of the second, starting from a certain fixed point in the past. Converts a date with time or date to the number of the second, starting from a certain fixed point in the past.
## toISOYear
Converts a date or date with time to a UInt16 number containing the ISO Year number.
## toISOWeek
Converts a date or date with time to a UInt8 number containing the ISO Week number.
## now ## now
Accepts zero arguments and returns the current time at one of the moments of request execution. Accepts zero arguments and returns the current time at one of the moments of request execution.
@ -148,6 +181,60 @@ The same as 'today() - 1'.
Rounds the time to the half hour. Rounds the time to the half hour.
This function is specific to Yandex.Metrica, since half an hour is the minimum amount of time for breaking a session into two sessions if a tracking tag shows a single user's consecutive pageviews that differ in time by strictly more than this amount. This means that tuples (the tag ID, user ID, and time slot) can be used to search for pageviews that are included in the corresponding session. This function is specific to Yandex.Metrica, since half an hour is the minimum amount of time for breaking a session into two sessions if a tracking tag shows a single user's consecutive pageviews that differ in time by strictly more than this amount. This means that tuples (the tag ID, user ID, and time slot) can be used to search for pageviews that are included in the corresponding session.
## toYYYYMM
Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 100 + MM).
## toYYYYMMDD
Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 10000 + MM * 100 + DD).
## toYYYYMMDDhhmmss
Converts a date or date with time to a UInt64 number containing the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).
## addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters
Function adds a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:
```sql
WITH
toDate('2018-01-01') AS date,
toDateTime('2018-01-01 00:00:00') AS date_time
SELECT
addYears(date, 1) AS add_years_with_date,
addYears(date_time, 1) AS add_years_with_date_time
```
```
┌─add_years_with_date─┬─add_years_with_date_time─┐
│ 2019-01-01 │ 2019-01-01 00:00:00 │
└─────────────────────┴──────────────────────────┘
```
## subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters
Function subtract a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:
```sql
WITH
toDate('2019-01-01') AS date,
toDateTime('2019-01-01 00:00:00') AS date_time
SELECT
subtractYears(date, 1) AS subtract_years_with_date,
subtractYears(date_time, 1) AS subtract_years_with_date_time
```
```
┌─subtract_years_with_date─┬─subtract_years_with_date_time─┐
│ 2018-01-01 │ 2018-01-01 00:00:00 │
└──────────────────────────┴───────────────────────────────┘
```
## dateDiff('unit', t1, t2, \[timezone\])
Return the difference between two times, t1 and t2 can be Date or DateTime, If timezone is specified, it applied to both arguments. If not, timezones from datatypes t1 and t2 are used. If that timezones are not the same, the result is unspecified.
## timeSlots(StartTime, Duration,\[, Size\]) ## timeSlots(StartTime, Duration,\[, Size\])
For a time interval starting at 'StartTime' and continuing for 'Duration' seconds, it returns an array of moments in time, consisting of points from this interval rounded down to the 'Size' in seconds. 'Size' is an optional parameter: a constant UInt32, set to 1800 by default. For a time interval starting at 'StartTime' and continuing for 'Duration' seconds, it returns an array of moments in time, consisting of points from this interval rounded down to the 'Size' in seconds. 'Size' is an optional parameter: a constant UInt32, set to 1800 by default.

View File

@ -21,7 +21,7 @@ If there is no `id` key in the dictionary, it returns the default value specifie
## dictGetTOrDefault {#ext_dict_functions_dictGetTOrDefault} ## dictGetTOrDefault {#ext_dict_functions_dictGetTOrDefault}
`dictGetT('dict_name', 'attr_name', id, default)` `dictGetTOrDefault('dict_name', 'attr_name', id, default)`
The same as the `dictGetT` functions, but the default value is taken from the function's last argument. The same as the `dictGetT` functions, but the default value is taken from the function's last argument.

View File

@ -64,5 +64,52 @@ A fast, decent-quality non-cryptographic hash function for a string obtained fro
`URLHash(s, N)` Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols `/`,`?` or `#` at the end, if present. `URLHash(s, N)` Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols `/`,`?` or `#` at the end, if present.
Levels are the same as in URLHierarchy. This function is specific to Yandex.Metrica. Levels are the same as in URLHierarchy. This function is specific to Yandex.Metrica.
## farmHash64
Calculates FarmHash64 from a string.
Accepts a String-type argument. Returns UInt64.
For more information, see the link: [FarmHash64](https://github.com/google/farmhash)
## javaHash
Calculates JavaHash from a string.
Accepts a String-type argument. Returns Int32.
For more information, see the link: [JavaHash](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/String.java#l1452)
## hiveHash
Calculates HiveHash from a string.
Accepts a String-type argument. Returns Int32.
Same as for [JavaHash](./hash_functions.md#javaHash), except that the return value never has a negative number.
## metroHash64
Calculates MetroHash from a string.
Accepts a String-type argument. Returns UInt64.
For more information, see the link: [MetroHash64](http://www.jandrewrogers.com/2015/05/27/metrohash/)
## jumpConsistentHash
Calculates JumpConsistentHash form a UInt64.
Accepts a UInt64-type argument. Returns Int32.
For more information, see the link: [JumpConsistentHash](https://arxiv.org/pdf/1406.2294.pdf)
## murmurHash2_32, murmurHash2_64
Calculates MurmurHash2 from a string.
Accepts a String-type argument. Returns UInt64 Or UInt32.
For more information, see the link: [MurmurHash2](https://github.com/aappleby/smhasher)
## murmurHash3_32, murmurHash3_64, murmurHash3_128
Calculates MurmurHash3 from a string.
Accepts a String-type argument. Returns UInt64 Or UInt32 Or FixedString(16).
For more information, see the link: [MurmurHash3](https://github.com/aappleby/smhasher)
## xxHash32, xxHash64
Calculates xxHash from a string.
ccepts a String-type argument. Returns UInt64 Or UInt32.
For more information, see the link: [xxHash](http://cyan4973.github.io/xxHash/)
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/hash_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/hash_functions/) <!--hide-->

View File

@ -87,6 +87,20 @@ SELECT arrayCumSum([1, 1, 1, 1]) AS res
└──────────────┘ └──────────────┘
``` ```
### arrayCumSumNonNegative(arr)
Same as arrayCumSum, returns an array of partial sums of elements in the source array (a running sum). Different arrayCumSum, when then returned value contains a value less than zero, the value is replace with zero and the subsequent calculation is performed with zero parameters. For example:
``` sql
SELECT arrayCumSumNonNegative([1, 1, -4, 1]) AS res
```
```
┌─res───────┐
│ [1,2,0,1] │
└───────────┘
```
### arraySort(\[func,\] arr1, ...) ### arraySort(\[func,\] arr1, ...)
Returns an array as result of sorting the elements of `arr1` in ascending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays) Returns an array as result of sorting the elements of `arr1` in ascending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays)

View File

@ -113,5 +113,38 @@ LIMIT 10
The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it returns a string of null bytes. The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it returns a string of null bytes.
HEX can be uppercase or lowercase. HEX can be uppercase or lowercase.
## IPv4ToIPv6(x)
Takes a UInt32 number. Interprets it as an IPv4 address in big endian. Returns a FixedString(16) value containing the IPv6 address in binary format. Examples:
``` sql
SELECT IPv6NumToString(IPv4ToIPv6(IPv4StringToNum('192.168.0.1'))) AS addr
```
```
┌─addr───────────────┐
│ ::ffff:192.168.0.1 │
└────────────────────┘
```
## cutIPv6(x, bitsToCutForIPv6, bitsToCutForIPv4)
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing the address of the specified number of bits removed in text format. For example:
```sql
WITH
IPv6StringToNum('2001:0DB8:AC10:FE01:FEED:BABE:CAFE:F00D') AS ipv6,
IPv4ToIPv6(IPv4StringToNum('192.168.0.1')) AS ipv4
SELECT
cutIPv6(ipv6, 2, 0),
cutIPv6(ipv4, 0, 2)
```
```
┌─cutIPv6(ipv6, 2, 0)─────────────────┬─cutIPv6(ipv4, 0, 2)─┐
│ 2001:db8:ac10:fe01:feed:babe:cafe:0 │ ::ffff:192.168.0.0 │
└─────────────────────────────────────┴─────────────────────┘
```
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/ip_address_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/ip_address_functions/) <!--hide-->

View File

@ -14,7 +14,7 @@ Returns a Float64 number that is close to the number π.
Accepts a numeric argument and returns a Float64 number close to the exponent of the argument. Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.
## log(x) ## log(x), ln(x)
Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument. Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument.
@ -94,8 +94,16 @@ The arc cosine.
The arc tangent. The arc tangent.
## pow(x, y) ## pow(x, y), power(x, y)
Takes two numeric arguments x and y. Returns a Float64 number close to x to the power of y. Takes two numeric arguments x and y. Returns a Float64 number close to x to the power of y.
## intExp2
Accepts a numeric argument and returns a UInt64 number close to 2 to the power of x.
## intExp10
Accepts a numeric argument and returns a UInt64 number close to 10 to the power of x.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/math_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/math_functions/) <!--hide-->

View File

@ -44,6 +44,10 @@ However, the argument is still evaluated. This can be used for benchmarks.
Sleeps 'seconds' seconds on each data block. You can specify an integer or a floating-point number. Sleeps 'seconds' seconds on each data block. You can specify an integer or a floating-point number.
## sleepEachRow(seconds)
Sleeps 'seconds' seconds on each row. You can specify an integer or a floating-point number.
## currentDatabase() ## currentDatabase()
Returns the name of the current database. Returns the name of the current database.
@ -242,6 +246,18 @@ Returns the server's uptime in seconds.
Returns the version of the server as a string. Returns the version of the server as a string.
## timezone()
Returns the timezone of the server.
## blockNumber
Returns the sequence number of the data block where the row is located.
## rowNumberInBlock
Returns the ordinal number of the row in the data block. Different data blocks are always recalculated.
## rowNumberInAllBlocks() ## rowNumberInAllBlocks()
Returns the ordinal number of the row in the data block. This function only considers the affected data blocks. Returns the ordinal number of the row in the data block. This function only considers the affected data blocks.
@ -283,6 +299,10 @@ FROM
└─────────┴─────────────────────┴───────┘ └─────────┴─────────────────────┴───────┘
``` ```
## runningDifferenceStartingWithFirstValue
Same as for [runningDifference](./other_functions.md#runningDifference), the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.
## MACNumToString(num) ## MACNumToString(num)
Accepts a UInt64 number. Interprets it as a MAC address in big endian. Returns a string containing the corresponding MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form). Accepts a UInt64 number. Interprets it as a MAC address in big endian. Returns a string containing the corresponding MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form).
@ -558,5 +578,34 @@ SELECT replicate(1, ['a', 'b', 'c'])
└───────────────────────────────┘ └───────────────────────────────┘
``` ```
## filesystemAvailable
Returns the remaining space information of the disk, in bytes. This information is evaluated using the configured by path.
## filesystemCapacity
Returns the capacity information of the disk, in bytes. This information is evaluated using the configured by path.
## finalizeAggregation
Takes state of aggregate function. Returns result of aggregation (finalized state).
## runningAccumulate
Takes the states of the aggregate function and returns a column with values, are the result of the accumulation of these states for a set of block lines, from the first to the current line.
For example, takes state of aggregate function (example runningAccumulate(uniqState(UserID))), and for each row of block, return result of aggregate function on merge of states of all previous rows and current row.
So, result of function depends on partition of data to blocks and on order of data in block.
## joinGet('join_storage_table_name', 'get_column', join_key)
Get data from a table of type Join using the specified join key.
## modelEvaluate(model_name, ...)
Evaluate external model.
Accepts a model name and model arguments. Returns Float64.
## throwIf(x)
Throw an exception if the argument is non zero.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/other_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/other_functions/) <!--hide-->

View File

@ -16,5 +16,8 @@ Uses a linear congruential generator.
Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers. Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.
Uses a linear congruential generator. Uses a linear congruential generator.
## randConstant
Returns a pseudo-random UInt32 number, The value is one for different blocks.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/random_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/random_functions/) <!--hide-->

View File

@ -12,7 +12,7 @@ Examples: `floor(123.45, 1) = 123.4, floor(123.45, -1) = 120.`
For integer arguments, it makes sense to round with a negative 'N' value (for non-negative 'N', the function doesn't do anything). For integer arguments, it makes sense to round with a negative 'N' value (for non-negative 'N', the function doesn't do anything).
If rounding causes overflow (for example, floor(-128, -1)), an implementation-specific result is returned. If rounding causes overflow (for example, floor(-128, -1)), an implementation-specific result is returned.
## ceil(x\[, N\]) ## ceil(x\[, N\]), ceiling(x\[, N\])
Returns the smallest round number that is greater than or equal to 'x'. In every other way, it is the same as the 'floor' function (see above). Returns the smallest round number that is greater than or equal to 'x'. In every other way, it is the same as the 'floor' function (see above).
@ -66,5 +66,8 @@ Accepts a number. If the number is less than one, it returns 0. Otherwise, it ro
Accepts a number. If the number is less than 18, it returns 0. Otherwise, it rounds the number down to a number from the set: 18, 25, 35, 45, 55. This function is specific to Yandex.Metrica and used for implementing the report on user age. Accepts a number. If the number is less than 18, it returns 0. Otherwise, it rounds the number down to a number from the set: 18, 25, 35, 45, 55. This function is specific to Yandex.Metrica and used for implementing the report on user age.
## roundDown(num, arr)
Accept a number, round it down to an element in the specified array. If the value is less than the lowest bound, the lowest bound is returned.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/rounding_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/rounding_functions/) <!--hide-->

View File

@ -24,11 +24,21 @@ The function also works for arrays.
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception). Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception).
The result type is UInt64. The result type is UInt64.
## lower ## char_length, CHAR_LENGTH
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception).
The result type is UInt64.
## character_length, CHARACTER_LENGTH
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception).
The result type is UInt64.
## lower, lcase
Converts ASCII Latin symbols in a string to lowercase. Converts ASCII Latin symbols in a string to lowercase.
## upper ## upper, ucase
Converts ASCII Latin symbols in a string to uppercase. Converts ASCII Latin symbols in a string to uppercase.
@ -58,7 +68,11 @@ Reverses a sequence of Unicode code points, assuming that the string contains a
Concatenates the strings listed in the arguments, without a separator. Concatenates the strings listed in the arguments, without a separator.
## substring(s, offset, length) ## concatAssumeInjective(s1, s2, ...)
Same as [concat](./string_functions.md#concat-s1-s2), the difference is that you need to ensure that concat(s1, s2, s3) -> s4 is injective, it will be used for optimization of GROUP BY
## substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)
Returns a substring starting with the byte from the 'offset' index that is 'length' bytes long. Character indexing starts from one (as in standard SQL). The 'offset' and 'length' arguments must be constants. Returns a substring starting with the byte from the 'offset' index that is 'length' bytes long. Character indexing starts from one (as in standard SQL). The 'offset' and 'length' arguments must be constants.
@ -83,4 +97,24 @@ Decode base64-encoded string 's' into original string. In case of failure raises
## tryBase64Decode(s) ## tryBase64Decode(s)
Similar to base64Decode, but in case of error an empty string would be returned. Similar to base64Decode, but in case of error an empty string would be returned.
## endsWith(s, suffix)
Returns whether to end with the specified suffix. Returns 1 if the string ends with the specified suffix, otherwise it returns 0.
## startsWith(s, prefix)
Returns whether to end with the specified prefix. Returns 1 if the string ends with the specified prefix, otherwise it returns 0.
## trimLeft(s)
Returns a string that removes the whitespace characters on left side.
## trimRight(s)
Returns a string that removes the whitespace characters on right side.
## trimBoth(s)
Returns a string that removes the whitespace characters on either side.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_functions/) <!--hide-->

View File

@ -5,7 +5,7 @@
Replaces the first occurrence, if it exists, of the 'pattern' substring in 'haystack' with the 'replacement' substring. Replaces the first occurrence, if it exists, of the 'pattern' substring in 'haystack' with the 'replacement' substring.
Hereafter, 'pattern' and 'replacement' must be constants. Hereafter, 'pattern' and 'replacement' must be constants.
## replaceAll(haystack, pattern, replacement) ## replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)
Replaces all occurrences of the 'pattern' substring in 'haystack' with the 'replacement' substring. Replaces all occurrences of the 'pattern' substring in 'haystack' with the 'replacement' substring.
@ -78,4 +78,12 @@ SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res
``` ```
## regexpQuoteMeta(s)
The function adds a backslash before some predefined characters in the string.
Predefined characters: '0', '\\', '|', '(', ')', '^', '$', '.', '[', ']', '?', '*', '+', '{', ':', '-'.
This implementation slightly differs from re2::RE2::QuoteMeta. It escapes zero byte as \0 instead of \x00 and it escapes only required characters.
For more information, see the link: [RE2](https://github.com/google/re2/blob/master/re2/re2.cc#L473)
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_replace_functions/) <!--hide--> [Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_replace_functions/) <!--hide-->

Some files were not shown because too many files have changed in this diff Show More