diff --git a/CHANGELOG.draft.md b/CHANGELOG.draft.md deleted file mode 100644 index 8b137891791..00000000000 --- a/CHANGELOG.draft.md +++ /dev/null @@ -1 +0,0 @@ - diff --git a/CHANGELOG.md b/CHANGELOG.md index b090e541101..72071111672 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,139 +1,113 @@ ## ClickHouse release 19.1.6, 2019-01-24 -### Backward Incompatible Change -* Removed `ALTER MODIFY PRIMARY KEY` command because it was superseded by the `ALTER MODIFY ORDER BY` command. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([ztlpn](https://github.com/ztlpn)) - ### New Features -* Add ability to choose per column codecs for storage log and tiny log. [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin)) -* Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([bgranvea](https://github.com/bgranvea)) -* Add custom compression codecs. [#3899](https://github.com/yandex/ClickHouse/pull/3899) ([alesapin](https://github.com/alesapin)) + +* Custom per column compression codecs for tables. [#3899](https://github.com/yandex/ClickHouse/pull/3899) [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin), [Winter Zhang](https://github.com/zhang2014), [Anatoly](https://github.com/Sindbag)) +* Added compression codec `Delta`. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin)) +* Allow to `ALTER` compression codecs. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin)) +* Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub` for SQL standard compatibility. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([Ivan Blinkov](https://github.com/blinkov)) +* Support for write in `HDFS` tables and `hdfs` table function. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin)) +* Added functions to search for multiple constant strings from big haystack: `multiPosition`, `multiSearch` ,`firstMatch` also with `-UTF8`, `-CaseInsensitive`, and `-CaseInsensitiveUTF8` variants. [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([Danila Kutenin](https://github.com/danlark1)) +* Pruning of unused shards if `SELECT` query filters by sharding key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Ivan](https://github.com/abyss7)) +* Allow `Kafka` engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([Ivan](https://github.com/abyss7)) +* Added support for `CatBoost` multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai)) +* Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([Boris Granveaud](https://github.com/bgranvea)) * Added hashing functions `xxHash64` and `xxHash32`. [#3905](https://github.com/yandex/ClickHouse/pull/3905) ([filimonov](https://github.com/filimonov)) -* Added multiple joins emulation (very experimental). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([4ertus2](https://github.com/4ertus2)) -* Added support for CatBoost multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai)) -* Added gccHash function which uses the same hash seed as [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundy-li](https://github.com/sundy-li)) -* Added compression codec delta. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin)) -* Added multi searcher to search from multiple constant strings from big haystack. Added functions (`multiPosition`, `multiSearch` ,`firstMatch`) * (` `, `UTF8`, `CaseInsensitive`, `CaseInsensitiveUTF8`) [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([danlark1](https://github.com/danlark1)) -* Added ability to alter compression codecs. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin)) -* Add ability to write data into HDFS and small refactoring. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin)) -* Removed some redundant objects from compiled expressions cache (optimization). [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin)) -* Added functions `JavaHash`, `HiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365)) -* Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub`. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([blinkov](https://github.com/blinkov)) -* Added function `remoteSecure`. Function works as `remote`, but uses secure connection. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller)) +* Added `gccMurmurHash` hashing function (GCC flavoured Murmur hash) which uses the same hash seed as [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundyli](https://github.com/sundy-li)) +* Added hashing functions `javaHash`, `hiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365)) +* Added table function `remoteSecure`. Function works as `remote`, but uses secure connection. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller)) + + +### Experimental features + +* Added multiple JOINs emulation (`allow_experimental_multiple_joins_emulation` setting). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([Artem Zuikov](https://github.com/4ertus2)) + + +### Bug Fixes + +* Make `compiled_expression_cache_size` setting limited by default to lower memory consumption. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin)) +* Fix a bug that led to hangups in threads that perform ALTERs of Replicated tables and in the thread that updates configuration from ZooKeeper. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3891](https://github.com/yandex/ClickHouse/issues/3891) [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([Alex Zatelepin](https://github.com/ztlpn)) +* Fixed a race condition when executing a distributed ALTER task. The race condition led to more than one replica trying to execute the task and all replicas except one failing with a ZooKeeper error. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([Alex Zatelepin](https://github.com/ztlpn)) +* Fix a bug when `from_zk` config elements weren't refreshed after a request to ZooKeeper timed out. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([Alex Zatelepin](https://github.com/ztlpn)) +* Fix bug with wrong prefix for IPv4 subnet masks. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin)) +* Fixed crash (`std::terminate`) in rare cases when a new thread cannot be created due to exhausted resources. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fix bug when in `remote` table function execution when wrong restrictions were used for in `getStructureOfRemoteTable`. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin)) +* Fix a leak of netlink sockets. They were placed in a pool where they were never deleted and new sockets were created at the start of a new thread when all current sockets were in use. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([Alex Zatelepin](https://github.com/ztlpn)) +* Fix bug with closing `/proc/self/fd` directory earlier than all fds were read from `/proc` after forking `odbc-bridge` subprocess. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin)) +* Fixed String to UInt monotonic conversion in case of usage String in primary key. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([Winter Zhang](https://github.com/zhang2014)) +* Fixed error in calculation of integer conversion function monotonicity. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed segfault in `arrayEnumerateUniq`, `arrayEnumerateDense` functions in case of some invalid arguments. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fix UB in StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([Amos Bird](https://github.com/amosbird)) +* Fixed segfault in functions `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed error: functions `round`, `floor`, `trunc`, `ceil` may return bogus result when executed on integer argument and large negative scale. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed a bug induced by 'kill query sync' which leads to a core dump. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([muVulDeePecker](https://github.com/fancyqlx)) +* Fix bug with long delay after empty replication queue. [#3928](https://github.com/yandex/ClickHouse/pull/3928) [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin)) +* Fixed excessive memory usage in case of inserting into table with `LowCardinality` primary key. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai)) +* Fixed `LowCardinality` serialization for `Native` format in case of empty arrays. [#3907](https://github.com/yandex/ClickHouse/issues/3907) [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai)) +* Fixed incorrect result while using distinct by single LowCardinality numeric column. [#3895](https://github.com/yandex/ClickHouse/issues/3895) [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai)) +* Fixed specialized aggregation with LowCardinality key (in case when `compile` setting is enabled). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai)) +* Fix user and password forwarding for replicated tables queries. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin)) ([小路](https://github.com/nicelulu)) +* Fixed very rare race condition that can happen when listing tables in Dictionary database while reloading dictionaries. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed incorrect result when HAVING was used with ROLLUP or CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([Sam Chou](https://github.com/reflection)) +* Fixed column aliases for query with `JOIN ON` syntax and distributed tables. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([Winter Zhang](https://github.com/zhang2014)) +* Fixed error in internal implementation of `quantileTDigest` (found by Artem Vakhrushev). This error never happens in ClickHouse and was relevant only for those who use ClickHouse codebase as a library directly. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov)) ### Improvements -* Support for IF NOT EXISTS in ALTER TABLE ADD COLUMN statements, and for IF EXISTS in DROP/MODIFY/CLEAR/COMMENT COLUMN. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([bgranvea](https://github.com/bgranvea)) + +* Support for `IF NOT EXISTS` in `ALTER TABLE ADD COLUMN` statements along with `IF EXISTS` in `DROP/MODIFY/CLEAR/COMMENT COLUMN`. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([Boris Granveaud](https://github.com/bgranvea)) * Function `parseDateTimeBestEffort`: support for formats `DD.MM.YYYY`, `DD.MM.YY`, `DD-MM-YYYY`, `DD-Mon-YYYY`, `DD/Month/YYYY` and similar. [#3922](https://github.com/yandex/ClickHouse/pull/3922) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Add a MergeTree setting `use_minimalistic_part_header_in_zookeeper`. If enabled, Replicated tables will store compact part metadata in a single part znode. This can dramatically reduce ZooKeeper snapshot size (especially if the tables have a lot of columns). Note that after enabling this setting you will not be able to downgrade to a version that doesn't support it. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([ztlpn](https://github.com/ztlpn)) -* Add an DFA-based implementation for functions `sequenceMatch` and `sequenceCount` in case pattern doesn't contain time. [#\](https://github.com/yandex/ClickHouse/pull/4004) ([ercolanelli-leo](https://github.com/ercolanelli-leo)) -* Changed the way CapnProtoInputStream creates actions in such a way that it now support structures that are jagged. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Miniwoffer](https://github.com/Miniwoffer)) -* Better way to collect columns, tables and joins from AST when checking required columns. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([4ertus2](https://github.com/4ertus2)) -* Zero left padding PODArray so that -1 element is always valid and zeroed. It's used for branchless Offset access. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([amosbird](https://github.com/amosbird)) -* Performance improvement for int serialization. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([amosbird](https://github.com/amosbird)) -* Moved debian/ specific entries to debian/.gitignore [#4106](https://github.com/yandex/ClickHouse/pull/4106) ([gerasiov](https://github.com/gerasiov)) -* Decreased the number of connections in case of large number of Distributed tables in a single server. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([zhang2014](https://github.com/zhang2014)) -* Supported totals row for `WITH TOTALS` query for ODBC driver (ODBCDriver2 format). [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([nightweb](https://github.com/nightweb)) -* Better constant expression folding. Possibility to skip unused shards if SELECT query filters by sharding_key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([abyss7](https://github.com/abyss7)) -* Do not log from odbc-bridge when there is no console. [#3857](https://github.com/yandex/ClickHouse/pull/3857) ([alesapin](https://github.com/alesapin)) -* Forbid using aggregate functions inside scalar subqueries. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([abyss7](https://github.com/abyss7)) -* Added ability to use Enums as integers inside if function. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([abyss7](https://github.com/abyss7)) -* Added `low_cardinality_allow_in_native_format` setting. If disabled, do not use `LowCadrinality` type in native format. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai)) -* Removed duplicate code. [#3915](https://github.com/yandex/ClickHouse/pull/3915) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev)) -* Minor improvements in StorageKafka. [#3919](https://github.com/yandex/ClickHouse/pull/3919) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Automatically disable logs in negative tests. [#3940](https://github.com/yandex/ClickHouse/pull/3940) ([4ertus2](https://github.com/4ertus2)) -* Refactored SyntaxAnalyzer. [#4014](https://github.com/yandex/ClickHouse/pull/4014) ([4ertus2](https://github.com/4ertus2)) -* Reverted jemalloc patch which lead to performance degradation. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Refactored QueryNormalizer. Unified column sources for ASTIdentifier and ASTQualifiedAsterisk (were different), removed column duplicates for ASTQualifiedAsterisk sources, cleared asterisks replacement. [#4031](https://github.com/yandex/ClickHouse/pull/4031) ([4ertus2](https://github.com/4ertus2)) -* Refactored code with ASTIdentifier. [#4056](https://github.com/yandex/ClickHouse/pull/4056) [#4077](https://github.com/yandex/ClickHouse/pull/4077) [#4087](https://github.com/yandex/ClickHouse/pull/4087) ([4ertus2](https://github.com/4ertus2)) -* Improve error message in `clickhouse-test` script when no ClickHouse binary was found. [#4130](https://github.com/yandex/ClickHouse/pull/4130) ([Miniwoffer](https://github.com/Miniwoffer)) -* Rewrited code to calculate integer conversion function monotonicity. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fixed typos in comments. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([kvinty](https://github.com/kvinty)) +* `CapnProtoInputStream` now support jagged structures. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Odin Hultgren Van Der Horst](https://github.com/Miniwoffer)) +* Usability improvement: added a check that server process is started from the data directory's owner. Do not allow to start server from root if the data belongs to non-root user. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev)) +* Better logic of checking required columns during analysis of queries with JOINs. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([Artem Zuikov](https://github.com/4ertus2)) +* Decreased the number of connections in case of large number of Distributed tables in a single server. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([Winter Zhang](https://github.com/zhang2014)) +* Supported totals row for `WITH TOTALS` query for ODBC driver. [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([Maksim Koritckiy](https://github.com/nightweb)) +* Allowed to use `Enum`s as integers inside if function. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([Ivan](https://github.com/abyss7)) +* Added `low_cardinality_allow_in_native_format` setting. If disabled, do not use `LowCadrinality` type in `Native` format. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai)) +* Removed some redundant objects from compiled expressions cache to lower memory usage. [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin)) +* Add check that `SET send_logs_level = 'value'` query accept appropriate value. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([Sabyanin Maxim](https://github.com/s-mx)) +* Fixed data type check in type conversion functions. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([Winter Zhang](https://github.com/zhang2014)) + +### Performance Improvements + +* Add a MergeTree setting `use_minimalistic_part_header_in_zookeeper`. If enabled, Replicated tables will store compact part metadata in a single part znode. This can dramatically reduce ZooKeeper snapshot size (especially if the tables have a lot of columns). Note that after enabling this setting you will not be able to downgrade to a version that doesn't support it. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([Alex Zatelepin](https://github.com/ztlpn)) +* Add an DFA-based implementation for functions `sequenceMatch` and `sequenceCount` in case pattern doesn't contain time. [#4004](https://github.com/yandex/ClickHouse/pull/4004) ([Léo Ercolanelli](https://github.com/ercolanelli-leo)) +* Performance improvement for integer numbers serialization. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([Amos Bird](https://github.com/amosbird)) +* Zero left padding PODArray so that -1 element is always valid and zeroed. It's used for branchless calculation of offsets. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([Amos Bird](https://github.com/amosbird)) +* Reverted `jemalloc` version which lead to performance degradation. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov)) + +### Backward Incompatible Changes + +* Removed undocumented feature `ALTER MODIFY PRIMARY KEY` because it was superseded by the `ALTER MODIFY ORDER BY` command. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([Alex Zatelepin](https://github.com/ztlpn)) +* Removed function `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Forbid using scalar subqueries with result of type `AggregateFunction`. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([Ivan](https://github.com/abyss7)) ### Build/Testing/Packaging Improvements -* Added minimal support for powerpc build. [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([danlark1](https://github.com/danlark1)) -* Fixed error when the server cannot start with the `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` message within Docker or systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Updated `mariadb-client` library. Fixed one of issues found by UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Some fixes for UBSan builds. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Move docker images to 18.10 and add compatibility file for glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin)) -* Add env variable if user don't want to chown directories in server docker image. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin)) + +* Added support for PowerPC (`ppc64le`) build. [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([Danila Kutenin](https://github.com/danlark1)) * Stateful functional tests are run on public available dataset. [#3969](https://github.com/yandex/ClickHouse/pull/3969) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Enabled most of the warnings from `-Weverything` in clang. Enabled `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Link to libLLVM rather than to individual LLVM libs when USE_STATIC_LIBRARIES is off. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([orivej](https://github.com/orivej)) -* Added a few more warnings that are available only in clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed error when the server cannot start with the `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` message within Docker or systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Updated `rdkafka` library to v1.0.0-RC5. Used cppkafka instead of raw C interface. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([Ivan](https://github.com/abyss7)) +* Updated `mariadb-client` library. Fixed one of issues found by UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Some fixes for UBSan builds. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3021](https://github.com/yandex/ClickHouse/pull/3021) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Added per-commit runs of tests with UBSan build. +* Added per-commit runs of PVS-Studio static analyzer. * Fixed bugs found by PVS-Studio. [#4013](https://github.com/yandex/ClickHouse/pull/4013) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed glibc compatibility issues. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Move Docker images to 18.10 and add compatibility file for glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin)) +* Add env variable if user don't want to chown directories in server Docker image. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin)) +* Enabled most of the warnings from `-Weverything` in clang. Enabled `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Added a few more warnings that are available only in clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Link to `libLLVM` rather than to individual LLVM libs when using shared linking. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([Orivej Desh](https://github.com/orivej)) * Added sanitizer variables for test images. [#4072](https://github.com/yandex/ClickHouse/pull/4072) ([alesapin](https://github.com/alesapin)) -* clickhouse-server debian package will recommend `libcap2-bin` package to use `setcap` tool for setting capabilities. This is optional. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* `clickhouse-server` debian package will recommend `libcap2-bin` package to use `setcap` tool for setting capabilities. This is optional. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Improved compilation time, fixed includes. [#3898](https://github.com/yandex/ClickHouse/pull/3898) ([proller](https://github.com/proller)) * Added performance tests for hash functions. [#3918](https://github.com/yandex/ClickHouse/pull/3918) ([filimonov](https://github.com/filimonov)) * Fixed cyclic library dependences. [#3958](https://github.com/yandex/ClickHouse/pull/3958) ([proller](https://github.com/proller)) * Improved compilation with low available memory. [#4030](https://github.com/yandex/ClickHouse/pull/4030) ([proller](https://github.com/proller)) +* Added test script to reproduce performance degradation in `jemalloc`. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed misspells in comments and string literals under `dbms`. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha)) +* Fixed typos in comments. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([Evgenii Pravda](https://github.com/kvinty)) -### Bug Fixes -* Fix bug when in remote table function execution when wrong restrictions were used for in `getStructureOfRemoteTable`. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin)) -* Fix a leak of netlink sockets. They were placed in a pool where they were never deleted and new sockets were created at the start of a new thread when all current sockets were in use. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([ztlpn](https://github.com/ztlpn)) -* Regression in master. Fix "Unknown identifier" error in case column names appear in lambdas. [#4115](https://github.com/yandex/ClickHouse/pull/4115) ([4ertus2](https://github.com/4ertus2)) -* Fix bug with closing /proc/self/fd earlier than all fds were read from /proc. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin)) -* Fixed misspells in **comments** and **string literals** under `dbms`. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha)) -* Fixed String to UInt monotonic conversion in case of usage String in primary key. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([zhang2014](https://github.com/zhang2014)) -* Add checking that 'SET send_logs_level = value' query accept appropriate value. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([s-mx](https://github.com/s-mx)) -* Fixed a race condition when executing a distributed ALTER task. The race condition led to more than one replica trying to execute the task and all replicas except one failing with a ZooKeeper error. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([ztlpn](https://github.com/ztlpn)) -* Fixed segfault in `arrayEnumerateUniq`, `arrayEnumerateDense` functions in case of some invalid arguments. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fix UB in StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([amosbird](https://github.com/amosbird)) -* Fixed segfault in functions `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fixed error: functions `round`, `floor`, `trunc`, `ceil` may return bogus result when executed on integer argument and large negative scale. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fixed a bug introduced by 'kill query sync' which leads to a core dump. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([fancyqlx](https://github.com/fancyqlx)) -* Fix bug with long delay after empty replication queue. [#3928](https://github.com/yandex/ClickHouse/pull/3928) ([alesapin](https://github.com/alesapin)) -* Don't do exponential backoff when there is nothing to do for task. [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin)) -* Fix a bug that led to hangups in threads that perform ALTERs of Replicated tables and in the thread that updates configuration from ZooKeeper. #2947 #3891 [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([ztlpn](https://github.com/ztlpn)) -* Fixed error in internal implementation of `quantileTDigest` (found by Artem Vakhrushev). This error never happens in ClickHouse and was relevant only for those who use ClickHouse codebase as a library directly. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fix bug with wrong prefix for ipv4 subnet masks. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin)) -* Fix a bug when `from_zk` config elements weren't refreshed after a request to ZooKeeper timed out. #2947 [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([ztlpn](https://github.com/ztlpn)) -* Fixed dictionary copying at LowCardinality::cloneEmpty() method which lead to excessive memory usage in case of inserting into table with LowCardinality primary key. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai)) -* Fixed crash (`std::terminate`) in rare cases when a new thread cannot be created due to exhausted resources. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fix user and password forwarding for replicated tables queries. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin)) -* Fixed very rare race condition that can happen when listing tables in Dictionary database while reloading dictionaries. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fixed LowCardinality serialization for Native format in case of empty arrays. #3907 [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai)) -* Fixed incorrect result while using distinct by single LowCardinality numeric column. #3895 [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai)) -* Make compiled_expression_cache_size setting limited by default. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin)) -* Fix ubsan bug in compression codecs. [#4069](https://github.com/yandex/ClickHouse/pull/4069) ([alesapin](https://github.com/alesapin)) -* Allow Kafka Engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([abyss7](https://github.com/abyss7)) -* Fixed glibc compatibility issues. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fixed issues found by PVS-Studio. [#4103](https://github.com/yandex/ClickHouse/pull/4103) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fix a way how to collect array join columns. [#4121](https://github.com/yandex/ClickHouse/pull/4121) ([4ertus2](https://github.com/4ertus2)) -* Fixed incorrect result when HAVING was used with ROLLUP or CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([reflection](https://github.com/reflection)) -* Fixed specialized aggregation with LowCardinality key (in case when `compile` setting is enabled). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai)) -* Fixed data type check in type conversion functions. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([zhang2014](https://github.com/zhang2014)) -* Fixed column aliases for query with `JOIN ON` syntax and distributed tables. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([zhang2014](https://github.com/zhang2014)) -* Fixed issues detected by UBSan. [#3021](https://github.com/yandex/ClickHouse/pull/3021) ([alexey-milovidov](https://github.com/alexey-milovidov)) - -### Doc fixes -* Translated table engines related part to Chinese. [#3844](https://github.com/yandex/ClickHouse/pull/3844) ([lamber-ken](https://github.com/lamber-ken)) -* Fixed `toStartOfFiveMinute` description. [#4096](https://github.com/yandex/ClickHouse/pull/4096) ([cheesedosa](https://github.com/cheesedosa)) -* Added description for client `--secure` argument. [#3961](https://github.com/yandex/ClickHouse/pull/3961) ([vicdashkov](https://github.com/vicdashkov)) -* Added descriptions for settings `merge_tree_uniform_read_distribution`, `merge_tree_min_rows_for_concurrent_read`, `merge_tree_min_rows_for_seek`, `merge_tree_coarse_index_granularity`, `merge_tree_max_rows_to_use_cache` [#4024](https://github.com/yandex/ClickHouse/pull/4024) ([BayoNet](https://github.com/BayoNet)) -* Minor doc fixes. [#4098](https://github.com/yandex/ClickHouse/pull/4098) ([blinkov](https://github.com/blinkov)) -* Updated example for zookeeper config setting. [#3883](https://github.com/yandex/ClickHouse/pull/3883) [#3894](https://github.com/yandex/ClickHouse/pull/3894) ([ogorbacheva](https://github.com/ogorbacheva)) -* Updated info about escaping in formats Vertical, Pretty and VerticalRaw. [#4118](https://github.com/yandex/ClickHouse/pull/4118) ([ogorbacheva](https://github.com/ogorbacheva)) -* Adding description of the functions for working with UUID. [#4059](https://github.com/yandex/ClickHouse/pull/4059) ([ogorbacheva](https://github.com/ogorbacheva)) -* Add the description of the CHECK TABLE query. [#3881](https://github.com/yandex/ClickHouse/pull/3881) [#4043](https://github.com/yandex/ClickHouse/pull/4043) ([ogorbacheva](https://github.com/ogorbacheva)) -* Add `zh/tests` doc translate to Chinese. [#4034](https://github.com/yandex/ClickHouse/pull/4034) ([sundy-li](https://github.com/sundy-li)) -* Added documentation about functions `multiPosition`, `firstMatch`, `multiSearch`. [#4123](https://github.com/yandex/ClickHouse/pull/4123) ([danlark1](https://github.com/danlark1)) -* Add puppet module to the list of the third party libraries. [#3862](https://github.com/yandex/ClickHouse/pull/3862) ([Felixoid](https://github.com/Felixoid)) -* Fixed typo in the English version of Creating a Table example [#3872](https://github.com/yandex/ClickHouse/pull/3872) ([areldar](https://github.com/areldar)) -* Mention about nagios plugin for ClickHouse [#3878](https://github.com/yandex/ClickHouse/pull/3878) ([lisuml](https://github.com/lisuml)) -* Update of query language syntax description. [#4065](https://github.com/yandex/ClickHouse/pull/4065) ([BayoNet](https://github.com/BayoNet)) -* Added documentation for per-column compression codecs. [#4073](https://github.com/yandex/ClickHouse/pull/4073) ([alex-krash](https://github.com/alex-krash)) -* Updated articles about CollapsingMergeTree, GraphiteMergeTree, Replicated*MergeTree, `CREATE TABLE` query [#4085](https://github.com/yandex/ClickHouse/pull/4085) ([BayoNet](https://github.com/BayoNet)) -* Other minor improvements. [#3897](https://github.com/yandex/ClickHouse/pull/3897) [#3923](https://github.com/yandex/ClickHouse/pull/3923) [#4066](https://github.com/yandex/ClickHouse/pull/4066) [#3860](https://github.com/yandex/ClickHouse/pull/3860) [#3906](https://github.com/yandex/ClickHouse/pull/3906) [#3936](https://github.com/yandex/ClickHouse/pull/3936) [#3975](https://github.com/yandex/ClickHouse/pull/3975) ([ogorbacheva](https://github.com/ogorbacheva)) ([ogorbacheva](https://github.com/ogorbacheva)) ([ogorbacheva](https://github.com/ogorbacheva)) ([blinkov](https://github.com/blinkov)) ([blinkov](https://github.com/blinkov)) ([sdk2](https://github.com/sdk2)) ([blinkov](https://github.com/blinkov)) - -### Other -* Updated librdkafka to v1.0.0-RC5. Used cppkafka instead of raw C interface. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([abyss7](https://github.com/abyss7)) -* Fixed `hidden` on page title [#4033](https://github.com/yandex/ClickHouse/pull/4033) ([xboston](https://github.com/xboston)) -* Updated year in copyright to 2019. [#4039](https://github.com/yandex/ClickHouse/pull/4039) ([xboston](https://github.com/xboston)) -* Added check that server process is started from the data directory's owner. Do not start server from root. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev)) -* Removed function `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fixed typo in ClusterCopier. [#3854](https://github.com/yandex/ClickHouse/pull/3854) ([dqminh](https://github.com/dqminh)) -* Minor grammar fixes. [#3855](https://github.com/yandex/ClickHouse/pull/3855) ([intgr](https://github.com/intgr)) -* Added test script to reproduce performance degradation in jemalloc. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov)) ## ClickHouse release 18.16.1, 2018-12-21 diff --git a/CMakeLists.txt b/CMakeLists.txt index 98c3643f055..25f92d0db7c 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -3,6 +3,21 @@ cmake_minimum_required (VERSION 3.3) set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules/") +option(ENABLE_IPO "Enable inter-procedural optimization (aka LTO)" OFF) # need cmake 3.9+ +if(ENABLE_IPO) + cmake_policy(SET CMP0069 NEW) + include(CheckIPOSupported) + check_ipo_supported(RESULT IPO_SUPPORTED OUTPUT IPO_NOT_SUPPORTED) + if(IPO_SUPPORTED) + message(STATUS "IPO/LTO is supported, enabling") + set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE) + else() + message(STATUS "IPO/LTO is not supported: <${IPO_NOT_SUPPORTED}>") + endif() +else() + message(STATUS "IPO/LTO not enabled.") +endif() + if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") # Require at least gcc 7 if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7 AND NOT CMAKE_VERSION VERSION_LESS 2.8.9) @@ -120,7 +135,9 @@ else() message(STATUS "Disabling compiler -pipe option (have only ${AVAILABLE_PHYSICAL_MEMORY} mb of memory)") endif() -include (cmake/test_cpu.cmake) +if(NOT DISABLE_CPU_OPTIMIZE) + include(cmake/test_cpu.cmake) +endif() if(NOT COMPILER_CLANG) # clang: error: the clang compiler does not support '-march=native' option(ARCH_NATIVE "Enable -march=native compiler flag" ${ARCH_ARM}) @@ -229,7 +246,10 @@ include (cmake/find_re2.cmake) include (cmake/find_rdkafka.cmake) include (cmake/find_capnp.cmake) include (cmake/find_llvm.cmake) -include (cmake/find_cpuid.cmake) +include (cmake/find_cpuid.cmake) # Freebsd, bundled +if (NOT USE_CPUID) + include (cmake/find_cpuinfo.cmake) # Debian +endif() include (cmake/find_libgsasl.cmake) include (cmake/find_libxml2.cmake) include (cmake/find_protobuf.cmake) diff --git a/cmake/Modules/Findmetrohash.cmake b/cmake/Modules/Findmetrohash.cmake index 9efc1ed2db8..c51665795bd 100644 --- a/cmake/Modules/Findmetrohash.cmake +++ b/cmake/Modules/Findmetrohash.cmake @@ -28,7 +28,7 @@ find_library(METROHASH_LIBRARIES find_path(METROHASH_INCLUDE_DIR NAMES metrohash.h - PATHS ${METROHASH_ROOT_DIR}/include ${METROHASH_INCLUDE_PATHS} + PATHS ${METROHASH_ROOT_DIR}/include PATH_SUFFIXES metrohash ${METROHASH_INCLUDE_PATHS} ) include(FindPackageHandleStandardArgs) diff --git a/cmake/find_cpuid.cmake b/cmake/find_cpuid.cmake index cda88433a1c..bc88626405d 100644 --- a/cmake/find_cpuid.cmake +++ b/cmake/find_cpuid.cmake @@ -2,11 +2,11 @@ if (NOT ARCH_ARM) option (USE_INTERNAL_CPUID_LIBRARY "Set to FALSE to use system cpuid library instead of bundled" ${NOT_UNBUNDLED}) endif () -#if (USE_INTERNAL_CPUID_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcpuid/include/cpuid/libcpuid.h") -# message (WARNING "submodule contrib/libcpuid is missing. to fix try run: \n git submodule update --init --recursive") -# set (USE_INTERNAL_CPUID_LIBRARY 0) -# set (MISSING_INTERNAL_CPUID_LIBRARY 1) -#endif () +if (USE_INTERNAL_CPUID_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcpuid/CMakeLists.txt") + message (WARNING "submodule contrib/libcpuid is missing. to fix try run: \n git submodule update --init --recursive") + set (USE_INTERNAL_CPUID_LIBRARY 0) + set (MISSING_INTERNAL_CPUID_LIBRARY 1) +endif () if (NOT USE_INTERNAL_CPUID_LIBRARY) find_library (CPUID_LIBRARY cpuid) @@ -20,10 +20,12 @@ if (CPUID_LIBRARY AND CPUID_INCLUDE_DIR) add_definitions(-DHAVE_STDINT_H) # TODO: make virtual target cpuid:cpuid with COMPILE_DEFINITIONS property endif () + set (USE_CPUID 1) elseif (NOT MISSING_INTERNAL_CPUID_LIBRARY) set (CPUID_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libcpuid/include) set (USE_INTERNAL_CPUID_LIBRARY 1) set (CPUID_LIBRARY cpuid) + set (USE_CPUID 1) endif () -message (STATUS "Using cpuid: ${CPUID_INCLUDE_DIR} : ${CPUID_LIBRARY}") +message (STATUS "Using cpuid=${USE_CPUID}: ${CPUID_INCLUDE_DIR} : ${CPUID_LIBRARY}") diff --git a/cmake/find_cpuinfo.cmake b/cmake/find_cpuinfo.cmake new file mode 100644 index 00000000000..c12050c4396 --- /dev/null +++ b/cmake/find_cpuinfo.cmake @@ -0,0 +1,17 @@ +option(USE_INTERNAL_CPUINFO_LIBRARY "Set to FALSE to use system cpuinfo library instead of bundled" ${NOT_UNBUNDLED}) + +if(NOT USE_INTERNAL_CPUINFO_LIBRARY) + find_library(CPUINFO_LIBRARY cpuinfo) + find_path(CPUINFO_INCLUDE_DIR NAMES cpuinfo.h PATHS ${CPUINFO_INCLUDE_PATHS}) +endif() + +if(CPUID_LIBRARY AND CPUID_INCLUDE_DIR) + set(USE_CPUINFO 1) +elseif(NOT MISSING_INTERNAL_CPUINFO_LIBRARY) + set(CPUINFO_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libcpuinfo/include) + set(USE_INTERNAL_CPUINFO_LIBRARY 1) + set(CPUINFO_LIBRARY cpuinfo) + set(USE_CPUINFO 1) +endif() + +message(STATUS "Using cpuinfo=${USE_CPUINFO}: ${CPUINFO_INCLUDE_DIR} : ${CPUINFO_LIBRARY}") diff --git a/cmake/find_gtest.cmake b/cmake/find_gtest.cmake index 19562e8d2ea..fa7b4f4828a 100644 --- a/cmake/find_gtest.cmake +++ b/cmake/find_gtest.cmake @@ -8,18 +8,22 @@ if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest/CMakeList set (MISSING_INTERNAL_GTEST_LIBRARY 1) endif () -if (NOT USE_INTERNAL_GTEST_LIBRARY) - find_package (GTest) -endif () -if (NOT GTEST_INCLUDE_DIRS AND NOT MISSING_INTERNAL_GTEST_LIBRARY) +if(NOT USE_INTERNAL_GTEST_LIBRARY) + # TODO: autodetect of GTEST_SRC_DIR by EXISTS /usr/src/googletest/CMakeLists.txt + if(NOT GTEST_SRC_DIR) + find_package(GTest) + endif() +endif() + +if (NOT GTEST_SRC_DIR AND NOT GTEST_INCLUDE_DIRS AND NOT MISSING_INTERNAL_GTEST_LIBRARY) set (USE_INTERNAL_GTEST_LIBRARY 1) set (GTEST_MAIN_LIBRARIES gtest_main) set (GTEST_INCLUDE_DIRS ${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest) endif () -if(GTEST_INCLUDE_DIRS AND GTEST_MAIN_LIBRARIES) +if((GTEST_INCLUDE_DIRS AND GTEST_MAIN_LIBRARIES) OR GTEST_SRC_DIR) set(USE_GTEST 1) endif() -message (STATUS "Using gtest=${USE_GTEST}: ${GTEST_INCLUDE_DIRS} : ${GTEST_MAIN_LIBRARIES}") +message (STATUS "Using gtest=${USE_GTEST}: ${GTEST_INCLUDE_DIRS} : ${GTEST_MAIN_LIBRARIES} : ${GTEST_SRC_DIR}") diff --git a/cmake/lib_name.cmake b/cmake/lib_name.cmake index 5c919b263e6..847efb15fc5 100644 --- a/cmake/lib_name.cmake +++ b/cmake/lib_name.cmake @@ -2,4 +2,5 @@ set(DIVIDE_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libdivide) set(COMMON_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/libs/libcommon/include ${ClickHouse_BINARY_DIR}/libs/libcommon/include) set(DBMS_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/dbms/src ${ClickHouse_BINARY_DIR}/dbms/src) set(DOUBLE_CONVERSION_CONTRIB_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/double-conversion) +set(METROHASH_CONTRIB_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libmetrohash/src) set(PCG_RANDOM_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libpcg-random/include) diff --git a/contrib/CMakeLists.txt b/contrib/CMakeLists.txt index 0c4b6c15287..fcc2cc75817 100644 --- a/contrib/CMakeLists.txt +++ b/contrib/CMakeLists.txt @@ -107,6 +107,11 @@ if (USE_INTERNAL_SSL_LIBRARY) if (NOT MAKE_STATIC_LIBRARIES) set (BUILD_SHARED 1) endif () + + # By default, ${CMAKE_INSTALL_PREFIX}/etc/ssl is selected - that is not what we need. + # We need to use system wide ssl directory. + set (OPENSSLDIR "/etc/ssl") + set (LIBRESSL_SKIP_INSTALL 1 CACHE INTERNAL "") add_subdirectory (ssl) target_include_directories(${OPENSSL_CRYPTO_LIBRARY} SYSTEM PUBLIC ${OPENSSL_INCLUDE_DIR}) @@ -166,13 +171,16 @@ if (USE_INTERNAL_POCO_LIBRARY) endif () endif () -if (USE_INTERNAL_GTEST_LIBRARY) +if(USE_INTERNAL_GTEST_LIBRARY) # Google Test from sources add_subdirectory(${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest ${CMAKE_CURRENT_BINARY_DIR}/googletest) # avoid problems with target_compile_definitions (gtest INTERFACE GTEST_HAS_POSIX_RE=0) target_include_directories (gtest SYSTEM INTERFACE ${ClickHouse_SOURCE_DIR}/contrib/googletest/include) -endif () +elseif(GTEST_SRC_DIR) + add_subdirectory(${GTEST_SRC_DIR}/googletest ${CMAKE_CURRENT_BINARY_DIR}/googletest) + target_compile_definitions(gtest INTERFACE GTEST_HAS_POSIX_RE=0) +endif() if (USE_INTERNAL_LLVM_LIBRARY) file(GENERATE OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/empty.cpp CONTENT " ") diff --git a/contrib/libmetrohash/CMakeLists.txt b/contrib/libmetrohash/CMakeLists.txt index 2bd5628d0f8..d71a5432715 100644 --- a/contrib/libmetrohash/CMakeLists.txt +++ b/contrib/libmetrohash/CMakeLists.txt @@ -1,5 +1,5 @@ if (HAVE_SSE42) # Not used. Pretty easy to port. - set (SOURCES_SSE42_ONLY src/metrohash128crc.cpp) + set (SOURCES_SSE42_ONLY src/metrohash128crc.cpp src/metrohash128crc.h) endif () add_library(metrohash diff --git a/contrib/libmetrohash/LICENSE b/contrib/libmetrohash/LICENSE index 0765a504e62..261eeb9e9f8 100644 --- a/contrib/libmetrohash/LICENSE +++ b/contrib/libmetrohash/LICENSE @@ -1,22 +1,201 @@ -The MIT License (MIT) + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ -Copyright (c) 2015 J. Andrew Rogers + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: + 1. Definitions. -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/contrib/libmetrohash/README.md b/contrib/libmetrohash/README.md index a8851cdb2d8..2ac16b1437c 100644 --- a/contrib/libmetrohash/README.md +++ b/contrib/libmetrohash/README.md @@ -5,12 +5,44 @@ MetroHash is a set of state-of-the-art hash functions for *non-cryptographic* us * Fastest general-purpose functions for bulk hashing. * Fastest general-purpose functions for small, variable length keys. * Robust statistical bias profile, similar to the MD5 cryptographic hash. +* Hashes can be constructed incrementally (**new**) * 64-bit, 128-bit, and 128-bit CRC variants currently available. * Optimized for modern x86-64 microarchitectures. * Elegant, compact, readable functions. You can read more about the design and history [here](http://www.jandrewrogers.com/2015/05/27/metrohash/). +## News + +### 23 October 2018 + +The project has been re-licensed under Apache License v2.0. The purpose of this license change is consistency with the imminent release of MetroHash v2.0, which is also licensed under the Apache license. + +### 27 July 2015 + +Two new 64-bit and 128-bit algorithms add the ability to construct hashes incrementally. In addition to supporting incremental construction, the algorithms are slightly superior to the prior versions. + +A big change is that these new algorithms are implemented as C++ classes that support both incremental and stateless hashing. These classes also have a static method for verifying the implementation against the test vectors built into the classes. Implementations are now fully contained by their respective headers e.g. "metrohash128.h". + +*Note: an incremental version of the 128-bit CRC version is on its way but is not included in this push.* + +**Usage Example For Stateless Hashing** + +`MetroHash128::Hash(key, key_length, hash_ptr, seed)` + +**Usage Example For Incremental Hashing** + +`MetroHash128 hasher;` +`hasher.Update(partial_key, partial_key_length);` +`...` +`hasher.Update(partial_key, partial_key_length);` +`hasher.Finalize(hash_ptr);` + +An `Initialize(seed)` method allows the hasher objects to be reused. + + +### 27 May 2015 + Six hash functions have been included in the initial release: * 64-bit hash functions, "metrohash64_1" and "metrohash64_2" diff --git a/contrib/libmetrohash/VERSION b/contrib/libmetrohash/VERSION index 211ea847416..43012d2e31c 100644 --- a/contrib/libmetrohash/VERSION +++ b/contrib/libmetrohash/VERSION @@ -1,7 +1,4 @@ -origin: git@github.com:jandrewrogers/MetroHash.git -commit d9dee18a54a8a6766e24c1950b814ac7ca9d1a89 -Merge: 761e8a4 3d06b24 +origin: https://github.com/jandrewrogers/MetroHash.git +commit 690a521d9beb2e1050cc8f273fdabc13b31bf8f6 tag: v1.1.3 Author: J. Andrew Rogers -Date: Sat Jun 6 16:12:06 2015 -0700 - - modified README +Date: Tue Oct 23 09:49:53 2018 -0700 diff --git a/contrib/libmetrohash/src/metrohash.h b/contrib/libmetrohash/src/metrohash.h index 0d9b76c99cf..ffab03216b7 100644 --- a/contrib/libmetrohash/src/metrohash.h +++ b/contrib/libmetrohash/src/metrohash.h @@ -1,73 +1,24 @@ // metrohash.h // -// The MIT License (MIT) +// Copyright 2015-2018 J. Andrew Rogers // -// Copyright (c) 2015 J. Andrew Rogers +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at // -// Permission is hereby granted, free of charge, to any person obtaining a copy -// of this software and associated documentation files (the "Software"), to deal -// in the Software without restriction, including without limitation the rights -// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -// copies of the Software, and to permit persons to whom the Software is -// furnished to do so, subject to the following conditions: -// -// The above copyright notice and this permission notice shall be included in all -// copies or substantial portions of the Software. -// -// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -// SOFTWARE. +// http://www.apache.org/licenses/LICENSE-2.0 // +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. #ifndef METROHASH_METROHASH_H #define METROHASH_METROHASH_H -#include -#include - -// MetroHash 64-bit hash functions -void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); -void metrohash64_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); - -// MetroHash 128-bit hash functions -void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); -void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); - -// MetroHash 128-bit hash functions using CRC instruction -void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); -void metrohash128crc_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); - - -/* rotate right idiom recognized by compiler*/ -inline static uint64_t rotate_right(uint64_t v, unsigned k) -{ - return (v >> k) | (v << (64 - k)); -} - -// unaligned reads, fast and safe on Nehalem and later microarchitectures -inline static uint64_t read_u64(const void * const ptr) -{ - return static_cast(*reinterpret_cast(ptr)); -} - -inline static uint64_t read_u32(const void * const ptr) -{ - return static_cast(*reinterpret_cast(ptr)); -} - -inline static uint64_t read_u16(const void * const ptr) -{ - return static_cast(*reinterpret_cast(ptr)); -} - -inline static uint64_t read_u8 (const void * const ptr) -{ - return static_cast(*reinterpret_cast(ptr)); -} - +#include "metrohash64.h" +#include "metrohash128.h" +#include "metrohash128crc.h" #endif // #ifndef METROHASH_METROHASH_H diff --git a/contrib/libmetrohash/src/metrohash128.cpp b/contrib/libmetrohash/src/metrohash128.cpp index 6370412046e..5c143db9cbe 100644 --- a/contrib/libmetrohash/src/metrohash128.cpp +++ b/contrib/libmetrohash/src/metrohash128.cpp @@ -1,29 +1,260 @@ // metrohash128.cpp // -// The MIT License (MIT) +// Copyright 2015-2018 J. Andrew Rogers // -// Copyright (c) 2015 J. Andrew Rogers +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at // -// Permission is hereby granted, free of charge, to any person obtaining a copy -// of this software and associated documentation files (the "Software"), to deal -// in the Software without restriction, including without limitation the rights -// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -// copies of the Software, and to permit persons to whom the Software is -// furnished to do so, subject to the following conditions: -// -// The above copyright notice and this permission notice shall be included in all -// copies or substantial portions of the Software. -// -// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -// SOFTWARE. +// http://www.apache.org/licenses/LICENSE-2.0 // +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include +#include "platform.h" +#include "metrohash128.h" + +const char * MetroHash128::test_string = "012345678901234567890123456789012345678901234567890123456789012"; + +const uint8_t MetroHash128::test_seed_0[16] = { + 0xC7, 0x7C, 0xE2, 0xBF, 0xA4, 0xED, 0x9F, 0x9B, + 0x05, 0x48, 0xB2, 0xAC, 0x50, 0x74, 0xA2, 0x97 + }; + +const uint8_t MetroHash128::test_seed_1[16] = { + 0x45, 0xA3, 0xCD, 0xB8, 0x38, 0x19, 0x9D, 0x7F, + 0xBD, 0xD6, 0x8D, 0x86, 0x7A, 0x14, 0xEC, 0xEF + }; + + + +MetroHash128::MetroHash128(const uint64_t seed) +{ + Initialize(seed); +} + + +void MetroHash128::Initialize(const uint64_t seed) +{ + // initialize internal hash registers + state.v[0] = (static_cast(seed) - k0) * k3; + state.v[1] = (static_cast(seed) + k1) * k2; + state.v[2] = (static_cast(seed) + k0) * k2; + state.v[3] = (static_cast(seed) - k1) * k3; + + // initialize total length of input + bytes = 0; +} + + +void MetroHash128::Update(const uint8_t * const buffer, const uint64_t length) +{ + const uint8_t * ptr = reinterpret_cast(buffer); + const uint8_t * const end = ptr + length; + + // input buffer may be partially filled + if (bytes % 32) + { + uint64_t fill = 32 - (bytes % 32); + if (fill > length) + fill = length; + + memcpy(input.b + (bytes % 32), ptr, static_cast(fill)); + ptr += fill; + bytes += fill; + + // input buffer is still partially filled + if ((bytes % 32) != 0) return; + + // process full input buffer + state.v[0] += read_u64(&input.b[ 0]) * k0; state.v[0] = rotate_right(state.v[0],29) + state.v[2]; + state.v[1] += read_u64(&input.b[ 8]) * k1; state.v[1] = rotate_right(state.v[1],29) + state.v[3]; + state.v[2] += read_u64(&input.b[16]) * k2; state.v[2] = rotate_right(state.v[2],29) + state.v[0]; + state.v[3] += read_u64(&input.b[24]) * k3; state.v[3] = rotate_right(state.v[3],29) + state.v[1]; + } + + // bulk update + bytes += (end - ptr); + while (ptr <= (end - 32)) + { + // process directly from the source, bypassing the input buffer + state.v[0] += read_u64(ptr) * k0; ptr += 8; state.v[0] = rotate_right(state.v[0],29) + state.v[2]; + state.v[1] += read_u64(ptr) * k1; ptr += 8; state.v[1] = rotate_right(state.v[1],29) + state.v[3]; + state.v[2] += read_u64(ptr) * k2; ptr += 8; state.v[2] = rotate_right(state.v[2],29) + state.v[0]; + state.v[3] += read_u64(ptr) * k3; ptr += 8; state.v[3] = rotate_right(state.v[3],29) + state.v[1]; + } + + // store remaining bytes in input buffer + if (ptr < end) + memcpy(input.b, ptr, end - ptr); +} + + +void MetroHash128::Finalize(uint8_t * const hash) +{ + // finalize bulk loop, if used + if (bytes >= 32) + { + state.v[2] ^= rotate_right(((state.v[0] + state.v[3]) * k0) + state.v[1], 21) * k1; + state.v[3] ^= rotate_right(((state.v[1] + state.v[2]) * k1) + state.v[0], 21) * k0; + state.v[0] ^= rotate_right(((state.v[0] + state.v[2]) * k0) + state.v[3], 21) * k1; + state.v[1] ^= rotate_right(((state.v[1] + state.v[3]) * k1) + state.v[2], 21) * k0; + } + + // process any bytes remaining in the input buffer + const uint8_t * ptr = reinterpret_cast(input.b); + const uint8_t * const end = ptr + (bytes % 32); + + if ((end - ptr) >= 16) + { + state.v[0] += read_u64(ptr) * k2; ptr += 8; state.v[0] = rotate_right(state.v[0],33) * k3; + state.v[1] += read_u64(ptr) * k2; ptr += 8; state.v[1] = rotate_right(state.v[1],33) * k3; + state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 45) * k1; + state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 45) * k0; + } + + if ((end - ptr) >= 8) + { + state.v[0] += read_u64(ptr) * k2; ptr += 8; state.v[0] = rotate_right(state.v[0],33) * k3; + state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 27) * k1; + } + + if ((end - ptr) >= 4) + { + state.v[1] += read_u32(ptr) * k2; ptr += 4; state.v[1] = rotate_right(state.v[1],33) * k3; + state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 46) * k0; + } + + if ((end - ptr) >= 2) + { + state.v[0] += read_u16(ptr) * k2; ptr += 2; state.v[0] = rotate_right(state.v[0],33) * k3; + state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 22) * k1; + } + + if ((end - ptr) >= 1) + { + state.v[1] += read_u8 (ptr) * k2; state.v[1] = rotate_right(state.v[1],33) * k3; + state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 58) * k0; + } + + state.v[0] += rotate_right((state.v[0] * k0) + state.v[1], 13); + state.v[1] += rotate_right((state.v[1] * k1) + state.v[0], 37); + state.v[0] += rotate_right((state.v[0] * k2) + state.v[1], 13); + state.v[1] += rotate_right((state.v[1] * k3) + state.v[0], 37); + + bytes = 0; + + // do any endian conversion here + + memcpy(hash, state.v, 16); +} + + +void MetroHash128::Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed) +{ + const uint8_t * ptr = reinterpret_cast(buffer); + const uint8_t * const end = ptr + length; + + uint64_t v[4]; + + v[0] = (static_cast(seed) - k0) * k3; + v[1] = (static_cast(seed) + k1) * k2; + + if (length >= 32) + { + v[2] = (static_cast(seed) + k0) * k2; + v[3] = (static_cast(seed) - k1) * k3; + + do + { + v[0] += read_u64(ptr) * k0; ptr += 8; v[0] = rotate_right(v[0],29) + v[2]; + v[1] += read_u64(ptr) * k1; ptr += 8; v[1] = rotate_right(v[1],29) + v[3]; + v[2] += read_u64(ptr) * k2; ptr += 8; v[2] = rotate_right(v[2],29) + v[0]; + v[3] += read_u64(ptr) * k3; ptr += 8; v[3] = rotate_right(v[3],29) + v[1]; + } + while (ptr <= (end - 32)); + + v[2] ^= rotate_right(((v[0] + v[3]) * k0) + v[1], 21) * k1; + v[3] ^= rotate_right(((v[1] + v[2]) * k1) + v[0], 21) * k0; + v[0] ^= rotate_right(((v[0] + v[2]) * k0) + v[3], 21) * k1; + v[1] ^= rotate_right(((v[1] + v[3]) * k1) + v[2], 21) * k0; + } + + if ((end - ptr) >= 16) + { + v[0] += read_u64(ptr) * k2; ptr += 8; v[0] = rotate_right(v[0],33) * k3; + v[1] += read_u64(ptr) * k2; ptr += 8; v[1] = rotate_right(v[1],33) * k3; + v[0] ^= rotate_right((v[0] * k2) + v[1], 45) * k1; + v[1] ^= rotate_right((v[1] * k3) + v[0], 45) * k0; + } + + if ((end - ptr) >= 8) + { + v[0] += read_u64(ptr) * k2; ptr += 8; v[0] = rotate_right(v[0],33) * k3; + v[0] ^= rotate_right((v[0] * k2) + v[1], 27) * k1; + } + + if ((end - ptr) >= 4) + { + v[1] += read_u32(ptr) * k2; ptr += 4; v[1] = rotate_right(v[1],33) * k3; + v[1] ^= rotate_right((v[1] * k3) + v[0], 46) * k0; + } + + if ((end - ptr) >= 2) + { + v[0] += read_u16(ptr) * k2; ptr += 2; v[0] = rotate_right(v[0],33) * k3; + v[0] ^= rotate_right((v[0] * k2) + v[1], 22) * k1; + } + + if ((end - ptr) >= 1) + { + v[1] += read_u8 (ptr) * k2; v[1] = rotate_right(v[1],33) * k3; + v[1] ^= rotate_right((v[1] * k3) + v[0], 58) * k0; + } + + v[0] += rotate_right((v[0] * k0) + v[1], 13); + v[1] += rotate_right((v[1] * k1) + v[0], 37); + v[0] += rotate_right((v[0] * k2) + v[1], 13); + v[1] += rotate_right((v[1] * k3) + v[0], 37); + + // do any endian conversion here + + memcpy(hash, v, 16); +} + + +bool MetroHash128::ImplementationVerified() +{ + uint8_t hash[16]; + const uint8_t * key = reinterpret_cast(MetroHash128::test_string); + + // verify one-shot implementation + MetroHash128::Hash(key, strlen(MetroHash128::test_string), hash, 0); + if (memcmp(hash, MetroHash128::test_seed_0, 16) != 0) return false; + + MetroHash128::Hash(key, strlen(MetroHash128::test_string), hash, 1); + if (memcmp(hash, MetroHash128::test_seed_1, 16) != 0) return false; + + // verify incremental implementation + MetroHash128 metro; + + metro.Initialize(0); + metro.Update(reinterpret_cast(MetroHash128::test_string), strlen(MetroHash128::test_string)); + metro.Finalize(hash); + if (memcmp(hash, MetroHash128::test_seed_0, 16) != 0) return false; + + metro.Initialize(1); + metro.Update(reinterpret_cast(MetroHash128::test_string), strlen(MetroHash128::test_string)); + metro.Finalize(hash); + if (memcmp(hash, MetroHash128::test_seed_1, 16) != 0) return false; + + return true; +} -#include "metrohash.h" void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out) { @@ -97,6 +328,8 @@ void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * v[0] += rotate_right((v[0] * k2) + v[1], 13); v[1] += rotate_right((v[1] * k3) + v[0], 37); + // do any endian conversion here + memcpy(out, v, 16); } @@ -173,6 +406,8 @@ void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * v[0] += rotate_right((v[0] * k2) + v[1], 33); v[1] += rotate_right((v[1] * k3) + v[0], 33); + // do any endian conversion here + memcpy(out, v, 16); } diff --git a/contrib/libmetrohash/src/metrohash128.h b/contrib/libmetrohash/src/metrohash128.h new file mode 100644 index 00000000000..639a4fa97e3 --- /dev/null +++ b/contrib/libmetrohash/src/metrohash128.h @@ -0,0 +1,72 @@ +// metrohash128.h +// +// Copyright 2015-2018 J. Andrew Rogers +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#ifndef METROHASH_METROHASH_128_H +#define METROHASH_METROHASH_128_H + +#include + +class MetroHash128 +{ +public: + static const uint32_t bits = 128; + + // Constructor initializes the same as Initialize() + MetroHash128(const uint64_t seed=0); + + // Initializes internal state for new hash with optional seed + void Initialize(const uint64_t seed=0); + + // Update the hash state with a string of bytes. If the length + // is sufficiently long, the implementation switches to a bulk + // hashing algorithm directly on the argument buffer for speed. + void Update(const uint8_t * buffer, const uint64_t length); + + // Constructs the final hash and writes it to the argument buffer. + // After a hash is finalized, this instance must be Initialized()-ed + // again or the behavior of Update() and Finalize() is undefined. + void Finalize(uint8_t * const hash); + + // A non-incremental function implementation. This can be significantly + // faster than the incremental implementation for some usage patterns. + static void Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed=0); + + // Does implementation correctly execute test vectors? + static bool ImplementationVerified(); + + // test vectors -- Hash(test_string, seed=0) => test_seed_0 + static const char * test_string; + static const uint8_t test_seed_0[16]; + static const uint8_t test_seed_1[16]; + +private: + static const uint64_t k0 = 0xC83A91E1; + static const uint64_t k1 = 0x8648DBDB; + static const uint64_t k2 = 0x7BDEC03B; + static const uint64_t k3 = 0x2F5870A5; + + struct { uint64_t v[4]; } state; + struct { uint8_t b[32]; } input; + uint64_t bytes; +}; + + +// Legacy 128-bit hash functions -- do not use +void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); +void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); + + +#endif // #ifndef METROHASH_METROHASH_128_H diff --git a/contrib/libmetrohash/src/metrohash128crc.cpp b/contrib/libmetrohash/src/metrohash128crc.cpp index c04cf5a6b23..775a9a944bf 100644 --- a/contrib/libmetrohash/src/metrohash128crc.cpp +++ b/contrib/libmetrohash/src/metrohash128crc.cpp @@ -1,31 +1,24 @@ // metrohash128crc.cpp // -// The MIT License (MIT) +// Copyright 2015-2018 J. Andrew Rogers // -// Copyright (c) 2015 J. Andrew Rogers +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at // -// Permission is hereby granted, free of charge, to any person obtaining a copy -// of this software and associated documentation files (the "Software"), to deal -// in the Software without restriction, including without limitation the rights -// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -// copies of the Software, and to permit persons to whom the Software is -// furnished to do so, subject to the following conditions: -// -// The above copyright notice and this permission notice shall be included in all -// copies or substantial portions of the Software. -// -// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -// SOFTWARE. +// http://www.apache.org/licenses/LICENSE-2.0 // +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. -#include "metrohash.h" #include +#include +#include "metrohash.h" +#include "platform.h" void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out) diff --git a/contrib/libmetrohash/src/metrohash128crc.h b/contrib/libmetrohash/src/metrohash128crc.h new file mode 100644 index 00000000000..f151fd4200d --- /dev/null +++ b/contrib/libmetrohash/src/metrohash128crc.h @@ -0,0 +1,27 @@ +// metrohash128crc.h +// +// Copyright 2015-2018 J. Andrew Rogers +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#ifndef METROHASH_METROHASH_128_CRC_H +#define METROHASH_METROHASH_128_CRC_H + +#include + +// Legacy 128-bit hash functions +void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); +void metrohash128crc_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); + + +#endif // #ifndef METROHASH_METROHASH_128_CRC_H diff --git a/contrib/libmetrohash/src/metrohash64.cpp b/contrib/libmetrohash/src/metrohash64.cpp index bc4b41eb8f2..7b5ec7f1a42 100644 --- a/contrib/libmetrohash/src/metrohash64.cpp +++ b/contrib/libmetrohash/src/metrohash64.cpp @@ -1,29 +1,257 @@ // metrohash64.cpp // -// The MIT License (MIT) +// Copyright 2015-2018 J. Andrew Rogers // -// Copyright (c) 2015 J. Andrew Rogers +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at // -// Permission is hereby granted, free of charge, to any person obtaining a copy -// of this software and associated documentation files (the "Software"), to deal -// in the Software without restriction, including without limitation the rights -// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -// copies of the Software, and to permit persons to whom the Software is -// furnished to do so, subject to the following conditions: -// -// The above copyright notice and this permission notice shall be included in all -// copies or substantial portions of the Software. -// -// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -// SOFTWARE. +// http://www.apache.org/licenses/LICENSE-2.0 // +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "platform.h" +#include "metrohash64.h" + +#include + +const char * MetroHash64::test_string = "012345678901234567890123456789012345678901234567890123456789012"; + +const uint8_t MetroHash64::test_seed_0[8] = { 0x6B, 0x75, 0x3D, 0xAE, 0x06, 0x70, 0x4B, 0xAD }; +const uint8_t MetroHash64::test_seed_1[8] = { 0x3B, 0x0D, 0x48, 0x1C, 0xF4, 0xB9, 0xB8, 0xDF }; + + + +MetroHash64::MetroHash64(const uint64_t seed) +{ + Initialize(seed); +} + + +void MetroHash64::Initialize(const uint64_t seed) +{ + vseed = (static_cast(seed) + k2) * k0; + + // initialize internal hash registers + state.v[0] = vseed; + state.v[1] = vseed; + state.v[2] = vseed; + state.v[3] = vseed; + + // initialize total length of input + bytes = 0; +} + + +void MetroHash64::Update(const uint8_t * const buffer, const uint64_t length) +{ + const uint8_t * ptr = reinterpret_cast(buffer); + const uint8_t * const end = ptr + length; + + // input buffer may be partially filled + if (bytes % 32) + { + uint64_t fill = 32 - (bytes % 32); + if (fill > length) + fill = length; + + memcpy(input.b + (bytes % 32), ptr, static_cast(fill)); + ptr += fill; + bytes += fill; + + // input buffer is still partially filled + if ((bytes % 32) != 0) return; + + // process full input buffer + state.v[0] += read_u64(&input.b[ 0]) * k0; state.v[0] = rotate_right(state.v[0],29) + state.v[2]; + state.v[1] += read_u64(&input.b[ 8]) * k1; state.v[1] = rotate_right(state.v[1],29) + state.v[3]; + state.v[2] += read_u64(&input.b[16]) * k2; state.v[2] = rotate_right(state.v[2],29) + state.v[0]; + state.v[3] += read_u64(&input.b[24]) * k3; state.v[3] = rotate_right(state.v[3],29) + state.v[1]; + } + + // bulk update + bytes += static_cast(end - ptr); + while (ptr <= (end - 32)) + { + // process directly from the source, bypassing the input buffer + state.v[0] += read_u64(ptr) * k0; ptr += 8; state.v[0] = rotate_right(state.v[0],29) + state.v[2]; + state.v[1] += read_u64(ptr) * k1; ptr += 8; state.v[1] = rotate_right(state.v[1],29) + state.v[3]; + state.v[2] += read_u64(ptr) * k2; ptr += 8; state.v[2] = rotate_right(state.v[2],29) + state.v[0]; + state.v[3] += read_u64(ptr) * k3; ptr += 8; state.v[3] = rotate_right(state.v[3],29) + state.v[1]; + } + + // store remaining bytes in input buffer + if (ptr < end) + memcpy(input.b, ptr, static_cast(end - ptr)); +} + + +void MetroHash64::Finalize(uint8_t * const hash) +{ + // finalize bulk loop, if used + if (bytes >= 32) + { + state.v[2] ^= rotate_right(((state.v[0] + state.v[3]) * k0) + state.v[1], 37) * k1; + state.v[3] ^= rotate_right(((state.v[1] + state.v[2]) * k1) + state.v[0], 37) * k0; + state.v[0] ^= rotate_right(((state.v[0] + state.v[2]) * k0) + state.v[3], 37) * k1; + state.v[1] ^= rotate_right(((state.v[1] + state.v[3]) * k1) + state.v[2], 37) * k0; + + state.v[0] = vseed + (state.v[0] ^ state.v[1]); + } + + // process any bytes remaining in the input buffer + const uint8_t * ptr = reinterpret_cast(input.b); + const uint8_t * const end = ptr + (bytes % 32); + + if ((end - ptr) >= 16) + { + state.v[1] = state.v[0] + (read_u64(ptr) * k2); ptr += 8; state.v[1] = rotate_right(state.v[1],29) * k3; + state.v[2] = state.v[0] + (read_u64(ptr) * k2); ptr += 8; state.v[2] = rotate_right(state.v[2],29) * k3; + state.v[1] ^= rotate_right(state.v[1] * k0, 21) + state.v[2]; + state.v[2] ^= rotate_right(state.v[2] * k3, 21) + state.v[1]; + state.v[0] += state.v[2]; + } + + if ((end - ptr) >= 8) + { + state.v[0] += read_u64(ptr) * k3; ptr += 8; + state.v[0] ^= rotate_right(state.v[0], 55) * k1; + } + + if ((end - ptr) >= 4) + { + state.v[0] += read_u32(ptr) * k3; ptr += 4; + state.v[0] ^= rotate_right(state.v[0], 26) * k1; + } + + if ((end - ptr) >= 2) + { + state.v[0] += read_u16(ptr) * k3; ptr += 2; + state.v[0] ^= rotate_right(state.v[0], 48) * k1; + } + + if ((end - ptr) >= 1) + { + state.v[0] += read_u8 (ptr) * k3; + state.v[0] ^= rotate_right(state.v[0], 37) * k1; + } + + state.v[0] ^= rotate_right(state.v[0], 28); + state.v[0] *= k0; + state.v[0] ^= rotate_right(state.v[0], 29); + + bytes = 0; + + // do any endian conversion here + + memcpy(hash, state.v, 8); +} + + +void MetroHash64::Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed) +{ + const uint8_t * ptr = reinterpret_cast(buffer); + const uint8_t * const end = ptr + length; + + uint64_t h = (static_cast(seed) + k2) * k0; + + if (length >= 32) + { + uint64_t v[4]; + v[0] = h; + v[1] = h; + v[2] = h; + v[3] = h; + + do + { + v[0] += read_u64(ptr) * k0; ptr += 8; v[0] = rotate_right(v[0],29) + v[2]; + v[1] += read_u64(ptr) * k1; ptr += 8; v[1] = rotate_right(v[1],29) + v[3]; + v[2] += read_u64(ptr) * k2; ptr += 8; v[2] = rotate_right(v[2],29) + v[0]; + v[3] += read_u64(ptr) * k3; ptr += 8; v[3] = rotate_right(v[3],29) + v[1]; + } + while (ptr <= (end - 32)); + + v[2] ^= rotate_right(((v[0] + v[3]) * k0) + v[1], 37) * k1; + v[3] ^= rotate_right(((v[1] + v[2]) * k1) + v[0], 37) * k0; + v[0] ^= rotate_right(((v[0] + v[2]) * k0) + v[3], 37) * k1; + v[1] ^= rotate_right(((v[1] + v[3]) * k1) + v[2], 37) * k0; + h += v[0] ^ v[1]; + } + + if ((end - ptr) >= 16) + { + uint64_t v0 = h + (read_u64(ptr) * k2); ptr += 8; v0 = rotate_right(v0,29) * k3; + uint64_t v1 = h + (read_u64(ptr) * k2); ptr += 8; v1 = rotate_right(v1,29) * k3; + v0 ^= rotate_right(v0 * k0, 21) + v1; + v1 ^= rotate_right(v1 * k3, 21) + v0; + h += v1; + } + + if ((end - ptr) >= 8) + { + h += read_u64(ptr) * k3; ptr += 8; + h ^= rotate_right(h, 55) * k1; + } + + if ((end - ptr) >= 4) + { + h += read_u32(ptr) * k3; ptr += 4; + h ^= rotate_right(h, 26) * k1; + } + + if ((end - ptr) >= 2) + { + h += read_u16(ptr) * k3; ptr += 2; + h ^= rotate_right(h, 48) * k1; + } + + if ((end - ptr) >= 1) + { + h += read_u8 (ptr) * k3; + h ^= rotate_right(h, 37) * k1; + } + + h ^= rotate_right(h, 28); + h *= k0; + h ^= rotate_right(h, 29); + + memcpy(hash, &h, 8); +} + + +bool MetroHash64::ImplementationVerified() +{ + uint8_t hash[8]; + const uint8_t * key = reinterpret_cast(MetroHash64::test_string); + + // verify one-shot implementation + MetroHash64::Hash(key, strlen(MetroHash64::test_string), hash, 0); + if (memcmp(hash, MetroHash64::test_seed_0, 8) != 0) return false; + + MetroHash64::Hash(key, strlen(MetroHash64::test_string), hash, 1); + if (memcmp(hash, MetroHash64::test_seed_1, 8) != 0) return false; + + // verify incremental implementation + MetroHash64 metro; + + metro.Initialize(0); + metro.Update(reinterpret_cast(MetroHash64::test_string), strlen(MetroHash64::test_string)); + metro.Finalize(hash); + if (memcmp(hash, MetroHash64::test_seed_0, 8) != 0) return false; + + metro.Initialize(1); + metro.Update(reinterpret_cast(MetroHash64::test_string), strlen(MetroHash64::test_string)); + metro.Finalize(hash); + if (memcmp(hash, MetroHash64::test_seed_1, 8) != 0) return false; + + return true; +} -#include "metrohash.h" void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out) { diff --git a/contrib/libmetrohash/src/metrohash64.h b/contrib/libmetrohash/src/metrohash64.h new file mode 100644 index 00000000000..d58898b117d --- /dev/null +++ b/contrib/libmetrohash/src/metrohash64.h @@ -0,0 +1,73 @@ +// metrohash64.h +// +// Copyright 2015-2018 J. Andrew Rogers +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#ifndef METROHASH_METROHASH_64_H +#define METROHASH_METROHASH_64_H + +#include + +class MetroHash64 +{ +public: + static const uint32_t bits = 64; + + // Constructor initializes the same as Initialize() + MetroHash64(const uint64_t seed=0); + + // Initializes internal state for new hash with optional seed + void Initialize(const uint64_t seed=0); + + // Update the hash state with a string of bytes. If the length + // is sufficiently long, the implementation switches to a bulk + // hashing algorithm directly on the argument buffer for speed. + void Update(const uint8_t * buffer, const uint64_t length); + + // Constructs the final hash and writes it to the argument buffer. + // After a hash is finalized, this instance must be Initialized()-ed + // again or the behavior of Update() and Finalize() is undefined. + void Finalize(uint8_t * const hash); + + // A non-incremental function implementation. This can be significantly + // faster than the incremental implementation for some usage patterns. + static void Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed=0); + + // Does implementation correctly execute test vectors? + static bool ImplementationVerified(); + + // test vectors -- Hash(test_string, seed=0) => test_seed_0 + static const char * test_string; + static const uint8_t test_seed_0[8]; + static const uint8_t test_seed_1[8]; + +private: + static const uint64_t k0 = 0xD6D018F5; + static const uint64_t k1 = 0xA2AA033B; + static const uint64_t k2 = 0x62992FC1; + static const uint64_t k3 = 0x30BC5B29; + + struct { uint64_t v[4]; } state; + struct { uint8_t b[32]; } input; + uint64_t bytes; + uint64_t vseed; +}; + + +// Legacy 64-bit hash functions -- do not use +void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); +void metrohash64_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out); + + +#endif // #ifndef METROHASH_METROHASH_64_H diff --git a/contrib/libmetrohash/src/platform.h b/contrib/libmetrohash/src/platform.h new file mode 100644 index 00000000000..31291b94b33 --- /dev/null +++ b/contrib/libmetrohash/src/platform.h @@ -0,0 +1,50 @@ +// platform.h +// +// Copyright 2015-2018 J. Andrew Rogers +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#ifndef METROHASH_PLATFORM_H +#define METROHASH_PLATFORM_H + +#include + +// rotate right idiom recognized by most compilers +inline static uint64_t rotate_right(uint64_t v, unsigned k) +{ + return (v >> k) | (v << (64 - k)); +} + +// unaligned reads, fast and safe on Nehalem and later microarchitectures +inline static uint64_t read_u64(const void * const ptr) +{ + return static_cast(*reinterpret_cast(ptr)); +} + +inline static uint64_t read_u32(const void * const ptr) +{ + return static_cast(*reinterpret_cast(ptr)); +} + +inline static uint64_t read_u16(const void * const ptr) +{ + return static_cast(*reinterpret_cast(ptr)); +} + +inline static uint64_t read_u8 (const void * const ptr) +{ + return static_cast(*reinterpret_cast(ptr)); +} + + +#endif // #ifndef METROHASH_PLATFORM_H diff --git a/contrib/libmetrohash/src/testvector.h b/contrib/libmetrohash/src/testvector.h index 8c7967453e9..e4006182e4f 100644 --- a/contrib/libmetrohash/src/testvector.h +++ b/contrib/libmetrohash/src/testvector.h @@ -1,27 +1,18 @@ // testvector.h // -// The MIT License (MIT) +// Copyright 2015-2018 J. Andrew Rogers // -// Copyright (c) 2015 J. Andrew Rogers +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at // -// Permission is hereby granted, free of charge, to any person obtaining a copy -// of this software and associated documentation files (the "Software"), to deal -// in the Software without restriction, including without limitation the rights -// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -// copies of the Software, and to permit persons to whom the Software is -// furnished to do so, subject to the following conditions: -// -// The above copyright notice and this permission notice shall be included in all -// copies or substantial portions of the Software. -// -// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -// SOFTWARE. +// http://www.apache.org/licenses/LICENSE-2.0 // +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. #ifndef METROHASH_TESTVECTOR_H #define METROHASH_TESTVECTOR_H @@ -46,6 +37,8 @@ struct TestVectorData static const char * test_key_63 = "012345678901234567890123456789012345678901234567890123456789012"; +// The hash assumes a little-endian architecture. Treating the hash results +// as an array of uint64_t should enable conversion for big-endian implementations. const TestVectorData TestVector [] = { // seed = 0 diff --git a/dbms/CMakeLists.txt b/dbms/CMakeLists.txt index 93d9a0a1a11..3eb84d8eefa 100644 --- a/dbms/CMakeLists.txt +++ b/dbms/CMakeLists.txt @@ -200,11 +200,18 @@ target_link_libraries (clickhouse_common_io ${Boost_SYSTEM_LIBRARY} PRIVATE apple_rt + PUBLIC + Threads::Threads + PRIVATE ${CMAKE_DL_LIBS} ) -if (NOT ARCH_ARM AND CPUID_LIBRARY) - target_link_libraries (clickhouse_common_io PRIVATE ${CPUID_LIBRARY}) +if(CPUID_LIBRARY) + target_link_libraries(clickhouse_common_io PRIVATE ${CPUID_LIBRARY}) +endif() + +if(CPUINFO_LIBRARY) + target_link_libraries(clickhouse_common_io PRIVATE ${CPUINFO_LIBRARY}) endif() target_link_libraries (dbms @@ -225,6 +232,7 @@ target_link_libraries (dbms ${Boost_PROGRAM_OPTIONS_LIBRARY} PUBLIC ${Boost_SYSTEM_LIBRARY} + Threads::Threads ) if (NOT USE_INTERNAL_RE2_LIBRARY) diff --git a/dbms/programs/benchmark/CMakeLists.txt b/dbms/programs/benchmark/CMakeLists.txt index af11c600b2d..9814fac9875 100644 --- a/dbms/programs/benchmark/CMakeLists.txt +++ b/dbms/programs/benchmark/CMakeLists.txt @@ -5,4 +5,5 @@ target_include_directories (clickhouse-benchmark-lib SYSTEM PRIVATE ${PCG_RANDOM if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-benchmark clickhouse-benchmark.cpp) target_link_libraries (clickhouse-benchmark PRIVATE clickhouse-benchmark-lib clickhouse_aggregate_functions) + install (TARGETS clickhouse-benchmark ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/programs/client/CMakeLists.txt b/dbms/programs/client/CMakeLists.txt index c5c5cdc664f..462720dea0e 100644 --- a/dbms/programs/client/CMakeLists.txt +++ b/dbms/programs/client/CMakeLists.txt @@ -7,6 +7,7 @@ endif () if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-client clickhouse-client.cpp) target_link_libraries (clickhouse-client PRIVATE clickhouse-client-lib) + install (TARGETS clickhouse-client ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () install (FILES clickhouse-client.xml DESTINATION ${CLICKHOUSE_ETC_DIR}/clickhouse-client COMPONENT clickhouse-client RENAME config.xml) diff --git a/dbms/programs/compressor/CMakeLists.txt b/dbms/programs/compressor/CMakeLists.txt index bf3accfb8af..a76986173a5 100644 --- a/dbms/programs/compressor/CMakeLists.txt +++ b/dbms/programs/compressor/CMakeLists.txt @@ -5,4 +5,5 @@ if (CLICKHOUSE_SPLIT_BINARY) # Also in utils add_executable (clickhouse-compressor clickhouse-compressor.cpp) target_link_libraries (clickhouse-compressor PRIVATE clickhouse-compressor-lib) + install (TARGETS clickhouse-compressor ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/programs/copier/CMakeLists.txt b/dbms/programs/copier/CMakeLists.txt index ed3e55208aa..158080ffce6 100644 --- a/dbms/programs/copier/CMakeLists.txt +++ b/dbms/programs/copier/CMakeLists.txt @@ -4,4 +4,5 @@ target_link_libraries (clickhouse-copier-lib PRIVATE clickhouse-server-lib click if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-copier clickhouse-copier.cpp) target_link_libraries (clickhouse-copier clickhouse-copier-lib) + install (TARGETS clickhouse-copier ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/programs/extract-from-config/CMakeLists.txt b/dbms/programs/extract-from-config/CMakeLists.txt index 62253649368..9d2ddcd7c2a 100644 --- a/dbms/programs/extract-from-config/CMakeLists.txt +++ b/dbms/programs/extract-from-config/CMakeLists.txt @@ -4,4 +4,5 @@ target_link_libraries (clickhouse-extract-from-config-lib PRIVATE clickhouse_com if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-extract-from-config clickhouse-extract-from-config.cpp) target_link_libraries (clickhouse-extract-from-config PRIVATE clickhouse-extract-from-config-lib) + install (TARGETS clickhouse-extract-from-config ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/programs/format/CMakeLists.txt b/dbms/programs/format/CMakeLists.txt index 53d09e82621..67033730b07 100644 --- a/dbms/programs/format/CMakeLists.txt +++ b/dbms/programs/format/CMakeLists.txt @@ -3,4 +3,5 @@ target_link_libraries (clickhouse-format-lib PRIVATE dbms clickhouse_common_io c if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-format clickhouse-format.cpp) target_link_libraries (clickhouse-format PRIVATE clickhouse-format-lib) + install (TARGETS clickhouse-format ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/programs/local/CMakeLists.txt b/dbms/programs/local/CMakeLists.txt index 07729d68563..5df54fd4e7a 100644 --- a/dbms/programs/local/CMakeLists.txt +++ b/dbms/programs/local/CMakeLists.txt @@ -4,4 +4,5 @@ target_link_libraries (clickhouse-local-lib PRIVATE clickhouse_common_io clickho if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-local clickhouse-local.cpp) target_link_libraries (clickhouse-local PRIVATE clickhouse-local-lib) + install (TARGETS clickhouse-local ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/programs/obfuscator/CMakeLists.txt b/dbms/programs/obfuscator/CMakeLists.txt index 73c3f01e9cb..77096c2a169 100644 --- a/dbms/programs/obfuscator/CMakeLists.txt +++ b/dbms/programs/obfuscator/CMakeLists.txt @@ -5,4 +5,5 @@ if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-obfuscator clickhouse-obfuscator.cpp) set_target_properties(clickhouse-obfuscator PROPERTIES RUNTIME_OUTPUT_DIRECTORY ..) target_link_libraries (clickhouse-obfuscator PRIVATE clickhouse-obfuscator-lib) + install (TARGETS clickhouse-obfuscator ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/programs/odbc-bridge/CMakeLists.txt b/dbms/programs/odbc-bridge/CMakeLists.txt index d57a41ebfc6..dd712a93c5a 100644 --- a/dbms/programs/odbc-bridge/CMakeLists.txt +++ b/dbms/programs/odbc-bridge/CMakeLists.txt @@ -36,4 +36,5 @@ endif () if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-odbc-bridge odbc-bridge.cpp) target_link_libraries (clickhouse-odbc-bridge PRIVATE clickhouse-odbc-bridge-lib) + install (TARGETS clickhouse-odbc-bridge ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/programs/performance-test/CMakeLists.txt b/dbms/programs/performance-test/CMakeLists.txt index f1a08172009..014b62ade2d 100644 --- a/dbms/programs/performance-test/CMakeLists.txt +++ b/dbms/programs/performance-test/CMakeLists.txt @@ -5,4 +5,5 @@ target_include_directories (clickhouse-performance-test-lib SYSTEM PRIVATE ${PCG if (CLICKHOUSE_SPLIT_BINARY) add_executable (clickhouse-performance-test clickhouse-performance-test.cpp) target_link_libraries (clickhouse-performance-test PRIVATE clickhouse-performance-test-lib) + install (TARGETS clickhouse-performance-test ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) endif () diff --git a/dbms/src/Common/Config/CMakeLists.txt b/dbms/src/Common/Config/CMakeLists.txt index a1bb2790fdf..298f1ed241b 100644 --- a/dbms/src/Common/Config/CMakeLists.txt +++ b/dbms/src/Common/Config/CMakeLists.txt @@ -4,5 +4,5 @@ add_headers_and_sources(clickhouse_common_config .) add_library(clickhouse_common_config ${LINK_MODE} ${clickhouse_common_config_headers} ${clickhouse_common_config_sources}) -target_link_libraries(clickhouse_common_config PUBLIC common PRIVATE clickhouse_common_zookeeper string_utils PUBLIC ${Poco_XML_LIBRARY} ${Poco_Util_LIBRARY}) +target_link_libraries(clickhouse_common_config PUBLIC common PRIVATE clickhouse_common_zookeeper string_utils PUBLIC ${Poco_XML_LIBRARY} ${Poco_Util_LIBRARY} Threads::Threads) target_include_directories(clickhouse_common_config PUBLIC ${DBMS_INCLUDE_DIR}) diff --git a/dbms/src/Common/ZooKeeper/CMakeLists.txt b/dbms/src/Common/ZooKeeper/CMakeLists.txt index 1f69f0af1ec..41fd406faae 100644 --- a/dbms/src/Common/ZooKeeper/CMakeLists.txt +++ b/dbms/src/Common/ZooKeeper/CMakeLists.txt @@ -4,7 +4,7 @@ add_headers_and_sources(clickhouse_common_zookeeper .) add_library(clickhouse_common_zookeeper ${LINK_MODE} ${clickhouse_common_zookeeper_headers} ${clickhouse_common_zookeeper_sources}) -target_link_libraries (clickhouse_common_zookeeper PUBLIC clickhouse_common_io common PRIVATE string_utils PUBLIC ${Poco_Util_LIBRARY}) +target_link_libraries (clickhouse_common_zookeeper PUBLIC clickhouse_common_io common PRIVATE string_utils PUBLIC ${Poco_Util_LIBRARY} Threads::Threads) target_include_directories(clickhouse_common_zookeeper PUBLIC ${DBMS_INCLUDE_DIR}) if (ENABLE_TESTS) diff --git a/dbms/src/Common/config.h.in b/dbms/src/Common/config.h.in index 0c756841f2e..aa57582f43c 100644 --- a/dbms/src/Common/config.h.in +++ b/dbms/src/Common/config.h.in @@ -18,6 +18,8 @@ #cmakedefine01 USE_XXHASH #cmakedefine01 USE_INTERNAL_LLVM_LIBRARY #cmakedefine01 USE_PROTOBUF +#cmakedefine01 USE_CPUID +#cmakedefine01 USE_CPUINFO #cmakedefine01 CLICKHOUSE_SPLIT_BINARY #cmakedefine01 LLVM_HAS_RTTI diff --git a/dbms/src/Common/getNumberOfPhysicalCPUCores.cpp b/dbms/src/Common/getNumberOfPhysicalCPUCores.cpp index 0a686b9c772..de158a51a77 100644 --- a/dbms/src/Common/getNumberOfPhysicalCPUCores.cpp +++ b/dbms/src/Common/getNumberOfPhysicalCPUCores.cpp @@ -1,19 +1,20 @@ #include #include -#if defined(__x86_64__) - - #include - #include - +#include +#if USE_CPUID +# include +# include namespace DB { namespace ErrorCodes { extern const int CPUID_ERROR; }} - +#elif USE_CPUINFO +# include #endif + unsigned getNumberOfPhysicalCPUCores() { -#if defined(__x86_64__) +#if USE_CPUID cpu_raw_data_t raw_data; if (0 != cpuid_get_raw_data(&raw_data)) throw DB::Exception("Cannot cpuid_get_raw_data: " + std::string(cpuid_error()), DB::ErrorCodes::CPUID_ERROR); @@ -37,6 +38,13 @@ unsigned getNumberOfPhysicalCPUCores() if (res != 0) return res; +#elif USE_CPUINFO + uint32_t cores = 0; + if (cpuinfo_initialize()) + cores = cpuinfo_get_cores_count(); + + if (cores) + return cores; #endif /// As a fallback (also for non-x86 architectures) assume there are no hyper-threading on the system. diff --git a/dbms/src/DataTypes/DataTypeLowCardinalityHelpers.cpp b/dbms/src/DataTypes/DataTypeLowCardinalityHelpers.cpp index 215b21f7994..0812e968794 100644 --- a/dbms/src/DataTypes/DataTypeLowCardinalityHelpers.cpp +++ b/dbms/src/DataTypes/DataTypeLowCardinalityHelpers.cpp @@ -48,10 +48,24 @@ ColumnPtr recursiveRemoveLowCardinality(const ColumnPtr & column) return column; if (const auto * column_array = typeid_cast(column.get())) - return ColumnArray::create(recursiveRemoveLowCardinality(column_array->getDataPtr()), column_array->getOffsetsPtr()); + { + auto & data = column_array->getDataPtr(); + auto data_no_lc = recursiveRemoveLowCardinality(data); + if (data.get() == data_no_lc.get()) + return column; + + return ColumnArray::create(data_no_lc, column_array->getOffsetsPtr()); + } if (const auto * column_const = typeid_cast(column.get())) - return ColumnConst::create(recursiveRemoveLowCardinality(column_const->getDataColumnPtr()), column_const->size()); + { + auto & nested = column_const->getDataColumnPtr(); + auto nested_no_lc = recursiveRemoveLowCardinality(nested); + if (nested.get() == nested_no_lc.get()) + return column; + + return ColumnConst::create(nested_no_lc, column_const->size()); + } if (const auto * column_tuple = typeid_cast(column.get())) { @@ -76,8 +90,14 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data return column; if (const auto * column_const = typeid_cast(column.get())) - return ColumnConst::create(recursiveLowCardinalityConversion(column_const->getDataColumnPtr(), from_type, to_type), - column_const->size()); + { + auto & nested = column_const->getDataColumnPtr(); + auto nested_no_lc = recursiveLowCardinalityConversion(nested, from_type, to_type); + if (nested.get() == nested_no_lc.get()) + return column; + + return ColumnConst::create(nested_no_lc, column_const->size()); + } if (const auto * low_cardinality_type = typeid_cast(from_type.get())) { @@ -125,11 +145,23 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data Columns columns = column_tuple->getColumns(); auto & from_elements = from_tuple_type->getElements(); auto & to_elements = to_tuple_type->getElements(); + + bool has_converted = false; + for (size_t i = 0; i < columns.size(); ++i) { auto & element = columns[i]; - element = recursiveLowCardinalityConversion(element, from_elements.at(i), to_elements.at(i)); + auto element_no_lc = recursiveLowCardinalityConversion(element, from_elements.at(i), to_elements.at(i)); + if (element.get() != element_no_lc.get()) + { + element = element_no_lc; + has_converted = true; + } } + + if (!has_converted) + return column; + return ColumnTuple::create(columns); } } diff --git a/dbms/src/Dictionaries/CMakeLists.txt b/dbms/src/Dictionaries/CMakeLists.txt index d7f85a5c7eb..3ee300e065b 100644 --- a/dbms/src/Dictionaries/CMakeLists.txt +++ b/dbms/src/Dictionaries/CMakeLists.txt @@ -12,7 +12,7 @@ generate_code(CacheDictionary_generate3 UInt8 UInt16 UInt32 UInt64 UInt128 Int8 add_headers_and_sources(clickhouse_dictionaries ${CMAKE_CURRENT_BINARY_DIR}/generated/) add_library(clickhouse_dictionaries ${LINK_MODE} ${clickhouse_dictionaries_sources}) -target_link_libraries(clickhouse_dictionaries PRIVATE clickhouse_common_io pocoext ${MYSQLXX_LIBRARY} ${BTRIE_LIBRARIES}) +target_link_libraries(clickhouse_dictionaries PRIVATE clickhouse_common_io pocoext ${MYSQLXX_LIBRARY} ${BTRIE_LIBRARIES} PUBLIC Threads::Threads) if(Poco_SQL_FOUND AND NOT USE_INTERNAL_POCO_LIBRARY) target_include_directories(clickhouse_dictionaries SYSTEM PRIVATE ${Poco_SQL_INCLUDE_DIR}) diff --git a/dbms/src/Dictionaries/ClickHouseDictionarySource.cpp b/dbms/src/Dictionaries/ClickHouseDictionarySource.cpp index 390f63ff7bf..3ec40f79c32 100644 --- a/dbms/src/Dictionaries/ClickHouseDictionarySource.cpp +++ b/dbms/src/Dictionaries/ClickHouseDictionarySource.cpp @@ -70,7 +70,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource( , query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks} , sample_block{sample_block} , context(context) - , is_local{isLocalAddress({host, port}, config.getInt("tcp_port", 0))} + , is_local{isLocalAddress({host, port}, context.getTCPPort())} , pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)} , load_all_query{query_builder.composeLoadAllQuery()} { diff --git a/dbms/src/Functions/CMakeLists.txt b/dbms/src/Functions/CMakeLists.txt index f210800d279..47e059ba93a 100644 --- a/dbms/src/Functions/CMakeLists.txt +++ b/dbms/src/Functions/CMakeLists.txt @@ -23,7 +23,7 @@ target_link_libraries(clickhouse_functions ${OPENSSL_CRYPTO_LIBRARY} ${LZ4_LIBRARY}) -target_include_directories (clickhouse_functions SYSTEM BEFORE PUBLIC ${DIVIDE_INCLUDE_DIR}) +target_include_directories (clickhouse_functions SYSTEM BEFORE PUBLIC ${DIVIDE_INCLUDE_DIR} ${METROHASH_INCLUDE_DIR}) if (CONSISTENT_HASHING_INCLUDE_DIR) target_include_directories (clickhouse_functions PRIVATE ${CONSISTENT_HASHING_INCLUDE_DIR}) diff --git a/dbms/src/Functions/sleep.h b/dbms/src/Functions/sleep.h index 1c492f61f69..c0b49f24ce6 100644 --- a/dbms/src/Functions/sleep.h +++ b/dbms/src/Functions/sleep.h @@ -82,12 +82,11 @@ public: /// We do not sleep if the block is empty. if (size > 0) { - unsigned useconds = seconds * (variant == FunctionSleepVariant::PerBlock ? 1 : size) * 1e6; - /// When sleeping, the query cannot be cancelled. For abitily to cancel query, we limit sleep time. - if (useconds > 3000000) /// The choice is arbitrary - throw Exception("The maximum sleep time is 3000000 microseconds. Requested: " + toString(useconds), ErrorCodes::TOO_SLOW); + if (seconds > 3.0) /// The choice is arbitrary + throw Exception("The maximum sleep time is 3 seconds. Requested: " + toString(seconds), ErrorCodes::TOO_SLOW); + UInt64 useconds = seconds * (variant == FunctionSleepVariant::PerBlock ? 1 : size) * 1e6; ::usleep(useconds); } diff --git a/dbms/src/Interpreters/Aggregator.cpp b/dbms/src/Interpreters/Aggregator.cpp index 517f882f103..3fb94397956 100644 --- a/dbms/src/Interpreters/Aggregator.cpp +++ b/dbms/src/Interpreters/Aggregator.cpp @@ -768,11 +768,12 @@ bool Aggregator::executeOnBlock(const Block & block, AggregatedDataVariants & re materialized_columns.push_back(block.safeGetByPosition(params.keys[i]).column->convertToFullColumnIfConst()); key_columns[i] = materialized_columns.back().get(); - if (const auto * low_cardinality_column = typeid_cast(key_columns[i])) + if (!result.isLowCardinality()) { - if (!result.isLowCardinality()) + auto column_no_lc = recursiveRemoveLowCardinality(key_columns[i]->getPtr()); + if (column_no_lc.get() != key_columns[i]) { - materialized_columns.push_back(low_cardinality_column->convertToFullColumn()); + materialized_columns.emplace_back(std::move(column_no_lc)); key_columns[i] = materialized_columns.back().get(); } } @@ -788,9 +789,10 @@ bool Aggregator::executeOnBlock(const Block & block, AggregatedDataVariants & re materialized_columns.push_back(block.safeGetByPosition(params.aggregates[i].arguments[j]).column->convertToFullColumnIfConst()); aggregate_columns[i][j] = materialized_columns.back().get(); - if (auto * col_low_cardinality = typeid_cast(aggregate_columns[i][j])) + auto column_no_lc = recursiveRemoveLowCardinality(aggregate_columns[i][j]->getPtr()); + if (column_no_lc.get() != aggregate_columns[i][j]) { - materialized_columns.push_back(col_low_cardinality->convertToFullColumn()); + materialized_columns.emplace_back(std::move(column_no_lc)); aggregate_columns[i][j] = materialized_columns.back().get(); } } diff --git a/dbms/src/Interpreters/Join.cpp b/dbms/src/Interpreters/Join.cpp index 6ef873fb6c7..ea6d4d06c9e 100644 --- a/dbms/src/Interpreters/Join.cpp +++ b/dbms/src/Interpreters/Join.cpp @@ -237,12 +237,18 @@ void Join::setSampleBlock(const Block & block) size_t keys_size = key_names_right.size(); ColumnRawPtrs key_columns(keys_size); - Columns materialized_columns(keys_size); + Columns materialized_columns; for (size_t i = 0; i < keys_size; ++i) { - materialized_columns[i] = recursiveRemoveLowCardinality(block.getByName(key_names_right[i]).column); - key_columns[i] = materialized_columns[i].get(); + auto & column = block.getByName(key_names_right[i]).column; + key_columns[i] = column.get(); + auto column_no_lc = recursiveRemoveLowCardinality(column); + if (column.get() != column_no_lc.get()) + { + materialized_columns.emplace_back(std::move(column_no_lc)); + key_columns[i] = materialized_columns[i].get(); + } /// We will join only keys, where all components are not NULL. if (key_columns[i]->isColumnNullable()) diff --git a/dbms/src/Interpreters/tests/CMakeLists.txt b/dbms/src/Interpreters/tests/CMakeLists.txt index 2f814c5a6a0..0cf33595335 100644 --- a/dbms/src/Interpreters/tests/CMakeLists.txt +++ b/dbms/src/Interpreters/tests/CMakeLists.txt @@ -15,6 +15,7 @@ target_include_directories (hash_map SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_ target_link_libraries (hash_map PRIVATE dbms clickhouse_compression) add_executable (hash_map3 hash_map3.cpp) +target_include_directories(hash_map3 SYSTEM BEFORE PRIVATE ${METROHASH_INCLUDE_DIR}) target_link_libraries (hash_map3 PRIVATE dbms ${FARMHASH_LIBRARIES} ${METROHASH_LIBRARIES}) add_executable (hash_map_string hash_map_string.cpp) @@ -25,6 +26,7 @@ add_executable (hash_map_string_2 hash_map_string_2.cpp) target_link_libraries (hash_map_string_2 PRIVATE dbms clickhouse_compression) add_executable (hash_map_string_3 hash_map_string_3.cpp) +target_include_directories(hash_map_string_3 SYSTEM BEFORE PRIVATE ${METROHASH_INCLUDE_DIR}) target_link_libraries (hash_map_string_3 PRIVATE dbms clickhouse_compression ${FARMHASH_LIBRARIES} ${METROHASH_LIBRARIES}) add_executable (hash_map_string_small hash_map_string_small.cpp) @@ -54,5 +56,5 @@ target_link_libraries (users PRIVATE dbms clickhouse_common_config ${Boost_FILES if (OS_LINUX) add_executable (internal_iotop internal_iotop.cpp) - target_link_libraries (internal_iotop PRIVATE dbms) + target_link_libraries (internal_iotop PRIVATE dbms Threads::Threads) endif () diff --git a/dbms/src/Interpreters/tests/hash_map_string_3.cpp b/dbms/src/Interpreters/tests/hash_map_string_3.cpp index 5d8d8e4f7c8..f58d79d0db7 100644 --- a/dbms/src/Interpreters/tests/hash_map_string_3.cpp +++ b/dbms/src/Interpreters/tests/hash_map_string_3.cpp @@ -325,7 +325,7 @@ struct FarmHash64 template -struct MetroHash64 +struct SMetroHash64 { size_t operator() (StringRef x) const { @@ -507,8 +507,8 @@ int main(int argc, char ** argv) if (!m || m == 8) bench (data, "StringRef_VerySimpleHash"); if (!m || m == 9) bench (data, "StringRef_FarmHash64"); - if (!m || m == 10) bench>(data, "StringRef_MetroHash64_1"); - if (!m || m == 11) bench>(data, "StringRef_MetroHash64_2"); + if (!m || m == 10) bench>(data, "StringRef_MetroHash64_1"); + if (!m || m == 11) bench>(data, "StringRef_MetroHash64_2"); return 0; } diff --git a/dbms/src/Storages/System/StorageSystemBuildOptions.generated.cpp.in b/dbms/src/Storages/System/StorageSystemBuildOptions.generated.cpp.in index c7a378e4363..be7d93c9fc5 100644 --- a/dbms/src/Storages/System/StorageSystemBuildOptions.generated.cpp.in +++ b/dbms/src/Storages/System/StorageSystemBuildOptions.generated.cpp.in @@ -1,15 +1,23 @@ // .cpp autogenerated by cmake +#cmakedefine01 BUILD_DETERMINISTIC + const char * auto_config_build[] { "VERSION_FULL", "@VERSION_FULL@", "VERSION_DESCRIBE", "@VERSION_DESCRIBE@", + "VERSION_INTEGER", "@VERSION_INTEGER@", + +#if BUILD_DETERMINISTIC + "SYSTEM", "@CMAKE_SYSTEM_NAME@", +#else "VERSION_GITHASH", "@VERSION_GITHASH@", "VERSION_REVISION", "@VERSION_REVISION@", - "VERSION_INTEGER", "@VERSION_INTEGER@", "BUILD_DATE", "@BUILD_DATE@", - "BUILD_TYPE", "@CMAKE_BUILD_TYPE@", "SYSTEM", "@CMAKE_SYSTEM@", +#endif + + "BUILD_TYPE", "@CMAKE_BUILD_TYPE@", "SYSTEM_PROCESSOR", "@CMAKE_SYSTEM_PROCESSOR@", "LIBRARY_ARCHITECTURE", "@CMAKE_LIBRARY_ARCHITECTURE@", "CMAKE_VERSION", "@CMAKE_VERSION@", diff --git a/dbms/tests/clickhouse-test-server b/dbms/tests/clickhouse-test-server index b9003cc93b7..0bb61922ab8 100755 --- a/dbms/tests/clickhouse-test-server +++ b/dbms/tests/clickhouse-test-server @@ -125,6 +125,7 @@ if [ -n "$*" ]; then else TEST_RUN=${TEST_RUN=1} TEST_PERF=${TEST_PERF=1} + TEST_DICT=${TEST_DICT=1} CLICKHOUSE_CLIENT_QUERY="${CLICKHOUSE_CLIENT} --config ${CLICKHOUSE_CONFIG_CLIENT} --port $CLICKHOUSE_PORT_TCP -m -n -q" $CLICKHOUSE_CLIENT_QUERY 'SELECT * from system.build_options; SELECT * FROM system.clusters;' CLICKHOUSE_TEST="env PATH=$PATH:$BIN_DIR ${TEST_DIR}clickhouse-test --binary ${BIN_DIR}${CLICKHOUSE_BINARY} --configclient $CLICKHOUSE_CONFIG_CLIENT --configserver $CLICKHOUSE_CONFIG --tmp $DATA_DIR/tmp --queries $QUERIES_DIR $TEST_OPT0 $TEST_OPT" @@ -139,6 +140,7 @@ else fi ( [ "$TEST_RUN" ] && $CLICKHOUSE_TEST ) || ${TEST_TRUE:=false} ( [ "$TEST_PERF" ] && $CLICKHOUSE_PERFORMANCE_TEST $* ) || true + ( [ "$TEST_DICT" ] && mkdir -p $DATA_DIR/etc/dictionaries/ && cd $CUR_DIR/external_dictionaries && python generate_and_test.py --port=$CLICKHOUSE_PORT_TCP --client=$CLICKHOUSE_CLIENT --source=$CUR_DIR/external_dictionaries/source.tsv --reference=$CUR_DIR/external_dictionaries/reference --generated=$DATA_DIR/etc/dictionaries/ --no_mysql --no_mongo ) || true $CLICKHOUSE_CLIENT_QUERY "SELECT event, value FROM system.events; SELECT metric, value FROM system.metrics; SELECT metric, value FROM system.asynchronous_metrics;" $CLICKHOUSE_CLIENT_QUERY "SELECT 'Still alive'" fi diff --git a/dbms/tests/external_dictionaries/generate_and_test.py b/dbms/tests/external_dictionaries/generate_and_test.py index 2c72d29de9d..e8bed97a5cc 100755 --- a/dbms/tests/external_dictionaries/generate_and_test.py +++ b/dbms/tests/external_dictionaries/generate_and_test.py @@ -394,8 +394,8 @@ def generate_dictionaries(args): - 0 - 0 + 5 + 15 diff --git a/dbms/tests/queries/0_stateless/00800_low_cardinality_array_group_by_arg.reference b/dbms/tests/queries/0_stateless/00800_low_cardinality_array_group_by_arg.reference new file mode 100644 index 00000000000..916213553ff --- /dev/null +++ b/dbms/tests/queries/0_stateless/00800_low_cardinality_array_group_by_arg.reference @@ -0,0 +1 @@ +2019-01-14 1 ['aaa','aaa','bbb','ccc'] diff --git a/dbms/tests/queries/0_stateless/00800_low_cardinality_array_group_by_arg.sql b/dbms/tests/queries/0_stateless/00800_low_cardinality_array_group_by_arg.sql new file mode 100644 index 00000000000..8ca5647140d --- /dev/null +++ b/dbms/tests/queries/0_stateless/00800_low_cardinality_array_group_by_arg.sql @@ -0,0 +1,33 @@ +SET allow_experimental_low_cardinality_type = 1; + +DROP TABLE IF EXISTS test.table1; +DROP TABLE IF EXISTS test.table2; + +CREATE TABLE test.table1 +( +dt Date, +id Int32, +arr Array(LowCardinality(String)) +) ENGINE = MergeTree PARTITION BY toMonday(dt) +ORDER BY (dt, id) SETTINGS index_granularity = 8192; + +CREATE TABLE test.table2 +( +dt Date, +id Int32, +arr Array(LowCardinality(String)) +) ENGINE = MergeTree PARTITION BY toMonday(dt) +ORDER BY (dt, id) SETTINGS index_granularity = 8192; + +insert into test.table1 (dt, id, arr) values ('2019-01-14', 1, ['aaa']); +insert into test.table2 (dt, id, arr) values ('2019-01-14', 1, ['aaa','bbb','ccc']); + +select dt, id, arraySort(groupArrayArray(arr)) +from ( + select dt, id, arr from test.table1 + where dt = '2019-01-14' and id = 1 + UNION ALL + select dt, id, arr from test.table2 + where dt = '2019-01-14' and id = 1 +) +group by dt, id; diff --git a/dbms/tests/queries/0_stateless/00833_sleep_overflow.reference b/dbms/tests/queries/0_stateless/00833_sleep_overflow.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/dbms/tests/queries/0_stateless/00833_sleep_overflow.sql b/dbms/tests/queries/0_stateless/00833_sleep_overflow.sql new file mode 100644 index 00000000000..155637eebd9 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00833_sleep_overflow.sql @@ -0,0 +1 @@ +SELECT sleep(4295.967296); -- { serverError 160 } diff --git a/dbms/tests/server-test.xml b/dbms/tests/server-test.xml index c20d34cce3f..c936f15bf52 100644 --- a/dbms/tests/server-test.xml +++ b/dbms/tests/server-test.xml @@ -110,7 +110,7 @@ query_log
7500 - *_dictionary.xml + dictionaries/dictionary_*.xml diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index eddefaa9394..0cb84542396 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -323,7 +323,7 @@ Outputs data as separate JSON objects for each row (newline delimited JSON). Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON. -For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines. +For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults, but it can be turned on by option `insert_sample_with_metadata=1`. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines. ## Native {#native} diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index 22568872092..c3a99080627 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -81,6 +81,9 @@ If an error occurred while reading rows but the error counter is still less than If `input_format_allow_errors_ratio` is exceeded, ClickHouse throws an exception. +## insert_sample_with_metadata + +For INSERT queries, specifies that the server need to send metadata about column defaults to the client. This will be used to calculate default expressions. Disabled by default. ## join_default_strictness diff --git a/docs/en/query_language/functions/array_functions.md b/docs/en/query_language/functions/array_functions.md index 3a16db67e8c..4fe0f8a4ffb 100644 --- a/docs/en/query_language/functions/array_functions.md +++ b/docs/en/query_language/functions/array_functions.md @@ -469,4 +469,64 @@ If you want to get a list of unique items in an array, you can use arrayReduce(' A special function. See the section ["ArrayJoin function"](array_join.md#functions_arrayjoin). +## arrayDifference(arr) + +Takes an array, returns an array with the difference between all pairs of neighboring elements. For example: + +```sql +SELECT arrayDifference([1, 2, 3, 4]) +``` + +``` +┌─arrayDifference([1, 2, 3, 4])─┐ +│ [0,1,1,1] │ +└───────────────────────────────┘ +``` + +## arrayDistinct(arr) + +Takes an array, returns an array containing the different elements in all the arrays. For example: + +```sql +SELECT arrayDifference([1, 2, 3, 4]) +``` + +``` +┌─arrayDifference([1, 2, 3, 4])─┐ +│ [0,1,1,1] │ +└───────────────────────────────┘ +``` + +## arrayEnumerateDense(arr) + +Returns an array of the same size as the source array, indicating where each element first appears in the source array. For example: arrayEnumerateDense([10,20,10,30]) = [1,2,1,4]. + +## arrayIntersect(arr) + +Takes an array, returns the intersection of all array elements. For example: + +```sql +SELECT + arrayIntersect([1, 2], [1, 3], [2, 3]) AS no_intersect, + arrayIntersect([1, 2], [1, 3], [1, 4]) AS intersect +``` + +``` +┌─no_intersect─┬─intersect─┐ +│ [] │ [1] │ +└──────────────┴───────────┘ +``` + +## arrayReduce(agg_func, arr1, ...) + +Applies an aggregate function to array and returns its result.If aggregate function has multiple arguments, then this function can be applied to multiple arrays of the same size. + +arrayReduce('agg_func', arr1, ...) - apply the aggregate function `agg_func` to arrays `arr1...`. If multiple arrays passed, then elements on corresponding positions are passed as multiple arguments to the aggregate function. For example: SELECT arrayReduce('max', [1,2,3]) = 3 + +## arrayReverse(arr) + +Returns an array of the same size as the source array, containing the result of inverting all elements of the source array. + + + [Original article](https://clickhouse.yandex/docs/en/query_language/functions/array_functions/) diff --git a/docs/en/query_language/functions/bit_functions.md b/docs/en/query_language/functions/bit_functions.md index 1664664a6cf..c08a80e2bbf 100644 --- a/docs/en/query_language/functions/bit_functions.md +++ b/docs/en/query_language/functions/bit_functions.md @@ -16,5 +16,16 @@ The result type is an integer with bits equal to the maximum bits of its argumen ## bitShiftRight(a, b) +## bitRotateLeft(a, b) + +## bitRotateRight(a, b) + +## bitTest(a, b) + +## bitTestAll(a, b) + +## bitTestAny(a, b) + + [Original article](https://clickhouse.yandex/docs/en/query_language/functions/bit_functions/) diff --git a/docs/en/query_language/functions/date_time_functions.md b/docs/en/query_language/functions/date_time_functions.md index 9d9f60d627e..96852d82c3f 100644 --- a/docs/en/query_language/functions/date_time_functions.md +++ b/docs/en/query_language/functions/date_time_functions.md @@ -20,17 +20,29 @@ SELECT Only time zones that differ from UTC by a whole number of hours are supported. +## toTimeZone + +Convert time or date and time to the specified time zone. + ## toYear Converts a date or date with time to a UInt16 number containing the year number (AD). +## toQuarter + +Converts a date or date with time to a UInt8 number containing the quarter number. + ## toMonth Converts a date or date with time to a UInt8 number containing the month number (1-12). +## toDayOfYear + +Converts a date or date with time to a UInt8 number containing the number of the day of the year (1-366). + ## toDayOfMonth --Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31). +Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31). ## toDayOfWeek @@ -50,11 +62,20 @@ Converts a date with time to a UInt8 number containing the number of the minute Converts a date with time to a UInt8 number containing the number of the second in the minute (0-59). Leap seconds are not accounted for. +## toUnixTimestamp + +Converts a date with time to a unix timestamp. + ## toMonday Rounds down a date or date with time to the nearest Monday. Returns the date. +## toStartOfISOYear + +Rounds down a date or date with time to the first day of ISO year. +Returns the date. + ## toStartOfMonth Rounds down a date or date with time to the first day of the month. @@ -104,6 +125,10 @@ Converts a date with time to a certain fixed date, while preserving the time. Converts a date with time or date to the number of the year, starting from a certain fixed point in the past. +## toRelativeQuarterNum + +Converts a date with time or date to the number of the quarter, starting from a certain fixed point in the past. + ## toRelativeMonthNum Converts a date with time or date to the number of the month, starting from a certain fixed point in the past. @@ -128,6 +153,14 @@ Converts a date with time or date to the number of the minute, starting from a c Converts a date with time or date to the number of the second, starting from a certain fixed point in the past. +## toISOYear + +Converts a date or date with time to a UInt16 number containing the ISO Year number. + +## toISOWeek + +Converts a date or date with time to a UInt8 number containing the ISO Week number. + ## now Accepts zero arguments and returns the current time at one of the moments of request execution. @@ -148,6 +181,60 @@ The same as 'today() - 1'. Rounds the time to the half hour. This function is specific to Yandex.Metrica, since half an hour is the minimum amount of time for breaking a session into two sessions if a tracking tag shows a single user's consecutive pageviews that differ in time by strictly more than this amount. This means that tuples (the tag ID, user ID, and time slot) can be used to search for pageviews that are included in the corresponding session. +## toYYYYMM + +Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 100 + MM). + +## toYYYYMMDD + +Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 10000 + MM * 100 + DD). + +## toYYYYMMDDhhmmss + +Converts a date or date with time to a UInt64 number containing the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss). + +## addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters + +Function adds a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example: + +```sql +WITH + toDate('2018-01-01') AS date, + toDateTime('2018-01-01 00:00:00') AS date_time +SELECT + addYears(date, 1) AS add_years_with_date, + addYears(date_time, 1) AS add_years_with_date_time +``` + +``` +┌─add_years_with_date─┬─add_years_with_date_time─┐ +│ 2019-01-01 │ 2019-01-01 00:00:00 │ +└─────────────────────┴──────────────────────────┘ +``` + +## subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters + +Function subtract a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example: + +```sql +WITH + toDate('2019-01-01') AS date, + toDateTime('2019-01-01 00:00:00') AS date_time +SELECT + subtractYears(date, 1) AS subtract_years_with_date, + subtractYears(date_time, 1) AS subtract_years_with_date_time +``` + +``` +┌─subtract_years_with_date─┬─subtract_years_with_date_time─┐ +│ 2018-01-01 │ 2018-01-01 00:00:00 │ +└──────────────────────────┴───────────────────────────────┘ +``` + +## dateDiff('unit', t1, t2, \[timezone\]) + +Return the difference between two times, t1 and t2 can be Date or DateTime, If timezone is specified, it applied to both arguments. If not, timezones from datatypes t1 and t2 are used. If that timezones are not the same, the result is unspecified. + ## timeSlots(StartTime, Duration,\[, Size\]) For a time interval starting at 'StartTime' and continuing for 'Duration' seconds, it returns an array of moments in time, consisting of points from this interval rounded down to the 'Size' in seconds. 'Size' is an optional parameter: a constant UInt32, set to 1800 by default. diff --git a/docs/en/query_language/functions/ext_dict_functions.md b/docs/en/query_language/functions/ext_dict_functions.md index d370e47e3f7..fd4bc7575be 100644 --- a/docs/en/query_language/functions/ext_dict_functions.md +++ b/docs/en/query_language/functions/ext_dict_functions.md @@ -21,7 +21,7 @@ If there is no `id` key in the dictionary, it returns the default value specifie ## dictGetTOrDefault {#ext_dict_functions_dictGetTOrDefault} -`dictGetT('dict_name', 'attr_name', id, default)` +`dictGetTOrDefault('dict_name', 'attr_name', id, default)` The same as the `dictGetT` functions, but the default value is taken from the function's last argument. diff --git a/docs/en/query_language/functions/hash_functions.md b/docs/en/query_language/functions/hash_functions.md index ffffe5584fc..788ad968663 100644 --- a/docs/en/query_language/functions/hash_functions.md +++ b/docs/en/query_language/functions/hash_functions.md @@ -64,5 +64,52 @@ A fast, decent-quality non-cryptographic hash function for a string obtained fro `URLHash(s, N)` – Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols `/`,`?` or `#` at the end, if present. Levels are the same as in URLHierarchy. This function is specific to Yandex.Metrica. +## farmHash64 + +Calculates FarmHash64 from a string. +Accepts a String-type argument. Returns UInt64. +For more information, see the link: [FarmHash64](https://github.com/google/farmhash) + +## javaHash + +Calculates JavaHash from a string. +Accepts a String-type argument. Returns Int32. +For more information, see the link: [JavaHash](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/String.java#l1452) + +## hiveHash + +Calculates HiveHash from a string. +Accepts a String-type argument. Returns Int32. +Same as for [JavaHash](./hash_functions.md#javaHash), except that the return value never has a negative number. + +## metroHash64 + +Calculates MetroHash from a string. +Accepts a String-type argument. Returns UInt64. +For more information, see the link: [MetroHash64](http://www.jandrewrogers.com/2015/05/27/metrohash/) + +## jumpConsistentHash + +Calculates JumpConsistentHash form a UInt64. +Accepts a UInt64-type argument. Returns Int32. +For more information, see the link: [JumpConsistentHash](https://arxiv.org/pdf/1406.2294.pdf) + +## murmurHash2_32, murmurHash2_64 + +Calculates MurmurHash2 from a string. +Accepts a String-type argument. Returns UInt64 Or UInt32. +For more information, see the link: [MurmurHash2](https://github.com/aappleby/smhasher) + +## murmurHash3_32, murmurHash3_64, murmurHash3_128 + +Calculates MurmurHash3 from a string. +Accepts a String-type argument. Returns UInt64 Or UInt32 Or FixedString(16). +For more information, see the link: [MurmurHash3](https://github.com/aappleby/smhasher) + +## xxHash32, xxHash64 + +Calculates xxHash from a string. +ccepts a String-type argument. Returns UInt64 Or UInt32. +For more information, see the link: [xxHash](http://cyan4973.github.io/xxHash/) [Original article](https://clickhouse.yandex/docs/en/query_language/functions/hash_functions/) diff --git a/docs/en/query_language/functions/higher_order_functions.md b/docs/en/query_language/functions/higher_order_functions.md index b00896cb4ab..dde52c05b7a 100644 --- a/docs/en/query_language/functions/higher_order_functions.md +++ b/docs/en/query_language/functions/higher_order_functions.md @@ -87,6 +87,20 @@ SELECT arrayCumSum([1, 1, 1, 1]) AS res └──────────────┘ ``` +### arrayCumSumNonNegative(arr) + +Same as arrayCumSum, returns an array of partial sums of elements in the source array (a running sum). Different arrayCumSum, when then returned value contains a value less than zero, the value is replace with zero and the subsequent calculation is performed with zero parameters. For example: + +``` sql +SELECT arrayCumSumNonNegative([1, 1, -4, 1]) AS res +``` + +``` +┌─res───────┐ +│ [1,2,0,1] │ +└───────────┘ +``` + ### arraySort(\[func,\] arr1, ...) Returns an array as result of sorting the elements of `arr1` in ascending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays) @@ -112,6 +126,6 @@ Returns an array as result of sorting the elements of `arr1` in descending order - + [Original article](https://clickhouse.yandex/docs/en/query_language/functions/higher_order_functions/) diff --git a/docs/en/query_language/functions/ip_address_functions.md b/docs/en/query_language/functions/ip_address_functions.md index 27e1290c63c..a3e1958677f 100644 --- a/docs/en/query_language/functions/ip_address_functions.md +++ b/docs/en/query_language/functions/ip_address_functions.md @@ -113,5 +113,38 @@ LIMIT 10 The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it returns a string of null bytes. HEX can be uppercase or lowercase. +## IPv4ToIPv6(x) + +Takes a UInt32 number. Interprets it as an IPv4 address in big endian. Returns a FixedString(16) value containing the IPv6 address in binary format. Examples: + +``` sql +SELECT IPv6NumToString(IPv4ToIPv6(IPv4StringToNum('192.168.0.1'))) AS addr +``` + +``` +┌─addr───────────────┐ +│ ::ffff:192.168.0.1 │ +└────────────────────┘ +``` + +## cutIPv6(x, bitsToCutForIPv6, bitsToCutForIPv4) + +Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing the address of the specified number of bits removed in text format. For example: + +```sql +WITH + IPv6StringToNum('2001:0DB8:AC10:FE01:FEED:BABE:CAFE:F00D') AS ipv6, + IPv4ToIPv6(IPv4StringToNum('192.168.0.1')) AS ipv4 +SELECT + cutIPv6(ipv6, 2, 0), + cutIPv6(ipv4, 0, 2) + +``` + +``` +┌─cutIPv6(ipv6, 2, 0)─────────────────┬─cutIPv6(ipv4, 0, 2)─┐ +│ 2001:db8:ac10:fe01:feed:babe:cafe:0 │ ::ffff:192.168.0.0 │ +└─────────────────────────────────────┴─────────────────────┘ +``` [Original article](https://clickhouse.yandex/docs/en/query_language/functions/ip_address_functions/) diff --git a/docs/en/query_language/functions/math_functions.md b/docs/en/query_language/functions/math_functions.md index af4c9a30129..31deb337fdb 100644 --- a/docs/en/query_language/functions/math_functions.md +++ b/docs/en/query_language/functions/math_functions.md @@ -14,7 +14,7 @@ Returns a Float64 number that is close to the number π. Accepts a numeric argument and returns a Float64 number close to the exponent of the argument. -## log(x) +## log(x), ln(x) Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument. @@ -94,8 +94,16 @@ The arc cosine. The arc tangent. -## pow(x, y) +## pow(x, y), power(x, y) Takes two numeric arguments x and y. Returns a Float64 number close to x to the power of y. +## intExp2 + +Accepts a numeric argument and returns a UInt64 number close to 2 to the power of x. + +## intExp10 + +Accepts a numeric argument and returns a UInt64 number close to 10 to the power of x. + [Original article](https://clickhouse.yandex/docs/en/query_language/functions/math_functions/) diff --git a/docs/en/query_language/functions/other_functions.md b/docs/en/query_language/functions/other_functions.md index e49bedd8199..b5a25a6276f 100644 --- a/docs/en/query_language/functions/other_functions.md +++ b/docs/en/query_language/functions/other_functions.md @@ -44,6 +44,10 @@ However, the argument is still evaluated. This can be used for benchmarks. Sleeps 'seconds' seconds on each data block. You can specify an integer or a floating-point number. +## sleepEachRow(seconds) + +Sleeps 'seconds' seconds on each row. You can specify an integer or a floating-point number. + ## currentDatabase() Returns the name of the current database. @@ -242,6 +246,18 @@ Returns the server's uptime in seconds. Returns the version of the server as a string. +## timezone() + +Returns the timezone of the server. + +## blockNumber + +Returns the sequence number of the data block where the row is located. + +## rowNumberInBlock + +Returns the ordinal number of the row in the data block. Different data blocks are always recalculated. + ## rowNumberInAllBlocks() Returns the ordinal number of the row in the data block. This function only considers the affected data blocks. @@ -283,6 +299,10 @@ FROM └─────────┴─────────────────────┴───────┘ ``` +## runningDifferenceStartingWithFirstValue + +Same as for [runningDifference](./other_functions.md#runningDifference), the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row. + ## MACNumToString(num) Accepts a UInt64 number. Interprets it as a MAC address in big endian. Returns a string containing the corresponding MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form). @@ -440,7 +460,7 @@ The expression passed to the function is not calculated, but ClickHouse applies **Returned value** -- 1. +- 1. **Example** @@ -558,5 +578,34 @@ SELECT replicate(1, ['a', 'b', 'c']) └───────────────────────────────┘ ``` +## filesystemAvailable + +Returns the remaining space information of the disk, in bytes. This information is evaluated using the configured by path. + +## filesystemCapacity + +Returns the capacity information of the disk, in bytes. This information is evaluated using the configured by path. + +## finalizeAggregation + +Takes state of aggregate function. Returns result of aggregation (finalized state). + +## runningAccumulate + +Takes the states of the aggregate function and returns a column with values, are the result of the accumulation of these states for a set of block lines, from the first to the current line. +For example, takes state of aggregate function (example runningAccumulate(uniqState(UserID))), and for each row of block, return result of aggregate function on merge of states of all previous rows and current row. +So, result of function depends on partition of data to blocks and on order of data in block. + +## joinGet('join_storage_table_name', 'get_column', join_key) + +Get data from a table of type Join using the specified join key. + +## modelEvaluate(model_name, ...) +Evaluate external model. +Accepts a model name and model arguments. Returns Float64. + +## throwIf(x) + +Throw an exception if the argument is non zero. [Original article](https://clickhouse.yandex/docs/en/query_language/functions/other_functions/) diff --git a/docs/en/query_language/functions/random_functions.md b/docs/en/query_language/functions/random_functions.md index eca7e3279aa..7e8649990d5 100644 --- a/docs/en/query_language/functions/random_functions.md +++ b/docs/en/query_language/functions/random_functions.md @@ -16,5 +16,8 @@ Uses a linear congruential generator. Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers. Uses a linear congruential generator. +## randConstant + +Returns a pseudo-random UInt32 number, The value is one for different blocks. [Original article](https://clickhouse.yandex/docs/en/query_language/functions/random_functions/) diff --git a/docs/en/query_language/functions/rounding_functions.md b/docs/en/query_language/functions/rounding_functions.md index 17407aee852..83d8334323a 100644 --- a/docs/en/query_language/functions/rounding_functions.md +++ b/docs/en/query_language/functions/rounding_functions.md @@ -12,7 +12,7 @@ Examples: `floor(123.45, 1) = 123.4, floor(123.45, -1) = 120.` For integer arguments, it makes sense to round with a negative 'N' value (for non-negative 'N', the function doesn't do anything). If rounding causes overflow (for example, floor(-128, -1)), an implementation-specific result is returned. -## ceil(x\[, N\]) +## ceil(x\[, N\]), ceiling(x\[, N\]) Returns the smallest round number that is greater than or equal to 'x'. In every other way, it is the same as the 'floor' function (see above). @@ -66,5 +66,8 @@ Accepts a number. If the number is less than one, it returns 0. Otherwise, it ro Accepts a number. If the number is less than 18, it returns 0. Otherwise, it rounds the number down to a number from the set: 18, 25, 35, 45, 55. This function is specific to Yandex.Metrica and used for implementing the report on user age. +## roundDown(num, arr) + +Accept a number, round it down to an element in the specified array. If the value is less than the lowest bound, the lowest bound is returned. [Original article](https://clickhouse.yandex/docs/en/query_language/functions/rounding_functions/) diff --git a/docs/en/query_language/functions/string_functions.md b/docs/en/query_language/functions/string_functions.md index 29b8583624d..6e90d218b5a 100644 --- a/docs/en/query_language/functions/string_functions.md +++ b/docs/en/query_language/functions/string_functions.md @@ -24,11 +24,21 @@ The function also works for arrays. Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception). The result type is UInt64. -## lower +## char_length, CHAR_LENGTH + +Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception). +The result type is UInt64. + +## character_length, CHARACTER_LENGTH + +Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception). +The result type is UInt64. + +## lower, lcase Converts ASCII Latin symbols in a string to lowercase. -## upper +## upper, ucase Converts ASCII Latin symbols in a string to uppercase. @@ -58,7 +68,11 @@ Reverses a sequence of Unicode code points, assuming that the string contains a Concatenates the strings listed in the arguments, without a separator. -## substring(s, offset, length) +## concatAssumeInjective(s1, s2, ...) + +Same as [concat](./string_functions.md#concat-s1-s2), the difference is that you need to ensure that concat(s1, s2, s3) -> s4 is injective, it will be used for optimization of GROUP BY + +## substring(s, offset, length), mid(s, offset, length), substr(s, offset, length) Returns a substring starting with the byte from the 'offset' index that is 'length' bytes long. Character indexing starts from one (as in standard SQL). The 'offset' and 'length' arguments must be constants. @@ -83,4 +97,24 @@ Decode base64-encoded string 's' into original string. In case of failure raises ## tryBase64Decode(s) Similar to base64Decode, but in case of error an empty string would be returned. -[Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_functions/) \ No newline at end of file +## endsWith(s, suffix) + +Returns whether to end with the specified suffix. Returns 1 if the string ends with the specified suffix, otherwise it returns 0. + +## startsWith(s, prefix) + +Returns whether to end with the specified prefix. Returns 1 if the string ends with the specified prefix, otherwise it returns 0. + +## trimLeft(s) + +Returns a string that removes the whitespace characters on left side. + +## trimRight(s) + +Returns a string that removes the whitespace characters on right side. + +## trimBoth(s) + +Returns a string that removes the whitespace characters on either side. + +[Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_functions/) diff --git a/docs/en/query_language/functions/string_replace_functions.md b/docs/en/query_language/functions/string_replace_functions.md index 400e4a7eff6..19339dd474d 100644 --- a/docs/en/query_language/functions/string_replace_functions.md +++ b/docs/en/query_language/functions/string_replace_functions.md @@ -5,7 +5,7 @@ Replaces the first occurrence, if it exists, of the 'pattern' substring in 'haystack' with the 'replacement' substring. Hereafter, 'pattern' and 'replacement' must be constants. -## replaceAll(haystack, pattern, replacement) +## replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement) Replaces all occurrences of the 'pattern' substring in 'haystack' with the 'replacement' substring. @@ -78,4 +78,12 @@ SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res ``` +## regexpQuoteMeta(s) + +The function adds a backslash before some predefined characters in the string. +Predefined characters: '0', '\\', '|', '(', ')', '^', '$', '.', '[', ']', '?', '*', '+', '{', ':', '-'. +This implementation slightly differs from re2::RE2::QuoteMeta. It escapes zero byte as \0 instead of \x00 and it escapes only required characters. +For more information, see the link: [RE2](https://github.com/google/re2/blob/master/re2/re2.cc#L473) + + [Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_replace_functions/) diff --git a/docs/en/query_language/functions/string_search_functions.md b/docs/en/query_language/functions/string_search_functions.md index ced657da2ed..a08693acaf7 100644 --- a/docs/en/query_language/functions/string_search_functions.md +++ b/docs/en/query_language/functions/string_search_functions.md @@ -3,7 +3,7 @@ The search is case-sensitive in all these functions. The search substring or regular expression must be a constant in all these functions. -## position(haystack, needle) +## position(haystack, needle), locate(haystack, needle) Search for the substring `needle` in the string `haystack`. Returns the position (in bytes) of the found substring, starting from 1, or returns 0 if the substring was not found. diff --git a/docs/en/query_language/functions/type_conversion_functions.md b/docs/en/query_language/functions/type_conversion_functions.md index a1a175db845..087a6e4c1ef 100644 --- a/docs/en/query_language/functions/type_conversion_functions.md +++ b/docs/en/query_language/functions/type_conversion_functions.md @@ -7,10 +7,12 @@ ## toFloat32, toFloat64 -## toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero - ## toDate, toDateTime +## toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero, toDateOrZero, toDateTimeOrZero + +## toUInt8OrNull, toUInt16OrNull, toUInt32OrNull, toUInt64OrNull, toInt8OrNull, toInt16OrNull, toInt32OrNull, toInt64OrNull, toFloat32OrNull, toFloat64OrNull, toDateOrNull, toDateTimeOrNull + ## toDecimal32(value, S), toDecimal64(value, S), toDecimal128(value, S) Converts `value` to [Decimal](../../data_types/decimal.md) of precision `S`. The `value` can be a number or a string. The `S` (scale) parameter specifies the number of decimal places. @@ -99,6 +101,9 @@ These functions accept a string and interpret the bytes placed at the beginning This function accepts a number or date or date with time, and returns a string containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long. +## reinterpretAsFixedString + +This function accepts a number or date or date with time, and returns a FixedString containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long. ## CAST(x, t) @@ -141,5 +146,39 @@ SELECT toTypeName(CAST(x, 'Nullable(UInt16)')) FROM t_null └─────────────────────────────────────────┘ ``` +## toIntervalYear, toIntervalQuarter, toIntervalMonth, toIntervalWeek, toIntervalDay, toIntervalHour, toIntervalMinute, toIntervalSecond + +Converts a Number type argument to a Interval type (duration). +The interval type is actually very useful, you can use this type of data to perform arithmetic operations directly with Date or DateTime. At the same time, ClickHouse provides a more convenient syntax for declaring Interval type data. For example: + +```sql +WITH + toDate('2019-01-01') AS date, + INTERVAL 1 WEEK AS interval_week, + toIntervalWeek(1) AS interval_to_week +SELECT + date + interval_week, + date + interval_to_week +``` + +``` +┌─plus(date, interval_week)─┬─plus(date, interval_to_week)─┐ +│ 2019-01-08 │ 2019-01-08 │ +└───────────────────────────┴──────────────────────────────┘ +``` + +## parseDateTimeBestEffort + +Parse a number type argument to a Date or DateTime type. +different from toDate and toDateTime, parseDateTimeBestEffort can progress more complex date format. +For more information, see the link: [Complex Date Format](https://xkcd.com/1179/) + +## parseDateTimeBestEffortOrNull + +Same as for [parseDateTimeBestEffort](./type_conversion_functions.md#parseDateTimeBestEffort) except that it returns null when it encounters a date format that cannot be processed. + +## parseDateTimeBestEffortOrZero + +Same as for [parseDateTimeBestEffort](./type_conversion_functions.md#parseDateTimeBestEffort) except that it returns zero date or zero date time when it encounters a date format that cannot be processed. [Original article](https://clickhouse.yandex/docs/en/query_language/functions/type_conversion_functions/) diff --git a/docs/ru/getting_started/index.md b/docs/ru/getting_started/index.md index 99464d0260c..7b110aed88b 100644 --- a/docs/ru/getting_started/index.md +++ b/docs/ru/getting_started/index.md @@ -73,7 +73,7 @@ Server: dbms/programs/clickhouse-server Для запуска сервера в качестве демона, выполните: ``` bash -% sudo service clickhouse-server start +$ sudo service clickhouse-server start ``` Смотрите логи в директории `/var/log/clickhouse-server/`. diff --git a/docs/ru/interfaces/formats.md b/docs/ru/interfaces/formats.md index 303ed85cd73..1257486a3f8 100644 --- a/docs/ru/interfaces/formats.md +++ b/docs/ru/interfaces/formats.md @@ -323,7 +323,7 @@ ClickHouse поддерживает [NULL](../query_language/syntax.md), кот В отличие от формата JSON, нет замены невалидных UTF-8 последовательностей. В строках может выводиться произвольный набор байт. Это сделано для того, чтобы данные форматировались без потери информации. Экранирование значений осуществляется аналогично формату JSON. -При парсинге, поддерживается расположение значений разных столбцов в произвольном порядке. Допустимо отсутствие некоторых значений - тогда они воспринимаются как равные значениям по умолчанию. При этом, в качестве значений по умолчанию используются нули, пустые строки и не поддерживаются сложные значения по умолчанию, которые могут быть заданы в таблице. Пропускаются пробельные символы между элементами. После объектов может быть расположена запятая, которая игнорируется. Объекты не обязательно должны быть разделены переводами строк. +При парсинге, поддерживается расположение значений разных столбцов в произвольном порядке. Допустимо отсутствие некоторых значений - тогда они воспринимаются как равные значениям по умолчанию. При этом, в качестве значений по умолчанию используются нули, и пустые строки. Сложные значения которые могут быть заданы в таблице, не поддерживаются по умолчанию, но их можно включить с помощью опции `insert_sample_with_metadata = 1`. Пропускаются пробельные символы между элементами. После объектов может быть расположена запятая, которая игнорируется. Объекты не обязательно должны быть разделены переводами строк. ## Native {#native} diff --git a/docs/ru/operations/settings/settings.md b/docs/ru/operations/settings/settings.md index c174507859b..169dc6c0823 100644 --- a/docs/ru/operations/settings/settings.md +++ b/docs/ru/operations/settings/settings.md @@ -322,6 +322,10 @@ ClickHouse применяет настройку в тех случаях, ко Если значение истинно, то при выполнении INSERT из входных данных пропускаются (не рассматриваются) колонки с неизвестными именами, иначе в данной ситуации будет сгенерировано исключение. Работает для форматов JSONEachRow и TSKV. +## insert_sample_with_metadata + +Для запросов INSERT. Указывает, что серверу необходимо отправлять клиенту метаданные о значениях столбцов по умолчанию, которые будут использоваться для вычисления выражений по умолчанию. По умолчанию отключено. + ## output_format_json_quote_64bit_integers Если значение истинно, то при использовании JSON\* форматов UInt64 и Int64 числа выводятся в кавычках (из соображений совместимости с большинством реализаций JavaScript), иначе - без кавычек. diff --git a/libs/libcommon/CMakeLists.txt b/libs/libcommon/CMakeLists.txt index 08a642e7cc1..4c6daa23e7d 100644 --- a/libs/libcommon/CMakeLists.txt +++ b/libs/libcommon/CMakeLists.txt @@ -106,7 +106,7 @@ target_link_libraries (common ${Boost_SYSTEM_LIBRARY} PRIVATE ${MALLOC_LIBRARIES} - ${CMAKE_THREAD_LIBS_INIT} + Threads::Threads ${MEMCPY_LIBRARIES}) if (RT_LIBRARY) diff --git a/libs/libcommon/src/tests/CMakeLists.txt b/libs/libcommon/src/tests/CMakeLists.txt index 113cda38a2a..906f371b876 100644 --- a/libs/libcommon/src/tests/CMakeLists.txt +++ b/libs/libcommon/src/tests/CMakeLists.txt @@ -16,7 +16,7 @@ target_link_libraries (date_lut3 common ${PLATFORM_LIBS}) target_link_libraries (date_lut4 common ${PLATFORM_LIBS}) target_link_libraries (date_lut_default_timezone common ${PLATFORM_LIBS}) target_link_libraries (local_date_time_comparison common) -target_link_libraries (realloc-perf common) +target_link_libraries (realloc-perf common Threads::Threads) add_check(local_date_time_comparison) if(USE_GTEST) diff --git a/utils/make_changelog.py b/utils/make_changelog.py new file mode 100755 index 00000000000..87bd3130867 --- /dev/null +++ b/utils/make_changelog.py @@ -0,0 +1,386 @@ +#!/usr/bin/env python +# Note: should work with python 2 and 3 +from __future__ import print_function + +import requests +import json +import subprocess +import re +import os +import time +import logging +import codecs +import argparse + + +GITHUB_API_URL = 'https://api.github.com/' +SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__)) + + +def http_get_json(url, token, max_retries, retry_timeout): + + for t in range(max_retries): + + if token: + resp = requests.get(url, headers={"Authorization": "token {}".format(token)}) + else: + resp = requests.get(url) + + if resp.status_code != 200: + msg = "Request {} failed with code {}.\n{}\n".format(url, resp.status_code, resp.text) + + if resp.status_code == 403: + try: + if resp.json()['message'].startswith('API rate limit exceeded') and t + 1 < max_retries: + logging.warning(msg) + time.sleep(retry_timeout) + continue + except: + pass + + raise Exception(msg) + + return resp.json() + + +def github_api_get_json(query, token, max_retries, retry_timeout): + return http_get_json(GITHUB_API_URL + query, token, max_retries, retry_timeout) + + +def check_sha(sha): + if not (re.match('^[a-hA-H0-9]+$', sha) and len(sha) >= 7): + raise Exception("String " + sha + " doesn't look like git sha.") + + +def get_merge_base(first, second, project_root): + try: + command = "git merge-base {} {}".format(first, second) + text = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, cwd=project_root).stdout.read() + text = text.decode('utf-8', 'ignore') + sha = tuple(filter(len, text.split()))[0] + check_sha(sha) + return sha + except: + logging.error('Cannot find merge base for %s and %s', first, second) + raise + + +# Get list of commits from branch to base_sha. Update commits_info. +def get_commits_from_branch(repo, branch, base_sha, commits_info, max_pages, token, max_retries, retry_timeout): + + def get_commits_from_page(page): + query = 'repos/{}/commits?sha={}&page={}'.format(repo, branch, page) + resp = github_api_get_json(query, token, max_retries, retry_timeout) + for commit in resp: + sha = commit['sha'] + if sha not in commits_info: + commits_info[sha] = commit + + return [commit['sha'] for commit in resp] + + commits = [] + found_base_commit = False + + for page in range(max_pages): + page_commits = get_commits_from_page(page) + for commit in page_commits: + if commit == base_sha: + found_base_commit = True + break + commits.append(commit) + + if found_base_commit: + break + + if not found_base_commit: + raise Exception("Can't found base commit sha {} in branch {}. Checked {} commits on {} pages.\nCommits: {}" + .format(base_sha, branch, len(commits), max_pages, ' '.join(commits))) + return commits + + +# Use GitHub search api to check if commit from any pull request. Update pull_requests info. +def find_pull_request_for_commit(commit_sha, pull_requests, token, max_retries, retry_timeout): + resp = github_api_get_json('search/issues?q={}'.format(commit_sha), token, max_retries, retry_timeout) + + found = False + for item in resp['items']: + if 'pull_request' in item: + found = True + number = item['number'] + if number not in pull_requests: + pull_requests[number] = { + 'description': item['body'], + 'user': item['user']['login'], + } + + return found + + +# Find pull requests from list of commits. If no pull request found, add commit to not_found_commits list. +def find_pull_requests(commits, token, max_retries, retry_timeout): + not_found_commits = [] + pull_requests = {} + + for i, commit in enumerate(commits): + if (i + 1) % 10 == 0: + logging.info('Processed %d commits', i + 1) + if not find_pull_request_for_commit(commit, pull_requests, token, max_retries, retry_timeout): + not_found_commits.append(commit) + + return not_found_commits, pull_requests + + +# Get users for all unknown commits and pull requests. +def get_users_info(pull_requests, commits_info, token, max_retries, retry_timeout): + + users = {} + + def update_user(user): + if user not in users: + query = 'users/{}'.format(user) + resp = github_api_get_json(query, token, max_retries, retry_timeout) + users[user] = resp + + for pull_request in pull_requests.values(): + update_user(pull_request['user']) + + for commit_info in commits_info.values(): + if 'author' in commit_info and commit_info['author'] is not None: + update_user(commit_info['committer']['login']) + else: + logging.warning('Not found author for commit %s.', commit_info['html_url']) + + return users + + +# List of unknown commits -> text description. +def process_unknown_commits(commits, commits_info, users): + + pattern = 'Commit: [{}]({})\nAuthor: {}\nMessage: {}' + + texts = [] + + for commit in commits: + info = commits_info[commit] + html_url = info['html_url'] + msg = info['commit']['message'] + + # GitHub login + login = info['author']['login'] + name = None + + # Firstly, try get name from github user + try: + name = users[login]['name'] + except: + pass + + # Then, try get name from commit + if not name: + try: + name = info['commit']['author']['name'] + except: + pass + + author = '[{}]({})'.format(name or login, info['author']['html_url']) + texts.append(pattern.format(commit, html_url, author, msg)) + + text = 'Commits which are not from any pull request:\n\n' + return text + '\n\n'.join(texts) + + +# List of pull requests -> text description. +def process_pull_requests(pull_requests, users, repo): + groups = {} + + for id, item in pull_requests.items(): + lines = list(filter(len, map(lambda x: x.strip(), item['description'].split('\n')))) + + cat_pos = None + short_descr_pos = None + long_descr_pos = None + + if lines: + for i in range(len(lines) - 1): + if re.match('^\**Category\s*(\(leave one\))*:*\**\s*$', lines[i]): + cat_pos = i + if re.match('^\**\s*Short description', lines[i]): + short_descr_pos = i + if re.match('^\**\s*Detailed description', lines[i]): + long_descr_pos = i + + cat = '' + if cat_pos: + # TODO: Sometimes have more than one + cat = lines[cat_pos + 1] + cat = cat.strip().lstrip('-').strip() + + short_descr = '' + if short_descr_pos: + short_descr_end = long_descr_pos or len(lines) + short_descr = lines[short_descr_pos + 1] + if short_descr_pos + 2 != short_descr_end: + short_descr += ' ...' + + # TODO: Add detailed description somewhere + + pattern = u"{} [#{}]({}) ({})" + link = 'https://github.com/{}/pull/{}'.format(repo, id) + author = 'author not found' + if item['user'] in users: + # TODO get user from any commit if no user name on github + user = users[item['user']] + author = u'[{}]({})'.format(user['name'] or user['login'], user['html_url']) + + if cat not in groups: + groups[cat] = [] + groups[cat].append(pattern.format(short_descr, id, link, author)) + + texts = [] + for group, text in groups.items(): + items = [u'* {}'.format(pr) for pr in text] + texts.append(u'### {}\n{}'.format(group if group else u'[No category]', '\n'.join(items))) + + return '\n\n'.join(texts) + + +# Load inner state. For debug purposes. +def load_state(state_file, base_sha, new_tag, prev_tag): + + state = {} + + if state_file: + try: + if os.path.exists(state_file): + logging.info('Reading state from %', state_file) + with codecs.open(state_file, encoding='utf-8') as f: + state = json.loads(f.read()) + else: + logging.info('State file does not exist. Will create new one.') + except Exception as e: + logging.warning('Cannot load state from %s. Reason: %s', state_file, str(e)) + + if state: + if 'base_sha' not in state or 'new_tag' not in state or 'prev_tag' not in state: + logging.warning('Invalid state. Will create new one.') + elif state['base_sha'] == base_sha and state['new_tag'] == new_tag and state['prev_tag'] == prev_tag: + logging.info('State loaded.') + else: + logging.info('Loaded state has different tags or merge base sha. Will create new state.') + state = {} + + return state + + +# Save inner state. For debug purposes. +def save_state(state_file, state): + with codecs.open(state_file, 'w', encoding='utf-8') as f: + f.write(json.dumps(state, indent=4, separators=(',', ': '))) + + +def make_changelog(new_tag, prev_tag, repo, repo_folder, state_file, token, max_retries, retry_timeout): + + base_sha = get_merge_base(new_tag, prev_tag, repo_folder) + logging.info('Base sha: %s', base_sha) + + # Step 1. Get commits from merge_base to new_tag HEAD. + # Result is a list of commits + map with commits info (author, message) + commits_info = {} + commits = [] + is_commits_loaded = False + + # Step 2. For each commit check if it is from any pull request (using github search api). + # Result is a list of unknown commits + map with pull request info (author, description). + unknown_commits = [] + pull_requests = {} + is_pull_requests_loaded = False + + # Step 3. Map users with their info (Name) + users = {} + is_users_loaded = False + + # Step 4. Make changelog text from data above. + + state = load_state(state_file, base_sha, new_tag, prev_tag) + + if state: + + if 'commits' in state and 'commits_info' in state: + logging.info('Loading commits from %s', state_file) + commits_info = state['commits_info'] + commits = state['commits'] + is_commits_loaded = True + + if 'pull_requests' in state and 'unknown_commits' in state: + logging.info('Loading pull requests from %s', state_file) + unknown_commits = state['unknown_commits'] + pull_requests = state['pull_requests'] + is_pull_requests_loaded = True + + if 'users' in state: + logging.info('Loading users requests from %s', state_file) + users = state['users'] + is_users_loaded = True + + state['base_sha'] = base_sha + state['new_tag'] = new_tag + state['prev_tag'] = prev_tag + + if not is_commits_loaded: + logging.info('Getting commits using github api.') + commits = get_commits_from_branch(repo, new_tag, base_sha, commits_info, 100, token, max_retries, retry_timeout) + state['commits'] = commits + state['commits_info'] = commits_info + + logging.info('Found %d commits from %s to %s.\n', len(commits), new_tag, base_sha) + save_state(state_file, state) + + if not is_pull_requests_loaded: + logging.info('Searching for pull requests using github api.') + unknown_commits, pull_requests = find_pull_requests(commits, token, max_retries, retry_timeout) + state['unknown_commits'] = unknown_commits + state['pull_requests'] = pull_requests + + logging.info('Found %d pull requests and %d unknown commits.\n', len(pull_requests), len(unknown_commits)) + save_state(state_file, state) + + if not is_users_loaded: + logging.info('Getting users info using github api.') + users = get_users_info(pull_requests, commits_info, token, max_retries, retry_timeout) + state['users'] = users + + logging.info('Found %d users.', len(users)) + save_state(state_file, state) + + print(process_pull_requests(pull_requests, users, repo)) + print(process_unknown_commits(unknown_commits, commits_info, users)) + + +if __name__ == '__main__': + + parser = argparse.ArgumentParser(description='Make changelog.') + parser.add_argument('prev_release_tag', help='Git tag from previous release.') + parser.add_argument('new_release_tag', help='Git tag for new release.') + parser.add_argument('--token', help='Github token. Use it to increase github api query limit. ') + parser.add_argument('--directory', help='ClickHouse repo directory. Script dir by default.') + parser.add_argument('--state', help='File to dump inner states result.', default='changelog_state.json') + parser.add_argument('--repo', help='ClockHouse repo on GitHub.', default='yandex/ClickHouse') + parser.add_argument('--max_retry', default=100, type=int, + help='Max number of retries pre api query in case of API rate limit exceeded error.') + parser.add_argument('--retry_timeout', help='Timeout after retry in seconds.', type=int, default=5) + + args = parser.parse_args() + prev_release_tag = args.prev_release_tag + new_release_tag = args.new_release_tag + token = args.token or '' + repo_folder = args.directory or SCRIPT_DIR + state_file = args.state + repo = args.repo + max_retry = args.max_retry + retry_timeout = args.retry_timeout + + logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s') + + repo_folder = os.path.expanduser(repo_folder) + + make_changelog(new_release_tag, prev_release_tag, repo, repo_folder, state_file, token, max_retry, retry_timeout) diff --git a/utils/s3tools/s3uploader b/utils/s3tools/s3uploader new file mode 100755 index 00000000000..db3f7cb2335 --- /dev/null +++ b/utils/s3tools/s3uploader @@ -0,0 +1,132 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +import os +import logging +import argparse +import tarfile +import math + +try: + from boto.s3.connection import S3Connection +except ImportError: + raise ImportError("You have to install boto package 'pip install boto'") + + +class S3API(object): + def __init__(self, access_key, secret_access_key, mds_api, mds_url): + self.connection = S3Connection( + host=mds_api, + aws_access_key_id=access_key, + aws_secret_access_key=secret_access_key, + ) + self.mds_url = mds_url + + def upload_file(self, bucket_name, file_path, s3_path): + logging.info("Start uploading file to bucket %s", bucket_name) + bucket = self.connection.get_bucket(bucket_name) + key = bucket.initiate_multipart_upload(s3_path) + logging.info("Will upload to s3 path %s", s3_path) + chunksize = 1024 * 1024 * 1024 # 1 GB + filesize = os.stat(file_path).st_size + logging.info("File size if %s", filesize) + chunkcount = int(math.ceil(filesize / chunksize)) + + def call_back(x, y): + print "Uploaded {}/{} bytes".format(x, y) + try: + for i in range(chunkcount + 1): + logging.info("Uploading chunk %s of %s", i, chunkcount + 1) + offset = chunksize * i + bytes_size = min(chunksize, filesize - offset) + with open(file_path, 'r') as fp: + fp.seek(offset) + key.upload_part_from_file(fp=fp, part_num=i+1, + size=bytes_size, cb=call_back, + num_cb=100) + key.complete_upload() + except Exception as ex: + key.cancel_upload() + raise ex + logging.info("Contents were set") + return "https://{bucket}.{mds_url}/{path}".format( + bucket=bucket_name, mds_url=self.mds_url, path=s3_path) + + +def make_tar_file_for_table(clickhouse_data_path, db_name, table_name, + tmp_prefix): + + relative_data_path = os.path.join('data', db_name, table_name) + relative_meta_path = os.path.join('metadata', db_name, table_name + '.sql') + path_to_data = os.path.join(clickhouse_data_path, relative_data_path) + path_to_metadata = os.path.join(clickhouse_data_path, relative_meta_path) + temporary_file_name = tmp_prefix + '/{tname}.tar'.format(tname=table_name) + with tarfile.open(temporary_file_name, "w") as bundle: + bundle.add(path_to_data, arcname=relative_data_path) + bundle.add(path_to_metadata, arcname=relative_meta_path) + return temporary_file_name + + +USAGE_EXAMPLES = ''' +examples: +\ts3uploader --dataset-name some_ds --access-key-id XXX --secret-access-key YYY --clickhouse-data-path /opt/clickhouse/ --table-name default.some_tbl --bucket-name some-bucket +\ts3uploader --dataset-name some_ds --access-key-id XXX --secret-access-key YYY --file-name some_ds.tsv.xz --bucket-name some-bucket +''' + +if __name__ == "__main__": + logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s') + + parser = argparse.ArgumentParser( + description="Simple tool for uploading datasets to clickhouse S3", + usage='%(prog)s [options] {}'.format(USAGE_EXAMPLES)) + parser.add_argument('--s3-api-url', default='s3.mds.yandex.net') + parser.add_argument('--s3-common-url', default='s3.yandex.net') + parser.add_argument('--bucket-name', default='clickhouse-datasets') + parser.add_argument('--dataset-name', required=True, + help='Name of dataset, will be used in uploaded path') + parser.add_argument('--access-key-id', required=True) + parser.add_argument('--secret-access-key', required=True) + parser.add_argument('--clickhouse-data-path', + default='/var/lib/clickhouse/', + help='Path to clickhouse database on filesystem') + parser.add_argument('--s3-path', help='Path in s3, where to upload file') + parser.add_argument('--tmp-prefix', default='/tmp', + help='Prefix to store temporay downloaded file') + data_group = parser.add_mutually_exclusive_group(required=True) + data_group.add_argument('--table-name', + help='Name of table with database, if you are uploading partitions') + data_group.add_argument('--file-path', + help='Name of file, if you are uploading') + args = parser.parse_args() + + if args.table_name is not None and args.clickhouse_data_path is None: + raise argparse.ArgumentError( + "You should specify --clickhouse-data-path to upload --table") + + s3_conn = S3API( + args.access_key_id, args.secret_access_key, + args.s3_api_url, args.s3_common_url) + + if args.table_name is not None: + if '.' not in args.table_name: + db_name = 'default' + else: + db_name, table_name = args.table_name.split('.') + file_path = make_tar_file_for_table( + args.clickhouse_data_path, db_name, table_name, args.tmp_prefix) + else: + file_path = args.file_path + + if 'tsv' in file_path: + s3_path = os.path.join( + args.dataset_name, 'tsv', os.path.basename(file_path)) + elif args.table_name is not None: + s3_path = os.path.join( + args.dataset_name, 'partitions', os.path.basename(file_path)) + elif args.s3_path is not None: + s3_path = os.path.join( + args.dataset_name, s3_path, os.path.base_name(file_path)) + else: + raise Exception("Don't know s3-path to upload") + + url = s3_conn.upload_file(args.bucket_name, file_path, s3_path) + logging.info("Data uploaded: %s", url)