Merge branch 'master' of github.com:yandex/ClickHouse

This commit is contained in:
Ivan Blinkov 2019-01-30 13:42:47 +03:00
commit a4d43cd0c3
78 changed files with 2252 additions and 362 deletions

View File

@ -1 +0,0 @@

View File

@ -1,139 +1,113 @@
## ClickHouse release 19.1.6, 2019-01-24
### Backward Incompatible Change
* Removed `ALTER MODIFY PRIMARY KEY` command because it was superseded by the `ALTER MODIFY ORDER BY` command. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([ztlpn](https://github.com/ztlpn))
### New Features
* Add ability to choose per column codecs for storage log and tiny log. [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin))
* Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([bgranvea](https://github.com/bgranvea))
* Add custom compression codecs. [#3899](https://github.com/yandex/ClickHouse/pull/3899) ([alesapin](https://github.com/alesapin))
* Custom per column compression codecs for tables. [#3899](https://github.com/yandex/ClickHouse/pull/3899) [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin), [Winter Zhang](https://github.com/zhang2014), [Anatoly](https://github.com/Sindbag))
* Added compression codec `Delta`. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin))
* Allow to `ALTER` compression codecs. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin))
* Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub` for SQL standard compatibility. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([Ivan Blinkov](https://github.com/blinkov))
* Support for write in `HDFS` tables and `hdfs` table function. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
* Added functions to search for multiple constant strings from big haystack: `multiPosition`, `multiSearch` ,`firstMatch` also with `-UTF8`, `-CaseInsensitive`, and `-CaseInsensitiveUTF8` variants. [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([Danila Kutenin](https://github.com/danlark1))
* Pruning of unused shards if `SELECT` query filters by sharding key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([Ivan](https://github.com/abyss7))
* Allow `Kafka` engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([Ivan](https://github.com/abyss7))
* Added support for `CatBoost` multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([Boris Granveaud](https://github.com/bgranvea))
* Added hashing functions `xxHash64` and `xxHash32`. [#3905](https://github.com/yandex/ClickHouse/pull/3905) ([filimonov](https://github.com/filimonov))
* Added multiple joins emulation (very experimental). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([4ertus2](https://github.com/4ertus2))
* Added support for CatBoost multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Added gccHash function which uses the same hash seed as [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundy-li](https://github.com/sundy-li))
* Added compression codec delta. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin))
* Added multi searcher to search from multiple constant strings from big haystack. Added functions (`multiPosition`, `multiSearch` ,`firstMatch`) * (` `, `UTF8`, `CaseInsensitive`, `CaseInsensitiveUTF8`) [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([danlark1](https://github.com/danlark1))
* Added ability to alter compression codecs. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin))
* Add ability to write data into HDFS and small refactoring. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
* Removed some redundant objects from compiled expressions cache (optimization). [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin))
* Added functions `JavaHash`, `HiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365))
* Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub`. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([blinkov](https://github.com/blinkov))
* Added function `remoteSecure`. Function works as `remote`, but uses secure connection. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller))
* Added `gccMurmurHash` hashing function (GCC flavoured Murmur hash) which uses the same hash seed as [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundyli](https://github.com/sundy-li))
* Added hashing functions `javaHash`, `hiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365))
* Added table function `remoteSecure`. Function works as `remote`, but uses secure connection. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller))
### Experimental features
* Added multiple JOINs emulation (`allow_experimental_multiple_joins_emulation` setting). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([Artem Zuikov](https://github.com/4ertus2))
### Bug Fixes
* Make `compiled_expression_cache_size` setting limited by default to lower memory consumption. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin))
* Fix a bug that led to hangups in threads that perform ALTERs of Replicated tables and in the thread that updates configuration from ZooKeeper. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3891](https://github.com/yandex/ClickHouse/issues/3891) [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([Alex Zatelepin](https://github.com/ztlpn))
* Fixed a race condition when executing a distributed ALTER task. The race condition led to more than one replica trying to execute the task and all replicas except one failing with a ZooKeeper error. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([Alex Zatelepin](https://github.com/ztlpn))
* Fix a bug when `from_zk` config elements weren't refreshed after a request to ZooKeeper timed out. [#2947](https://github.com/yandex/ClickHouse/issues/2947) [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([Alex Zatelepin](https://github.com/ztlpn))
* Fix bug with wrong prefix for IPv4 subnet masks. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin))
* Fixed crash (`std::terminate`) in rare cases when a new thread cannot be created due to exhausted resources. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug when in `remote` table function execution when wrong restrictions were used for in `getStructureOfRemoteTable`. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin))
* Fix a leak of netlink sockets. They were placed in a pool where they were never deleted and new sockets were created at the start of a new thread when all current sockets were in use. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([Alex Zatelepin](https://github.com/ztlpn))
* Fix bug with closing `/proc/self/fd` directory earlier than all fds were read from `/proc` after forking `odbc-bridge` subprocess. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin))
* Fixed String to UInt monotonic conversion in case of usage String in primary key. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([Winter Zhang](https://github.com/zhang2014))
* Fixed error in calculation of integer conversion function monotonicity. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed segfault in `arrayEnumerateUniq`, `arrayEnumerateDense` functions in case of some invalid arguments. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix UB in StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([Amos Bird](https://github.com/amosbird))
* Fixed segfault in functions `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error: functions `round`, `floor`, `trunc`, `ceil` may return bogus result when executed on integer argument and large negative scale. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed a bug induced by 'kill query sync' which leads to a core dump. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([muVulDeePecker](https://github.com/fancyqlx))
* Fix bug with long delay after empty replication queue. [#3928](https://github.com/yandex/ClickHouse/pull/3928) [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin))
* Fixed excessive memory usage in case of inserting into table with `LowCardinality` primary key. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed `LowCardinality` serialization for `Native` format in case of empty arrays. [#3907](https://github.com/yandex/ClickHouse/issues/3907) [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed incorrect result while using distinct by single LowCardinality numeric column. [#3895](https://github.com/yandex/ClickHouse/issues/3895) [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed specialized aggregation with LowCardinality key (in case when `compile` setting is enabled). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fix user and password forwarding for replicated tables queries. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin)) ([小路](https://github.com/nicelulu))
* Fixed very rare race condition that can happen when listing tables in Dictionary database while reloading dictionaries. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed incorrect result when HAVING was used with ROLLUP or CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([Sam Chou](https://github.com/reflection))
* Fixed column aliases for query with `JOIN ON` syntax and distributed tables. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([Winter Zhang](https://github.com/zhang2014))
* Fixed error in internal implementation of `quantileTDigest` (found by Artem Vakhrushev). This error never happens in ClickHouse and was relevant only for those who use ClickHouse codebase as a library directly. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Improvements
* Support for IF NOT EXISTS in ALTER TABLE ADD COLUMN statements, and for IF EXISTS in DROP/MODIFY/CLEAR/COMMENT COLUMN. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([bgranvea](https://github.com/bgranvea))
* Support for `IF NOT EXISTS` in `ALTER TABLE ADD COLUMN` statements along with `IF EXISTS` in `DROP/MODIFY/CLEAR/COMMENT COLUMN`. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([Boris Granveaud](https://github.com/bgranvea))
* Function `parseDateTimeBestEffort`: support for formats `DD.MM.YYYY`, `DD.MM.YY`, `DD-MM-YYYY`, `DD-Mon-YYYY`, `DD/Month/YYYY` and similar. [#3922](https://github.com/yandex/ClickHouse/pull/3922) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add a MergeTree setting `use_minimalistic_part_header_in_zookeeper`. If enabled, Replicated tables will store compact part metadata in a single part znode. This can dramatically reduce ZooKeeper snapshot size (especially if the tables have a lot of columns). Note that after enabling this setting you will not be able to downgrade to a version that doesn't support it. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([ztlpn](https://github.com/ztlpn))
* Add an DFA-based implementation for functions `sequenceMatch` and `sequenceCount` in case pattern doesn't contain time. [#\](https://github.com/yandex/ClickHouse/pull/4004) ([ercolanelli-leo](https://github.com/ercolanelli-leo))
* Changed the way CapnProtoInputStream creates actions in such a way that it now support structures that are jagged. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Miniwoffer](https://github.com/Miniwoffer))
* Better way to collect columns, tables and joins from AST when checking required columns. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([4ertus2](https://github.com/4ertus2))
* Zero left padding PODArray so that -1 element is always valid and zeroed. It's used for branchless Offset access. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([amosbird](https://github.com/amosbird))
* Performance improvement for int serialization. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([amosbird](https://github.com/amosbird))
* Moved debian/ specific entries to debian/.gitignore [#4106](https://github.com/yandex/ClickHouse/pull/4106) ([gerasiov](https://github.com/gerasiov))
* Decreased the number of connections in case of large number of Distributed tables in a single server. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([zhang2014](https://github.com/zhang2014))
* Supported totals row for `WITH TOTALS` query for ODBC driver (ODBCDriver2 format). [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([nightweb](https://github.com/nightweb))
* Better constant expression folding. Possibility to skip unused shards if SELECT query filters by sharding_key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([abyss7](https://github.com/abyss7))
* Do not log from odbc-bridge when there is no console. [#3857](https://github.com/yandex/ClickHouse/pull/3857) ([alesapin](https://github.com/alesapin))
* Forbid using aggregate functions inside scalar subqueries. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([abyss7](https://github.com/abyss7))
* Added ability to use Enums as integers inside if function. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([abyss7](https://github.com/abyss7))
* Added `low_cardinality_allow_in_native_format` setting. If disabled, do not use `LowCadrinality` type in native format. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Removed duplicate code. [#3915](https://github.com/yandex/ClickHouse/pull/3915) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
* Minor improvements in StorageKafka. [#3919](https://github.com/yandex/ClickHouse/pull/3919) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Automatically disable logs in negative tests. [#3940](https://github.com/yandex/ClickHouse/pull/3940) ([4ertus2](https://github.com/4ertus2))
* Refactored SyntaxAnalyzer. [#4014](https://github.com/yandex/ClickHouse/pull/4014) ([4ertus2](https://github.com/4ertus2))
* Reverted jemalloc patch which lead to performance degradation. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Refactored QueryNormalizer. Unified column sources for ASTIdentifier and ASTQualifiedAsterisk (were different), removed column duplicates for ASTQualifiedAsterisk sources, cleared asterisks replacement. [#4031](https://github.com/yandex/ClickHouse/pull/4031) ([4ertus2](https://github.com/4ertus2))
* Refactored code with ASTIdentifier. [#4056](https://github.com/yandex/ClickHouse/pull/4056) [#4077](https://github.com/yandex/ClickHouse/pull/4077) [#4087](https://github.com/yandex/ClickHouse/pull/4087) ([4ertus2](https://github.com/4ertus2))
* Improve error message in `clickhouse-test` script when no ClickHouse binary was found. [#4130](https://github.com/yandex/ClickHouse/pull/4130) ([Miniwoffer](https://github.com/Miniwoffer))
* Rewrited code to calculate integer conversion function monotonicity. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed typos in comments. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([kvinty](https://github.com/kvinty))
* `CapnProtoInputStream` now support jagged structures. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Odin Hultgren Van Der Horst](https://github.com/Miniwoffer))
* Usability improvement: added a check that server process is started from the data directory's owner. Do not allow to start server from root if the data belongs to non-root user. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
* Better logic of checking required columns during analysis of queries with JOINs. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([Artem Zuikov](https://github.com/4ertus2))
* Decreased the number of connections in case of large number of Distributed tables in a single server. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([Winter Zhang](https://github.com/zhang2014))
* Supported totals row for `WITH TOTALS` query for ODBC driver. [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([Maksim Koritckiy](https://github.com/nightweb))
* Allowed to use `Enum`s as integers inside if function. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([Ivan](https://github.com/abyss7))
* Added `low_cardinality_allow_in_native_format` setting. If disabled, do not use `LowCadrinality` type in `Native` format. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Removed some redundant objects from compiled expressions cache to lower memory usage. [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin))
* Add check that `SET send_logs_level = 'value'` query accept appropriate value. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([Sabyanin Maxim](https://github.com/s-mx))
* Fixed data type check in type conversion functions. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([Winter Zhang](https://github.com/zhang2014))
### Performance Improvements
* Add a MergeTree setting `use_minimalistic_part_header_in_zookeeper`. If enabled, Replicated tables will store compact part metadata in a single part znode. This can dramatically reduce ZooKeeper snapshot size (especially if the tables have a lot of columns). Note that after enabling this setting you will not be able to downgrade to a version that doesn't support it. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([Alex Zatelepin](https://github.com/ztlpn))
* Add an DFA-based implementation for functions `sequenceMatch` and `sequenceCount` in case pattern doesn't contain time. [#4004](https://github.com/yandex/ClickHouse/pull/4004) ([Léo Ercolanelli](https://github.com/ercolanelli-leo))
* Performance improvement for integer numbers serialization. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([Amos Bird](https://github.com/amosbird))
* Zero left padding PODArray so that -1 element is always valid and zeroed. It's used for branchless calculation of offsets. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([Amos Bird](https://github.com/amosbird))
* Reverted `jemalloc` version which lead to performance degradation. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Backward Incompatible Changes
* Removed undocumented feature `ALTER MODIFY PRIMARY KEY` because it was superseded by the `ALTER MODIFY ORDER BY` command. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([Alex Zatelepin](https://github.com/ztlpn))
* Removed function `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Forbid using scalar subqueries with result of type `AggregateFunction`. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([Ivan](https://github.com/abyss7))
### Build/Testing/Packaging Improvements
* Added minimal support for powerpc build. [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([danlark1](https://github.com/danlark1))
* Fixed error when the server cannot start with the `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` message within Docker or systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Updated `mariadb-client` library. Fixed one of issues found by UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Some fixes for UBSan builds. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Move docker images to 18.10 and add compatibility file for glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin))
* Add env variable if user don't want to chown directories in server docker image. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin))
* Added support for PowerPC (`ppc64le`) build. [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([Danila Kutenin](https://github.com/danlark1))
* Stateful functional tests are run on public available dataset. [#3969](https://github.com/yandex/ClickHouse/pull/3969) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Enabled most of the warnings from `-Weverything` in clang. Enabled `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Link to libLLVM rather than to individual LLVM libs when USE_STATIC_LIBRARIES is off. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([orivej](https://github.com/orivej))
* Added a few more warnings that are available only in clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error when the server cannot start with the `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` message within Docker or systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Updated `rdkafka` library to v1.0.0-RC5. Used cppkafka instead of raw C interface. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([Ivan](https://github.com/abyss7))
* Updated `mariadb-client` library. Fixed one of issues found by UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Some fixes for UBSan builds. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3021](https://github.com/yandex/ClickHouse/pull/3021) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added per-commit runs of tests with UBSan build.
* Added per-commit runs of PVS-Studio static analyzer.
* Fixed bugs found by PVS-Studio. [#4013](https://github.com/yandex/ClickHouse/pull/4013) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed glibc compatibility issues. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Move Docker images to 18.10 and add compatibility file for glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin))
* Add env variable if user don't want to chown directories in server Docker image. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin))
* Enabled most of the warnings from `-Weverything` in clang. Enabled `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added a few more warnings that are available only in clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Link to `libLLVM` rather than to individual LLVM libs when using shared linking. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([Orivej Desh](https://github.com/orivej))
* Added sanitizer variables for test images. [#4072](https://github.com/yandex/ClickHouse/pull/4072) ([alesapin](https://github.com/alesapin))
* clickhouse-server debian package will recommend `libcap2-bin` package to use `setcap` tool for setting capabilities. This is optional. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov))
* `clickhouse-server` debian package will recommend `libcap2-bin` package to use `setcap` tool for setting capabilities. This is optional. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Improved compilation time, fixed includes. [#3898](https://github.com/yandex/ClickHouse/pull/3898) ([proller](https://github.com/proller))
* Added performance tests for hash functions. [#3918](https://github.com/yandex/ClickHouse/pull/3918) ([filimonov](https://github.com/filimonov))
* Fixed cyclic library dependences. [#3958](https://github.com/yandex/ClickHouse/pull/3958) ([proller](https://github.com/proller))
* Improved compilation with low available memory. [#4030](https://github.com/yandex/ClickHouse/pull/4030) ([proller](https://github.com/proller))
* Added test script to reproduce performance degradation in `jemalloc`. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed misspells in comments and string literals under `dbms`. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha))
* Fixed typos in comments. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([Evgenii Pravda](https://github.com/kvinty))
### Bug Fixes
* Fix bug when in remote table function execution when wrong restrictions were used for in `getStructureOfRemoteTable`. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin))
* Fix a leak of netlink sockets. They were placed in a pool where they were never deleted and new sockets were created at the start of a new thread when all current sockets were in use. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([ztlpn](https://github.com/ztlpn))
* Regression in master. Fix "Unknown identifier" error in case column names appear in lambdas. [#4115](https://github.com/yandex/ClickHouse/pull/4115) ([4ertus2](https://github.com/4ertus2))
* Fix bug with closing /proc/self/fd earlier than all fds were read from /proc. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin))
* Fixed misspells in **comments** and **string literals** under `dbms`. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha))
* Fixed String to UInt monotonic conversion in case of usage String in primary key. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([zhang2014](https://github.com/zhang2014))
* Add checking that 'SET send_logs_level = value' query accept appropriate value. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([s-mx](https://github.com/s-mx))
* Fixed a race condition when executing a distributed ALTER task. The race condition led to more than one replica trying to execute the task and all replicas except one failing with a ZooKeeper error. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([ztlpn](https://github.com/ztlpn))
* Fixed segfault in `arrayEnumerateUniq`, `arrayEnumerateDense` functions in case of some invalid arguments. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix UB in StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([amosbird](https://github.com/amosbird))
* Fixed segfault in functions `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error: functions `round`, `floor`, `trunc`, `ceil` may return bogus result when executed on integer argument and large negative scale. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed a bug introduced by 'kill query sync' which leads to a core dump. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([fancyqlx](https://github.com/fancyqlx))
* Fix bug with long delay after empty replication queue. [#3928](https://github.com/yandex/ClickHouse/pull/3928) ([alesapin](https://github.com/alesapin))
* Don't do exponential backoff when there is nothing to do for task. [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin))
* Fix a bug that led to hangups in threads that perform ALTERs of Replicated tables and in the thread that updates configuration from ZooKeeper. #2947 #3891 [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([ztlpn](https://github.com/ztlpn))
* Fixed error in internal implementation of `quantileTDigest` (found by Artem Vakhrushev). This error never happens in ClickHouse and was relevant only for those who use ClickHouse codebase as a library directly. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug with wrong prefix for ipv4 subnet masks. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin))
* Fix a bug when `from_zk` config elements weren't refreshed after a request to ZooKeeper timed out. #2947 [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([ztlpn](https://github.com/ztlpn))
* Fixed dictionary copying at LowCardinality::cloneEmpty() method which lead to excessive memory usage in case of inserting into table with LowCardinality primary key. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed crash (`std::terminate`) in rare cases when a new thread cannot be created due to exhausted resources. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix user and password forwarding for replicated tables queries. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin))
* Fixed very rare race condition that can happen when listing tables in Dictionary database while reloading dictionaries. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed LowCardinality serialization for Native format in case of empty arrays. #3907 [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed incorrect result while using distinct by single LowCardinality numeric column. #3895 [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Make compiled_expression_cache_size setting limited by default. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin))
* Fix ubsan bug in compression codecs. [#4069](https://github.com/yandex/ClickHouse/pull/4069) ([alesapin](https://github.com/alesapin))
* Allow Kafka Engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([abyss7](https://github.com/abyss7))
* Fixed glibc compatibility issues. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed issues found by PVS-Studio. [#4103](https://github.com/yandex/ClickHouse/pull/4103) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix a way how to collect array join columns. [#4121](https://github.com/yandex/ClickHouse/pull/4121) ([4ertus2](https://github.com/4ertus2))
* Fixed incorrect result when HAVING was used with ROLLUP or CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([reflection](https://github.com/reflection))
* Fixed specialized aggregation with LowCardinality key (in case when `compile` setting is enabled). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed data type check in type conversion functions. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([zhang2014](https://github.com/zhang2014))
* Fixed column aliases for query with `JOIN ON` syntax and distributed tables. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([zhang2014](https://github.com/zhang2014))
* Fixed issues detected by UBSan. [#3021](https://github.com/yandex/ClickHouse/pull/3021) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Doc fixes
* Translated table engines related part to Chinese. [#3844](https://github.com/yandex/ClickHouse/pull/3844) ([lamber-ken](https://github.com/lamber-ken))
* Fixed `toStartOfFiveMinute` description. [#4096](https://github.com/yandex/ClickHouse/pull/4096) ([cheesedosa](https://github.com/cheesedosa))
* Added description for client `--secure` argument. [#3961](https://github.com/yandex/ClickHouse/pull/3961) ([vicdashkov](https://github.com/vicdashkov))
* Added descriptions for settings `merge_tree_uniform_read_distribution`, `merge_tree_min_rows_for_concurrent_read`, `merge_tree_min_rows_for_seek`, `merge_tree_coarse_index_granularity`, `merge_tree_max_rows_to_use_cache` [#4024](https://github.com/yandex/ClickHouse/pull/4024) ([BayoNet](https://github.com/BayoNet))
* Minor doc fixes. [#4098](https://github.com/yandex/ClickHouse/pull/4098) ([blinkov](https://github.com/blinkov))
* Updated example for zookeeper config setting. [#3883](https://github.com/yandex/ClickHouse/pull/3883) [#3894](https://github.com/yandex/ClickHouse/pull/3894) ([ogorbacheva](https://github.com/ogorbacheva))
* Updated info about escaping in formats Vertical, Pretty and VerticalRaw. [#4118](https://github.com/yandex/ClickHouse/pull/4118) ([ogorbacheva](https://github.com/ogorbacheva))
* Adding description of the functions for working with UUID. [#4059](https://github.com/yandex/ClickHouse/pull/4059) ([ogorbacheva](https://github.com/ogorbacheva))
* Add the description of the CHECK TABLE query. [#3881](https://github.com/yandex/ClickHouse/pull/3881) [#4043](https://github.com/yandex/ClickHouse/pull/4043) ([ogorbacheva](https://github.com/ogorbacheva))
* Add `zh/tests` doc translate to Chinese. [#4034](https://github.com/yandex/ClickHouse/pull/4034) ([sundy-li](https://github.com/sundy-li))
* Added documentation about functions `multiPosition`, `firstMatch`, `multiSearch`. [#4123](https://github.com/yandex/ClickHouse/pull/4123) ([danlark1](https://github.com/danlark1))
* Add puppet module to the list of the third party libraries. [#3862](https://github.com/yandex/ClickHouse/pull/3862) ([Felixoid](https://github.com/Felixoid))
* Fixed typo in the English version of Creating a Table example [#3872](https://github.com/yandex/ClickHouse/pull/3872) ([areldar](https://github.com/areldar))
* Mention about nagios plugin for ClickHouse [#3878](https://github.com/yandex/ClickHouse/pull/3878) ([lisuml](https://github.com/lisuml))
* Update of query language syntax description. [#4065](https://github.com/yandex/ClickHouse/pull/4065) ([BayoNet](https://github.com/BayoNet))
* Added documentation for per-column compression codecs. [#4073](https://github.com/yandex/ClickHouse/pull/4073) ([alex-krash](https://github.com/alex-krash))
* Updated articles about CollapsingMergeTree, GraphiteMergeTree, Replicated*MergeTree, `CREATE TABLE` query [#4085](https://github.com/yandex/ClickHouse/pull/4085) ([BayoNet](https://github.com/BayoNet))
* Other minor improvements. [#3897](https://github.com/yandex/ClickHouse/pull/3897) [#3923](https://github.com/yandex/ClickHouse/pull/3923) [#4066](https://github.com/yandex/ClickHouse/pull/4066) [#3860](https://github.com/yandex/ClickHouse/pull/3860) [#3906](https://github.com/yandex/ClickHouse/pull/3906) [#3936](https://github.com/yandex/ClickHouse/pull/3936) [#3975](https://github.com/yandex/ClickHouse/pull/3975) ([ogorbacheva](https://github.com/ogorbacheva)) ([ogorbacheva](https://github.com/ogorbacheva)) ([ogorbacheva](https://github.com/ogorbacheva)) ([blinkov](https://github.com/blinkov)) ([blinkov](https://github.com/blinkov)) ([sdk2](https://github.com/sdk2)) ([blinkov](https://github.com/blinkov))
### Other
* Updated librdkafka to v1.0.0-RC5. Used cppkafka instead of raw C interface. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([abyss7](https://github.com/abyss7))
* Fixed `hidden` on page title [#4033](https://github.com/yandex/ClickHouse/pull/4033) ([xboston](https://github.com/xboston))
* Updated year in copyright to 2019. [#4039](https://github.com/yandex/ClickHouse/pull/4039) ([xboston](https://github.com/xboston))
* Added check that server process is started from the data directory's owner. Do not start server from root. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
* Removed function `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed typo in ClusterCopier. [#3854](https://github.com/yandex/ClickHouse/pull/3854) ([dqminh](https://github.com/dqminh))
* Minor grammar fixes. [#3855](https://github.com/yandex/ClickHouse/pull/3855) ([intgr](https://github.com/intgr))
* Added test script to reproduce performance degradation in jemalloc. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov))
## ClickHouse release 18.16.1, 2018-12-21

View File

@ -3,6 +3,21 @@ cmake_minimum_required (VERSION 3.3)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules/")
option(ENABLE_IPO "Enable inter-procedural optimization (aka LTO)" OFF) # need cmake 3.9+
if(ENABLE_IPO)
cmake_policy(SET CMP0069 NEW)
include(CheckIPOSupported)
check_ipo_supported(RESULT IPO_SUPPORTED OUTPUT IPO_NOT_SUPPORTED)
if(IPO_SUPPORTED)
message(STATUS "IPO/LTO is supported, enabling")
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)
else()
message(STATUS "IPO/LTO is not supported: <${IPO_NOT_SUPPORTED}>")
endif()
else()
message(STATUS "IPO/LTO not enabled.")
endif()
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
# Require at least gcc 7
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7 AND NOT CMAKE_VERSION VERSION_LESS 2.8.9)
@ -120,7 +135,9 @@ else()
message(STATUS "Disabling compiler -pipe option (have only ${AVAILABLE_PHYSICAL_MEMORY} mb of memory)")
endif()
include (cmake/test_cpu.cmake)
if(NOT DISABLE_CPU_OPTIMIZE)
include(cmake/test_cpu.cmake)
endif()
if(NOT COMPILER_CLANG) # clang: error: the clang compiler does not support '-march=native'
option(ARCH_NATIVE "Enable -march=native compiler flag" ${ARCH_ARM})
@ -229,7 +246,10 @@ include (cmake/find_re2.cmake)
include (cmake/find_rdkafka.cmake)
include (cmake/find_capnp.cmake)
include (cmake/find_llvm.cmake)
include (cmake/find_cpuid.cmake)
include (cmake/find_cpuid.cmake) # Freebsd, bundled
if (NOT USE_CPUID)
include (cmake/find_cpuinfo.cmake) # Debian
endif()
include (cmake/find_libgsasl.cmake)
include (cmake/find_libxml2.cmake)
include (cmake/find_protobuf.cmake)

View File

@ -28,7 +28,7 @@ find_library(METROHASH_LIBRARIES
find_path(METROHASH_INCLUDE_DIR
NAMES metrohash.h
PATHS ${METROHASH_ROOT_DIR}/include ${METROHASH_INCLUDE_PATHS}
PATHS ${METROHASH_ROOT_DIR}/include PATH_SUFFIXES metrohash ${METROHASH_INCLUDE_PATHS}
)
include(FindPackageHandleStandardArgs)

View File

@ -2,11 +2,11 @@ if (NOT ARCH_ARM)
option (USE_INTERNAL_CPUID_LIBRARY "Set to FALSE to use system cpuid library instead of bundled" ${NOT_UNBUNDLED})
endif ()
#if (USE_INTERNAL_CPUID_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcpuid/include/cpuid/libcpuid.h")
# message (WARNING "submodule contrib/libcpuid is missing. to fix try run: \n git submodule update --init --recursive")
# set (USE_INTERNAL_CPUID_LIBRARY 0)
# set (MISSING_INTERNAL_CPUID_LIBRARY 1)
#endif ()
if (USE_INTERNAL_CPUID_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcpuid/CMakeLists.txt")
message (WARNING "submodule contrib/libcpuid is missing. to fix try run: \n git submodule update --init --recursive")
set (USE_INTERNAL_CPUID_LIBRARY 0)
set (MISSING_INTERNAL_CPUID_LIBRARY 1)
endif ()
if (NOT USE_INTERNAL_CPUID_LIBRARY)
find_library (CPUID_LIBRARY cpuid)
@ -20,10 +20,12 @@ if (CPUID_LIBRARY AND CPUID_INCLUDE_DIR)
add_definitions(-DHAVE_STDINT_H)
# TODO: make virtual target cpuid:cpuid with COMPILE_DEFINITIONS property
endif ()
set (USE_CPUID 1)
elseif (NOT MISSING_INTERNAL_CPUID_LIBRARY)
set (CPUID_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libcpuid/include)
set (USE_INTERNAL_CPUID_LIBRARY 1)
set (CPUID_LIBRARY cpuid)
set (USE_CPUID 1)
endif ()
message (STATUS "Using cpuid: ${CPUID_INCLUDE_DIR} : ${CPUID_LIBRARY}")
message (STATUS "Using cpuid=${USE_CPUID}: ${CPUID_INCLUDE_DIR} : ${CPUID_LIBRARY}")

17
cmake/find_cpuinfo.cmake Normal file
View File

@ -0,0 +1,17 @@
option(USE_INTERNAL_CPUINFO_LIBRARY "Set to FALSE to use system cpuinfo library instead of bundled" ${NOT_UNBUNDLED})
if(NOT USE_INTERNAL_CPUINFO_LIBRARY)
find_library(CPUINFO_LIBRARY cpuinfo)
find_path(CPUINFO_INCLUDE_DIR NAMES cpuinfo.h PATHS ${CPUINFO_INCLUDE_PATHS})
endif()
if(CPUID_LIBRARY AND CPUID_INCLUDE_DIR)
set(USE_CPUINFO 1)
elseif(NOT MISSING_INTERNAL_CPUINFO_LIBRARY)
set(CPUINFO_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libcpuinfo/include)
set(USE_INTERNAL_CPUINFO_LIBRARY 1)
set(CPUINFO_LIBRARY cpuinfo)
set(USE_CPUINFO 1)
endif()
message(STATUS "Using cpuinfo=${USE_CPUINFO}: ${CPUINFO_INCLUDE_DIR} : ${CPUINFO_LIBRARY}")

View File

@ -8,18 +8,22 @@ if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest/CMakeList
set (MISSING_INTERNAL_GTEST_LIBRARY 1)
endif ()
if (NOT USE_INTERNAL_GTEST_LIBRARY)
find_package (GTest)
endif ()
if (NOT GTEST_INCLUDE_DIRS AND NOT MISSING_INTERNAL_GTEST_LIBRARY)
if(NOT USE_INTERNAL_GTEST_LIBRARY)
# TODO: autodetect of GTEST_SRC_DIR by EXISTS /usr/src/googletest/CMakeLists.txt
if(NOT GTEST_SRC_DIR)
find_package(GTest)
endif()
endif()
if (NOT GTEST_SRC_DIR AND NOT GTEST_INCLUDE_DIRS AND NOT MISSING_INTERNAL_GTEST_LIBRARY)
set (USE_INTERNAL_GTEST_LIBRARY 1)
set (GTEST_MAIN_LIBRARIES gtest_main)
set (GTEST_INCLUDE_DIRS ${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest)
endif ()
if(GTEST_INCLUDE_DIRS AND GTEST_MAIN_LIBRARIES)
if((GTEST_INCLUDE_DIRS AND GTEST_MAIN_LIBRARIES) OR GTEST_SRC_DIR)
set(USE_GTEST 1)
endif()
message (STATUS "Using gtest=${USE_GTEST}: ${GTEST_INCLUDE_DIRS} : ${GTEST_MAIN_LIBRARIES}")
message (STATUS "Using gtest=${USE_GTEST}: ${GTEST_INCLUDE_DIRS} : ${GTEST_MAIN_LIBRARIES} : ${GTEST_SRC_DIR}")

View File

@ -2,4 +2,5 @@ set(DIVIDE_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libdivide)
set(COMMON_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/libs/libcommon/include ${ClickHouse_BINARY_DIR}/libs/libcommon/include)
set(DBMS_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/dbms/src ${ClickHouse_BINARY_DIR}/dbms/src)
set(DOUBLE_CONVERSION_CONTRIB_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/double-conversion)
set(METROHASH_CONTRIB_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libmetrohash/src)
set(PCG_RANDOM_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libpcg-random/include)

View File

@ -107,6 +107,11 @@ if (USE_INTERNAL_SSL_LIBRARY)
if (NOT MAKE_STATIC_LIBRARIES)
set (BUILD_SHARED 1)
endif ()
# By default, ${CMAKE_INSTALL_PREFIX}/etc/ssl is selected - that is not what we need.
# We need to use system wide ssl directory.
set (OPENSSLDIR "/etc/ssl")
set (LIBRESSL_SKIP_INSTALL 1 CACHE INTERNAL "")
add_subdirectory (ssl)
target_include_directories(${OPENSSL_CRYPTO_LIBRARY} SYSTEM PUBLIC ${OPENSSL_INCLUDE_DIR})
@ -166,13 +171,16 @@ if (USE_INTERNAL_POCO_LIBRARY)
endif ()
endif ()
if (USE_INTERNAL_GTEST_LIBRARY)
if(USE_INTERNAL_GTEST_LIBRARY)
# Google Test from sources
add_subdirectory(${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest ${CMAKE_CURRENT_BINARY_DIR}/googletest)
# avoid problems with <regexp.h>
target_compile_definitions (gtest INTERFACE GTEST_HAS_POSIX_RE=0)
target_include_directories (gtest SYSTEM INTERFACE ${ClickHouse_SOURCE_DIR}/contrib/googletest/include)
endif ()
elseif(GTEST_SRC_DIR)
add_subdirectory(${GTEST_SRC_DIR}/googletest ${CMAKE_CURRENT_BINARY_DIR}/googletest)
target_compile_definitions(gtest INTERFACE GTEST_HAS_POSIX_RE=0)
endif()
if (USE_INTERNAL_LLVM_LIBRARY)
file(GENERATE OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/empty.cpp CONTENT " ")

View File

@ -1,5 +1,5 @@
if (HAVE_SSE42) # Not used. Pretty easy to port.
set (SOURCES_SSE42_ONLY src/metrohash128crc.cpp)
set (SOURCES_SSE42_ONLY src/metrohash128crc.cpp src/metrohash128crc.h)
endif ()
add_library(metrohash

View File

@ -1,22 +1,201 @@
The MIT License (MIT)
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Copyright (c) 2015 J. Andrew Rogers
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
1. Definitions.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -5,12 +5,44 @@ MetroHash is a set of state-of-the-art hash functions for *non-cryptographic* us
* Fastest general-purpose functions for bulk hashing.
* Fastest general-purpose functions for small, variable length keys.
* Robust statistical bias profile, similar to the MD5 cryptographic hash.
* Hashes can be constructed incrementally (**new**)
* 64-bit, 128-bit, and 128-bit CRC variants currently available.
* Optimized for modern x86-64 microarchitectures.
* Elegant, compact, readable functions.
You can read more about the design and history [here](http://www.jandrewrogers.com/2015/05/27/metrohash/).
## News
### 23 October 2018
The project has been re-licensed under Apache License v2.0. The purpose of this license change is consistency with the imminent release of MetroHash v2.0, which is also licensed under the Apache license.
### 27 July 2015
Two new 64-bit and 128-bit algorithms add the ability to construct hashes incrementally. In addition to supporting incremental construction, the algorithms are slightly superior to the prior versions.
A big change is that these new algorithms are implemented as C++ classes that support both incremental and stateless hashing. These classes also have a static method for verifying the implementation against the test vectors built into the classes. Implementations are now fully contained by their respective headers e.g. "metrohash128.h".
*Note: an incremental version of the 128-bit CRC version is on its way but is not included in this push.*
**Usage Example For Stateless Hashing**
`MetroHash128::Hash(key, key_length, hash_ptr, seed)`
**Usage Example For Incremental Hashing**
`MetroHash128 hasher;`
`hasher.Update(partial_key, partial_key_length);`
`...`
`hasher.Update(partial_key, partial_key_length);`
`hasher.Finalize(hash_ptr);`
An `Initialize(seed)` method allows the hasher objects to be reused.
### 27 May 2015
Six hash functions have been included in the initial release:
* 64-bit hash functions, "metrohash64_1" and "metrohash64_2"

View File

@ -1,7 +1,4 @@
origin: git@github.com:jandrewrogers/MetroHash.git
commit d9dee18a54a8a6766e24c1950b814ac7ca9d1a89
Merge: 761e8a4 3d06b24
origin: https://github.com/jandrewrogers/MetroHash.git
commit 690a521d9beb2e1050cc8f273fdabc13b31bf8f6 tag: v1.1.3
Author: J. Andrew Rogers <andrew@jarbox.org>
Date: Sat Jun 6 16:12:06 2015 -0700
modified README
Date: Tue Oct 23 09:49:53 2018 -0700

View File

@ -1,73 +1,24 @@
// metrohash.h
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_METROHASH_H
#define METROHASH_METROHASH_H
#include <stdint.h>
#include <string.h>
// MetroHash 64-bit hash functions
void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash64_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
// MetroHash 128-bit hash functions
void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
// MetroHash 128-bit hash functions using CRC instruction
void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128crc_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
/* rotate right idiom recognized by compiler*/
inline static uint64_t rotate_right(uint64_t v, unsigned k)
{
return (v >> k) | (v << (64 - k));
}
// unaligned reads, fast and safe on Nehalem and later microarchitectures
inline static uint64_t read_u64(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint64_t*>(ptr));
}
inline static uint64_t read_u32(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint32_t*>(ptr));
}
inline static uint64_t read_u16(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint16_t*>(ptr));
}
inline static uint64_t read_u8 (const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint8_t *>(ptr));
}
#include "metrohash64.h"
#include "metrohash128.h"
#include "metrohash128crc.h"
#endif // #ifndef METROHASH_METROHASH_H

View File

@ -1,29 +1,260 @@
// metrohash128.cpp
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <string.h>
#include "platform.h"
#include "metrohash128.h"
const char * MetroHash128::test_string = "012345678901234567890123456789012345678901234567890123456789012";
const uint8_t MetroHash128::test_seed_0[16] = {
0xC7, 0x7C, 0xE2, 0xBF, 0xA4, 0xED, 0x9F, 0x9B,
0x05, 0x48, 0xB2, 0xAC, 0x50, 0x74, 0xA2, 0x97
};
const uint8_t MetroHash128::test_seed_1[16] = {
0x45, 0xA3, 0xCD, 0xB8, 0x38, 0x19, 0x9D, 0x7F,
0xBD, 0xD6, 0x8D, 0x86, 0x7A, 0x14, 0xEC, 0xEF
};
MetroHash128::MetroHash128(const uint64_t seed)
{
Initialize(seed);
}
void MetroHash128::Initialize(const uint64_t seed)
{
// initialize internal hash registers
state.v[0] = (static_cast<uint64_t>(seed) - k0) * k3;
state.v[1] = (static_cast<uint64_t>(seed) + k1) * k2;
state.v[2] = (static_cast<uint64_t>(seed) + k0) * k2;
state.v[3] = (static_cast<uint64_t>(seed) - k1) * k3;
// initialize total length of input
bytes = 0;
}
void MetroHash128::Update(const uint8_t * const buffer, const uint64_t length)
{
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
const uint8_t * const end = ptr + length;
// input buffer may be partially filled
if (bytes % 32)
{
uint64_t fill = 32 - (bytes % 32);
if (fill > length)
fill = length;
memcpy(input.b + (bytes % 32), ptr, static_cast<size_t>(fill));
ptr += fill;
bytes += fill;
// input buffer is still partially filled
if ((bytes % 32) != 0) return;
// process full input buffer
state.v[0] += read_u64(&input.b[ 0]) * k0; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
state.v[1] += read_u64(&input.b[ 8]) * k1; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
state.v[2] += read_u64(&input.b[16]) * k2; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
state.v[3] += read_u64(&input.b[24]) * k3; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
}
// bulk update
bytes += (end - ptr);
while (ptr <= (end - 32))
{
// process directly from the source, bypassing the input buffer
state.v[0] += read_u64(ptr) * k0; ptr += 8; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
state.v[1] += read_u64(ptr) * k1; ptr += 8; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
state.v[2] += read_u64(ptr) * k2; ptr += 8; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
state.v[3] += read_u64(ptr) * k3; ptr += 8; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
}
// store remaining bytes in input buffer
if (ptr < end)
memcpy(input.b, ptr, end - ptr);
}
void MetroHash128::Finalize(uint8_t * const hash)
{
// finalize bulk loop, if used
if (bytes >= 32)
{
state.v[2] ^= rotate_right(((state.v[0] + state.v[3]) * k0) + state.v[1], 21) * k1;
state.v[3] ^= rotate_right(((state.v[1] + state.v[2]) * k1) + state.v[0], 21) * k0;
state.v[0] ^= rotate_right(((state.v[0] + state.v[2]) * k0) + state.v[3], 21) * k1;
state.v[1] ^= rotate_right(((state.v[1] + state.v[3]) * k1) + state.v[2], 21) * k0;
}
// process any bytes remaining in the input buffer
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(input.b);
const uint8_t * const end = ptr + (bytes % 32);
if ((end - ptr) >= 16)
{
state.v[0] += read_u64(ptr) * k2; ptr += 8; state.v[0] = rotate_right(state.v[0],33) * k3;
state.v[1] += read_u64(ptr) * k2; ptr += 8; state.v[1] = rotate_right(state.v[1],33) * k3;
state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 45) * k1;
state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 45) * k0;
}
if ((end - ptr) >= 8)
{
state.v[0] += read_u64(ptr) * k2; ptr += 8; state.v[0] = rotate_right(state.v[0],33) * k3;
state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 27) * k1;
}
if ((end - ptr) >= 4)
{
state.v[1] += read_u32(ptr) * k2; ptr += 4; state.v[1] = rotate_right(state.v[1],33) * k3;
state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 46) * k0;
}
if ((end - ptr) >= 2)
{
state.v[0] += read_u16(ptr) * k2; ptr += 2; state.v[0] = rotate_right(state.v[0],33) * k3;
state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 22) * k1;
}
if ((end - ptr) >= 1)
{
state.v[1] += read_u8 (ptr) * k2; state.v[1] = rotate_right(state.v[1],33) * k3;
state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 58) * k0;
}
state.v[0] += rotate_right((state.v[0] * k0) + state.v[1], 13);
state.v[1] += rotate_right((state.v[1] * k1) + state.v[0], 37);
state.v[0] += rotate_right((state.v[0] * k2) + state.v[1], 13);
state.v[1] += rotate_right((state.v[1] * k3) + state.v[0], 37);
bytes = 0;
// do any endian conversion here
memcpy(hash, state.v, 16);
}
void MetroHash128::Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed)
{
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
const uint8_t * const end = ptr + length;
uint64_t v[4];
v[0] = (static_cast<uint64_t>(seed) - k0) * k3;
v[1] = (static_cast<uint64_t>(seed) + k1) * k2;
if (length >= 32)
{
v[2] = (static_cast<uint64_t>(seed) + k0) * k2;
v[3] = (static_cast<uint64_t>(seed) - k1) * k3;
do
{
v[0] += read_u64(ptr) * k0; ptr += 8; v[0] = rotate_right(v[0],29) + v[2];
v[1] += read_u64(ptr) * k1; ptr += 8; v[1] = rotate_right(v[1],29) + v[3];
v[2] += read_u64(ptr) * k2; ptr += 8; v[2] = rotate_right(v[2],29) + v[0];
v[3] += read_u64(ptr) * k3; ptr += 8; v[3] = rotate_right(v[3],29) + v[1];
}
while (ptr <= (end - 32));
v[2] ^= rotate_right(((v[0] + v[3]) * k0) + v[1], 21) * k1;
v[3] ^= rotate_right(((v[1] + v[2]) * k1) + v[0], 21) * k0;
v[0] ^= rotate_right(((v[0] + v[2]) * k0) + v[3], 21) * k1;
v[1] ^= rotate_right(((v[1] + v[3]) * k1) + v[2], 21) * k0;
}
if ((end - ptr) >= 16)
{
v[0] += read_u64(ptr) * k2; ptr += 8; v[0] = rotate_right(v[0],33) * k3;
v[1] += read_u64(ptr) * k2; ptr += 8; v[1] = rotate_right(v[1],33) * k3;
v[0] ^= rotate_right((v[0] * k2) + v[1], 45) * k1;
v[1] ^= rotate_right((v[1] * k3) + v[0], 45) * k0;
}
if ((end - ptr) >= 8)
{
v[0] += read_u64(ptr) * k2; ptr += 8; v[0] = rotate_right(v[0],33) * k3;
v[0] ^= rotate_right((v[0] * k2) + v[1], 27) * k1;
}
if ((end - ptr) >= 4)
{
v[1] += read_u32(ptr) * k2; ptr += 4; v[1] = rotate_right(v[1],33) * k3;
v[1] ^= rotate_right((v[1] * k3) + v[0], 46) * k0;
}
if ((end - ptr) >= 2)
{
v[0] += read_u16(ptr) * k2; ptr += 2; v[0] = rotate_right(v[0],33) * k3;
v[0] ^= rotate_right((v[0] * k2) + v[1], 22) * k1;
}
if ((end - ptr) >= 1)
{
v[1] += read_u8 (ptr) * k2; v[1] = rotate_right(v[1],33) * k3;
v[1] ^= rotate_right((v[1] * k3) + v[0], 58) * k0;
}
v[0] += rotate_right((v[0] * k0) + v[1], 13);
v[1] += rotate_right((v[1] * k1) + v[0], 37);
v[0] += rotate_right((v[0] * k2) + v[1], 13);
v[1] += rotate_right((v[1] * k3) + v[0], 37);
// do any endian conversion here
memcpy(hash, v, 16);
}
bool MetroHash128::ImplementationVerified()
{
uint8_t hash[16];
const uint8_t * key = reinterpret_cast<const uint8_t *>(MetroHash128::test_string);
// verify one-shot implementation
MetroHash128::Hash(key, strlen(MetroHash128::test_string), hash, 0);
if (memcmp(hash, MetroHash128::test_seed_0, 16) != 0) return false;
MetroHash128::Hash(key, strlen(MetroHash128::test_string), hash, 1);
if (memcmp(hash, MetroHash128::test_seed_1, 16) != 0) return false;
// verify incremental implementation
MetroHash128 metro;
metro.Initialize(0);
metro.Update(reinterpret_cast<const uint8_t *>(MetroHash128::test_string), strlen(MetroHash128::test_string));
metro.Finalize(hash);
if (memcmp(hash, MetroHash128::test_seed_0, 16) != 0) return false;
metro.Initialize(1);
metro.Update(reinterpret_cast<const uint8_t *>(MetroHash128::test_string), strlen(MetroHash128::test_string));
metro.Finalize(hash);
if (memcmp(hash, MetroHash128::test_seed_1, 16) != 0) return false;
return true;
}
#include "metrohash.h"
void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out)
{
@ -97,6 +328,8 @@ void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t *
v[0] += rotate_right((v[0] * k2) + v[1], 13);
v[1] += rotate_right((v[1] * k3) + v[0], 37);
// do any endian conversion here
memcpy(out, v, 16);
}
@ -173,6 +406,8 @@ void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t *
v[0] += rotate_right((v[0] * k2) + v[1], 33);
v[1] += rotate_right((v[1] * k3) + v[0], 33);
// do any endian conversion here
memcpy(out, v, 16);
}

View File

@ -0,0 +1,72 @@
// metrohash128.h
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_METROHASH_128_H
#define METROHASH_METROHASH_128_H
#include <stdint.h>
class MetroHash128
{
public:
static const uint32_t bits = 128;
// Constructor initializes the same as Initialize()
MetroHash128(const uint64_t seed=0);
// Initializes internal state for new hash with optional seed
void Initialize(const uint64_t seed=0);
// Update the hash state with a string of bytes. If the length
// is sufficiently long, the implementation switches to a bulk
// hashing algorithm directly on the argument buffer for speed.
void Update(const uint8_t * buffer, const uint64_t length);
// Constructs the final hash and writes it to the argument buffer.
// After a hash is finalized, this instance must be Initialized()-ed
// again or the behavior of Update() and Finalize() is undefined.
void Finalize(uint8_t * const hash);
// A non-incremental function implementation. This can be significantly
// faster than the incremental implementation for some usage patterns.
static void Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed=0);
// Does implementation correctly execute test vectors?
static bool ImplementationVerified();
// test vectors -- Hash(test_string, seed=0) => test_seed_0
static const char * test_string;
static const uint8_t test_seed_0[16];
static const uint8_t test_seed_1[16];
private:
static const uint64_t k0 = 0xC83A91E1;
static const uint64_t k1 = 0x8648DBDB;
static const uint64_t k2 = 0x7BDEC03B;
static const uint64_t k3 = 0x2F5870A5;
struct { uint64_t v[4]; } state;
struct { uint8_t b[32]; } input;
uint64_t bytes;
};
// Legacy 128-bit hash functions -- do not use
void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
#endif // #ifndef METROHASH_METROHASH_128_H

View File

@ -1,31 +1,24 @@
// metrohash128crc.cpp
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "metrohash.h"
#include <nmmintrin.h>
#include <string.h>
#include "metrohash.h"
#include "platform.h"
void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out)

View File

@ -0,0 +1,27 @@
// metrohash128crc.h
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_METROHASH_128_CRC_H
#define METROHASH_METROHASH_128_CRC_H
#include <stdint.h>
// Legacy 128-bit hash functions
void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128crc_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
#endif // #ifndef METROHASH_METROHASH_128_CRC_H

View File

@ -1,29 +1,257 @@
// metrohash64.cpp
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "platform.h"
#include "metrohash64.h"
#include <cstring>
const char * MetroHash64::test_string = "012345678901234567890123456789012345678901234567890123456789012";
const uint8_t MetroHash64::test_seed_0[8] = { 0x6B, 0x75, 0x3D, 0xAE, 0x06, 0x70, 0x4B, 0xAD };
const uint8_t MetroHash64::test_seed_1[8] = { 0x3B, 0x0D, 0x48, 0x1C, 0xF4, 0xB9, 0xB8, 0xDF };
MetroHash64::MetroHash64(const uint64_t seed)
{
Initialize(seed);
}
void MetroHash64::Initialize(const uint64_t seed)
{
vseed = (static_cast<uint64_t>(seed) + k2) * k0;
// initialize internal hash registers
state.v[0] = vseed;
state.v[1] = vseed;
state.v[2] = vseed;
state.v[3] = vseed;
// initialize total length of input
bytes = 0;
}
void MetroHash64::Update(const uint8_t * const buffer, const uint64_t length)
{
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
const uint8_t * const end = ptr + length;
// input buffer may be partially filled
if (bytes % 32)
{
uint64_t fill = 32 - (bytes % 32);
if (fill > length)
fill = length;
memcpy(input.b + (bytes % 32), ptr, static_cast<size_t>(fill));
ptr += fill;
bytes += fill;
// input buffer is still partially filled
if ((bytes % 32) != 0) return;
// process full input buffer
state.v[0] += read_u64(&input.b[ 0]) * k0; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
state.v[1] += read_u64(&input.b[ 8]) * k1; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
state.v[2] += read_u64(&input.b[16]) * k2; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
state.v[3] += read_u64(&input.b[24]) * k3; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
}
// bulk update
bytes += static_cast<uint64_t>(end - ptr);
while (ptr <= (end - 32))
{
// process directly from the source, bypassing the input buffer
state.v[0] += read_u64(ptr) * k0; ptr += 8; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
state.v[1] += read_u64(ptr) * k1; ptr += 8; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
state.v[2] += read_u64(ptr) * k2; ptr += 8; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
state.v[3] += read_u64(ptr) * k3; ptr += 8; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
}
// store remaining bytes in input buffer
if (ptr < end)
memcpy(input.b, ptr, static_cast<size_t>(end - ptr));
}
void MetroHash64::Finalize(uint8_t * const hash)
{
// finalize bulk loop, if used
if (bytes >= 32)
{
state.v[2] ^= rotate_right(((state.v[0] + state.v[3]) * k0) + state.v[1], 37) * k1;
state.v[3] ^= rotate_right(((state.v[1] + state.v[2]) * k1) + state.v[0], 37) * k0;
state.v[0] ^= rotate_right(((state.v[0] + state.v[2]) * k0) + state.v[3], 37) * k1;
state.v[1] ^= rotate_right(((state.v[1] + state.v[3]) * k1) + state.v[2], 37) * k0;
state.v[0] = vseed + (state.v[0] ^ state.v[1]);
}
// process any bytes remaining in the input buffer
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(input.b);
const uint8_t * const end = ptr + (bytes % 32);
if ((end - ptr) >= 16)
{
state.v[1] = state.v[0] + (read_u64(ptr) * k2); ptr += 8; state.v[1] = rotate_right(state.v[1],29) * k3;
state.v[2] = state.v[0] + (read_u64(ptr) * k2); ptr += 8; state.v[2] = rotate_right(state.v[2],29) * k3;
state.v[1] ^= rotate_right(state.v[1] * k0, 21) + state.v[2];
state.v[2] ^= rotate_right(state.v[2] * k3, 21) + state.v[1];
state.v[0] += state.v[2];
}
if ((end - ptr) >= 8)
{
state.v[0] += read_u64(ptr) * k3; ptr += 8;
state.v[0] ^= rotate_right(state.v[0], 55) * k1;
}
if ((end - ptr) >= 4)
{
state.v[0] += read_u32(ptr) * k3; ptr += 4;
state.v[0] ^= rotate_right(state.v[0], 26) * k1;
}
if ((end - ptr) >= 2)
{
state.v[0] += read_u16(ptr) * k3; ptr += 2;
state.v[0] ^= rotate_right(state.v[0], 48) * k1;
}
if ((end - ptr) >= 1)
{
state.v[0] += read_u8 (ptr) * k3;
state.v[0] ^= rotate_right(state.v[0], 37) * k1;
}
state.v[0] ^= rotate_right(state.v[0], 28);
state.v[0] *= k0;
state.v[0] ^= rotate_right(state.v[0], 29);
bytes = 0;
// do any endian conversion here
memcpy(hash, state.v, 8);
}
void MetroHash64::Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed)
{
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
const uint8_t * const end = ptr + length;
uint64_t h = (static_cast<uint64_t>(seed) + k2) * k0;
if (length >= 32)
{
uint64_t v[4];
v[0] = h;
v[1] = h;
v[2] = h;
v[3] = h;
do
{
v[0] += read_u64(ptr) * k0; ptr += 8; v[0] = rotate_right(v[0],29) + v[2];
v[1] += read_u64(ptr) * k1; ptr += 8; v[1] = rotate_right(v[1],29) + v[3];
v[2] += read_u64(ptr) * k2; ptr += 8; v[2] = rotate_right(v[2],29) + v[0];
v[3] += read_u64(ptr) * k3; ptr += 8; v[3] = rotate_right(v[3],29) + v[1];
}
while (ptr <= (end - 32));
v[2] ^= rotate_right(((v[0] + v[3]) * k0) + v[1], 37) * k1;
v[3] ^= rotate_right(((v[1] + v[2]) * k1) + v[0], 37) * k0;
v[0] ^= rotate_right(((v[0] + v[2]) * k0) + v[3], 37) * k1;
v[1] ^= rotate_right(((v[1] + v[3]) * k1) + v[2], 37) * k0;
h += v[0] ^ v[1];
}
if ((end - ptr) >= 16)
{
uint64_t v0 = h + (read_u64(ptr) * k2); ptr += 8; v0 = rotate_right(v0,29) * k3;
uint64_t v1 = h + (read_u64(ptr) * k2); ptr += 8; v1 = rotate_right(v1,29) * k3;
v0 ^= rotate_right(v0 * k0, 21) + v1;
v1 ^= rotate_right(v1 * k3, 21) + v0;
h += v1;
}
if ((end - ptr) >= 8)
{
h += read_u64(ptr) * k3; ptr += 8;
h ^= rotate_right(h, 55) * k1;
}
if ((end - ptr) >= 4)
{
h += read_u32(ptr) * k3; ptr += 4;
h ^= rotate_right(h, 26) * k1;
}
if ((end - ptr) >= 2)
{
h += read_u16(ptr) * k3; ptr += 2;
h ^= rotate_right(h, 48) * k1;
}
if ((end - ptr) >= 1)
{
h += read_u8 (ptr) * k3;
h ^= rotate_right(h, 37) * k1;
}
h ^= rotate_right(h, 28);
h *= k0;
h ^= rotate_right(h, 29);
memcpy(hash, &h, 8);
}
bool MetroHash64::ImplementationVerified()
{
uint8_t hash[8];
const uint8_t * key = reinterpret_cast<const uint8_t *>(MetroHash64::test_string);
// verify one-shot implementation
MetroHash64::Hash(key, strlen(MetroHash64::test_string), hash, 0);
if (memcmp(hash, MetroHash64::test_seed_0, 8) != 0) return false;
MetroHash64::Hash(key, strlen(MetroHash64::test_string), hash, 1);
if (memcmp(hash, MetroHash64::test_seed_1, 8) != 0) return false;
// verify incremental implementation
MetroHash64 metro;
metro.Initialize(0);
metro.Update(reinterpret_cast<const uint8_t *>(MetroHash64::test_string), strlen(MetroHash64::test_string));
metro.Finalize(hash);
if (memcmp(hash, MetroHash64::test_seed_0, 8) != 0) return false;
metro.Initialize(1);
metro.Update(reinterpret_cast<const uint8_t *>(MetroHash64::test_string), strlen(MetroHash64::test_string));
metro.Finalize(hash);
if (memcmp(hash, MetroHash64::test_seed_1, 8) != 0) return false;
return true;
}
#include "metrohash.h"
void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out)
{

View File

@ -0,0 +1,73 @@
// metrohash64.h
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_METROHASH_64_H
#define METROHASH_METROHASH_64_H
#include <stdint.h>
class MetroHash64
{
public:
static const uint32_t bits = 64;
// Constructor initializes the same as Initialize()
MetroHash64(const uint64_t seed=0);
// Initializes internal state for new hash with optional seed
void Initialize(const uint64_t seed=0);
// Update the hash state with a string of bytes. If the length
// is sufficiently long, the implementation switches to a bulk
// hashing algorithm directly on the argument buffer for speed.
void Update(const uint8_t * buffer, const uint64_t length);
// Constructs the final hash and writes it to the argument buffer.
// After a hash is finalized, this instance must be Initialized()-ed
// again or the behavior of Update() and Finalize() is undefined.
void Finalize(uint8_t * const hash);
// A non-incremental function implementation. This can be significantly
// faster than the incremental implementation for some usage patterns.
static void Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed=0);
// Does implementation correctly execute test vectors?
static bool ImplementationVerified();
// test vectors -- Hash(test_string, seed=0) => test_seed_0
static const char * test_string;
static const uint8_t test_seed_0[8];
static const uint8_t test_seed_1[8];
private:
static const uint64_t k0 = 0xD6D018F5;
static const uint64_t k1 = 0xA2AA033B;
static const uint64_t k2 = 0x62992FC1;
static const uint64_t k3 = 0x30BC5B29;
struct { uint64_t v[4]; } state;
struct { uint8_t b[32]; } input;
uint64_t bytes;
uint64_t vseed;
};
// Legacy 64-bit hash functions -- do not use
void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash64_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
#endif // #ifndef METROHASH_METROHASH_64_H

View File

@ -0,0 +1,50 @@
// platform.h
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_PLATFORM_H
#define METROHASH_PLATFORM_H
#include <stdint.h>
// rotate right idiom recognized by most compilers
inline static uint64_t rotate_right(uint64_t v, unsigned k)
{
return (v >> k) | (v << (64 - k));
}
// unaligned reads, fast and safe on Nehalem and later microarchitectures
inline static uint64_t read_u64(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint64_t*>(ptr));
}
inline static uint64_t read_u32(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint32_t*>(ptr));
}
inline static uint64_t read_u16(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint16_t*>(ptr));
}
inline static uint64_t read_u8 (const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint8_t *>(ptr));
}
#endif // #ifndef METROHASH_PLATFORM_H

View File

@ -1,27 +1,18 @@
// testvector.h
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_TESTVECTOR_H
#define METROHASH_TESTVECTOR_H
@ -46,6 +37,8 @@ struct TestVectorData
static const char * test_key_63 = "012345678901234567890123456789012345678901234567890123456789012";
// The hash assumes a little-endian architecture. Treating the hash results
// as an array of uint64_t should enable conversion for big-endian implementations.
const TestVectorData TestVector [] =
{
// seed = 0

View File

@ -200,11 +200,18 @@ target_link_libraries (clickhouse_common_io
${Boost_SYSTEM_LIBRARY}
PRIVATE
apple_rt
PUBLIC
Threads::Threads
PRIVATE
${CMAKE_DL_LIBS}
)
if (NOT ARCH_ARM AND CPUID_LIBRARY)
target_link_libraries (clickhouse_common_io PRIVATE ${CPUID_LIBRARY})
if(CPUID_LIBRARY)
target_link_libraries(clickhouse_common_io PRIVATE ${CPUID_LIBRARY})
endif()
if(CPUINFO_LIBRARY)
target_link_libraries(clickhouse_common_io PRIVATE ${CPUINFO_LIBRARY})
endif()
target_link_libraries (dbms
@ -225,6 +232,7 @@ target_link_libraries (dbms
${Boost_PROGRAM_OPTIONS_LIBRARY}
PUBLIC
${Boost_SYSTEM_LIBRARY}
Threads::Threads
)
if (NOT USE_INTERNAL_RE2_LIBRARY)

View File

@ -5,4 +5,5 @@ target_include_directories (clickhouse-benchmark-lib SYSTEM PRIVATE ${PCG_RANDOM
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-benchmark clickhouse-benchmark.cpp)
target_link_libraries (clickhouse-benchmark PRIVATE clickhouse-benchmark-lib clickhouse_aggregate_functions)
install (TARGETS clickhouse-benchmark ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -7,6 +7,7 @@ endif ()
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-client clickhouse-client.cpp)
target_link_libraries (clickhouse-client PRIVATE clickhouse-client-lib)
install (TARGETS clickhouse-client ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()
install (FILES clickhouse-client.xml DESTINATION ${CLICKHOUSE_ETC_DIR}/clickhouse-client COMPONENT clickhouse-client RENAME config.xml)

View File

@ -5,4 +5,5 @@ if (CLICKHOUSE_SPLIT_BINARY)
# Also in utils
add_executable (clickhouse-compressor clickhouse-compressor.cpp)
target_link_libraries (clickhouse-compressor PRIVATE clickhouse-compressor-lib)
install (TARGETS clickhouse-compressor ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -4,4 +4,5 @@ target_link_libraries (clickhouse-copier-lib PRIVATE clickhouse-server-lib click
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-copier clickhouse-copier.cpp)
target_link_libraries (clickhouse-copier clickhouse-copier-lib)
install (TARGETS clickhouse-copier ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -4,4 +4,5 @@ target_link_libraries (clickhouse-extract-from-config-lib PRIVATE clickhouse_com
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-extract-from-config clickhouse-extract-from-config.cpp)
target_link_libraries (clickhouse-extract-from-config PRIVATE clickhouse-extract-from-config-lib)
install (TARGETS clickhouse-extract-from-config ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -3,4 +3,5 @@ target_link_libraries (clickhouse-format-lib PRIVATE dbms clickhouse_common_io c
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-format clickhouse-format.cpp)
target_link_libraries (clickhouse-format PRIVATE clickhouse-format-lib)
install (TARGETS clickhouse-format ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -4,4 +4,5 @@ target_link_libraries (clickhouse-local-lib PRIVATE clickhouse_common_io clickho
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-local clickhouse-local.cpp)
target_link_libraries (clickhouse-local PRIVATE clickhouse-local-lib)
install (TARGETS clickhouse-local ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -5,4 +5,5 @@ if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-obfuscator clickhouse-obfuscator.cpp)
set_target_properties(clickhouse-obfuscator PROPERTIES RUNTIME_OUTPUT_DIRECTORY ..)
target_link_libraries (clickhouse-obfuscator PRIVATE clickhouse-obfuscator-lib)
install (TARGETS clickhouse-obfuscator ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -36,4 +36,5 @@ endif ()
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-odbc-bridge odbc-bridge.cpp)
target_link_libraries (clickhouse-odbc-bridge PRIVATE clickhouse-odbc-bridge-lib)
install (TARGETS clickhouse-odbc-bridge ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -5,4 +5,5 @@ target_include_directories (clickhouse-performance-test-lib SYSTEM PRIVATE ${PCG
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-performance-test clickhouse-performance-test.cpp)
target_link_libraries (clickhouse-performance-test PRIVATE clickhouse-performance-test-lib)
install (TARGETS clickhouse-performance-test ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -4,5 +4,5 @@ add_headers_and_sources(clickhouse_common_config .)
add_library(clickhouse_common_config ${LINK_MODE} ${clickhouse_common_config_headers} ${clickhouse_common_config_sources})
target_link_libraries(clickhouse_common_config PUBLIC common PRIVATE clickhouse_common_zookeeper string_utils PUBLIC ${Poco_XML_LIBRARY} ${Poco_Util_LIBRARY})
target_link_libraries(clickhouse_common_config PUBLIC common PRIVATE clickhouse_common_zookeeper string_utils PUBLIC ${Poco_XML_LIBRARY} ${Poco_Util_LIBRARY} Threads::Threads)
target_include_directories(clickhouse_common_config PUBLIC ${DBMS_INCLUDE_DIR})

View File

@ -4,7 +4,7 @@ add_headers_and_sources(clickhouse_common_zookeeper .)
add_library(clickhouse_common_zookeeper ${LINK_MODE} ${clickhouse_common_zookeeper_headers} ${clickhouse_common_zookeeper_sources})
target_link_libraries (clickhouse_common_zookeeper PUBLIC clickhouse_common_io common PRIVATE string_utils PUBLIC ${Poco_Util_LIBRARY})
target_link_libraries (clickhouse_common_zookeeper PUBLIC clickhouse_common_io common PRIVATE string_utils PUBLIC ${Poco_Util_LIBRARY} Threads::Threads)
target_include_directories(clickhouse_common_zookeeper PUBLIC ${DBMS_INCLUDE_DIR})
if (ENABLE_TESTS)

View File

@ -18,6 +18,8 @@
#cmakedefine01 USE_XXHASH
#cmakedefine01 USE_INTERNAL_LLVM_LIBRARY
#cmakedefine01 USE_PROTOBUF
#cmakedefine01 USE_CPUID
#cmakedefine01 USE_CPUINFO
#cmakedefine01 CLICKHOUSE_SPLIT_BINARY
#cmakedefine01 LLVM_HAS_RTTI

View File

@ -1,19 +1,20 @@
#include <Common/getNumberOfPhysicalCPUCores.h>
#include <thread>
#if defined(__x86_64__)
#include <libcpuid/libcpuid.h>
#include <Common/Exception.h>
#include <Common/config.h>
#if USE_CPUID
# include <libcpuid/libcpuid.h>
# include <Common/Exception.h>
namespace DB { namespace ErrorCodes { extern const int CPUID_ERROR; }}
#elif USE_CPUINFO
# include <cpuinfo.h>
#endif
unsigned getNumberOfPhysicalCPUCores()
{
#if defined(__x86_64__)
#if USE_CPUID
cpu_raw_data_t raw_data;
if (0 != cpuid_get_raw_data(&raw_data))
throw DB::Exception("Cannot cpuid_get_raw_data: " + std::string(cpuid_error()), DB::ErrorCodes::CPUID_ERROR);
@ -37,6 +38,13 @@ unsigned getNumberOfPhysicalCPUCores()
if (res != 0)
return res;
#elif USE_CPUINFO
uint32_t cores = 0;
if (cpuinfo_initialize())
cores = cpuinfo_get_cores_count();
if (cores)
return cores;
#endif
/// As a fallback (also for non-x86 architectures) assume there are no hyper-threading on the system.

View File

@ -48,10 +48,24 @@ ColumnPtr recursiveRemoveLowCardinality(const ColumnPtr & column)
return column;
if (const auto * column_array = typeid_cast<const ColumnArray *>(column.get()))
return ColumnArray::create(recursiveRemoveLowCardinality(column_array->getDataPtr()), column_array->getOffsetsPtr());
{
auto & data = column_array->getDataPtr();
auto data_no_lc = recursiveRemoveLowCardinality(data);
if (data.get() == data_no_lc.get())
return column;
return ColumnArray::create(data_no_lc, column_array->getOffsetsPtr());
}
if (const auto * column_const = typeid_cast<const ColumnConst *>(column.get()))
return ColumnConst::create(recursiveRemoveLowCardinality(column_const->getDataColumnPtr()), column_const->size());
{
auto & nested = column_const->getDataColumnPtr();
auto nested_no_lc = recursiveRemoveLowCardinality(nested);
if (nested.get() == nested_no_lc.get())
return column;
return ColumnConst::create(nested_no_lc, column_const->size());
}
if (const auto * column_tuple = typeid_cast<const ColumnTuple *>(column.get()))
{
@ -76,8 +90,14 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data
return column;
if (const auto * column_const = typeid_cast<const ColumnConst *>(column.get()))
return ColumnConst::create(recursiveLowCardinalityConversion(column_const->getDataColumnPtr(), from_type, to_type),
column_const->size());
{
auto & nested = column_const->getDataColumnPtr();
auto nested_no_lc = recursiveLowCardinalityConversion(nested, from_type, to_type);
if (nested.get() == nested_no_lc.get())
return column;
return ColumnConst::create(nested_no_lc, column_const->size());
}
if (const auto * low_cardinality_type = typeid_cast<const DataTypeLowCardinality *>(from_type.get()))
{
@ -125,11 +145,23 @@ ColumnPtr recursiveLowCardinalityConversion(const ColumnPtr & column, const Data
Columns columns = column_tuple->getColumns();
auto & from_elements = from_tuple_type->getElements();
auto & to_elements = to_tuple_type->getElements();
bool has_converted = false;
for (size_t i = 0; i < columns.size(); ++i)
{
auto & element = columns[i];
element = recursiveLowCardinalityConversion(element, from_elements.at(i), to_elements.at(i));
auto element_no_lc = recursiveLowCardinalityConversion(element, from_elements.at(i), to_elements.at(i));
if (element.get() != element_no_lc.get())
{
element = element_no_lc;
has_converted = true;
}
}
if (!has_converted)
return column;
return ColumnTuple::create(columns);
}
}

View File

@ -12,7 +12,7 @@ generate_code(CacheDictionary_generate3 UInt8 UInt16 UInt32 UInt64 UInt128 Int8
add_headers_and_sources(clickhouse_dictionaries ${CMAKE_CURRENT_BINARY_DIR}/generated/)
add_library(clickhouse_dictionaries ${LINK_MODE} ${clickhouse_dictionaries_sources})
target_link_libraries(clickhouse_dictionaries PRIVATE clickhouse_common_io pocoext ${MYSQLXX_LIBRARY} ${BTRIE_LIBRARIES})
target_link_libraries(clickhouse_dictionaries PRIVATE clickhouse_common_io pocoext ${MYSQLXX_LIBRARY} ${BTRIE_LIBRARIES} PUBLIC Threads::Threads)
if(Poco_SQL_FOUND AND NOT USE_INTERNAL_POCO_LIBRARY)
target_include_directories(clickhouse_dictionaries SYSTEM PRIVATE ${Poco_SQL_INCLUDE_DIR})

View File

@ -70,7 +70,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(
, query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks}
, sample_block{sample_block}
, context(context)
, is_local{isLocalAddress({host, port}, config.getInt("tcp_port", 0))}
, is_local{isLocalAddress({host, port}, context.getTCPPort())}
, pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)}
, load_all_query{query_builder.composeLoadAllQuery()}
{

View File

@ -23,7 +23,7 @@ target_link_libraries(clickhouse_functions
${OPENSSL_CRYPTO_LIBRARY}
${LZ4_LIBRARY})
target_include_directories (clickhouse_functions SYSTEM BEFORE PUBLIC ${DIVIDE_INCLUDE_DIR})
target_include_directories (clickhouse_functions SYSTEM BEFORE PUBLIC ${DIVIDE_INCLUDE_DIR} ${METROHASH_INCLUDE_DIR})
if (CONSISTENT_HASHING_INCLUDE_DIR)
target_include_directories (clickhouse_functions PRIVATE ${CONSISTENT_HASHING_INCLUDE_DIR})

View File

@ -82,12 +82,11 @@ public:
/// We do not sleep if the block is empty.
if (size > 0)
{
unsigned useconds = seconds * (variant == FunctionSleepVariant::PerBlock ? 1 : size) * 1e6;
/// When sleeping, the query cannot be cancelled. For abitily to cancel query, we limit sleep time.
if (useconds > 3000000) /// The choice is arbitrary
throw Exception("The maximum sleep time is 3000000 microseconds. Requested: " + toString(useconds), ErrorCodes::TOO_SLOW);
if (seconds > 3.0) /// The choice is arbitrary
throw Exception("The maximum sleep time is 3 seconds. Requested: " + toString(seconds), ErrorCodes::TOO_SLOW);
UInt64 useconds = seconds * (variant == FunctionSleepVariant::PerBlock ? 1 : size) * 1e6;
::usleep(useconds);
}

View File

@ -768,11 +768,12 @@ bool Aggregator::executeOnBlock(const Block & block, AggregatedDataVariants & re
materialized_columns.push_back(block.safeGetByPosition(params.keys[i]).column->convertToFullColumnIfConst());
key_columns[i] = materialized_columns.back().get();
if (const auto * low_cardinality_column = typeid_cast<const ColumnLowCardinality *>(key_columns[i]))
if (!result.isLowCardinality())
{
if (!result.isLowCardinality())
auto column_no_lc = recursiveRemoveLowCardinality(key_columns[i]->getPtr());
if (column_no_lc.get() != key_columns[i])
{
materialized_columns.push_back(low_cardinality_column->convertToFullColumn());
materialized_columns.emplace_back(std::move(column_no_lc));
key_columns[i] = materialized_columns.back().get();
}
}
@ -788,9 +789,10 @@ bool Aggregator::executeOnBlock(const Block & block, AggregatedDataVariants & re
materialized_columns.push_back(block.safeGetByPosition(params.aggregates[i].arguments[j]).column->convertToFullColumnIfConst());
aggregate_columns[i][j] = materialized_columns.back().get();
if (auto * col_low_cardinality = typeid_cast<const ColumnLowCardinality *>(aggregate_columns[i][j]))
auto column_no_lc = recursiveRemoveLowCardinality(aggregate_columns[i][j]->getPtr());
if (column_no_lc.get() != aggregate_columns[i][j])
{
materialized_columns.push_back(col_low_cardinality->convertToFullColumn());
materialized_columns.emplace_back(std::move(column_no_lc));
aggregate_columns[i][j] = materialized_columns.back().get();
}
}

View File

@ -237,12 +237,18 @@ void Join::setSampleBlock(const Block & block)
size_t keys_size = key_names_right.size();
ColumnRawPtrs key_columns(keys_size);
Columns materialized_columns(keys_size);
Columns materialized_columns;
for (size_t i = 0; i < keys_size; ++i)
{
materialized_columns[i] = recursiveRemoveLowCardinality(block.getByName(key_names_right[i]).column);
key_columns[i] = materialized_columns[i].get();
auto & column = block.getByName(key_names_right[i]).column;
key_columns[i] = column.get();
auto column_no_lc = recursiveRemoveLowCardinality(column);
if (column.get() != column_no_lc.get())
{
materialized_columns.emplace_back(std::move(column_no_lc));
key_columns[i] = materialized_columns[i].get();
}
/// We will join only keys, where all components are not NULL.
if (key_columns[i]->isColumnNullable())

View File

@ -15,6 +15,7 @@ target_include_directories (hash_map SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_
target_link_libraries (hash_map PRIVATE dbms clickhouse_compression)
add_executable (hash_map3 hash_map3.cpp)
target_include_directories(hash_map3 SYSTEM BEFORE PRIVATE ${METROHASH_INCLUDE_DIR})
target_link_libraries (hash_map3 PRIVATE dbms ${FARMHASH_LIBRARIES} ${METROHASH_LIBRARIES})
add_executable (hash_map_string hash_map_string.cpp)
@ -25,6 +26,7 @@ add_executable (hash_map_string_2 hash_map_string_2.cpp)
target_link_libraries (hash_map_string_2 PRIVATE dbms clickhouse_compression)
add_executable (hash_map_string_3 hash_map_string_3.cpp)
target_include_directories(hash_map_string_3 SYSTEM BEFORE PRIVATE ${METROHASH_INCLUDE_DIR})
target_link_libraries (hash_map_string_3 PRIVATE dbms clickhouse_compression ${FARMHASH_LIBRARIES} ${METROHASH_LIBRARIES})
add_executable (hash_map_string_small hash_map_string_small.cpp)
@ -54,5 +56,5 @@ target_link_libraries (users PRIVATE dbms clickhouse_common_config ${Boost_FILES
if (OS_LINUX)
add_executable (internal_iotop internal_iotop.cpp)
target_link_libraries (internal_iotop PRIVATE dbms)
target_link_libraries (internal_iotop PRIVATE dbms Threads::Threads)
endif ()

View File

@ -325,7 +325,7 @@ struct FarmHash64
template <void metrohash64(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out)>
struct MetroHash64
struct SMetroHash64
{
size_t operator() (StringRef x) const
{
@ -507,8 +507,8 @@ int main(int argc, char ** argv)
if (!m || m == 8) bench<StringRef, VerySimpleHash> (data, "StringRef_VerySimpleHash");
if (!m || m == 9) bench<StringRef, FarmHash64> (data, "StringRef_FarmHash64");
if (!m || m == 10) bench<StringRef, MetroHash64<metrohash64_1>>(data, "StringRef_MetroHash64_1");
if (!m || m == 11) bench<StringRef, MetroHash64<metrohash64_2>>(data, "StringRef_MetroHash64_2");
if (!m || m == 10) bench<StringRef, SMetroHash64<metrohash64_1>>(data, "StringRef_MetroHash64_1");
if (!m || m == 11) bench<StringRef, SMetroHash64<metrohash64_2>>(data, "StringRef_MetroHash64_2");
return 0;
}

View File

@ -1,15 +1,23 @@
// .cpp autogenerated by cmake
#cmakedefine01 BUILD_DETERMINISTIC
const char * auto_config_build[]
{
"VERSION_FULL", "@VERSION_FULL@",
"VERSION_DESCRIBE", "@VERSION_DESCRIBE@",
"VERSION_INTEGER", "@VERSION_INTEGER@",
#if BUILD_DETERMINISTIC
"SYSTEM", "@CMAKE_SYSTEM_NAME@",
#else
"VERSION_GITHASH", "@VERSION_GITHASH@",
"VERSION_REVISION", "@VERSION_REVISION@",
"VERSION_INTEGER", "@VERSION_INTEGER@",
"BUILD_DATE", "@BUILD_DATE@",
"BUILD_TYPE", "@CMAKE_BUILD_TYPE@",
"SYSTEM", "@CMAKE_SYSTEM@",
#endif
"BUILD_TYPE", "@CMAKE_BUILD_TYPE@",
"SYSTEM_PROCESSOR", "@CMAKE_SYSTEM_PROCESSOR@",
"LIBRARY_ARCHITECTURE", "@CMAKE_LIBRARY_ARCHITECTURE@",
"CMAKE_VERSION", "@CMAKE_VERSION@",

View File

@ -125,6 +125,7 @@ if [ -n "$*" ]; then
else
TEST_RUN=${TEST_RUN=1}
TEST_PERF=${TEST_PERF=1}
TEST_DICT=${TEST_DICT=1}
CLICKHOUSE_CLIENT_QUERY="${CLICKHOUSE_CLIENT} --config ${CLICKHOUSE_CONFIG_CLIENT} --port $CLICKHOUSE_PORT_TCP -m -n -q"
$CLICKHOUSE_CLIENT_QUERY 'SELECT * from system.build_options; SELECT * FROM system.clusters;'
CLICKHOUSE_TEST="env PATH=$PATH:$BIN_DIR ${TEST_DIR}clickhouse-test --binary ${BIN_DIR}${CLICKHOUSE_BINARY} --configclient $CLICKHOUSE_CONFIG_CLIENT --configserver $CLICKHOUSE_CONFIG --tmp $DATA_DIR/tmp --queries $QUERIES_DIR $TEST_OPT0 $TEST_OPT"
@ -139,6 +140,7 @@ else
fi
( [ "$TEST_RUN" ] && $CLICKHOUSE_TEST ) || ${TEST_TRUE:=false}
( [ "$TEST_PERF" ] && $CLICKHOUSE_PERFORMANCE_TEST $* ) || true
( [ "$TEST_DICT" ] && mkdir -p $DATA_DIR/etc/dictionaries/ && cd $CUR_DIR/external_dictionaries && python generate_and_test.py --port=$CLICKHOUSE_PORT_TCP --client=$CLICKHOUSE_CLIENT --source=$CUR_DIR/external_dictionaries/source.tsv --reference=$CUR_DIR/external_dictionaries/reference --generated=$DATA_DIR/etc/dictionaries/ --no_mysql --no_mongo ) || true
$CLICKHOUSE_CLIENT_QUERY "SELECT event, value FROM system.events; SELECT metric, value FROM system.metrics; SELECT metric, value FROM system.asynchronous_metrics;"
$CLICKHOUSE_CLIENT_QUERY "SELECT 'Still alive'"
fi

View File

@ -394,8 +394,8 @@ def generate_dictionaries(args):
</source>
<lifetime>
<min>0</min>
<max>0</max>
<min>5</min>
<max>15</max>
</lifetime>
<layout>

View File

@ -0,0 +1 @@
2019-01-14 1 ['aaa','aaa','bbb','ccc']

View File

@ -0,0 +1,33 @@
SET allow_experimental_low_cardinality_type = 1;
DROP TABLE IF EXISTS test.table1;
DROP TABLE IF EXISTS test.table2;
CREATE TABLE test.table1
(
dt Date,
id Int32,
arr Array(LowCardinality(String))
) ENGINE = MergeTree PARTITION BY toMonday(dt)
ORDER BY (dt, id) SETTINGS index_granularity = 8192;
CREATE TABLE test.table2
(
dt Date,
id Int32,
arr Array(LowCardinality(String))
) ENGINE = MergeTree PARTITION BY toMonday(dt)
ORDER BY (dt, id) SETTINGS index_granularity = 8192;
insert into test.table1 (dt, id, arr) values ('2019-01-14', 1, ['aaa']);
insert into test.table2 (dt, id, arr) values ('2019-01-14', 1, ['aaa','bbb','ccc']);
select dt, id, arraySort(groupArrayArray(arr))
from (
select dt, id, arr from test.table1
where dt = '2019-01-14' and id = 1
UNION ALL
select dt, id, arr from test.table2
where dt = '2019-01-14' and id = 1
)
group by dt, id;

View File

@ -0,0 +1 @@
SELECT sleep(4295.967296); -- { serverError 160 }

View File

@ -110,7 +110,7 @@
<table>query_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
<dictionaries_config>*_dictionary.xml</dictionaries_config>
<dictionaries_config>dictionaries/dictionary_*.xml</dictionaries_config>
<compression incl="clickhouse_compression">
</compression>
<distributed_ddl>

View File

@ -323,7 +323,7 @@ Outputs data as separate JSON objects for each row (newline delimited JSON).
Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON.
For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.
For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults, but it can be turned on by option `insert_sample_with_metadata=1`. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.
## Native {#native}

View File

@ -81,6 +81,9 @@ If an error occurred while reading rows but the error counter is still less than
If `input_format_allow_errors_ratio` is exceeded, ClickHouse throws an exception.
## insert_sample_with_metadata
For INSERT queries, specifies that the server need to send metadata about column defaults to the client. This will be used to calculate default expressions. Disabled by default.
## join_default_strictness

View File

@ -469,4 +469,64 @@ If you want to get a list of unique items in an array, you can use arrayReduce('
A special function. See the section ["ArrayJoin function"](array_join.md#functions_arrayjoin).
## arrayDifference(arr)
Takes an array, returns an array with the difference between all pairs of neighboring elements. For example:
```sql
SELECT arrayDifference([1, 2, 3, 4])
```
```
┌─arrayDifference([1, 2, 3, 4])─┐
│ [0,1,1,1] │
└───────────────────────────────┘
```
## arrayDistinct(arr)
Takes an array, returns an array containing the different elements in all the arrays. For example:
```sql
SELECT arrayDifference([1, 2, 3, 4])
```
```
┌─arrayDifference([1, 2, 3, 4])─┐
│ [0,1,1,1] │
└───────────────────────────────┘
```
## arrayEnumerateDense(arr)
Returns an array of the same size as the source array, indicating where each element first appears in the source array. For example: arrayEnumerateDense([10,20,10,30]) = [1,2,1,4].
## arrayIntersect(arr)
Takes an array, returns the intersection of all array elements. For example:
```sql
SELECT
arrayIntersect([1, 2], [1, 3], [2, 3]) AS no_intersect,
arrayIntersect([1, 2], [1, 3], [1, 4]) AS intersect
```
```
┌─no_intersect─┬─intersect─┐
│ [] │ [1] │
└──────────────┴───────────┘
```
## arrayReduce(agg_func, arr1, ...)
Applies an aggregate function to array and returns its result.If aggregate function has multiple arguments, then this function can be applied to multiple arrays of the same size.
arrayReduce('agg_func', arr1, ...) - apply the aggregate function `agg_func` to arrays `arr1...`. If multiple arrays passed, then elements on corresponding positions are passed as multiple arguments to the aggregate function. For example: SELECT arrayReduce('max', [1,2,3]) = 3
## arrayReverse(arr)
Returns an array of the same size as the source array, containing the result of inverting all elements of the source array.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/array_functions/) <!--hide-->

View File

@ -16,5 +16,16 @@ The result type is an integer with bits equal to the maximum bits of its argumen
## bitShiftRight(a, b)
## bitRotateLeft(a, b)
## bitRotateRight(a, b)
## bitTest(a, b)
## bitTestAll(a, b)
## bitTestAny(a, b)
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/bit_functions/) <!--hide-->

View File

@ -20,17 +20,29 @@ SELECT
Only time zones that differ from UTC by a whole number of hours are supported.
## toTimeZone
Convert time or date and time to the specified time zone.
## toYear
Converts a date or date with time to a UInt16 number containing the year number (AD).
## toQuarter
Converts a date or date with time to a UInt8 number containing the quarter number.
## toMonth
Converts a date or date with time to a UInt8 number containing the month number (1-12).
## toDayOfYear
Converts a date or date with time to a UInt8 number containing the number of the day of the year (1-366).
## toDayOfMonth
-Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
## toDayOfWeek
@ -50,11 +62,20 @@ Converts a date with time to a UInt8 number containing the number of the minute
Converts a date with time to a UInt8 number containing the number of the second in the minute (0-59).
Leap seconds are not accounted for.
## toUnixTimestamp
Converts a date with time to a unix timestamp.
## toMonday
Rounds down a date or date with time to the nearest Monday.
Returns the date.
## toStartOfISOYear
Rounds down a date or date with time to the first day of ISO year.
Returns the date.
## toStartOfMonth
Rounds down a date or date with time to the first day of the month.
@ -104,6 +125,10 @@ Converts a date with time to a certain fixed date, while preserving the time.
Converts a date with time or date to the number of the year, starting from a certain fixed point in the past.
## toRelativeQuarterNum
Converts a date with time or date to the number of the quarter, starting from a certain fixed point in the past.
## toRelativeMonthNum
Converts a date with time or date to the number of the month, starting from a certain fixed point in the past.
@ -128,6 +153,14 @@ Converts a date with time or date to the number of the minute, starting from a c
Converts a date with time or date to the number of the second, starting from a certain fixed point in the past.
## toISOYear
Converts a date or date with time to a UInt16 number containing the ISO Year number.
## toISOWeek
Converts a date or date with time to a UInt8 number containing the ISO Week number.
## now
Accepts zero arguments and returns the current time at one of the moments of request execution.
@ -148,6 +181,60 @@ The same as 'today() - 1'.
Rounds the time to the half hour.
This function is specific to Yandex.Metrica, since half an hour is the minimum amount of time for breaking a session into two sessions if a tracking tag shows a single user's consecutive pageviews that differ in time by strictly more than this amount. This means that tuples (the tag ID, user ID, and time slot) can be used to search for pageviews that are included in the corresponding session.
## toYYYYMM
Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 100 + MM).
## toYYYYMMDD
Converts a date or date with time to a UInt32 number containing the year and month number (YYYY * 10000 + MM * 100 + DD).
## toYYYYMMDDhhmmss
Converts a date or date with time to a UInt64 number containing the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).
## addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters
Function adds a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:
```sql
WITH
toDate('2018-01-01') AS date,
toDateTime('2018-01-01 00:00:00') AS date_time
SELECT
addYears(date, 1) AS add_years_with_date,
addYears(date_time, 1) AS add_years_with_date_time
```
```
┌─add_years_with_date─┬─add_years_with_date_time─┐
│ 2019-01-01 │ 2019-01-01 00:00:00 │
└─────────────────────┴──────────────────────────┘
```
## subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters
Function subtract a Date/DateTime interval to a Date/DateTime and then return the Date/DateTime. For example:
```sql
WITH
toDate('2019-01-01') AS date,
toDateTime('2019-01-01 00:00:00') AS date_time
SELECT
subtractYears(date, 1) AS subtract_years_with_date,
subtractYears(date_time, 1) AS subtract_years_with_date_time
```
```
┌─subtract_years_with_date─┬─subtract_years_with_date_time─┐
│ 2018-01-01 │ 2018-01-01 00:00:00 │
└──────────────────────────┴───────────────────────────────┘
```
## dateDiff('unit', t1, t2, \[timezone\])
Return the difference between two times, t1 and t2 can be Date or DateTime, If timezone is specified, it applied to both arguments. If not, timezones from datatypes t1 and t2 are used. If that timezones are not the same, the result is unspecified.
## timeSlots(StartTime, Duration,\[, Size\])
For a time interval starting at 'StartTime' and continuing for 'Duration' seconds, it returns an array of moments in time, consisting of points from this interval rounded down to the 'Size' in seconds. 'Size' is an optional parameter: a constant UInt32, set to 1800 by default.

View File

@ -21,7 +21,7 @@ If there is no `id` key in the dictionary, it returns the default value specifie
## dictGetTOrDefault {#ext_dict_functions_dictGetTOrDefault}
`dictGetT('dict_name', 'attr_name', id, default)`
`dictGetTOrDefault('dict_name', 'attr_name', id, default)`
The same as the `dictGetT` functions, but the default value is taken from the function's last argument.

View File

@ -64,5 +64,52 @@ A fast, decent-quality non-cryptographic hash function for a string obtained fro
`URLHash(s, N)` Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols `/`,`?` or `#` at the end, if present.
Levels are the same as in URLHierarchy. This function is specific to Yandex.Metrica.
## farmHash64
Calculates FarmHash64 from a string.
Accepts a String-type argument. Returns UInt64.
For more information, see the link: [FarmHash64](https://github.com/google/farmhash)
## javaHash
Calculates JavaHash from a string.
Accepts a String-type argument. Returns Int32.
For more information, see the link: [JavaHash](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/String.java#l1452)
## hiveHash
Calculates HiveHash from a string.
Accepts a String-type argument. Returns Int32.
Same as for [JavaHash](./hash_functions.md#javaHash), except that the return value never has a negative number.
## metroHash64
Calculates MetroHash from a string.
Accepts a String-type argument. Returns UInt64.
For more information, see the link: [MetroHash64](http://www.jandrewrogers.com/2015/05/27/metrohash/)
## jumpConsistentHash
Calculates JumpConsistentHash form a UInt64.
Accepts a UInt64-type argument. Returns Int32.
For more information, see the link: [JumpConsistentHash](https://arxiv.org/pdf/1406.2294.pdf)
## murmurHash2_32, murmurHash2_64
Calculates MurmurHash2 from a string.
Accepts a String-type argument. Returns UInt64 Or UInt32.
For more information, see the link: [MurmurHash2](https://github.com/aappleby/smhasher)
## murmurHash3_32, murmurHash3_64, murmurHash3_128
Calculates MurmurHash3 from a string.
Accepts a String-type argument. Returns UInt64 Or UInt32 Or FixedString(16).
For more information, see the link: [MurmurHash3](https://github.com/aappleby/smhasher)
## xxHash32, xxHash64
Calculates xxHash from a string.
ccepts a String-type argument. Returns UInt64 Or UInt32.
For more information, see the link: [xxHash](http://cyan4973.github.io/xxHash/)
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/hash_functions/) <!--hide-->

View File

@ -87,6 +87,20 @@ SELECT arrayCumSum([1, 1, 1, 1]) AS res
└──────────────┘
```
### arrayCumSumNonNegative(arr)
Same as arrayCumSum, returns an array of partial sums of elements in the source array (a running sum). Different arrayCumSum, when then returned value contains a value less than zero, the value is replace with zero and the subsequent calculation is performed with zero parameters. For example:
``` sql
SELECT arrayCumSumNonNegative([1, 1, -4, 1]) AS res
```
```
┌─res───────┐
│ [1,2,0,1] │
└───────────┘
```
### arraySort(\[func,\] arr1, ...)
Returns an array as result of sorting the elements of `arr1` in ascending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays)
@ -112,6 +126,6 @@ Returns an array as result of sorting the elements of `arr1` in descending order
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/higher_order_functions/) <!--hide-->

View File

@ -113,5 +113,38 @@ LIMIT 10
The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it returns a string of null bytes.
HEX can be uppercase or lowercase.
## IPv4ToIPv6(x)
Takes a UInt32 number. Interprets it as an IPv4 address in big endian. Returns a FixedString(16) value containing the IPv6 address in binary format. Examples:
``` sql
SELECT IPv6NumToString(IPv4ToIPv6(IPv4StringToNum('192.168.0.1'))) AS addr
```
```
┌─addr───────────────┐
│ ::ffff:192.168.0.1 │
└────────────────────┘
```
## cutIPv6(x, bitsToCutForIPv6, bitsToCutForIPv4)
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing the address of the specified number of bits removed in text format. For example:
```sql
WITH
IPv6StringToNum('2001:0DB8:AC10:FE01:FEED:BABE:CAFE:F00D') AS ipv6,
IPv4ToIPv6(IPv4StringToNum('192.168.0.1')) AS ipv4
SELECT
cutIPv6(ipv6, 2, 0),
cutIPv6(ipv4, 0, 2)
```
```
┌─cutIPv6(ipv6, 2, 0)─────────────────┬─cutIPv6(ipv4, 0, 2)─┐
│ 2001:db8:ac10:fe01:feed:babe:cafe:0 │ ::ffff:192.168.0.0 │
└─────────────────────────────────────┴─────────────────────┘
```
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/ip_address_functions/) <!--hide-->

View File

@ -14,7 +14,7 @@ Returns a Float64 number that is close to the number π.
Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.
## log(x)
## log(x), ln(x)
Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument.
@ -94,8 +94,16 @@ The arc cosine.
The arc tangent.
## pow(x, y)
## pow(x, y), power(x, y)
Takes two numeric arguments x and y. Returns a Float64 number close to x to the power of y.
## intExp2
Accepts a numeric argument and returns a UInt64 number close to 2 to the power of x.
## intExp10
Accepts a numeric argument and returns a UInt64 number close to 10 to the power of x.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/math_functions/) <!--hide-->

View File

@ -44,6 +44,10 @@ However, the argument is still evaluated. This can be used for benchmarks.
Sleeps 'seconds' seconds on each data block. You can specify an integer or a floating-point number.
## sleepEachRow(seconds)
Sleeps 'seconds' seconds on each row. You can specify an integer or a floating-point number.
## currentDatabase()
Returns the name of the current database.
@ -242,6 +246,18 @@ Returns the server's uptime in seconds.
Returns the version of the server as a string.
## timezone()
Returns the timezone of the server.
## blockNumber
Returns the sequence number of the data block where the row is located.
## rowNumberInBlock
Returns the ordinal number of the row in the data block. Different data blocks are always recalculated.
## rowNumberInAllBlocks()
Returns the ordinal number of the row in the data block. This function only considers the affected data blocks.
@ -283,6 +299,10 @@ FROM
└─────────┴─────────────────────┴───────┘
```
## runningDifferenceStartingWithFirstValue
Same as for [runningDifference](./other_functions.md#runningDifference), the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.
## MACNumToString(num)
Accepts a UInt64 number. Interprets it as a MAC address in big endian. Returns a string containing the corresponding MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form).
@ -440,7 +460,7 @@ The expression passed to the function is not calculated, but ClickHouse applies
**Returned value**
- 1.
- 1.
**Example**
@ -558,5 +578,34 @@ SELECT replicate(1, ['a', 'b', 'c'])
└───────────────────────────────┘
```
## filesystemAvailable
Returns the remaining space information of the disk, in bytes. This information is evaluated using the configured by path.
## filesystemCapacity
Returns the capacity information of the disk, in bytes. This information is evaluated using the configured by path.
## finalizeAggregation
Takes state of aggregate function. Returns result of aggregation (finalized state).
## runningAccumulate
Takes the states of the aggregate function and returns a column with values, are the result of the accumulation of these states for a set of block lines, from the first to the current line.
For example, takes state of aggregate function (example runningAccumulate(uniqState(UserID))), and for each row of block, return result of aggregate function on merge of states of all previous rows and current row.
So, result of function depends on partition of data to blocks and on order of data in block.
## joinGet('join_storage_table_name', 'get_column', join_key)
Get data from a table of type Join using the specified join key.
## modelEvaluate(model_name, ...)
Evaluate external model.
Accepts a model name and model arguments. Returns Float64.
## throwIf(x)
Throw an exception if the argument is non zero.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/other_functions/) <!--hide-->

View File

@ -16,5 +16,8 @@ Uses a linear congruential generator.
Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.
Uses a linear congruential generator.
## randConstant
Returns a pseudo-random UInt32 number, The value is one for different blocks.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/random_functions/) <!--hide-->

View File

@ -12,7 +12,7 @@ Examples: `floor(123.45, 1) = 123.4, floor(123.45, -1) = 120.`
For integer arguments, it makes sense to round with a negative 'N' value (for non-negative 'N', the function doesn't do anything).
If rounding causes overflow (for example, floor(-128, -1)), an implementation-specific result is returned.
## ceil(x\[, N\])
## ceil(x\[, N\]), ceiling(x\[, N\])
Returns the smallest round number that is greater than or equal to 'x'. In every other way, it is the same as the 'floor' function (see above).
@ -66,5 +66,8 @@ Accepts a number. If the number is less than one, it returns 0. Otherwise, it ro
Accepts a number. If the number is less than 18, it returns 0. Otherwise, it rounds the number down to a number from the set: 18, 25, 35, 45, 55. This function is specific to Yandex.Metrica and used for implementing the report on user age.
## roundDown(num, arr)
Accept a number, round it down to an element in the specified array. If the value is less than the lowest bound, the lowest bound is returned.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/rounding_functions/) <!--hide-->

View File

@ -24,11 +24,21 @@ The function also works for arrays.
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception).
The result type is UInt64.
## lower
## char_length, CHAR_LENGTH
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception).
The result type is UInt64.
## character_length, CHARACTER_LENGTH
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception).
The result type is UInt64.
## lower, lcase
Converts ASCII Latin symbols in a string to lowercase.
## upper
## upper, ucase
Converts ASCII Latin symbols in a string to uppercase.
@ -58,7 +68,11 @@ Reverses a sequence of Unicode code points, assuming that the string contains a
Concatenates the strings listed in the arguments, without a separator.
## substring(s, offset, length)
## concatAssumeInjective(s1, s2, ...)
Same as [concat](./string_functions.md#concat-s1-s2), the difference is that you need to ensure that concat(s1, s2, s3) -> s4 is injective, it will be used for optimization of GROUP BY
## substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)
Returns a substring starting with the byte from the 'offset' index that is 'length' bytes long. Character indexing starts from one (as in standard SQL). The 'offset' and 'length' arguments must be constants.
@ -83,4 +97,24 @@ Decode base64-encoded string 's' into original string. In case of failure raises
## tryBase64Decode(s)
Similar to base64Decode, but in case of error an empty string would be returned.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_functions/) <!--hide-->
## endsWith(s, suffix)
Returns whether to end with the specified suffix. Returns 1 if the string ends with the specified suffix, otherwise it returns 0.
## startsWith(s, prefix)
Returns whether to end with the specified prefix. Returns 1 if the string ends with the specified prefix, otherwise it returns 0.
## trimLeft(s)
Returns a string that removes the whitespace characters on left side.
## trimRight(s)
Returns a string that removes the whitespace characters on right side.
## trimBoth(s)
Returns a string that removes the whitespace characters on either side.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_functions/) <!--hide-->

View File

@ -5,7 +5,7 @@
Replaces the first occurrence, if it exists, of the 'pattern' substring in 'haystack' with the 'replacement' substring.
Hereafter, 'pattern' and 'replacement' must be constants.
## replaceAll(haystack, pattern, replacement)
## replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)
Replaces all occurrences of the 'pattern' substring in 'haystack' with the 'replacement' substring.
@ -78,4 +78,12 @@ SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res
```
## regexpQuoteMeta(s)
The function adds a backslash before some predefined characters in the string.
Predefined characters: '0', '\\', '|', '(', ')', '^', '$', '.', '[', ']', '?', '*', '+', '{', ':', '-'.
This implementation slightly differs from re2::RE2::QuoteMeta. It escapes zero byte as \0 instead of \x00 and it escapes only required characters.
For more information, see the link: [RE2](https://github.com/google/re2/blob/master/re2/re2.cc#L473)
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/string_replace_functions/) <!--hide-->

View File

@ -3,7 +3,7 @@
The search is case-sensitive in all these functions.
The search substring or regular expression must be a constant in all these functions.
## position(haystack, needle)
## position(haystack, needle), locate(haystack, needle)
Search for the substring `needle` in the string `haystack`.
Returns the position (in bytes) of the found substring, starting from 1, or returns 0 if the substring was not found.

View File

@ -7,10 +7,12 @@
## toFloat32, toFloat64
## toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero
## toDate, toDateTime
## toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero, toDateOrZero, toDateTimeOrZero
## toUInt8OrNull, toUInt16OrNull, toUInt32OrNull, toUInt64OrNull, toInt8OrNull, toInt16OrNull, toInt32OrNull, toInt64OrNull, toFloat32OrNull, toFloat64OrNull, toDateOrNull, toDateTimeOrNull
## toDecimal32(value, S), toDecimal64(value, S), toDecimal128(value, S)
Converts `value` to [Decimal](../../data_types/decimal.md) of precision `S`. The `value` can be a number or a string. The `S` (scale) parameter specifies the number of decimal places.
@ -99,6 +101,9 @@ These functions accept a string and interpret the bytes placed at the beginning
This function accepts a number or date or date with time, and returns a string containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long.
## reinterpretAsFixedString
This function accepts a number or date or date with time, and returns a FixedString containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long.
## CAST(x, t)
@ -141,5 +146,39 @@ SELECT toTypeName(CAST(x, 'Nullable(UInt16)')) FROM t_null
└─────────────────────────────────────────┘
```
## toIntervalYear, toIntervalQuarter, toIntervalMonth, toIntervalWeek, toIntervalDay, toIntervalHour, toIntervalMinute, toIntervalSecond
Converts a Number type argument to a Interval type (duration).
The interval type is actually very useful, you can use this type of data to perform arithmetic operations directly with Date or DateTime. At the same time, ClickHouse provides a more convenient syntax for declaring Interval type data. For example:
```sql
WITH
toDate('2019-01-01') AS date,
INTERVAL 1 WEEK AS interval_week,
toIntervalWeek(1) AS interval_to_week
SELECT
date + interval_week,
date + interval_to_week
```
```
┌─plus(date, interval_week)─┬─plus(date, interval_to_week)─┐
│ 2019-01-08 │ 2019-01-08 │
└───────────────────────────┴──────────────────────────────┘
```
## parseDateTimeBestEffort
Parse a number type argument to a Date or DateTime type.
different from toDate and toDateTime, parseDateTimeBestEffort can progress more complex date format.
For more information, see the link: [Complex Date Format](https://xkcd.com/1179/)
## parseDateTimeBestEffortOrNull
Same as for [parseDateTimeBestEffort](./type_conversion_functions.md#parseDateTimeBestEffort) except that it returns null when it encounters a date format that cannot be processed.
## parseDateTimeBestEffortOrZero
Same as for [parseDateTimeBestEffort](./type_conversion_functions.md#parseDateTimeBestEffort) except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/type_conversion_functions/) <!--hide-->

View File

@ -73,7 +73,7 @@ Server: dbms/programs/clickhouse-server
Для запуска сервера в качестве демона, выполните:
``` bash
% sudo service clickhouse-server start
$ sudo service clickhouse-server start
```
Смотрите логи в директории `/var/log/clickhouse-server/`.

View File

@ -323,7 +323,7 @@ ClickHouse поддерживает [NULL](../query_language/syntax.md), кот
В отличие от формата JSON, нет замены невалидных UTF-8 последовательностей. В строках может выводиться произвольный набор байт. Это сделано для того, чтобы данные форматировались без потери информации. Экранирование значений осуществляется аналогично формату JSON.
При парсинге, поддерживается расположение значений разных столбцов в произвольном порядке. Допустимо отсутствие некоторых значений - тогда они воспринимаются как равные значениям по умолчанию. При этом, в качестве значений по умолчанию используются нули, пустые строки и не поддерживаются сложные значения по умолчанию, которые могут быть заданы в таблице. Пропускаются пробельные символы между элементами. После объектов может быть расположена запятая, которая игнорируется. Объекты не обязательно должны быть разделены переводами строк.
При парсинге, поддерживается расположение значений разных столбцов в произвольном порядке. Допустимо отсутствие некоторых значений - тогда они воспринимаются как равные значениям по умолчанию. При этом, в качестве значений по умолчанию используются нули, и пустые строки. Сложные значения которые могут быть заданы в таблице, не поддерживаются по умолчанию, но их можно включить с помощью опции `insert_sample_with_metadata = 1`. Пропускаются пробельные символы между элементами. После объектов может быть расположена запятая, которая игнорируется. Объекты не обязательно должны быть разделены переводами строк.
## Native {#native}

View File

@ -322,6 +322,10 @@ ClickHouse применяет настройку в тех случаях, ко
Если значение истинно, то при выполнении INSERT из входных данных пропускаются (не рассматриваются) колонки с неизвестными именами, иначе в данной ситуации будет сгенерировано исключение.
Работает для форматов JSONEachRow и TSKV.
## insert_sample_with_metadata
Для запросов INSERT. Указывает, что серверу необходимо отправлять клиенту метаданные о значениях столбцов по умолчанию, которые будут использоваться для вычисления выражений по умолчанию. По умолчанию отключено.
## output_format_json_quote_64bit_integers
Если значение истинно, то при использовании JSON\* форматов UInt64 и Int64 числа выводятся в кавычках (из соображений совместимости с большинством реализаций JavaScript), иначе - без кавычек.

View File

@ -106,7 +106,7 @@ target_link_libraries (common
${Boost_SYSTEM_LIBRARY}
PRIVATE
${MALLOC_LIBRARIES}
${CMAKE_THREAD_LIBS_INIT}
Threads::Threads
${MEMCPY_LIBRARIES})
if (RT_LIBRARY)

View File

@ -16,7 +16,7 @@ target_link_libraries (date_lut3 common ${PLATFORM_LIBS})
target_link_libraries (date_lut4 common ${PLATFORM_LIBS})
target_link_libraries (date_lut_default_timezone common ${PLATFORM_LIBS})
target_link_libraries (local_date_time_comparison common)
target_link_libraries (realloc-perf common)
target_link_libraries (realloc-perf common Threads::Threads)
add_check(local_date_time_comparison)
if(USE_GTEST)

386
utils/make_changelog.py Executable file
View File

@ -0,0 +1,386 @@
#!/usr/bin/env python
# Note: should work with python 2 and 3
from __future__ import print_function
import requests
import json
import subprocess
import re
import os
import time
import logging
import codecs
import argparse
GITHUB_API_URL = 'https://api.github.com/'
SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__))
def http_get_json(url, token, max_retries, retry_timeout):
for t in range(max_retries):
if token:
resp = requests.get(url, headers={"Authorization": "token {}".format(token)})
else:
resp = requests.get(url)
if resp.status_code != 200:
msg = "Request {} failed with code {}.\n{}\n".format(url, resp.status_code, resp.text)
if resp.status_code == 403:
try:
if resp.json()['message'].startswith('API rate limit exceeded') and t + 1 < max_retries:
logging.warning(msg)
time.sleep(retry_timeout)
continue
except:
pass
raise Exception(msg)
return resp.json()
def github_api_get_json(query, token, max_retries, retry_timeout):
return http_get_json(GITHUB_API_URL + query, token, max_retries, retry_timeout)
def check_sha(sha):
if not (re.match('^[a-hA-H0-9]+$', sha) and len(sha) >= 7):
raise Exception("String " + sha + " doesn't look like git sha.")
def get_merge_base(first, second, project_root):
try:
command = "git merge-base {} {}".format(first, second)
text = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, cwd=project_root).stdout.read()
text = text.decode('utf-8', 'ignore')
sha = tuple(filter(len, text.split()))[0]
check_sha(sha)
return sha
except:
logging.error('Cannot find merge base for %s and %s', first, second)
raise
# Get list of commits from branch to base_sha. Update commits_info.
def get_commits_from_branch(repo, branch, base_sha, commits_info, max_pages, token, max_retries, retry_timeout):
def get_commits_from_page(page):
query = 'repos/{}/commits?sha={}&page={}'.format(repo, branch, page)
resp = github_api_get_json(query, token, max_retries, retry_timeout)
for commit in resp:
sha = commit['sha']
if sha not in commits_info:
commits_info[sha] = commit
return [commit['sha'] for commit in resp]
commits = []
found_base_commit = False
for page in range(max_pages):
page_commits = get_commits_from_page(page)
for commit in page_commits:
if commit == base_sha:
found_base_commit = True
break
commits.append(commit)
if found_base_commit:
break
if not found_base_commit:
raise Exception("Can't found base commit sha {} in branch {}. Checked {} commits on {} pages.\nCommits: {}"
.format(base_sha, branch, len(commits), max_pages, ' '.join(commits)))
return commits
# Use GitHub search api to check if commit from any pull request. Update pull_requests info.
def find_pull_request_for_commit(commit_sha, pull_requests, token, max_retries, retry_timeout):
resp = github_api_get_json('search/issues?q={}'.format(commit_sha), token, max_retries, retry_timeout)
found = False
for item in resp['items']:
if 'pull_request' in item:
found = True
number = item['number']
if number not in pull_requests:
pull_requests[number] = {
'description': item['body'],
'user': item['user']['login'],
}
return found
# Find pull requests from list of commits. If no pull request found, add commit to not_found_commits list.
def find_pull_requests(commits, token, max_retries, retry_timeout):
not_found_commits = []
pull_requests = {}
for i, commit in enumerate(commits):
if (i + 1) % 10 == 0:
logging.info('Processed %d commits', i + 1)
if not find_pull_request_for_commit(commit, pull_requests, token, max_retries, retry_timeout):
not_found_commits.append(commit)
return not_found_commits, pull_requests
# Get users for all unknown commits and pull requests.
def get_users_info(pull_requests, commits_info, token, max_retries, retry_timeout):
users = {}
def update_user(user):
if user not in users:
query = 'users/{}'.format(user)
resp = github_api_get_json(query, token, max_retries, retry_timeout)
users[user] = resp
for pull_request in pull_requests.values():
update_user(pull_request['user'])
for commit_info in commits_info.values():
if 'author' in commit_info and commit_info['author'] is not None:
update_user(commit_info['committer']['login'])
else:
logging.warning('Not found author for commit %s.', commit_info['html_url'])
return users
# List of unknown commits -> text description.
def process_unknown_commits(commits, commits_info, users):
pattern = 'Commit: [{}]({})\nAuthor: {}\nMessage: {}'
texts = []
for commit in commits:
info = commits_info[commit]
html_url = info['html_url']
msg = info['commit']['message']
# GitHub login
login = info['author']['login']
name = None
# Firstly, try get name from github user
try:
name = users[login]['name']
except:
pass
# Then, try get name from commit
if not name:
try:
name = info['commit']['author']['name']
except:
pass
author = '[{}]({})'.format(name or login, info['author']['html_url'])
texts.append(pattern.format(commit, html_url, author, msg))
text = 'Commits which are not from any pull request:\n\n'
return text + '\n\n'.join(texts)
# List of pull requests -> text description.
def process_pull_requests(pull_requests, users, repo):
groups = {}
for id, item in pull_requests.items():
lines = list(filter(len, map(lambda x: x.strip(), item['description'].split('\n'))))
cat_pos = None
short_descr_pos = None
long_descr_pos = None
if lines:
for i in range(len(lines) - 1):
if re.match('^\**Category\s*(\(leave one\))*:*\**\s*$', lines[i]):
cat_pos = i
if re.match('^\**\s*Short description', lines[i]):
short_descr_pos = i
if re.match('^\**\s*Detailed description', lines[i]):
long_descr_pos = i
cat = ''
if cat_pos:
# TODO: Sometimes have more than one
cat = lines[cat_pos + 1]
cat = cat.strip().lstrip('-').strip()
short_descr = ''
if short_descr_pos:
short_descr_end = long_descr_pos or len(lines)
short_descr = lines[short_descr_pos + 1]
if short_descr_pos + 2 != short_descr_end:
short_descr += ' ...'
# TODO: Add detailed description somewhere
pattern = u"{} [#{}]({}) ({})"
link = 'https://github.com/{}/pull/{}'.format(repo, id)
author = 'author not found'
if item['user'] in users:
# TODO get user from any commit if no user name on github
user = users[item['user']]
author = u'[{}]({})'.format(user['name'] or user['login'], user['html_url'])
if cat not in groups:
groups[cat] = []
groups[cat].append(pattern.format(short_descr, id, link, author))
texts = []
for group, text in groups.items():
items = [u'* {}'.format(pr) for pr in text]
texts.append(u'### {}\n{}'.format(group if group else u'[No category]', '\n'.join(items)))
return '\n\n'.join(texts)
# Load inner state. For debug purposes.
def load_state(state_file, base_sha, new_tag, prev_tag):
state = {}
if state_file:
try:
if os.path.exists(state_file):
logging.info('Reading state from %', state_file)
with codecs.open(state_file, encoding='utf-8') as f:
state = json.loads(f.read())
else:
logging.info('State file does not exist. Will create new one.')
except Exception as e:
logging.warning('Cannot load state from %s. Reason: %s', state_file, str(e))
if state:
if 'base_sha' not in state or 'new_tag' not in state or 'prev_tag' not in state:
logging.warning('Invalid state. Will create new one.')
elif state['base_sha'] == base_sha and state['new_tag'] == new_tag and state['prev_tag'] == prev_tag:
logging.info('State loaded.')
else:
logging.info('Loaded state has different tags or merge base sha. Will create new state.')
state = {}
return state
# Save inner state. For debug purposes.
def save_state(state_file, state):
with codecs.open(state_file, 'w', encoding='utf-8') as f:
f.write(json.dumps(state, indent=4, separators=(',', ': ')))
def make_changelog(new_tag, prev_tag, repo, repo_folder, state_file, token, max_retries, retry_timeout):
base_sha = get_merge_base(new_tag, prev_tag, repo_folder)
logging.info('Base sha: %s', base_sha)
# Step 1. Get commits from merge_base to new_tag HEAD.
# Result is a list of commits + map with commits info (author, message)
commits_info = {}
commits = []
is_commits_loaded = False
# Step 2. For each commit check if it is from any pull request (using github search api).
# Result is a list of unknown commits + map with pull request info (author, description).
unknown_commits = []
pull_requests = {}
is_pull_requests_loaded = False
# Step 3. Map users with their info (Name)
users = {}
is_users_loaded = False
# Step 4. Make changelog text from data above.
state = load_state(state_file, base_sha, new_tag, prev_tag)
if state:
if 'commits' in state and 'commits_info' in state:
logging.info('Loading commits from %s', state_file)
commits_info = state['commits_info']
commits = state['commits']
is_commits_loaded = True
if 'pull_requests' in state and 'unknown_commits' in state:
logging.info('Loading pull requests from %s', state_file)
unknown_commits = state['unknown_commits']
pull_requests = state['pull_requests']
is_pull_requests_loaded = True
if 'users' in state:
logging.info('Loading users requests from %s', state_file)
users = state['users']
is_users_loaded = True
state['base_sha'] = base_sha
state['new_tag'] = new_tag
state['prev_tag'] = prev_tag
if not is_commits_loaded:
logging.info('Getting commits using github api.')
commits = get_commits_from_branch(repo, new_tag, base_sha, commits_info, 100, token, max_retries, retry_timeout)
state['commits'] = commits
state['commits_info'] = commits_info
logging.info('Found %d commits from %s to %s.\n', len(commits), new_tag, base_sha)
save_state(state_file, state)
if not is_pull_requests_loaded:
logging.info('Searching for pull requests using github api.')
unknown_commits, pull_requests = find_pull_requests(commits, token, max_retries, retry_timeout)
state['unknown_commits'] = unknown_commits
state['pull_requests'] = pull_requests
logging.info('Found %d pull requests and %d unknown commits.\n', len(pull_requests), len(unknown_commits))
save_state(state_file, state)
if not is_users_loaded:
logging.info('Getting users info using github api.')
users = get_users_info(pull_requests, commits_info, token, max_retries, retry_timeout)
state['users'] = users
logging.info('Found %d users.', len(users))
save_state(state_file, state)
print(process_pull_requests(pull_requests, users, repo))
print(process_unknown_commits(unknown_commits, commits_info, users))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Make changelog.')
parser.add_argument('prev_release_tag', help='Git tag from previous release.')
parser.add_argument('new_release_tag', help='Git tag for new release.')
parser.add_argument('--token', help='Github token. Use it to increase github api query limit. ')
parser.add_argument('--directory', help='ClickHouse repo directory. Script dir by default.')
parser.add_argument('--state', help='File to dump inner states result.', default='changelog_state.json')
parser.add_argument('--repo', help='ClockHouse repo on GitHub.', default='yandex/ClickHouse')
parser.add_argument('--max_retry', default=100, type=int,
help='Max number of retries pre api query in case of API rate limit exceeded error.')
parser.add_argument('--retry_timeout', help='Timeout after retry in seconds.', type=int, default=5)
args = parser.parse_args()
prev_release_tag = args.prev_release_tag
new_release_tag = args.new_release_tag
token = args.token or ''
repo_folder = args.directory or SCRIPT_DIR
state_file = args.state
repo = args.repo
max_retry = args.max_retry
retry_timeout = args.retry_timeout
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
repo_folder = os.path.expanduser(repo_folder)
make_changelog(new_release_tag, prev_release_tag, repo, repo_folder, state_file, token, max_retry, retry_timeout)

132
utils/s3tools/s3uploader Executable file
View File

@ -0,0 +1,132 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import logging
import argparse
import tarfile
import math
try:
from boto.s3.connection import S3Connection
except ImportError:
raise ImportError("You have to install boto package 'pip install boto'")
class S3API(object):
def __init__(self, access_key, secret_access_key, mds_api, mds_url):
self.connection = S3Connection(
host=mds_api,
aws_access_key_id=access_key,
aws_secret_access_key=secret_access_key,
)
self.mds_url = mds_url
def upload_file(self, bucket_name, file_path, s3_path):
logging.info("Start uploading file to bucket %s", bucket_name)
bucket = self.connection.get_bucket(bucket_name)
key = bucket.initiate_multipart_upload(s3_path)
logging.info("Will upload to s3 path %s", s3_path)
chunksize = 1024 * 1024 * 1024 # 1 GB
filesize = os.stat(file_path).st_size
logging.info("File size if %s", filesize)
chunkcount = int(math.ceil(filesize / chunksize))
def call_back(x, y):
print "Uploaded {}/{} bytes".format(x, y)
try:
for i in range(chunkcount + 1):
logging.info("Uploading chunk %s of %s", i, chunkcount + 1)
offset = chunksize * i
bytes_size = min(chunksize, filesize - offset)
with open(file_path, 'r') as fp:
fp.seek(offset)
key.upload_part_from_file(fp=fp, part_num=i+1,
size=bytes_size, cb=call_back,
num_cb=100)
key.complete_upload()
except Exception as ex:
key.cancel_upload()
raise ex
logging.info("Contents were set")
return "https://{bucket}.{mds_url}/{path}".format(
bucket=bucket_name, mds_url=self.mds_url, path=s3_path)
def make_tar_file_for_table(clickhouse_data_path, db_name, table_name,
tmp_prefix):
relative_data_path = os.path.join('data', db_name, table_name)
relative_meta_path = os.path.join('metadata', db_name, table_name + '.sql')
path_to_data = os.path.join(clickhouse_data_path, relative_data_path)
path_to_metadata = os.path.join(clickhouse_data_path, relative_meta_path)
temporary_file_name = tmp_prefix + '/{tname}.tar'.format(tname=table_name)
with tarfile.open(temporary_file_name, "w") as bundle:
bundle.add(path_to_data, arcname=relative_data_path)
bundle.add(path_to_metadata, arcname=relative_meta_path)
return temporary_file_name
USAGE_EXAMPLES = '''
examples:
\ts3uploader --dataset-name some_ds --access-key-id XXX --secret-access-key YYY --clickhouse-data-path /opt/clickhouse/ --table-name default.some_tbl --bucket-name some-bucket
\ts3uploader --dataset-name some_ds --access-key-id XXX --secret-access-key YYY --file-name some_ds.tsv.xz --bucket-name some-bucket
'''
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
parser = argparse.ArgumentParser(
description="Simple tool for uploading datasets to clickhouse S3",
usage='%(prog)s [options] {}'.format(USAGE_EXAMPLES))
parser.add_argument('--s3-api-url', default='s3.mds.yandex.net')
parser.add_argument('--s3-common-url', default='s3.yandex.net')
parser.add_argument('--bucket-name', default='clickhouse-datasets')
parser.add_argument('--dataset-name', required=True,
help='Name of dataset, will be used in uploaded path')
parser.add_argument('--access-key-id', required=True)
parser.add_argument('--secret-access-key', required=True)
parser.add_argument('--clickhouse-data-path',
default='/var/lib/clickhouse/',
help='Path to clickhouse database on filesystem')
parser.add_argument('--s3-path', help='Path in s3, where to upload file')
parser.add_argument('--tmp-prefix', default='/tmp',
help='Prefix to store temporay downloaded file')
data_group = parser.add_mutually_exclusive_group(required=True)
data_group.add_argument('--table-name',
help='Name of table with database, if you are uploading partitions')
data_group.add_argument('--file-path',
help='Name of file, if you are uploading')
args = parser.parse_args()
if args.table_name is not None and args.clickhouse_data_path is None:
raise argparse.ArgumentError(
"You should specify --clickhouse-data-path to upload --table")
s3_conn = S3API(
args.access_key_id, args.secret_access_key,
args.s3_api_url, args.s3_common_url)
if args.table_name is not None:
if '.' not in args.table_name:
db_name = 'default'
else:
db_name, table_name = args.table_name.split('.')
file_path = make_tar_file_for_table(
args.clickhouse_data_path, db_name, table_name, args.tmp_prefix)
else:
file_path = args.file_path
if 'tsv' in file_path:
s3_path = os.path.join(
args.dataset_name, 'tsv', os.path.basename(file_path))
elif args.table_name is not None:
s3_path = os.path.join(
args.dataset_name, 'partitions', os.path.basename(file_path))
elif args.s3_path is not None:
s3_path = os.path.join(
args.dataset_name, s3_path, os.path.base_name(file_path))
else:
raise Exception("Don't know s3-path to upload")
url = s3_conn.upload_file(args.bucket_name, file_path, s3_path)
logging.info("Data uploaded: %s", url)