Merged with master

This commit is contained in:
Nikolai Kochetov 2019-01-29 15:25:19 +03:00
commit 2c2932e185
301 changed files with 6939 additions and 2022 deletions

View File

@ -1,3 +1,140 @@
## ClickHouse release 19.1.6, 2019-01-24
### Backward Incompatible Change
* Removed `ALTER MODIFY PRIMARY KEY` command because it was superseded by the `ALTER MODIFY ORDER BY` command. [#3887](https://github.com/yandex/ClickHouse/pull/3887) ([ztlpn](https://github.com/ztlpn))
### New Features
* Add ability to choose per column codecs for storage log and tiny log. [#4111](https://github.com/yandex/ClickHouse/pull/4111) ([alesapin](https://github.com/alesapin))
* Added functions `filesystemAvailable`, `filesystemFree`, `filesystemCapacity`. [#4097](https://github.com/yandex/ClickHouse/pull/4097) ([bgranvea](https://github.com/bgranvea))
* Add custom compression codecs. [#3899](https://github.com/yandex/ClickHouse/pull/3899) ([alesapin](https://github.com/alesapin))
* Added hashing functions `xxHash64` and `xxHash32`. [#3905](https://github.com/yandex/ClickHouse/pull/3905) ([filimonov](https://github.com/filimonov))
* Added multiple joins emulation (very experimental). [#3946](https://github.com/yandex/ClickHouse/pull/3946) ([4ertus2](https://github.com/4ertus2))
* Added support for CatBoost multiclass models evaluation. Function `modelEvaluate` returns tuple with per-class raw predictions for multiclass models. `libcatboostmodel.so` should be built with [#607](https://github.com/catboost/catboost/pull/607). [#3959](https://github.com/yandex/ClickHouse/pull/3959) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Added gccHash function which uses the same hash seed as [gcc](https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/functional_hash.h#L191) [#4000](https://github.com/yandex/ClickHouse/pull/4000) ([sundy-li](https://github.com/sundy-li))
* Added compression codec delta. [#4052](https://github.com/yandex/ClickHouse/pull/4052) ([alesapin](https://github.com/alesapin))
* Added multi searcher to search from multiple constant strings from big haystack. Added functions (`multiPosition`, `multiSearch` ,`firstMatch`) * (` `, `UTF8`, `CaseInsensitive`, `CaseInsensitiveUTF8`) [#4053](https://github.com/yandex/ClickHouse/pull/4053) ([danlark1](https://github.com/danlark1))
* Added ability to alter compression codecs. [#4054](https://github.com/yandex/ClickHouse/pull/4054) ([alesapin](https://github.com/alesapin))
* Add ability to write data into HDFS and small refactoring. [#4084](https://github.com/yandex/ClickHouse/pull/4084) ([alesapin](https://github.com/alesapin))
* Removed some redundant objects from compiled expressions cache (optimization). [#4042](https://github.com/yandex/ClickHouse/pull/4042) ([alesapin](https://github.com/alesapin))
* Added functions `JavaHash`, `HiveHash`. [#3811](https://github.com/yandex/ClickHouse/pull/3811) ([shangshujie365](https://github.com/shangshujie365))
* Added functions `left`, `right`, `trim`, `ltrim`, `rtrim`, `timestampadd`, `timestampsub`. [#3826](https://github.com/yandex/ClickHouse/pull/3826) ([blinkov](https://github.com/blinkov))
* Added function `remoteSecure`. Function works as `remote`, but uses secure connection. [#4088](https://github.com/yandex/ClickHouse/pull/4088) ([proller](https://github.com/proller))
### Improvements
* Support for IF NOT EXISTS in ALTER TABLE ADD COLUMN statements, and for IF EXISTS in DROP/MODIFY/CLEAR/COMMENT COLUMN. [#3900](https://github.com/yandex/ClickHouse/pull/3900) ([bgranvea](https://github.com/bgranvea))
* Function `parseDateTimeBestEffort`: support for formats `DD.MM.YYYY`, `DD.MM.YY`, `DD-MM-YYYY`, `DD-Mon-YYYY`, `DD/Month/YYYY` and similar. [#3922](https://github.com/yandex/ClickHouse/pull/3922) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add a MergeTree setting `use_minimalistic_part_header_in_zookeeper`. If enabled, Replicated tables will store compact part metadata in a single part znode. This can dramatically reduce ZooKeeper snapshot size (especially if the tables have a lot of columns). Note that after enabling this setting you will not be able to downgrade to a version that doesn't support it. [#3960](https://github.com/yandex/ClickHouse/pull/3960) ([ztlpn](https://github.com/ztlpn))
* Add an DFA-based implementation for functions `sequenceMatch` and `sequenceCount` in case pattern doesn't contain time. [#\](https://github.com/yandex/ClickHouse/pull/4004) ([ercolanelli-leo](https://github.com/ercolanelli-leo))
* Changed the way CapnProtoInputStream creates actions in such a way that it now support structures that are jagged. [#4063](https://github.com/yandex/ClickHouse/pull/4063) ([Miniwoffer](https://github.com/Miniwoffer))
* Better way to collect columns, tables and joins from AST when checking required columns. [#3930](https://github.com/yandex/ClickHouse/pull/3930) ([4ertus2](https://github.com/4ertus2))
* Zero left padding PODArray so that -1 element is always valid and zeroed. It's used for branchless Offset access. [#3920](https://github.com/yandex/ClickHouse/pull/3920) ([amosbird](https://github.com/amosbird))
* Performance improvement for int serialization. [#3968](https://github.com/yandex/ClickHouse/pull/3968) ([amosbird](https://github.com/amosbird))
* Moved debian/ specific entries to debian/.gitignore [#4106](https://github.com/yandex/ClickHouse/pull/4106) ([gerasiov](https://github.com/gerasiov))
* Decreased the number of connections in case of large number of Distributed tables in a single server. [#3726](https://github.com/yandex/ClickHouse/pull/3726) ([zhang2014](https://github.com/zhang2014))
* Supported totals row for `WITH TOTALS` query for ODBC driver (ODBCDriver2 format). [#3836](https://github.com/yandex/ClickHouse/pull/3836) ([nightweb](https://github.com/nightweb))
* Better constant expression folding. Possibility to skip unused shards if SELECT query filters by sharding_key (setting `distributed_optimize_skip_select_on_unused_shards`). [#3851](https://github.com/yandex/ClickHouse/pull/3851) ([abyss7](https://github.com/abyss7))
* Do not log from odbc-bridge when there is no console. [#3857](https://github.com/yandex/ClickHouse/pull/3857) ([alesapin](https://github.com/alesapin))
* Forbid using aggregate functions inside scalar subqueries. [#3865](https://github.com/yandex/ClickHouse/pull/3865) ([abyss7](https://github.com/abyss7))
* Added ability to use Enums as integers inside if function. [#3875](https://github.com/yandex/ClickHouse/pull/3875) ([abyss7](https://github.com/abyss7))
* Added `low_cardinality_allow_in_native_format` setting. If disabled, do not use `LowCadrinality` type in native format. [#3879](https://github.com/yandex/ClickHouse/pull/3879) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Removed duplicate code. [#3915](https://github.com/yandex/ClickHouse/pull/3915) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
* Minor improvements in StorageKafka. [#3919](https://github.com/yandex/ClickHouse/pull/3919) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Automatically disable logs in negative tests. [#3940](https://github.com/yandex/ClickHouse/pull/3940) ([4ertus2](https://github.com/4ertus2))
* Refactored SyntaxAnalyzer. [#4014](https://github.com/yandex/ClickHouse/pull/4014) ([4ertus2](https://github.com/4ertus2))
* Reverted jemalloc patch which lead to performance degradation. [#4018](https://github.com/yandex/ClickHouse/pull/4018) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Refactored QueryNormalizer. Unified column sources for ASTIdentifier and ASTQualifiedAsterisk (were different), removed column duplicates for ASTQualifiedAsterisk sources, cleared asterisks replacement. [#4031](https://github.com/yandex/ClickHouse/pull/4031) ([4ertus2](https://github.com/4ertus2))
* Refactored code with ASTIdentifier. [#4056](https://github.com/yandex/ClickHouse/pull/4056) [#4077](https://github.com/yandex/ClickHouse/pull/4077) [#4087](https://github.com/yandex/ClickHouse/pull/4087) ([4ertus2](https://github.com/4ertus2))
* Improve error message in `clickhouse-test` script when no ClickHouse binary was found. [#4130](https://github.com/yandex/ClickHouse/pull/4130) ([Miniwoffer](https://github.com/Miniwoffer))
* Rewrited code to calculate integer conversion function monotonicity. [#3921](https://github.com/yandex/ClickHouse/pull/3921) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed typos in comments. [#4089](https://github.com/yandex/ClickHouse/pull/4089) ([kvinty](https://github.com/kvinty))
### Build/Testing/Packaging Improvements
* Added minimal support for powerpc build. [#4132](https://github.com/yandex/ClickHouse/pull/4132) ([danlark1](https://github.com/danlark1))
* Fixed error when the server cannot start with the `bash: /usr/bin/clickhouse-extract-from-config: Operation not permitted` message within Docker or systemd-nspawn. [#4136](https://github.com/yandex/ClickHouse/pull/4136) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Updated `mariadb-client` library. Fixed one of issues found by UBSan. [#3924](https://github.com/yandex/ClickHouse/pull/3924) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Some fixes for UBSan builds. [#3926](https://github.com/yandex/ClickHouse/pull/3926) [#3948](https://github.com/yandex/ClickHouse/pull/3948) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Move docker images to 18.10 and add compatibility file for glibc >= 2.28 [#3965](https://github.com/yandex/ClickHouse/pull/3965) ([alesapin](https://github.com/alesapin))
* Add env variable if user don't want to chown directories in server docker image. [#3967](https://github.com/yandex/ClickHouse/pull/3967) ([alesapin](https://github.com/alesapin))
* Stateful functional tests are run on public available dataset. [#3969](https://github.com/yandex/ClickHouse/pull/3969) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Enabled most of the warnings from `-Weverything` in clang. Enabled `-Wpedantic`. [#3986](https://github.com/yandex/ClickHouse/pull/3986) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Link to libLLVM rather than to individual LLVM libs when USE_STATIC_LIBRARIES is off. [#3989](https://github.com/yandex/ClickHouse/pull/3989) ([orivej](https://github.com/orivej))
* Added a few more warnings that are available only in clang 8. [#3993](https://github.com/yandex/ClickHouse/pull/3993) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed bugs found by PVS-Studio. [#4013](https://github.com/yandex/ClickHouse/pull/4013) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added sanitizer variables for test images. [#4072](https://github.com/yandex/ClickHouse/pull/4072) ([alesapin](https://github.com/alesapin))
* clickhouse-server debian package will recommend `libcap2-bin` package to use `setcap` tool for setting capabilities. This is optional. [#4093](https://github.com/yandex/ClickHouse/pull/4093) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Improved compilation time, fixed includes. [#3898](https://github.com/yandex/ClickHouse/pull/3898) ([proller](https://github.com/proller))
* Added performance tests for hash functions. [#3918](https://github.com/yandex/ClickHouse/pull/3918) ([filimonov](https://github.com/filimonov))
* Fixed cyclic library dependences. [#3958](https://github.com/yandex/ClickHouse/pull/3958) ([proller](https://github.com/proller))
* Improved compilation with low available memory. [#4030](https://github.com/yandex/ClickHouse/pull/4030) ([proller](https://github.com/proller))
### Bug Fixes
* Fix bug when in remote table function execution when wrong restrictions were used for in `getStructureOfRemoteTable`. [#4009](https://github.com/yandex/ClickHouse/pull/4009) ([alesapin](https://github.com/alesapin))
* Fix a leak of netlink sockets. They were placed in a pool where they were never deleted and new sockets were created at the start of a new thread when all current sockets were in use. [#4017](https://github.com/yandex/ClickHouse/pull/4017) ([ztlpn](https://github.com/ztlpn))
* Regression in master. Fix "Unknown identifier" error in case column names appear in lambdas. [#4115](https://github.com/yandex/ClickHouse/pull/4115) ([4ertus2](https://github.com/4ertus2))
* Fix bug with closing /proc/self/fd earlier than all fds were read from /proc. [#4120](https://github.com/yandex/ClickHouse/pull/4120) ([alesapin](https://github.com/alesapin))
* Fixed misspells in **comments** and **string literals** under `dbms`. [#4122](https://github.com/yandex/ClickHouse/pull/4122) ([maiha](https://github.com/maiha))
* Fixed String to UInt monotonic conversion in case of usage String in primary key. [#3870](https://github.com/yandex/ClickHouse/pull/3870) ([zhang2014](https://github.com/zhang2014))
* Add checking that 'SET send_logs_level = value' query accept appropriate value. [#3873](https://github.com/yandex/ClickHouse/pull/3873) ([s-mx](https://github.com/s-mx))
* Fixed a race condition when executing a distributed ALTER task. The race condition led to more than one replica trying to execute the task and all replicas except one failing with a ZooKeeper error. [#3904](https://github.com/yandex/ClickHouse/pull/3904) ([ztlpn](https://github.com/ztlpn))
* Fixed segfault in `arrayEnumerateUniq`, `arrayEnumerateDense` functions in case of some invalid arguments. [#3909](https://github.com/yandex/ClickHouse/pull/3909) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix UB in StorageMerge. [#3910](https://github.com/yandex/ClickHouse/pull/3910) ([amosbird](https://github.com/amosbird))
* Fixed segfault in functions `addDays`, `subtractDays`. [#3913](https://github.com/yandex/ClickHouse/pull/3913) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error: functions `round`, `floor`, `trunc`, `ceil` may return bogus result when executed on integer argument and large negative scale. [#3914](https://github.com/yandex/ClickHouse/pull/3914) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed a bug introduced by 'kill query sync' which leads to a core dump. [#3916](https://github.com/yandex/ClickHouse/pull/3916) ([fancyqlx](https://github.com/fancyqlx))
* Fix bug with long delay after empty replication queue. [#3928](https://github.com/yandex/ClickHouse/pull/3928) ([alesapin](https://github.com/alesapin))
* Don't do exponential backoff when there is nothing to do for task. [#3932](https://github.com/yandex/ClickHouse/pull/3932) ([alesapin](https://github.com/alesapin))
* Fix a bug that led to hangups in threads that perform ALTERs of Replicated tables and in the thread that updates configuration from ZooKeeper. #2947 #3891 [#3934](https://github.com/yandex/ClickHouse/pull/3934) ([ztlpn](https://github.com/ztlpn))
* Fixed error in internal implementation of `quantileTDigest` (found by Artem Vakhrushev). This error never happens in ClickHouse and was relevant only for those who use ClickHouse codebase as a library directly. [#3935](https://github.com/yandex/ClickHouse/pull/3935) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug with wrong prefix for ipv4 subnet masks. [#3945](https://github.com/yandex/ClickHouse/pull/3945) ([alesapin](https://github.com/alesapin))
* Fix a bug when `from_zk` config elements weren't refreshed after a request to ZooKeeper timed out. #2947 [#3947](https://github.com/yandex/ClickHouse/pull/3947) ([ztlpn](https://github.com/ztlpn))
* Fixed dictionary copying at LowCardinality::cloneEmpty() method which lead to excessive memory usage in case of inserting into table with LowCardinality primary key. [#3955](https://github.com/yandex/ClickHouse/pull/3955) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed crash (`std::terminate`) in rare cases when a new thread cannot be created due to exhausted resources. [#3956](https://github.com/yandex/ClickHouse/pull/3956) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix user and password forwarding for replicated tables queries. [#3957](https://github.com/yandex/ClickHouse/pull/3957) ([alesapin](https://github.com/alesapin))
* Fixed very rare race condition that can happen when listing tables in Dictionary database while reloading dictionaries. [#3970](https://github.com/yandex/ClickHouse/pull/3970) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed LowCardinality serialization for Native format in case of empty arrays. #3907 [#4011](https://github.com/yandex/ClickHouse/pull/4011) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed incorrect result while using distinct by single LowCardinality numeric column. #3895 [#4012](https://github.com/yandex/ClickHouse/pull/4012) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Make compiled_expression_cache_size setting limited by default. [#4041](https://github.com/yandex/ClickHouse/pull/4041) ([alesapin](https://github.com/alesapin))
* Fix ubsan bug in compression codecs. [#4069](https://github.com/yandex/ClickHouse/pull/4069) ([alesapin](https://github.com/alesapin))
* Allow Kafka Engine to ignore some number of parsing errors per block. [#4094](https://github.com/yandex/ClickHouse/pull/4094) ([abyss7](https://github.com/abyss7))
* Fixed glibc compatibility issues. [#4100](https://github.com/yandex/ClickHouse/pull/4100) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed issues found by PVS-Studio. [#4103](https://github.com/yandex/ClickHouse/pull/4103) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix a way how to collect array join columns. [#4121](https://github.com/yandex/ClickHouse/pull/4121) ([4ertus2](https://github.com/4ertus2))
* Fixed incorrect result when HAVING was used with ROLLUP or CUBE. [#3756](https://github.com/yandex/ClickHouse/issues/3756) [#3837](https://github.com/yandex/ClickHouse/pull/3837) ([reflection](https://github.com/reflection))
* Fixed specialized aggregation with LowCardinality key (in case when `compile` setting is enabled). [#3886](https://github.com/yandex/ClickHouse/pull/3886) ([KochetovNicolai](https://github.com/KochetovNicolai))
* Fixed data type check in type conversion functions. [#3896](https://github.com/yandex/ClickHouse/pull/3896) ([zhang2014](https://github.com/zhang2014))
* Fixed column aliases for query with `JOIN ON` syntax and distributed tables. [#3980](https://github.com/yandex/ClickHouse/pull/3980) ([zhang2014](https://github.com/zhang2014))
* Fixed issues detected by UBSan. [#3021](https://github.com/yandex/ClickHouse/pull/3021) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Doc fixes
* Translated table engines related part to Chinese. [#3844](https://github.com/yandex/ClickHouse/pull/3844) ([lamber-ken](https://github.com/lamber-ken))
* Fixed `toStartOfFiveMinute` description. [#4096](https://github.com/yandex/ClickHouse/pull/4096) ([cheesedosa](https://github.com/cheesedosa))
* Added description for client `--secure` argument. [#3961](https://github.com/yandex/ClickHouse/pull/3961) ([vicdashkov](https://github.com/vicdashkov))
* Added descriptions for settings `merge_tree_uniform_read_distribution`, `merge_tree_min_rows_for_concurrent_read`, `merge_tree_min_rows_for_seek`, `merge_tree_coarse_index_granularity`, `merge_tree_max_rows_to_use_cache` [#4024](https://github.com/yandex/ClickHouse/pull/4024) ([BayoNet](https://github.com/BayoNet))
* Minor doc fixes. [#4098](https://github.com/yandex/ClickHouse/pull/4098) ([blinkov](https://github.com/blinkov))
* Updated example for zookeeper config setting. [#3883](https://github.com/yandex/ClickHouse/pull/3883) [#3894](https://github.com/yandex/ClickHouse/pull/3894) ([ogorbacheva](https://github.com/ogorbacheva))
* Updated info about escaping in formats Vertical, Pretty and VerticalRaw. [#4118](https://github.com/yandex/ClickHouse/pull/4118) ([ogorbacheva](https://github.com/ogorbacheva))
* Adding description of the functions for working with UUID. [#4059](https://github.com/yandex/ClickHouse/pull/4059) ([ogorbacheva](https://github.com/ogorbacheva))
* Add the description of the CHECK TABLE query. [#3881](https://github.com/yandex/ClickHouse/pull/3881) [#4043](https://github.com/yandex/ClickHouse/pull/4043) ([ogorbacheva](https://github.com/ogorbacheva))
* Add `zh/tests` doc translate to Chinese. [#4034](https://github.com/yandex/ClickHouse/pull/4034) ([sundy-li](https://github.com/sundy-li))
* Added documentation about functions `multiPosition`, `firstMatch`, `multiSearch`. [#4123](https://github.com/yandex/ClickHouse/pull/4123) ([danlark1](https://github.com/danlark1))
* Add puppet module to the list of the third party libraries. [#3862](https://github.com/yandex/ClickHouse/pull/3862) ([Felixoid](https://github.com/Felixoid))
* Fixed typo in the English version of Creating a Table example [#3872](https://github.com/yandex/ClickHouse/pull/3872) ([areldar](https://github.com/areldar))
* Mention about nagios plugin for ClickHouse [#3878](https://github.com/yandex/ClickHouse/pull/3878) ([lisuml](https://github.com/lisuml))
* Update of query language syntax description. [#4065](https://github.com/yandex/ClickHouse/pull/4065) ([BayoNet](https://github.com/BayoNet))
* Added documentation for per-column compression codecs. [#4073](https://github.com/yandex/ClickHouse/pull/4073) ([alex-krash](https://github.com/alex-krash))
* Updated articles about CollapsingMergeTree, GraphiteMergeTree, Replicated*MergeTree, `CREATE TABLE` query [#4085](https://github.com/yandex/ClickHouse/pull/4085) ([BayoNet](https://github.com/BayoNet))
* Other minor improvements. [#3897](https://github.com/yandex/ClickHouse/pull/3897) [#3923](https://github.com/yandex/ClickHouse/pull/3923) [#4066](https://github.com/yandex/ClickHouse/pull/4066) [#3860](https://github.com/yandex/ClickHouse/pull/3860) [#3906](https://github.com/yandex/ClickHouse/pull/3906) [#3936](https://github.com/yandex/ClickHouse/pull/3936) [#3975](https://github.com/yandex/ClickHouse/pull/3975) ([ogorbacheva](https://github.com/ogorbacheva)) ([ogorbacheva](https://github.com/ogorbacheva)) ([ogorbacheva](https://github.com/ogorbacheva)) ([blinkov](https://github.com/blinkov)) ([blinkov](https://github.com/blinkov)) ([sdk2](https://github.com/sdk2)) ([blinkov](https://github.com/blinkov))
### Other
* Updated librdkafka to v1.0.0-RC5. Used cppkafka instead of raw C interface. [#4025](https://github.com/yandex/ClickHouse/pull/4025) ([abyss7](https://github.com/abyss7))
* Fixed `hidden` on page title [#4033](https://github.com/yandex/ClickHouse/pull/4033) ([xboston](https://github.com/xboston))
* Updated year in copyright to 2019. [#4039](https://github.com/yandex/ClickHouse/pull/4039) ([xboston](https://github.com/xboston))
* Added check that server process is started from the data directory's owner. Do not start server from root. [#3785](https://github.com/yandex/ClickHouse/pull/3785) ([sergey-v-galtsev](https://github.com/sergey-v-galtsev))
* Removed function `shardByHash`. [#3833](https://github.com/yandex/ClickHouse/pull/3833) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed typo in ClusterCopier. [#3854](https://github.com/yandex/ClickHouse/pull/3854) ([dqminh](https://github.com/dqminh))
* Minor grammar fixes. [#3855](https://github.com/yandex/ClickHouse/pull/3855) ([intgr](https://github.com/intgr))
* Added test script to reproduce performance degradation in jemalloc. [#4036](https://github.com/yandex/ClickHouse/pull/4036) ([alexey-milovidov](https://github.com/alexey-milovidov))
## ClickHouse release 18.16.1, 2018-12-21
### Bug fixes:

View File

@ -3,6 +3,21 @@ cmake_minimum_required (VERSION 3.3)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules/")
option(ENABLE_IPO "Enable inter-procedural optimization (aka LTO)" OFF) # need cmake 3.9+
if(ENABLE_IPO)
cmake_policy(SET CMP0069 NEW)
include(CheckIPOSupported)
check_ipo_supported(RESULT IPO_SUPPORTED OUTPUT IPO_NOT_SUPPORTED)
if(IPO_SUPPORTED)
message(STATUS "IPO/LTO is supported, enabling")
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)
else()
message(STATUS "IPO/LTO is not supported: <${IPO_NOT_SUPPORTED}>")
endif()
else()
message(STATUS "IPO/LTO not enabled.")
endif()
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
# Require at least gcc 7
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7 AND NOT CMAKE_VERSION VERSION_LESS 2.8.9)
@ -81,7 +96,7 @@ option (ENABLE_TESTS "Enables tests" ON)
if (CMAKE_SYSTEM_PROCESSOR MATCHES "amd64|x86_64")
option (USE_INTERNAL_MEMCPY "Use internal implementation of 'memcpy' function instead of provided by libc. Only for x86_64." ON)
if (OS_LINUX AND NOT UNBUNDLED)
if (OS_LINUX AND NOT UNBUNDLED AND MAKE_STATIC_LIBRARIES)
option (GLIBC_COMPATIBILITY "Set to TRUE to enable compatibility with older glibc libraries. Only for x86_64, Linux. Implies USE_INTERNAL_MEMCPY." ON)
if (GLIBC_COMPATIBILITY)
message (STATUS "Some symbols from glibc will be replaced for compatibility")
@ -120,7 +135,9 @@ else()
message(STATUS "Disabling compiler -pipe option (have only ${AVAILABLE_PHYSICAL_MEMORY} mb of memory)")
endif()
include (cmake/test_cpu.cmake)
if(NOT DISABLE_CPU_OPTIMIZE)
include(cmake/test_cpu.cmake)
endif()
if(NOT COMPILER_CLANG) # clang: error: the clang compiler does not support '-march=native'
option(ARCH_NATIVE "Enable -march=native compiler flag" ${ARCH_ARM})
@ -229,9 +246,13 @@ include (cmake/find_re2.cmake)
include (cmake/find_rdkafka.cmake)
include (cmake/find_capnp.cmake)
include (cmake/find_llvm.cmake)
include (cmake/find_cpuid.cmake)
include (cmake/find_cpuid.cmake) # Freebsd, bundled
if (NOT USE_CPUID)
include (cmake/find_cpuinfo.cmake) # Debian
endif()
include (cmake/find_libgsasl.cmake)
include (cmake/find_libxml2.cmake)
include (cmake/find_protobuf.cmake)
include (cmake/find_hdfs3.cmake)
include (cmake/find_consistent-hashing.cmake)
include (cmake/find_base64.cmake)

View File

@ -28,7 +28,7 @@ find_library(METROHASH_LIBRARIES
find_path(METROHASH_INCLUDE_DIR
NAMES metrohash.h
PATHS ${METROHASH_ROOT_DIR}/include ${METROHASH_INCLUDE_PATHS}
PATHS ${METROHASH_ROOT_DIR}/include PATH_SUFFIXES metrohash ${METROHASH_INCLUDE_PATHS}
)
include(FindPackageHandleStandardArgs)

View File

@ -2,11 +2,11 @@ if (NOT ARCH_ARM)
option (USE_INTERNAL_CPUID_LIBRARY "Set to FALSE to use system cpuid library instead of bundled" ${NOT_UNBUNDLED})
endif ()
#if (USE_INTERNAL_CPUID_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcpuid/include/cpuid/libcpuid.h")
# message (WARNING "submodule contrib/libcpuid is missing. to fix try run: \n git submodule update --init --recursive")
# set (USE_INTERNAL_CPUID_LIBRARY 0)
# set (MISSING_INTERNAL_CPUID_LIBRARY 1)
#endif ()
if (USE_INTERNAL_CPUID_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcpuid/CMakeLists.txt")
message (WARNING "submodule contrib/libcpuid is missing. to fix try run: \n git submodule update --init --recursive")
set (USE_INTERNAL_CPUID_LIBRARY 0)
set (MISSING_INTERNAL_CPUID_LIBRARY 1)
endif ()
if (NOT USE_INTERNAL_CPUID_LIBRARY)
find_library (CPUID_LIBRARY cpuid)
@ -20,10 +20,12 @@ if (CPUID_LIBRARY AND CPUID_INCLUDE_DIR)
add_definitions(-DHAVE_STDINT_H)
# TODO: make virtual target cpuid:cpuid with COMPILE_DEFINITIONS property
endif ()
set (USE_CPUID 1)
elseif (NOT MISSING_INTERNAL_CPUID_LIBRARY)
set (CPUID_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libcpuid/include)
set (USE_INTERNAL_CPUID_LIBRARY 1)
set (CPUID_LIBRARY cpuid)
set (USE_CPUID 1)
endif ()
message (STATUS "Using cpuid: ${CPUID_INCLUDE_DIR} : ${CPUID_LIBRARY}")
message (STATUS "Using cpuid=${USE_CPUID}: ${CPUID_INCLUDE_DIR} : ${CPUID_LIBRARY}")

17
cmake/find_cpuinfo.cmake Normal file
View File

@ -0,0 +1,17 @@
option(USE_INTERNAL_CPUINFO_LIBRARY "Set to FALSE to use system cpuinfo library instead of bundled" ${NOT_UNBUNDLED})
if(NOT USE_INTERNAL_CPUINFO_LIBRARY)
find_library(CPUINFO_LIBRARY cpuinfo)
find_path(CPUINFO_INCLUDE_DIR NAMES cpuinfo.h PATHS ${CPUINFO_INCLUDE_PATHS})
endif()
if(CPUID_LIBRARY AND CPUID_INCLUDE_DIR)
set(USE_CPUINFO 1)
elseif(NOT MISSING_INTERNAL_CPUINFO_LIBRARY)
set(CPUINFO_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libcpuinfo/include)
set(USE_INTERNAL_CPUINFO_LIBRARY 1)
set(CPUINFO_LIBRARY cpuinfo)
set(USE_CPUINFO 1)
endif()
message(STATUS "Using cpuinfo=${USE_CPUINFO}: ${CPUINFO_INCLUDE_DIR} : ${CPUINFO_LIBRARY}")

View File

@ -8,13 +8,22 @@ if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest/CMakeList
set (MISSING_INTERNAL_GTEST_LIBRARY 1)
endif ()
if (NOT USE_INTERNAL_GTEST_LIBRARY)
find_package (GTest)
endif ()
if (NOT GTEST_INCLUDE_DIRS AND NOT MISSING_INTERNAL_GTEST_LIBRARY)
if(NOT USE_INTERNAL_GTEST_LIBRARY)
# TODO: autodetect of GTEST_SRC_DIR by EXISTS /usr/src/googletest/CMakeLists.txt
if(NOT GTEST_SRC_DIR)
find_package(GTest)
endif()
endif()
if (NOT GTEST_SRC_DIR AND NOT GTEST_INCLUDE_DIRS AND NOT MISSING_INTERNAL_GTEST_LIBRARY)
set (USE_INTERNAL_GTEST_LIBRARY 1)
set (GTEST_MAIN_LIBRARIES gtest_main)
set (GTEST_INCLUDE_DIRS ${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest)
endif ()
message (STATUS "Using gtest: ${GTEST_INCLUDE_DIRS} : ${GTEST_MAIN_LIBRARIES}")
if((GTEST_INCLUDE_DIRS AND GTEST_MAIN_LIBRARIES) OR GTEST_SRC_DIR)
set(USE_GTEST 1)
endif()
message (STATUS "Using gtest=${USE_GTEST}: ${GTEST_INCLUDE_DIRS} : ${GTEST_MAIN_LIBRARIES} : ${GTEST_SRC_DIR}")

View File

@ -1,18 +1,29 @@
option (USE_INTERNAL_PROTOBUF_LIBRARY "Set to FALSE to use system protobuf instead of bundled" ON)
option(USE_INTERNAL_PROTOBUF_LIBRARY "Set to FALSE to use system protobuf instead of bundled" ${NOT_UNBUNDLED})
if (NOT USE_INTERNAL_PROTOBUF_LIBRARY)
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/protobuf/cmake/CMakeLists.txt")
if(USE_INTERNAL_PROTOBUF_LIBRARY)
message(WARNING "submodule contrib/protobuf is missing. to fix try run: \n git submodule update --init --recursive")
set(USE_INTERNAL_PROTOBUF_LIBRARY 0)
endif()
set(MISSING_INTERNAL_PROTOBUF_LIBRARY 1)
endif()
if(NOT USE_INTERNAL_PROTOBUF_LIBRARY)
find_package(Protobuf)
endif ()
endif()
if (Protobuf_LIBRARY AND Protobuf_INCLUDE_DIR)
else ()
set(Protobuf_INCLUDE_DIR ${CMAKE_SOURCE_DIR}/contrib/protobuf/src)
set(USE_PROTOBUF 1)
elseif(NOT MISSING_INTERNAL_PROTOBUF_LIBRARY)
set(Protobuf_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/protobuf/src)
set(USE_PROTOBUF 1)
set(USE_INTERNAL_PROTOBUF_LIBRARY 1)
set(Protobuf_LIBRARY libprotobuf)
set(Protobuf_PROTOC_LIBRARY libprotoc)
set(Protobuf_LITE_LIBRARY libprotobuf-lite)
set(Protobuf_PROTOC_EXECUTABLE ${CMAKE_BINARY_DIR}/contrib/protobuf/cmake/protoc)
set(Protobuf_PROTOC_EXECUTABLE ${ClickHouse_BINARY_DIR}/contrib/protobuf/cmake/protoc)
if(NOT DEFINED PROTOBUF_GENERATE_CPP_APPEND_PATH)
set(PROTOBUF_GENERATE_CPP_APPEND_PATH TRUE)
@ -77,4 +88,4 @@ else ()
endfunction()
endif()
message (STATUS "Using protobuf: ${Protobuf_INCLUDE_DIR} : ${Protobuf_LIBRARY}")
message(STATUS "Using protobuf=${USE_PROTOBUF}: ${Protobuf_INCLUDE_DIR} : ${Protobuf_LIBRARY}")

View File

@ -2,6 +2,11 @@ if (NOT ARCH_ARM AND NOT ARCH_32 AND NOT APPLE)
option (ENABLE_RDKAFKA "Enable kafka" ON)
endif ()
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/cppkafka/CMakeLists.txt")
message (WARNING "submodule contrib/cppkafka is missing. to fix try run: \n git submodule update --init --recursive")
set (ENABLE_RDKAFKA 0)
endif ()
if (ENABLE_RDKAFKA)
if (OS_LINUX AND NOT ARCH_ARM)

View File

@ -14,6 +14,7 @@ if (ZSTD_LIBRARY AND ZSTD_INCLUDE_DIR)
else ()
set (USE_INTERNAL_ZSTD_LIBRARY 1)
set (ZSTD_LIBRARY zstd)
set (ZSTD_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/zstd/lib)
endif ()
message (STATUS "Using zstd: ${ZSTD_INCLUDE_DIR} : ${ZSTD_LIBRARY}")

View File

@ -2,4 +2,5 @@ set(DIVIDE_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libdivide)
set(COMMON_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/libs/libcommon/include ${ClickHouse_BINARY_DIR}/libs/libcommon/include)
set(DBMS_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/dbms/src ${ClickHouse_BINARY_DIR}/dbms/src)
set(DOUBLE_CONVERSION_CONTRIB_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/double-conversion)
set(METROHASH_CONTRIB_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libmetrohash/src)
set(PCG_RANDOM_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/libpcg-random/include)

View File

@ -107,6 +107,11 @@ if (USE_INTERNAL_SSL_LIBRARY)
if (NOT MAKE_STATIC_LIBRARIES)
set (BUILD_SHARED 1)
endif ()
# By default, ${CMAKE_INSTALL_PREFIX}/etc/ssl is selected - that is not what we need.
# We need to use system wide ssl directory.
set (OPENSSLDIR "/etc/ssl")
set (LIBRESSL_SKIP_INSTALL 1 CACHE INTERNAL "")
add_subdirectory (ssl)
target_include_directories(${OPENSSL_CRYPTO_LIBRARY} SYSTEM PUBLIC ${OPENSSL_INCLUDE_DIR})
@ -166,13 +171,16 @@ if (USE_INTERNAL_POCO_LIBRARY)
endif ()
endif ()
if (USE_INTERNAL_GTEST_LIBRARY)
if(USE_INTERNAL_GTEST_LIBRARY)
# Google Test from sources
add_subdirectory(${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest ${CMAKE_CURRENT_BINARY_DIR}/googletest)
# avoid problems with <regexp.h>
target_compile_definitions (gtest INTERFACE GTEST_HAS_POSIX_RE=0)
target_include_directories (gtest SYSTEM INTERFACE ${ClickHouse_SOURCE_DIR}/contrib/googletest/include)
endif ()
elseif(GTEST_SRC_DIR)
add_subdirectory(${GTEST_SRC_DIR}/googletest ${CMAKE_CURRENT_BINARY_DIR}/googletest)
target_compile_definitions(gtest INTERFACE GTEST_HAS_POSIX_RE=0)
endif()
if (USE_INTERNAL_LLVM_LIBRARY)
file(GENERATE OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/empty.cpp CONTENT " ")
@ -207,14 +215,14 @@ if (USE_INTERNAL_LIBXML2_LIBRARY)
add_subdirectory(libxml2-cmake)
endif ()
if (USE_INTERNAL_PROTOBUF_LIBRARY)
set(protobuf_BUILD_TESTS OFF CACHE INTERNAL "" FORCE)
set(protobuf_BUILD_SHARED_LIBS OFF CACHE INTERNAL "" FORCE)
set(protobuf_WITH_ZLIB 0 CACHE INTERNAL "" FORCE) # actually will use zlib, but skip find
add_subdirectory(protobuf/cmake)
endif ()
if (USE_INTERNAL_HDFS3_LIBRARY)
include(${ClickHouse_SOURCE_DIR}/cmake/find_protobuf.cmake)
if (USE_INTERNAL_PROTOBUF_LIBRARY)
set(protobuf_BUILD_TESTS OFF CACHE INTERNAL "" FORCE)
set(protobuf_BUILD_SHARED_LIBS OFF CACHE INTERNAL "" FORCE)
set(protobuf_WITH_ZLIB 0 CACHE INTERNAL "" FORCE) # actually will use zlib, but skip find
add_subdirectory(protobuf/cmake)
endif ()
add_subdirectory(libhdfs3-cmake)
endif ()

2
contrib/jemalloc vendored

@ -1 +1 @@
Subproject commit 41b7372eadee941b9164751b8d4963f915d3ceae
Subproject commit cd2931ad9bbd78208565716ab102e86d858c2fff

View File

@ -1,5 +1,5 @@
if (HAVE_SSE42) # Not used. Pretty easy to port.
set (SOURCES_SSE42_ONLY src/metrohash128crc.cpp)
set (SOURCES_SSE42_ONLY src/metrohash128crc.cpp src/metrohash128crc.h)
endif ()
add_library(metrohash

View File

@ -1,22 +1,201 @@
The MIT License (MIT)
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Copyright (c) 2015 J. Andrew Rogers
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
1. Definitions.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -5,12 +5,44 @@ MetroHash is a set of state-of-the-art hash functions for *non-cryptographic* us
* Fastest general-purpose functions for bulk hashing.
* Fastest general-purpose functions for small, variable length keys.
* Robust statistical bias profile, similar to the MD5 cryptographic hash.
* Hashes can be constructed incrementally (**new**)
* 64-bit, 128-bit, and 128-bit CRC variants currently available.
* Optimized for modern x86-64 microarchitectures.
* Elegant, compact, readable functions.
You can read more about the design and history [here](http://www.jandrewrogers.com/2015/05/27/metrohash/).
## News
### 23 October 2018
The project has been re-licensed under Apache License v2.0. The purpose of this license change is consistency with the imminent release of MetroHash v2.0, which is also licensed under the Apache license.
### 27 July 2015
Two new 64-bit and 128-bit algorithms add the ability to construct hashes incrementally. In addition to supporting incremental construction, the algorithms are slightly superior to the prior versions.
A big change is that these new algorithms are implemented as C++ classes that support both incremental and stateless hashing. These classes also have a static method for verifying the implementation against the test vectors built into the classes. Implementations are now fully contained by their respective headers e.g. "metrohash128.h".
*Note: an incremental version of the 128-bit CRC version is on its way but is not included in this push.*
**Usage Example For Stateless Hashing**
`MetroHash128::Hash(key, key_length, hash_ptr, seed)`
**Usage Example For Incremental Hashing**
`MetroHash128 hasher;`
`hasher.Update(partial_key, partial_key_length);`
`...`
`hasher.Update(partial_key, partial_key_length);`
`hasher.Finalize(hash_ptr);`
An `Initialize(seed)` method allows the hasher objects to be reused.
### 27 May 2015
Six hash functions have been included in the initial release:
* 64-bit hash functions, "metrohash64_1" and "metrohash64_2"

View File

@ -1,7 +1,4 @@
origin: git@github.com:jandrewrogers/MetroHash.git
commit d9dee18a54a8a6766e24c1950b814ac7ca9d1a89
Merge: 761e8a4 3d06b24
origin: https://github.com/jandrewrogers/MetroHash.git
commit 690a521d9beb2e1050cc8f273fdabc13b31bf8f6 tag: v1.1.3
Author: J. Andrew Rogers <andrew@jarbox.org>
Date: Sat Jun 6 16:12:06 2015 -0700
modified README
Date: Tue Oct 23 09:49:53 2018 -0700

View File

@ -1,73 +1,24 @@
// metrohash.h
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_METROHASH_H
#define METROHASH_METROHASH_H
#include <stdint.h>
#include <string.h>
// MetroHash 64-bit hash functions
void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash64_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
// MetroHash 128-bit hash functions
void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
// MetroHash 128-bit hash functions using CRC instruction
void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128crc_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
/* rotate right idiom recognized by compiler*/
inline static uint64_t rotate_right(uint64_t v, unsigned k)
{
return (v >> k) | (v << (64 - k));
}
// unaligned reads, fast and safe on Nehalem and later microarchitectures
inline static uint64_t read_u64(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint64_t*>(ptr));
}
inline static uint64_t read_u32(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint32_t*>(ptr));
}
inline static uint64_t read_u16(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint16_t*>(ptr));
}
inline static uint64_t read_u8 (const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint8_t *>(ptr));
}
#include "metrohash64.h"
#include "metrohash128.h"
#include "metrohash128crc.h"
#endif // #ifndef METROHASH_METROHASH_H

View File

@ -1,29 +1,260 @@
// metrohash128.cpp
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <string.h>
#include "platform.h"
#include "metrohash128.h"
const char * MetroHash128::test_string = "012345678901234567890123456789012345678901234567890123456789012";
const uint8_t MetroHash128::test_seed_0[16] = {
0xC7, 0x7C, 0xE2, 0xBF, 0xA4, 0xED, 0x9F, 0x9B,
0x05, 0x48, 0xB2, 0xAC, 0x50, 0x74, 0xA2, 0x97
};
const uint8_t MetroHash128::test_seed_1[16] = {
0x45, 0xA3, 0xCD, 0xB8, 0x38, 0x19, 0x9D, 0x7F,
0xBD, 0xD6, 0x8D, 0x86, 0x7A, 0x14, 0xEC, 0xEF
};
MetroHash128::MetroHash128(const uint64_t seed)
{
Initialize(seed);
}
void MetroHash128::Initialize(const uint64_t seed)
{
// initialize internal hash registers
state.v[0] = (static_cast<uint64_t>(seed) - k0) * k3;
state.v[1] = (static_cast<uint64_t>(seed) + k1) * k2;
state.v[2] = (static_cast<uint64_t>(seed) + k0) * k2;
state.v[3] = (static_cast<uint64_t>(seed) - k1) * k3;
// initialize total length of input
bytes = 0;
}
void MetroHash128::Update(const uint8_t * const buffer, const uint64_t length)
{
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
const uint8_t * const end = ptr + length;
// input buffer may be partially filled
if (bytes % 32)
{
uint64_t fill = 32 - (bytes % 32);
if (fill > length)
fill = length;
memcpy(input.b + (bytes % 32), ptr, static_cast<size_t>(fill));
ptr += fill;
bytes += fill;
// input buffer is still partially filled
if ((bytes % 32) != 0) return;
// process full input buffer
state.v[0] += read_u64(&input.b[ 0]) * k0; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
state.v[1] += read_u64(&input.b[ 8]) * k1; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
state.v[2] += read_u64(&input.b[16]) * k2; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
state.v[3] += read_u64(&input.b[24]) * k3; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
}
// bulk update
bytes += (end - ptr);
while (ptr <= (end - 32))
{
// process directly from the source, bypassing the input buffer
state.v[0] += read_u64(ptr) * k0; ptr += 8; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
state.v[1] += read_u64(ptr) * k1; ptr += 8; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
state.v[2] += read_u64(ptr) * k2; ptr += 8; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
state.v[3] += read_u64(ptr) * k3; ptr += 8; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
}
// store remaining bytes in input buffer
if (ptr < end)
memcpy(input.b, ptr, end - ptr);
}
void MetroHash128::Finalize(uint8_t * const hash)
{
// finalize bulk loop, if used
if (bytes >= 32)
{
state.v[2] ^= rotate_right(((state.v[0] + state.v[3]) * k0) + state.v[1], 21) * k1;
state.v[3] ^= rotate_right(((state.v[1] + state.v[2]) * k1) + state.v[0], 21) * k0;
state.v[0] ^= rotate_right(((state.v[0] + state.v[2]) * k0) + state.v[3], 21) * k1;
state.v[1] ^= rotate_right(((state.v[1] + state.v[3]) * k1) + state.v[2], 21) * k0;
}
// process any bytes remaining in the input buffer
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(input.b);
const uint8_t * const end = ptr + (bytes % 32);
if ((end - ptr) >= 16)
{
state.v[0] += read_u64(ptr) * k2; ptr += 8; state.v[0] = rotate_right(state.v[0],33) * k3;
state.v[1] += read_u64(ptr) * k2; ptr += 8; state.v[1] = rotate_right(state.v[1],33) * k3;
state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 45) * k1;
state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 45) * k0;
}
if ((end - ptr) >= 8)
{
state.v[0] += read_u64(ptr) * k2; ptr += 8; state.v[0] = rotate_right(state.v[0],33) * k3;
state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 27) * k1;
}
if ((end - ptr) >= 4)
{
state.v[1] += read_u32(ptr) * k2; ptr += 4; state.v[1] = rotate_right(state.v[1],33) * k3;
state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 46) * k0;
}
if ((end - ptr) >= 2)
{
state.v[0] += read_u16(ptr) * k2; ptr += 2; state.v[0] = rotate_right(state.v[0],33) * k3;
state.v[0] ^= rotate_right((state.v[0] * k2) + state.v[1], 22) * k1;
}
if ((end - ptr) >= 1)
{
state.v[1] += read_u8 (ptr) * k2; state.v[1] = rotate_right(state.v[1],33) * k3;
state.v[1] ^= rotate_right((state.v[1] * k3) + state.v[0], 58) * k0;
}
state.v[0] += rotate_right((state.v[0] * k0) + state.v[1], 13);
state.v[1] += rotate_right((state.v[1] * k1) + state.v[0], 37);
state.v[0] += rotate_right((state.v[0] * k2) + state.v[1], 13);
state.v[1] += rotate_right((state.v[1] * k3) + state.v[0], 37);
bytes = 0;
// do any endian conversion here
memcpy(hash, state.v, 16);
}
void MetroHash128::Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed)
{
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
const uint8_t * const end = ptr + length;
uint64_t v[4];
v[0] = (static_cast<uint64_t>(seed) - k0) * k3;
v[1] = (static_cast<uint64_t>(seed) + k1) * k2;
if (length >= 32)
{
v[2] = (static_cast<uint64_t>(seed) + k0) * k2;
v[3] = (static_cast<uint64_t>(seed) - k1) * k3;
do
{
v[0] += read_u64(ptr) * k0; ptr += 8; v[0] = rotate_right(v[0],29) + v[2];
v[1] += read_u64(ptr) * k1; ptr += 8; v[1] = rotate_right(v[1],29) + v[3];
v[2] += read_u64(ptr) * k2; ptr += 8; v[2] = rotate_right(v[2],29) + v[0];
v[3] += read_u64(ptr) * k3; ptr += 8; v[3] = rotate_right(v[3],29) + v[1];
}
while (ptr <= (end - 32));
v[2] ^= rotate_right(((v[0] + v[3]) * k0) + v[1], 21) * k1;
v[3] ^= rotate_right(((v[1] + v[2]) * k1) + v[0], 21) * k0;
v[0] ^= rotate_right(((v[0] + v[2]) * k0) + v[3], 21) * k1;
v[1] ^= rotate_right(((v[1] + v[3]) * k1) + v[2], 21) * k0;
}
if ((end - ptr) >= 16)
{
v[0] += read_u64(ptr) * k2; ptr += 8; v[0] = rotate_right(v[0],33) * k3;
v[1] += read_u64(ptr) * k2; ptr += 8; v[1] = rotate_right(v[1],33) * k3;
v[0] ^= rotate_right((v[0] * k2) + v[1], 45) * k1;
v[1] ^= rotate_right((v[1] * k3) + v[0], 45) * k0;
}
if ((end - ptr) >= 8)
{
v[0] += read_u64(ptr) * k2; ptr += 8; v[0] = rotate_right(v[0],33) * k3;
v[0] ^= rotate_right((v[0] * k2) + v[1], 27) * k1;
}
if ((end - ptr) >= 4)
{
v[1] += read_u32(ptr) * k2; ptr += 4; v[1] = rotate_right(v[1],33) * k3;
v[1] ^= rotate_right((v[1] * k3) + v[0], 46) * k0;
}
if ((end - ptr) >= 2)
{
v[0] += read_u16(ptr) * k2; ptr += 2; v[0] = rotate_right(v[0],33) * k3;
v[0] ^= rotate_right((v[0] * k2) + v[1], 22) * k1;
}
if ((end - ptr) >= 1)
{
v[1] += read_u8 (ptr) * k2; v[1] = rotate_right(v[1],33) * k3;
v[1] ^= rotate_right((v[1] * k3) + v[0], 58) * k0;
}
v[0] += rotate_right((v[0] * k0) + v[1], 13);
v[1] += rotate_right((v[1] * k1) + v[0], 37);
v[0] += rotate_right((v[0] * k2) + v[1], 13);
v[1] += rotate_right((v[1] * k3) + v[0], 37);
// do any endian conversion here
memcpy(hash, v, 16);
}
bool MetroHash128::ImplementationVerified()
{
uint8_t hash[16];
const uint8_t * key = reinterpret_cast<const uint8_t *>(MetroHash128::test_string);
// verify one-shot implementation
MetroHash128::Hash(key, strlen(MetroHash128::test_string), hash, 0);
if (memcmp(hash, MetroHash128::test_seed_0, 16) != 0) return false;
MetroHash128::Hash(key, strlen(MetroHash128::test_string), hash, 1);
if (memcmp(hash, MetroHash128::test_seed_1, 16) != 0) return false;
// verify incremental implementation
MetroHash128 metro;
metro.Initialize(0);
metro.Update(reinterpret_cast<const uint8_t *>(MetroHash128::test_string), strlen(MetroHash128::test_string));
metro.Finalize(hash);
if (memcmp(hash, MetroHash128::test_seed_0, 16) != 0) return false;
metro.Initialize(1);
metro.Update(reinterpret_cast<const uint8_t *>(MetroHash128::test_string), strlen(MetroHash128::test_string));
metro.Finalize(hash);
if (memcmp(hash, MetroHash128::test_seed_1, 16) != 0) return false;
return true;
}
#include "metrohash.h"
void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out)
{
@ -97,6 +328,8 @@ void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t *
v[0] += rotate_right((v[0] * k2) + v[1], 13);
v[1] += rotate_right((v[1] * k3) + v[0], 37);
// do any endian conversion here
memcpy(out, v, 16);
}
@ -173,6 +406,8 @@ void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t *
v[0] += rotate_right((v[0] * k2) + v[1], 33);
v[1] += rotate_right((v[1] * k3) + v[0], 33);
// do any endian conversion here
memcpy(out, v, 16);
}

View File

@ -0,0 +1,72 @@
// metrohash128.h
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_METROHASH_128_H
#define METROHASH_METROHASH_128_H
#include <stdint.h>
class MetroHash128
{
public:
static const uint32_t bits = 128;
// Constructor initializes the same as Initialize()
MetroHash128(const uint64_t seed=0);
// Initializes internal state for new hash with optional seed
void Initialize(const uint64_t seed=0);
// Update the hash state with a string of bytes. If the length
// is sufficiently long, the implementation switches to a bulk
// hashing algorithm directly on the argument buffer for speed.
void Update(const uint8_t * buffer, const uint64_t length);
// Constructs the final hash and writes it to the argument buffer.
// After a hash is finalized, this instance must be Initialized()-ed
// again or the behavior of Update() and Finalize() is undefined.
void Finalize(uint8_t * const hash);
// A non-incremental function implementation. This can be significantly
// faster than the incremental implementation for some usage patterns.
static void Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed=0);
// Does implementation correctly execute test vectors?
static bool ImplementationVerified();
// test vectors -- Hash(test_string, seed=0) => test_seed_0
static const char * test_string;
static const uint8_t test_seed_0[16];
static const uint8_t test_seed_1[16];
private:
static const uint64_t k0 = 0xC83A91E1;
static const uint64_t k1 = 0x8648DBDB;
static const uint64_t k2 = 0x7BDEC03B;
static const uint64_t k3 = 0x2F5870A5;
struct { uint64_t v[4]; } state;
struct { uint8_t b[32]; } input;
uint64_t bytes;
};
// Legacy 128-bit hash functions -- do not use
void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
#endif // #ifndef METROHASH_METROHASH_128_H

View File

@ -1,31 +1,24 @@
// metrohash128crc.cpp
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "metrohash.h"
#include <nmmintrin.h>
#include <string.h>
#include "metrohash.h"
#include "platform.h"
void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out)

View File

@ -0,0 +1,27 @@
// metrohash128crc.h
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_METROHASH_128_CRC_H
#define METROHASH_METROHASH_128_CRC_H
#include <stdint.h>
// Legacy 128-bit hash functions
void metrohash128crc_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128crc_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
#endif // #ifndef METROHASH_METROHASH_128_CRC_H

View File

@ -1,29 +1,257 @@
// metrohash64.cpp
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "platform.h"
#include "metrohash64.h"
#include <cstring>
const char * MetroHash64::test_string = "012345678901234567890123456789012345678901234567890123456789012";
const uint8_t MetroHash64::test_seed_0[8] = { 0x6B, 0x75, 0x3D, 0xAE, 0x06, 0x70, 0x4B, 0xAD };
const uint8_t MetroHash64::test_seed_1[8] = { 0x3B, 0x0D, 0x48, 0x1C, 0xF4, 0xB9, 0xB8, 0xDF };
MetroHash64::MetroHash64(const uint64_t seed)
{
Initialize(seed);
}
void MetroHash64::Initialize(const uint64_t seed)
{
vseed = (static_cast<uint64_t>(seed) + k2) * k0;
// initialize internal hash registers
state.v[0] = vseed;
state.v[1] = vseed;
state.v[2] = vseed;
state.v[3] = vseed;
// initialize total length of input
bytes = 0;
}
void MetroHash64::Update(const uint8_t * const buffer, const uint64_t length)
{
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
const uint8_t * const end = ptr + length;
// input buffer may be partially filled
if (bytes % 32)
{
uint64_t fill = 32 - (bytes % 32);
if (fill > length)
fill = length;
memcpy(input.b + (bytes % 32), ptr, static_cast<size_t>(fill));
ptr += fill;
bytes += fill;
// input buffer is still partially filled
if ((bytes % 32) != 0) return;
// process full input buffer
state.v[0] += read_u64(&input.b[ 0]) * k0; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
state.v[1] += read_u64(&input.b[ 8]) * k1; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
state.v[2] += read_u64(&input.b[16]) * k2; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
state.v[3] += read_u64(&input.b[24]) * k3; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
}
// bulk update
bytes += static_cast<uint64_t>(end - ptr);
while (ptr <= (end - 32))
{
// process directly from the source, bypassing the input buffer
state.v[0] += read_u64(ptr) * k0; ptr += 8; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
state.v[1] += read_u64(ptr) * k1; ptr += 8; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
state.v[2] += read_u64(ptr) * k2; ptr += 8; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
state.v[3] += read_u64(ptr) * k3; ptr += 8; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
}
// store remaining bytes in input buffer
if (ptr < end)
memcpy(input.b, ptr, static_cast<size_t>(end - ptr));
}
void MetroHash64::Finalize(uint8_t * const hash)
{
// finalize bulk loop, if used
if (bytes >= 32)
{
state.v[2] ^= rotate_right(((state.v[0] + state.v[3]) * k0) + state.v[1], 37) * k1;
state.v[3] ^= rotate_right(((state.v[1] + state.v[2]) * k1) + state.v[0], 37) * k0;
state.v[0] ^= rotate_right(((state.v[0] + state.v[2]) * k0) + state.v[3], 37) * k1;
state.v[1] ^= rotate_right(((state.v[1] + state.v[3]) * k1) + state.v[2], 37) * k0;
state.v[0] = vseed + (state.v[0] ^ state.v[1]);
}
// process any bytes remaining in the input buffer
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(input.b);
const uint8_t * const end = ptr + (bytes % 32);
if ((end - ptr) >= 16)
{
state.v[1] = state.v[0] + (read_u64(ptr) * k2); ptr += 8; state.v[1] = rotate_right(state.v[1],29) * k3;
state.v[2] = state.v[0] + (read_u64(ptr) * k2); ptr += 8; state.v[2] = rotate_right(state.v[2],29) * k3;
state.v[1] ^= rotate_right(state.v[1] * k0, 21) + state.v[2];
state.v[2] ^= rotate_right(state.v[2] * k3, 21) + state.v[1];
state.v[0] += state.v[2];
}
if ((end - ptr) >= 8)
{
state.v[0] += read_u64(ptr) * k3; ptr += 8;
state.v[0] ^= rotate_right(state.v[0], 55) * k1;
}
if ((end - ptr) >= 4)
{
state.v[0] += read_u32(ptr) * k3; ptr += 4;
state.v[0] ^= rotate_right(state.v[0], 26) * k1;
}
if ((end - ptr) >= 2)
{
state.v[0] += read_u16(ptr) * k3; ptr += 2;
state.v[0] ^= rotate_right(state.v[0], 48) * k1;
}
if ((end - ptr) >= 1)
{
state.v[0] += read_u8 (ptr) * k3;
state.v[0] ^= rotate_right(state.v[0], 37) * k1;
}
state.v[0] ^= rotate_right(state.v[0], 28);
state.v[0] *= k0;
state.v[0] ^= rotate_right(state.v[0], 29);
bytes = 0;
// do any endian conversion here
memcpy(hash, state.v, 8);
}
void MetroHash64::Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed)
{
const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
const uint8_t * const end = ptr + length;
uint64_t h = (static_cast<uint64_t>(seed) + k2) * k0;
if (length >= 32)
{
uint64_t v[4];
v[0] = h;
v[1] = h;
v[2] = h;
v[3] = h;
do
{
v[0] += read_u64(ptr) * k0; ptr += 8; v[0] = rotate_right(v[0],29) + v[2];
v[1] += read_u64(ptr) * k1; ptr += 8; v[1] = rotate_right(v[1],29) + v[3];
v[2] += read_u64(ptr) * k2; ptr += 8; v[2] = rotate_right(v[2],29) + v[0];
v[3] += read_u64(ptr) * k3; ptr += 8; v[3] = rotate_right(v[3],29) + v[1];
}
while (ptr <= (end - 32));
v[2] ^= rotate_right(((v[0] + v[3]) * k0) + v[1], 37) * k1;
v[3] ^= rotate_right(((v[1] + v[2]) * k1) + v[0], 37) * k0;
v[0] ^= rotate_right(((v[0] + v[2]) * k0) + v[3], 37) * k1;
v[1] ^= rotate_right(((v[1] + v[3]) * k1) + v[2], 37) * k0;
h += v[0] ^ v[1];
}
if ((end - ptr) >= 16)
{
uint64_t v0 = h + (read_u64(ptr) * k2); ptr += 8; v0 = rotate_right(v0,29) * k3;
uint64_t v1 = h + (read_u64(ptr) * k2); ptr += 8; v1 = rotate_right(v1,29) * k3;
v0 ^= rotate_right(v0 * k0, 21) + v1;
v1 ^= rotate_right(v1 * k3, 21) + v0;
h += v1;
}
if ((end - ptr) >= 8)
{
h += read_u64(ptr) * k3; ptr += 8;
h ^= rotate_right(h, 55) * k1;
}
if ((end - ptr) >= 4)
{
h += read_u32(ptr) * k3; ptr += 4;
h ^= rotate_right(h, 26) * k1;
}
if ((end - ptr) >= 2)
{
h += read_u16(ptr) * k3; ptr += 2;
h ^= rotate_right(h, 48) * k1;
}
if ((end - ptr) >= 1)
{
h += read_u8 (ptr) * k3;
h ^= rotate_right(h, 37) * k1;
}
h ^= rotate_right(h, 28);
h *= k0;
h ^= rotate_right(h, 29);
memcpy(hash, &h, 8);
}
bool MetroHash64::ImplementationVerified()
{
uint8_t hash[8];
const uint8_t * key = reinterpret_cast<const uint8_t *>(MetroHash64::test_string);
// verify one-shot implementation
MetroHash64::Hash(key, strlen(MetroHash64::test_string), hash, 0);
if (memcmp(hash, MetroHash64::test_seed_0, 8) != 0) return false;
MetroHash64::Hash(key, strlen(MetroHash64::test_string), hash, 1);
if (memcmp(hash, MetroHash64::test_seed_1, 8) != 0) return false;
// verify incremental implementation
MetroHash64 metro;
metro.Initialize(0);
metro.Update(reinterpret_cast<const uint8_t *>(MetroHash64::test_string), strlen(MetroHash64::test_string));
metro.Finalize(hash);
if (memcmp(hash, MetroHash64::test_seed_0, 8) != 0) return false;
metro.Initialize(1);
metro.Update(reinterpret_cast<const uint8_t *>(MetroHash64::test_string), strlen(MetroHash64::test_string));
metro.Finalize(hash);
if (memcmp(hash, MetroHash64::test_seed_1, 8) != 0) return false;
return true;
}
#include "metrohash.h"
void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out)
{

View File

@ -0,0 +1,73 @@
// metrohash64.h
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_METROHASH_64_H
#define METROHASH_METROHASH_64_H
#include <stdint.h>
class MetroHash64
{
public:
static const uint32_t bits = 64;
// Constructor initializes the same as Initialize()
MetroHash64(const uint64_t seed=0);
// Initializes internal state for new hash with optional seed
void Initialize(const uint64_t seed=0);
// Update the hash state with a string of bytes. If the length
// is sufficiently long, the implementation switches to a bulk
// hashing algorithm directly on the argument buffer for speed.
void Update(const uint8_t * buffer, const uint64_t length);
// Constructs the final hash and writes it to the argument buffer.
// After a hash is finalized, this instance must be Initialized()-ed
// again or the behavior of Update() and Finalize() is undefined.
void Finalize(uint8_t * const hash);
// A non-incremental function implementation. This can be significantly
// faster than the incremental implementation for some usage patterns.
static void Hash(const uint8_t * buffer, const uint64_t length, uint8_t * const hash, const uint64_t seed=0);
// Does implementation correctly execute test vectors?
static bool ImplementationVerified();
// test vectors -- Hash(test_string, seed=0) => test_seed_0
static const char * test_string;
static const uint8_t test_seed_0[8];
static const uint8_t test_seed_1[8];
private:
static const uint64_t k0 = 0xD6D018F5;
static const uint64_t k1 = 0xA2AA033B;
static const uint64_t k2 = 0x62992FC1;
static const uint64_t k3 = 0x30BC5B29;
struct { uint64_t v[4]; } state;
struct { uint8_t b[32]; } input;
uint64_t bytes;
uint64_t vseed;
};
// Legacy 64-bit hash functions -- do not use
void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash64_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
#endif // #ifndef METROHASH_METROHASH_64_H

View File

@ -0,0 +1,50 @@
// platform.h
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_PLATFORM_H
#define METROHASH_PLATFORM_H
#include <stdint.h>
// rotate right idiom recognized by most compilers
inline static uint64_t rotate_right(uint64_t v, unsigned k)
{
return (v >> k) | (v << (64 - k));
}
// unaligned reads, fast and safe on Nehalem and later microarchitectures
inline static uint64_t read_u64(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint64_t*>(ptr));
}
inline static uint64_t read_u32(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint32_t*>(ptr));
}
inline static uint64_t read_u16(const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint16_t*>(ptr));
}
inline static uint64_t read_u8 (const void * const ptr)
{
return static_cast<uint64_t>(*reinterpret_cast<const uint8_t *>(ptr));
}
#endif // #ifndef METROHASH_PLATFORM_H

View File

@ -1,27 +1,18 @@
// testvector.h
//
// The MIT License (MIT)
// Copyright 2015-2018 J. Andrew Rogers
//
// Copyright (c) 2015 J. Andrew Rogers
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef METROHASH_TESTVECTOR_H
#define METROHASH_TESTVECTOR_H
@ -46,6 +37,8 @@ struct TestVectorData
static const char * test_key_63 = "012345678901234567890123456789012345678901234567890123456789012";
// The hash assumes a little-endian architecture. Treating the hash results
// as an array of uint64_t should enable conversion for big-endian implementations.
const TestVectorData TestVector [] =
{
// seed = 0

View File

@ -2,6 +2,7 @@ set(RDKAFKA_SOURCE_DIR ${CMAKE_SOURCE_DIR}/contrib/librdkafka/src)
set(SRCS
${RDKAFKA_SOURCE_DIR}/crc32c.c
${RDKAFKA_SOURCE_DIR}/rdkafka_zstd.c
${RDKAFKA_SOURCE_DIR}/rdaddr.c
${RDKAFKA_SOURCE_DIR}/rdavl.c
${RDKAFKA_SOURCE_DIR}/rdbuf.c
@ -59,5 +60,6 @@ set(SRCS
add_library(rdkafka ${LINK_MODE} ${SRCS})
target_include_directories(rdkafka SYSTEM PUBLIC include)
target_include_directories(rdkafka SYSTEM PUBLIC ${RDKAFKA_SOURCE_DIR})
target_link_libraries(rdkafka PUBLIC ${ZLIB_LIBRARIES} ${OPENSSL_SSL_LIBRARY} ${OPENSSL_CRYPTO_LIBRARY})
target_include_directories(rdkafka SYSTEM PUBLIC ${RDKAFKA_SOURCE_DIR}) # Because weird logic with "include_next" is used.
target_include_directories(rdkafka SYSTEM PRIVATE ${ZSTD_INCLUDE_DIR}/common) # Because wrong path to "zstd_errors.h" is used.
target_link_libraries(rdkafka PUBLIC ${ZLIB_LIBRARIES} ${ZSTD_LIBRARY} ${OPENSSL_SSL_LIBRARY} ${OPENSSL_CRYPTO_LIBRARY})

View File

@ -51,6 +51,8 @@
//#define WITH_PLUGINS 1
// zlib
#define WITH_ZLIB 1
// zstd
#define WITH_ZSTD 1
// WITH_SNAPPY
#define WITH_SNAPPY 1
// WITH_SOCKEM

View File

@ -200,11 +200,18 @@ target_link_libraries (clickhouse_common_io
${Boost_SYSTEM_LIBRARY}
PRIVATE
apple_rt
PUBLIC
Threads::Threads
PRIVATE
${CMAKE_DL_LIBS}
)
if (NOT ARCH_ARM AND CPUID_LIBRARY)
target_link_libraries (clickhouse_common_io PRIVATE ${CPUID_LIBRARY})
if(CPUID_LIBRARY)
target_link_libraries(clickhouse_common_io PRIVATE ${CPUID_LIBRARY})
endif()
if(CPUINFO_LIBRARY)
target_link_libraries(clickhouse_common_io PRIVATE ${CPUINFO_LIBRARY})
endif()
target_link_libraries (dbms
@ -225,6 +232,7 @@ target_link_libraries (dbms
${Boost_PROGRAM_OPTIONS_LIBRARY}
PUBLIC
${Boost_SYSTEM_LIBRARY}
Threads::Threads
)
if (NOT USE_INTERNAL_RE2_LIBRARY)
@ -298,6 +306,11 @@ target_link_libraries(dbms PRIVATE ${OPENSSL_CRYPTO_LIBRARY} Threads::Threads)
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${DIVIDE_INCLUDE_DIR})
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
if (USE_PROTOBUF)
target_link_libraries (dbms PRIVATE ${Protobuf_LIBRARY})
target_include_directories (dbms SYSTEM BEFORE PRIVATE ${Protobuf_INCLUDE_DIR})
endif ()
if (USE_HDFS)
target_link_libraries (clickhouse_common_io PRIVATE ${HDFS3_LIBRARY})
target_include_directories (clickhouse_common_io SYSTEM BEFORE PRIVATE ${HDFS3_INCLUDE_DIR})
@ -318,7 +331,7 @@ target_include_directories (clickhouse_common_io BEFORE PRIVATE ${COMMON_INCLUDE
add_subdirectory (programs)
add_subdirectory (tests)
if (ENABLE_TESTS)
if (ENABLE_TESTS AND USE_GTEST)
macro (grep_gtest_sources BASE_DIR DST_VAR)
# Cold match files that are not in tests/ directories
file(GLOB_RECURSE "${DST_VAR}" RELATIVE "${BASE_DIR}" "gtest*.cpp")

View File

@ -11,7 +11,7 @@
#include <Poco/File.h>
#include <Poco/Util/Application.h>
#include <Common/Stopwatch.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
#include <AggregateFunctions/ReservoirSampler.h>
#include <AggregateFunctions/registerAggregateFunctions.h>
#include <boost/program_options.hpp>

View File

@ -5,4 +5,5 @@ target_include_directories (clickhouse-benchmark-lib SYSTEM PRIVATE ${PCG_RANDOM
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-benchmark clickhouse-benchmark.cpp)
target_link_libraries (clickhouse-benchmark PRIVATE clickhouse-benchmark-lib clickhouse_aggregate_functions)
install (TARGETS clickhouse-benchmark ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -7,6 +7,7 @@ endif ()
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-client clickhouse-client.cpp)
target_link_libraries (clickhouse-client PRIVATE clickhouse-client-lib)
install (TARGETS clickhouse-client ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()
install (FILES clickhouse-client.xml DESTINATION ${CLICKHOUSE_ETC_DIR}/clickhouse-client COMPONENT clickhouse-client RENAME config.xml)

View File

@ -56,6 +56,7 @@
#include <Parsers/formatAST.h>
#include <Parsers/parseQuery.h>
#include <Interpreters/Context.h>
#include <Interpreters/InterpreterSetQuery.h>
#include <Client/Connection.h>
#include <Common/InterruptListener.h>
#include <Functions/registerFunctions.h>
@ -219,6 +220,9 @@ private:
APPLY_FOR_SETTINGS(EXTRACT_SETTING)
#undef EXTRACT_SETTING
/// Set path for format schema files
if (config().has("format_schema_path"))
context.setFormatSchemaPath(Poco::Path(config().getString("format_schema_path")).toString());
}
@ -1206,6 +1210,10 @@ private:
const auto & id = typeid_cast<const ASTIdentifier &>(*query_with_output->format);
current_format = id.name;
}
if (query_with_output->settings_ast)
{
InterpreterSetQuery(query_with_output->settings_ast, context).executeForCurrentContext();
}
}
if (has_vertical_output_suffix)

View File

@ -5,4 +5,5 @@ if (CLICKHOUSE_SPLIT_BINARY)
# Also in utils
add_executable (clickhouse-compressor clickhouse-compressor.cpp)
target_link_libraries (clickhouse-compressor PRIVATE clickhouse-compressor-lib)
install (TARGETS clickhouse-compressor ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -4,4 +4,5 @@ target_link_libraries (clickhouse-copier-lib PRIVATE clickhouse-server-lib click
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-copier clickhouse-copier.cpp)
target_link_libraries (clickhouse-copier clickhouse-copier-lib)
install (TARGETS clickhouse-copier ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -18,7 +18,7 @@
#include <pcg_random.hpp>
#include <common/logger_useful.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
#include <daemon/OwnPatternFormatter.h>
#include <Common/Exception.h>

View File

@ -4,4 +4,5 @@ target_link_libraries (clickhouse-extract-from-config-lib PRIVATE clickhouse_com
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-extract-from-config clickhouse-extract-from-config.cpp)
target_link_libraries (clickhouse-extract-from-config PRIVATE clickhouse-extract-from-config-lib)
install (TARGETS clickhouse-extract-from-config ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -3,4 +3,5 @@ target_link_libraries (clickhouse-format-lib PRIVATE dbms clickhouse_common_io c
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-format clickhouse-format.cpp)
target_link_libraries (clickhouse-format PRIVATE clickhouse-format-lib)
install (TARGETS clickhouse-format ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -4,4 +4,5 @@ target_link_libraries (clickhouse-local-lib PRIVATE clickhouse_common_io clickho
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-local clickhouse-local.cpp)
target_link_libraries (clickhouse-local PRIVATE clickhouse-local-lib)
install (TARGETS clickhouse-local ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -17,6 +17,7 @@
#include <Common/Config/ConfigProcessor.h>
#include <Common/escapeForFileName.h>
#include <Common/ClickHouseRevision.h>
#include <Common/ThreadStatus.h>
#include <Common/config_version.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromString.h>
@ -102,7 +103,7 @@ int LocalServer::main(const std::vector<std::string> & /*args*/)
try
{
Logger * log = &logger();
ThreadStatus thread_status;
UseSSL use_ssl;
if (!config().has("query") && !config().has("table-structure")) /// Nothing to process

View File

@ -5,4 +5,5 @@ if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-obfuscator clickhouse-obfuscator.cpp)
set_target_properties(clickhouse-obfuscator PROPERTIES RUNTIME_OUTPUT_DIRECTORY ..)
target_link_libraries (clickhouse-obfuscator PRIVATE clickhouse-obfuscator-lib)
install (TARGETS clickhouse-obfuscator ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -36,4 +36,5 @@ endif ()
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-odbc-bridge odbc-bridge.cpp)
target_link_libraries (clickhouse-odbc-bridge PRIVATE clickhouse-odbc-bridge-lib)
install (TARGETS clickhouse-odbc-bridge ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -5,4 +5,5 @@ target_include_directories (clickhouse-performance-test-lib SYSTEM PRIVATE ${PCG
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-performance-test clickhouse-performance-test.cpp)
target_link_libraries (clickhouse-performance-test PRIVATE clickhouse-performance-test-lib)
install (TARGETS clickhouse-performance-test ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()

View File

@ -23,7 +23,7 @@
#include <IO/ConnectionTimeouts.h>
#include <IO/UseSSL.h>
#include <Interpreters/Settings.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
#include <common/getMemoryAmount.h>
#include <Poco/AutoPtr.h>
#include <Poco/Exception.h>

View File

@ -23,7 +23,7 @@ if (CLICKHOUSE_SPLIT_BINARY)
install (TARGETS clickhouse-server ${CLICKHOUSE_ALL_TARGETS} RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
endif ()
if (OS_LINUX AND MAKE_STATIC_LIBRARIES)
if (GLIBC_COMPATIBILITY)
set (GLIBC_MAX_REQUIRED 2.4 CACHE INTERNAL "")
# temporary disabled. to enable - change 'exit 0' to 'exit $a'
add_test(NAME GLIBC_required_version COMMAND bash -c "readelf -s ${CMAKE_CURRENT_BINARY_DIR}/../clickhouse-server | perl -nE 'END {exit 0 if $a} ++$a, print if /\\x40GLIBC_(\\S+)/ and pack(q{C*}, split /\\./, \$1) gt pack q{C*}, split /\\./, q{${GLIBC_MAX_REQUIRED}}'")

View File

@ -647,6 +647,7 @@ void HTTPHandler::trySendExceptionToClient(const std::string & s, int exception_
void HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response)
{
setThreadName("HTTPHandler");
ThreadStatus thread_status;
Output used_output;

View File

@ -6,6 +6,7 @@
#include <thread>
#include <vector>
#include <Common/ProfileEvents.h>
#include <Common/ThreadPool.h>
namespace DB
@ -46,7 +47,7 @@ private:
bool quit = false;
std::mutex mutex;
std::condition_variable cond;
std::thread thread{&MetricsTransmitter::run, this};
ThreadFromGlobalPool thread{&MetricsTransmitter::run, this};
static constexpr auto profile_events_path_prefix = "ClickHouse.ProfileEvents.";
static constexpr auto current_metrics_path_prefix = "ClickHouse.Metrics.";

View File

@ -27,6 +27,7 @@
#include <Common/getMultipleKeysFromConfig.h>
#include <Common/getNumberOfPhysicalCPUCores.h>
#include <Common/TaskStatsInfoGetter.h>
#include <Common/ThreadStatus.h>
#include <IO/HTTPCommon.h>
#include <IO/UseSSL.h>
#include <Interpreters/AsynchronousMetrics.h>
@ -129,9 +130,10 @@ std::string Server::getDefaultCorePath() const
int Server::main(const std::vector<std::string> & /*args*/)
{
Logger * log = &logger();
UseSSL use_ssl;
ThreadStatus thread_status;
registerFunctions();
registerAggregateFunctions();
registerTableFunctions();
@ -418,7 +420,7 @@ int Server::main(const std::vector<std::string> & /*args*/)
/// Set path for format schema files
auto format_schema_path = Poco::File(config().getString("format_schema_path", path + "format_schemas/"));
global_context->setFormatSchemaPath(format_schema_path.path() + "/");
global_context->setFormatSchemaPath(format_schema_path.path());
format_schema_path.createDirectories();
LOG_INFO(log, "Loading metadata.");

View File

@ -55,6 +55,7 @@ namespace ErrorCodes
void TCPHandler::runImpl()
{
setThreadName("TCPHandler");
ThreadStatus thread_status;
connection_context = server.context();
connection_context.setSessionContext(connection_context);

View File

@ -100,7 +100,7 @@ public:
return res;
}
void NO_SANITIZE_UNDEFINED add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
{
/// Out of range conversion may occur. This is Ok.
@ -177,8 +177,11 @@ public:
static void assertSecondArg(const DataTypes & argument_types)
{
if constexpr (has_second_arg)
/// TODO: check that second argument is of numerical type.
{
assertBinary(Name::name, argument_types);
if (!isUnsignedInteger(argument_types[1]))
throw Exception("Second argument (weight) for function " + std::string(Name::name) + " must be unsigned integer, but it has type " + argument_types[1]->getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
else
assertUnary(Name::name, argument_types);
}

View File

@ -12,10 +12,41 @@ namespace DB
namespace
{
AggregateFunctionPtr createAggregateFunctionSumMap(const std::string & name, const DataTypes & arguments, const Array & params)
struct WithOverflowPolicy
{
assertNoParameters(name, params);
/// Overflow, meaning that the returned type is the same as the input type.
static DataTypePtr promoteType(const DataTypePtr & data_type) { return data_type; }
};
struct WithoutOverflowPolicy
{
/// No overflow, meaning we promote the types if necessary.
static DataTypePtr promoteType(const DataTypePtr & data_type)
{
if (!data_type->canBePromoted())
throw new Exception{"Values to be summed are expected to be Numeric, Float or Decimal.",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
return data_type->promoteNumericType();
}
};
template <typename T>
using SumMapWithOverflow = AggregateFunctionSumMap<T, WithOverflowPolicy>;
template <typename T>
using SumMapWithoutOverflow = AggregateFunctionSumMap<T, WithoutOverflowPolicy>;
template <typename T>
using SumMapFilteredWithOverflow = AggregateFunctionSumMapFiltered<T, WithOverflowPolicy>;
template <typename T>
using SumMapFilteredWithoutOverflow = AggregateFunctionSumMapFiltered<T, WithoutOverflowPolicy>;
using SumMapArgs = std::pair<DataTypePtr, DataTypes>;
SumMapArgs parseArguments(const std::string & name, const DataTypes & arguments)
{
if (arguments.size() < 2)
throw Exception("Aggregate function " + name + " requires at least two arguments of Array type.",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
@ -25,9 +56,11 @@ AggregateFunctionPtr createAggregateFunctionSumMap(const std::string & name, con
throw Exception("First argument for function " + name + " must be an array.",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
const DataTypePtr & keys_type = array_type->getNestedType();
DataTypePtr keys_type = array_type->getNestedType();
DataTypes values_types;
values_types.reserve(arguments.size() - 1);
for (size_t i = 1; i < arguments.size(); ++i)
{
array_type = checkAndGetDataType<DataTypeArray>(arguments[i].get());
@ -37,20 +70,55 @@ AggregateFunctionPtr createAggregateFunctionSumMap(const std::string & name, con
values_types.push_back(array_type->getNestedType());
}
AggregateFunctionPtr res(createWithNumericBasedType<AggregateFunctionSumMap>(*keys_type, keys_type, values_types));
return {std::move(keys_type), std::move(values_types)};
}
template <template <typename> class Function>
AggregateFunctionPtr createAggregateFunctionSumMap(const std::string & name, const DataTypes & arguments, const Array & params)
{
assertNoParameters(name, params);
auto [keys_type, values_types] = parseArguments(name, arguments);
AggregateFunctionPtr res(createWithNumericBasedType<Function>(*keys_type, keys_type, values_types));
if (!res)
res.reset(createWithDecimalType<AggregateFunctionSumMap>(*keys_type, keys_type, values_types));
res.reset(createWithDecimalType<Function>(*keys_type, keys_type, values_types));
if (!res)
throw Exception("Illegal type of argument for aggregate function " + name, ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return res;
}
template <template <typename> class Function>
AggregateFunctionPtr createAggregateFunctionSumMapFiltered(const std::string & name, const DataTypes & arguments, const Array & params)
{
if (params.size() != 1)
throw Exception("Aggregate function " + name + " requires exactly one parameter of Array type.",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
Array keys_to_keep;
if (!params.front().tryGet<Array>(keys_to_keep))
throw Exception("Aggregate function " + name + " requires an Array as parameter.",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
auto [keys_type, values_types] = parseArguments(name, arguments);
AggregateFunctionPtr res(createWithNumericBasedType<Function>(*keys_type, keys_type, values_types, keys_to_keep));
if (!res)
res.reset(createWithDecimalType<Function>(*keys_type, keys_type, values_types, keys_to_keep));
if (!res)
throw Exception("Illegal type of argument for aggregate function " + name, ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return res;
}
}
void registerAggregateFunctionSumMap(AggregateFunctionFactory & factory)
{
factory.registerFunction("sumMap", createAggregateFunctionSumMap);
factory.registerFunction("sumMap", createAggregateFunctionSumMap<SumMapWithoutOverflow>);
factory.registerFunction("sumMapWithOverflow", createAggregateFunctionSumMap<SumMapWithOverflow>);
factory.registerFunction("sumMapFiltered", createAggregateFunctionSumMapFiltered<SumMapFilteredWithoutOverflow>);
factory.registerFunction("sumMapFilteredWithOverflow", createAggregateFunctionSumMapFiltered<SumMapFilteredWithOverflow>);
}
}

View File

@ -50,9 +50,9 @@ struct AggregateFunctionSumMapData
* ([1,2,3,4,5,6,7,8,9,10],[10,10,45,20,35,20,15,30,20,20])
*/
template <typename T>
class AggregateFunctionSumMap final : public IAggregateFunctionDataHelper<
AggregateFunctionSumMapData<NearestFieldType<T>>, AggregateFunctionSumMap<T>>
template <typename T, typename Derived, typename OverflowPolicy>
class AggregateFunctionSumMapBase : public IAggregateFunctionDataHelper<
AggregateFunctionSumMapData<NearestFieldType<T>>, Derived>
{
private:
using ColVecType = std::conditional_t<IsDecimalNumber<T>, ColumnDecimal<T>, ColumnVector<T>>;
@ -61,7 +61,7 @@ private:
DataTypes values_types;
public:
AggregateFunctionSumMap(const DataTypePtr & keys_type, const DataTypes & values_types)
AggregateFunctionSumMapBase(const DataTypePtr & keys_type, const DataTypes & values_types)
: keys_type(keys_type), values_types(values_types) {}
String getName() const override { return "sumMap"; }
@ -72,7 +72,7 @@ public:
types.emplace_back(std::make_shared<DataTypeArray>(keys_type));
for (const auto & value_type : values_types)
types.emplace_back(std::make_shared<DataTypeArray>(value_type));
types.emplace_back(std::make_shared<DataTypeArray>(OverflowPolicy::promoteType(value_type)));
return std::make_shared<DataTypeTuple>(types);
}
@ -109,6 +109,11 @@ public:
array_column.getData().get(values_vec_offset + i, value);
const auto & key = keys_vec.getData()[keys_vec_offset + i];
if (!keepKey(key))
{
continue;
}
IteratorType it;
if constexpr (IsDecimalNumber<T>)
{
@ -253,6 +258,52 @@ public:
}
const char * getHeaderFilePath() const override { return __FILE__; }
bool keepKey(const T & key) const { return static_cast<const Derived &>(*this).keepKey(key); }
};
template <typename T, typename OverflowPolicy>
class AggregateFunctionSumMap final :
public AggregateFunctionSumMapBase<T, AggregateFunctionSumMap<T, OverflowPolicy>, OverflowPolicy>
{
private:
using Self = AggregateFunctionSumMap<T, OverflowPolicy>;
using Base = AggregateFunctionSumMapBase<T, Self, OverflowPolicy>;
public:
AggregateFunctionSumMap(const DataTypePtr & keys_type, DataTypes & values_types)
: Base{keys_type, values_types}
{}
String getName() const override { return "sumMap"; }
bool keepKey(const T &) const { return true; }
};
template <typename T, typename OverflowPolicy>
class AggregateFunctionSumMapFiltered final :
public AggregateFunctionSumMapBase<T, AggregateFunctionSumMapFiltered<T, OverflowPolicy>, OverflowPolicy>
{
private:
using Self = AggregateFunctionSumMapFiltered<T, OverflowPolicy>;
using Base = AggregateFunctionSumMapBase<T, Self, OverflowPolicy>;
std::unordered_set<T> keys_to_keep;
public:
AggregateFunctionSumMapFiltered(const DataTypePtr & keys_type, const DataTypes & values_types, const Array & keys_to_keep_)
: Base{keys_type, values_types}
{
keys_to_keep.reserve(keys_to_keep_.size());
for (const Field & f : keys_to_keep_)
{
keys_to_keep.emplace(f.safeGet<NearestFieldType<T>>());
}
}
String getName() const override { return "sumMapFiltered"; }
bool keepKey(const T & key) const { return keys_to_keep.count(key); }
};
}

View File

@ -6,25 +6,25 @@ namespace DB
{
/** Data for HyperLogLogBiasEstimator in the uniqCombined function.
  * The development plan is as follows:
  * 1. Assemble ClickHouse.
  * 2. Run the script src/dbms/scripts/gen-bias-data.py, which returns one array for getRawEstimates()
  * and another array for getBiases().
  * 3. Update `raw_estimates` and `biases` arrays. Also update the size of arrays in InterpolatedData.
  * 4. Assemble ClickHouse.
  * 5. Run the script src/dbms/scripts/linear-counting-threshold.py, which creates 3 files:
  * - raw_graph.txt (1st column: the present number of unique values;
  * 2nd column: relative error in the case of HyperLogLog without applying any corrections)
  * - linear_counting_graph.txt (1st column: the present number of unique values;
  * 2nd column: relative error in the case of HyperLogLog using LinearCounting)
  * - bias_corrected_graph.txt (1st column: the present number of unique values;
  * 2nd column: relative error in the case of HyperLogLog with the use of corrections from the algorithm HyperLogLog++)
  * 6. Generate a graph with gnuplot based on this data.
  * 7. Determine the minimum number of unique values at which it is better to correct the error
  * using its evaluation (ie, using the HyperLogLog++ algorithm) than applying the LinearCounting algorithm.
  * 7. Accordingly, update the constant in the function getThreshold()
  * 8. Assemble ClickHouse.
  */
* The development plan is as follows:
* 1. Assemble ClickHouse.
* 2. Run the script src/dbms/scripts/gen-bias-data.py, which returns one array for getRawEstimates()
* and another array for getBiases().
* 3. Update `raw_estimates` and `biases` arrays. Also update the size of arrays in InterpolatedData.
* 4. Assemble ClickHouse.
* 5. Run the script src/dbms/scripts/linear-counting-threshold.py, which creates 3 files:
* - raw_graph.txt (1st column: the present number of unique values;
* 2nd column: relative error in the case of HyperLogLog without applying any corrections)
* - linear_counting_graph.txt (1st column: the present number of unique values;
* 2nd column: relative error in the case of HyperLogLog using LinearCounting)
* - bias_corrected_graph.txt (1st column: the present number of unique values;
* 2nd column: relative error in the case of HyperLogLog with the use of corrections from the algorithm HyperLogLog++)
* 6. Generate a graph with gnuplot based on this data.
* 7. Determine the minimum number of unique values at which it is better to correct the error
* using its evaluation (ie, using the HyperLogLog++ algorithm) than applying the LinearCounting algorithm.
* 7. Accordingly, update the constant in the function getThreshold()
* 8. Assemble ClickHouse.
*/
struct UniqCombinedBiasData
{
using InterpolatedData = std::array<double, 200>;

View File

@ -15,33 +15,33 @@
/** Approximate calculation of anything, as usual, is constructed according to the following scheme:
  * - some data structure is used to calculate the value of X;
  * - Not all values are added to the data structure, but only selected ones (according to some selectivity criteria);
  * - after processing all elements, the data structure is in some state S;
  * - as an approximate value of X, the value calculated according to the maximum likelihood principle is returned:
  * at what real value X, the probability of finding the data structure in the obtained state S is maximal.
  */
* - some data structure is used to calculate the value of X;
* - Not all values are added to the data structure, but only selected ones (according to some selectivity criteria);
* - after processing all elements, the data structure is in some state S;
* - as an approximate value of X, the value calculated according to the maximum likelihood principle is returned:
* at what real value X, the probability of finding the data structure in the obtained state S is maximal.
*/
/** In particular, what is described below can be found by the name of the BJKST algorithm.
  */
*/
/** Very simple hash-set for approximate number of unique values.
  * Works like this:
  * - you can insert UInt64;
  * - before insertion, first the hash function UInt64 -> UInt32 is calculated;
  * - the original value is not saved (lost);
  * - further all operations are made with these hashes;
  * - hash table is constructed according to the scheme:
  * - open addressing (one buffer, position in buffer is calculated by taking remainder of division by its size);
  * - linear probing (if the cell already has a value, then the cell following it is taken, etc.);
  * - the missing value is zero-encoded; to remember presence of zero in set, separate variable of type bool is used;
  * - buffer growth by 2 times when filling more than 50%;
  * - if the set has more UNIQUES_HASH_MAX_SIZE elements, then all the elements are removed from the set,
  * not divisible by 2, and then all elements that do not divide by 2 are not inserted into the set;
  * - if the situation repeats, then only elements dividing by 4, etc., are taken.
  * - the size() method returns an approximate number of elements that have been inserted into the set;
  * - there are methods for quick reading and writing in binary and text form.
  */
* Works like this:
* - you can insert UInt64;
* - before insertion, first the hash function UInt64 -> UInt32 is calculated;
* - the original value is not saved (lost);
* - further all operations are made with these hashes;
* - hash table is constructed according to the scheme:
* - open addressing (one buffer, position in buffer is calculated by taking remainder of division by its size);
* - linear probing (if the cell already has a value, then the cell following it is taken, etc.);
* - the missing value is zero-encoded; to remember presence of zero in set, separate variable of type bool is used;
* - buffer growth by 2 times when filling more than 50%;
* - if the set has more UNIQUES_HASH_MAX_SIZE elements, then all the elements are removed from the set,
* not divisible by 2, and then all elements that do not divide by 2 are not inserted into the set;
* - if the situation repeats, then only elements dividing by 4, etc., are taken.
* - the size() method returns an approximate number of elements that have been inserted into the set;
* - there are methods for quick reading and writing in binary and text form.
*/
/// The maximum degree of buffer size before the values are discarded
#define UNIQUES_HASH_MAX_SIZE_DEGREE 17
@ -50,8 +50,8 @@
#define UNIQUES_HASH_MAX_SIZE (1ULL << (UNIQUES_HASH_MAX_SIZE_DEGREE - 1))
/** The number of least significant bits used for thinning. The remaining high-order bits are used to determine the position in the hash table.
  * (high-order bits are taken because the younger bits will be constant after dropping some of the values)
  */
* (high-order bits are taken because the younger bits will be constant after dropping some of the values)
*/
#define UNIQUES_HASH_BITS_FOR_SKIP (32 - UNIQUES_HASH_MAX_SIZE_DEGREE)
/// Initial buffer size degree
@ -59,8 +59,8 @@
/** This hash function is not the most optimal, but UniquesHashSet states counted with it,
  * stored in many places on disks (in the Yandex.Metrika), so it continues to be used.
  */
* stored in many places on disks (in the Yandex.Metrika), so it continues to be used.
*/
struct UniquesHashSetDefaultHash
{
size_t operator() (UInt64 x) const

View File

@ -9,9 +9,9 @@
/** In a loop it connects to the server and immediately breaks the connection.
  * Using the SO_LINGER option, we ensure that the connection is terminated by sending a RST packet (not FIN).
  * This behavior causes a bug in the TCPServer implementation in the Poco library.
  */
* Using the SO_LINGER option, we ensure that the connection is terminated by sending a RST packet (not FIN).
* This behavior causes a bug in the TCPServer implementation in the Poco library.
*/
int main(int argc, char ** argv)
try
{

View File

@ -1,4 +1,4 @@
set(SRCS)
add_executable (column_unique column_unique.cpp ${SRCS})
target_link_libraries (column_unique PRIVATE dbms gtest_main)
if(USE_GTEST)
add_executable(column_unique column_unique.cpp)
target_link_libraries(column_unique PRIVATE dbms gtest_main)
endif()

View File

@ -4,5 +4,5 @@ add_headers_and_sources(clickhouse_common_config .)
add_library(clickhouse_common_config ${LINK_MODE} ${clickhouse_common_config_headers} ${clickhouse_common_config_sources})
target_link_libraries(clickhouse_common_config PUBLIC common PRIVATE clickhouse_common_zookeeper string_utils PUBLIC ${Poco_XML_LIBRARY} ${Poco_Util_LIBRARY})
target_link_libraries(clickhouse_common_config PUBLIC common PRIVATE clickhouse_common_zookeeper string_utils PUBLIC ${Poco_XML_LIBRARY} ${Poco_Util_LIBRARY} Threads::Threads)
target_include_directories(clickhouse_common_config PUBLIC ${DBMS_INCLUDE_DIR})

View File

@ -33,7 +33,7 @@ ConfigReloader::ConfigReloader(
void ConfigReloader::start()
{
thread = std::thread(&ConfigReloader::run, this);
thread = ThreadFromGlobalPool(&ConfigReloader::run, this);
}

View File

@ -1,6 +1,7 @@
#pragma once
#include "ConfigProcessor.h"
#include <Common/ThreadPool.h>
#include <Common/ZooKeeper/Common.h>
#include <Common/ZooKeeper/ZooKeeperNodeCache.h>
#include <time.h>
@ -81,7 +82,7 @@ private:
Updater updater;
std::atomic<bool> quit{false};
std::thread thread;
ThreadFromGlobalPool thread;
/// Locked inside reloadIfNewer.
std::mutex reload_mutex;

View File

@ -2,6 +2,7 @@
#include "CurrentThread.h"
#include <common/logger_useful.h>
#include <common/likely.h>
#include <Common/ThreadStatus.h>
#include <Common/TaskStatsInfoGetter.h>
#include <Interpreters/ProcessList.h>
@ -10,11 +11,6 @@
#include <Poco/Logger.h>
#if defined(ARCADIA_ROOT)
# include <util/thread/singleton.h>
#endif
namespace DB
{
@ -23,91 +19,62 @@ namespace ErrorCodes
extern const int LOGICAL_ERROR;
}
// Smoker's implementation to avoid thread_local usage: error: undefined symbol: __cxa_thread_atexit
#if defined(ARCADIA_ROOT)
struct ThreadStatusPtrHolder : ThreadStatusPtr
{
ThreadStatusPtrHolder() { ThreadStatusPtr::operator=(ThreadStatus::create()); }
};
struct ThreadScopePtrHolder : CurrentThread::ThreadScopePtr
{
ThreadScopePtrHolder() { CurrentThread::ThreadScopePtr::operator=(std::make_shared<CurrentThread::ThreadScope>()); }
};
# define current_thread (*FastTlsSingleton<ThreadStatusPtrHolder>())
# define current_thread_scope (*FastTlsSingleton<ThreadScopePtrHolder>())
#else
/// Order of current_thread and current_thread_scope matters
thread_local ThreadStatusPtr _current_thread = ThreadStatus::create();
thread_local CurrentThread::ThreadScopePtr _current_thread_scope = std::make_shared<CurrentThread::ThreadScope>();
# define current_thread _current_thread
# define current_thread_scope _current_thread_scope
#endif
void CurrentThread::updatePerformanceCounters()
{
get()->updatePerformanceCounters();
get().updatePerformanceCounters();
}
ThreadStatusPtr CurrentThread::get()
ThreadStatus & CurrentThread::get()
{
#ifndef NDEBUG
if (!current_thread || current_thread.use_count() <= 0)
if (unlikely(!current_thread))
throw Exception("Thread #" + std::to_string(Poco::ThreadNumber::get()) + " status was not initialized", ErrorCodes::LOGICAL_ERROR);
if (Poco::ThreadNumber::get() != current_thread->thread_number)
throw Exception("Current thread has different thread number", ErrorCodes::LOGICAL_ERROR);
#endif
return current_thread;
}
CurrentThread::ThreadScopePtr CurrentThread::getScope()
{
return current_thread_scope;
return *current_thread;
}
ProfileEvents::Counters & CurrentThread::getProfileEvents()
{
return current_thread->performance_counters;
return current_thread ? get().performance_counters : ProfileEvents::global_counters;
}
MemoryTracker & CurrentThread::getMemoryTracker()
{
return current_thread->memory_tracker;
return get().memory_tracker;
}
void CurrentThread::updateProgressIn(const Progress & value)
{
current_thread->progress_in.incrementPiecewiseAtomically(value);
get().progress_in.incrementPiecewiseAtomically(value);
}
void CurrentThread::updateProgressOut(const Progress & value)
{
current_thread->progress_out.incrementPiecewiseAtomically(value);
get().progress_out.incrementPiecewiseAtomically(value);
}
void CurrentThread::attachInternalTextLogsQueue(const std::shared_ptr<InternalTextLogsQueue> & logs_queue)
{
get()->attachInternalTextLogsQueue(logs_queue);
get().attachInternalTextLogsQueue(logs_queue);
}
std::shared_ptr<InternalTextLogsQueue> CurrentThread::getInternalTextLogsQueue()
{
/// NOTE: this method could be called at early server startup stage
/// NOTE: this method could be called in ThreadStatus destructor, therefore we make use_count() check just in case
if (!current_thread || current_thread.use_count() <= 0)
if (!current_thread)
return nullptr;
if (current_thread->getCurrentState() == ThreadStatus::ThreadState::Died)
if (get().getCurrentState() == ThreadStatus::ThreadState::Died)
return nullptr;
return current_thread->getInternalTextLogsQueue();
return get().getInternalTextLogsQueue();
}
ThreadGroupStatusPtr CurrentThread::getGroup()
{
return get()->getThreadGroup();
if (!current_thread)
return nullptr;
return get().getThreadGroup();
}
}

View File

@ -32,7 +32,7 @@ class CurrentThread
{
public:
/// Handler to current thread
static ThreadStatusPtr get();
static ThreadStatus & get();
/// Group to which belongs current thread
static ThreadGroupStatusPtr getGroup();
@ -85,25 +85,6 @@ public:
bool log_peak_memory_usage_in_destructor = true;
};
/// Implicitly finalizes current thread in the destructor
class ThreadScope
{
public:
void (*deleter)() = nullptr;
ThreadScope() = default;
~ThreadScope()
{
if (deleter)
deleter();
/// std::terminate on exception: this is Ok.
}
};
using ThreadScopePtr = std::shared_ptr<ThreadScope>;
static ThreadScopePtr getScope();
private:
static void defaultThreadDeleter();
};

View File

@ -408,6 +408,12 @@ namespace ErrorCodes
extern const int ILLEGAL_SYNTAX_FOR_CODEC_TYPE = 431;
extern const int UNKNOWN_CODEC = 432;
extern const int ILLEGAL_CODEC_PARAMETER = 433;
extern const int CANNOT_PARSE_PROTOBUF_SCHEMA = 434;
extern const int NO_DATA_FOR_REQUIRED_PROTOBUF_FIELD = 435;
extern const int CANNOT_CONVERT_TO_PROTOBUF_TYPE = 436;
extern const int PROTOBUF_FIELD_NOT_REPEATED = 437;
extern const int DATA_TYPE_CANNOT_BE_PROMOTED = 438;
extern const int CANNOT_SCHEDULE_TASK = 439;
extern const int KEEPER_EXCEPTION = 999;
extern const int POCO_EXCEPTION = 1000;

View File

@ -45,10 +45,10 @@ static int sigtimedwait(const sigset_t *set, siginfo_t *info, const struct times
/** As long as there exists an object of this class - it blocks the INT signal, at the same time it lets you know if it came.
  * This is necessary so that you can interrupt the execution of the request with Ctrl+C.
  * Use only one instance of this class at a time.
  * If `check` method returns true (the signal has arrived), the next call will wait for the next signal.
  */
* This is necessary so that you can interrupt the execution of the request with Ctrl+C.
* Use only one instance of this class at a time.
* If `check` method returns true (the signal has arrived), the next call will wait for the next signal.
*/
class InterruptListener
{
private:

View File

@ -190,17 +190,20 @@ namespace CurrentMemoryTracker
{
void alloc(Int64 size)
{
DB::CurrentThread::getMemoryTracker().alloc(size);
if (DB::current_thread)
DB::CurrentThread::getMemoryTracker().alloc(size);
}
void realloc(Int64 old_size, Int64 new_size)
{
DB::CurrentThread::getMemoryTracker().alloc(new_size - old_size);
if (DB::current_thread)
DB::CurrentThread::getMemoryTracker().alloc(new_size - old_size);
}
void free(Int64 size)
{
DB::CurrentThread::getMemoryTracker().free(size);
if (DB::current_thread)
DB::CurrentThread::getMemoryTracker().free(size);
}
}

View File

@ -79,7 +79,7 @@ private:
constexpr uint64_t nextAlphaSize(uint64_t x)
{
constexpr uint64_t ALPHA_MAP_ELEMENTS_PER_COUNTER = 6;
return 1ULL<<(sizeof(uint64_t) * 8 - __builtin_clzll(x * ALPHA_MAP_ELEMENTS_PER_COUNTER));
return 1ULL << (sizeof(uint64_t) * 8 - __builtin_clzll(x * ALPHA_MAP_ELEMENTS_PER_COUNTER));
}
public:

View File

@ -0,0 +1,235 @@
#include <Common/ThreadPool.h>
#include <Common/Exception.h>
#include <iostream>
#include <type_traits>
namespace DB
{
namespace ErrorCodes
{
extern const int CANNOT_SCHEDULE_TASK;
}
}
template <typename Thread>
ThreadPoolImpl<Thread>::ThreadPoolImpl(size_t max_threads)
: ThreadPoolImpl(max_threads, max_threads, max_threads)
{
}
template <typename Thread>
ThreadPoolImpl<Thread>::ThreadPoolImpl(size_t max_threads, size_t max_free_threads, size_t queue_size)
: max_threads(max_threads), max_free_threads(max_free_threads), queue_size(queue_size)
{
}
template <typename Thread>
template <typename ReturnType>
ReturnType ThreadPoolImpl<Thread>::scheduleImpl(Job job, int priority, std::optional<uint64_t> wait_microseconds)
{
auto on_error = []
{
if constexpr (std::is_same_v<ReturnType, void>)
throw DB::Exception("Cannot schedule a task", DB::ErrorCodes::CANNOT_SCHEDULE_TASK);
else
return false;
};
{
std::unique_lock lock(mutex);
auto pred = [this] { return !queue_size || scheduled_jobs < queue_size || shutdown; };
if (wait_microseconds)
{
if (!job_finished.wait_for(lock, std::chrono::microseconds(*wait_microseconds), pred))
return on_error();
}
else
job_finished.wait(lock, pred);
if (shutdown)
return on_error();
jobs.emplace(std::move(job), priority);
++scheduled_jobs;
if (threads.size() < std::min(max_threads, scheduled_jobs))
{
threads.emplace_front();
try
{
threads.front() = Thread([this, it = threads.begin()] { worker(it); });
}
catch (...)
{
threads.pop_front();
}
}
}
new_job_or_shutdown.notify_one();
return ReturnType(true);
}
template <typename Thread>
void ThreadPoolImpl<Thread>::schedule(Job job, int priority)
{
scheduleImpl<void>(std::move(job), priority, std::nullopt);
}
template <typename Thread>
bool ThreadPoolImpl<Thread>::trySchedule(Job job, int priority, uint64_t wait_microseconds)
{
return scheduleImpl<bool>(std::move(job), priority, wait_microseconds);
}
template <typename Thread>
void ThreadPoolImpl<Thread>::scheduleOrThrow(Job job, int priority, uint64_t wait_microseconds)
{
scheduleImpl<void>(std::move(job), priority, wait_microseconds);
}
template <typename Thread>
void ThreadPoolImpl<Thread>::wait()
{
{
std::unique_lock lock(mutex);
job_finished.wait(lock, [this] { return scheduled_jobs == 0; });
if (first_exception)
{
std::exception_ptr exception;
std::swap(exception, first_exception);
std::rethrow_exception(exception);
}
}
}
template <typename Thread>
ThreadPoolImpl<Thread>::~ThreadPoolImpl()
{
finalize();
}
template <typename Thread>
void ThreadPoolImpl<Thread>::finalize()
{
{
std::unique_lock lock(mutex);
shutdown = true;
}
new_job_or_shutdown.notify_all();
for (auto & thread : threads)
thread.join();
threads.clear();
}
template <typename Thread>
size_t ThreadPoolImpl<Thread>::active() const
{
std::unique_lock lock(mutex);
return scheduled_jobs;
}
template <typename Thread>
void ThreadPoolImpl<Thread>::worker(typename std::list<Thread>::iterator thread_it)
{
while (true)
{
Job job;
bool need_shutdown = false;
{
std::unique_lock lock(mutex);
new_job_or_shutdown.wait(lock, [this] { return shutdown || !jobs.empty(); });
need_shutdown = shutdown;
if (!jobs.empty())
{
job = jobs.top().job;
jobs.pop();
}
else
{
return;
}
}
if (!need_shutdown)
{
try
{
job();
}
catch (...)
{
{
std::unique_lock lock(mutex);
if (!first_exception)
first_exception = std::current_exception();
shutdown = true;
--scheduled_jobs;
}
job_finished.notify_all();
new_job_or_shutdown.notify_all();
return;
}
}
{
std::unique_lock lock(mutex);
--scheduled_jobs;
if (threads.size() > scheduled_jobs + max_free_threads)
{
threads.erase(thread_it);
job_finished.notify_all();
return;
}
}
job_finished.notify_all();
}
}
template class ThreadPoolImpl<std::thread>;
template class ThreadPoolImpl<ThreadFromGlobalPool>;
void ExceptionHandler::setException(std::exception_ptr && exception)
{
std::unique_lock lock(mutex);
if (!first_exception)
first_exception = std::move(exception);
}
void ExceptionHandler::throwIfException()
{
std::unique_lock lock(mutex);
if (first_exception)
std::rethrow_exception(first_exception);
}
ThreadPool::Job createExceptionHandledJob(ThreadPool::Job job, ExceptionHandler & handler)
{
return [job{std::move(job)}, &handler] ()
{
try
{
job();
}
catch (...)
{
handler.setException(std::current_exception());
}
};
}

View File

@ -0,0 +1,203 @@
#pragma once
#include <cstdint>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <functional>
#include <queue>
#include <list>
#include <optional>
#include <ext/singleton.h>
#include <Common/ThreadStatus.h>
/** Very simple thread pool similar to boost::threadpool.
* Advantages:
* - catches exceptions and rethrows on wait.
*
* This thread pool can be used as a task queue.
* For example, you can create a thread pool with 10 threads (and queue of size 10) and schedule 1000 tasks
* - in this case you will be blocked to keep 10 tasks in fly.
*
* Thread: std::thread or something with identical interface.
*/
template <typename Thread>
class ThreadPoolImpl
{
public:
using Job = std::function<void()>;
/// Size is constant. Up to num_threads are created on demand and then run until shutdown.
explicit ThreadPoolImpl(size_t max_threads);
/// queue_size - maximum number of running plus scheduled jobs. It can be greater than max_threads. Zero means unlimited.
ThreadPoolImpl(size_t max_threads, size_t max_free_threads, size_t queue_size);
/// Add new job. Locks until number of scheduled jobs is less than maximum or exception in one of threads was thrown.
/// If an exception in some thread was thrown, method silently returns, and exception will be rethrown only on call to 'wait' function.
/// Priority: greater is higher.
void schedule(Job job, int priority = 0);
/// Wait for specified amount of time and schedule a job or return false.
bool trySchedule(Job job, int priority = 0, uint64_t wait_microseconds = 0);
/// Wait for specified amount of time and schedule a job or throw an exception.
void scheduleOrThrow(Job job, int priority = 0, uint64_t wait_microseconds = 0);
/// Wait for all currently active jobs to be done.
/// You may call schedule and wait many times in arbitary order.
/// If any thread was throw an exception, first exception will be rethrown from this method,
/// and exception will be cleared.
void wait();
/// Waits for all threads. Doesn't rethrow exceptions (use 'wait' method to rethrow exceptions).
/// You should not destroy object while calling schedule or wait methods from another threads.
~ThreadPoolImpl();
/// Returns number of running and scheduled jobs.
size_t active() const;
private:
mutable std::mutex mutex;
std::condition_variable job_finished;
std::condition_variable new_job_or_shutdown;
const size_t max_threads;
const size_t max_free_threads;
const size_t queue_size;
size_t scheduled_jobs = 0;
bool shutdown = false;
struct JobWithPriority
{
Job job;
int priority;
JobWithPriority(Job job, int priority)
: job(job), priority(priority) {}
bool operator< (const JobWithPriority & rhs) const
{
return priority < rhs.priority;
}
};
std::priority_queue<JobWithPriority> jobs;
std::list<Thread> threads;
std::exception_ptr first_exception;
template <typename ReturnType>
ReturnType scheduleImpl(Job job, int priority, std::optional<uint64_t> wait_microseconds);
void worker(typename std::list<Thread>::iterator thread_it);
void finalize();
};
/// ThreadPool with std::thread for threads.
using FreeThreadPool = ThreadPoolImpl<std::thread>;
/** Global ThreadPool that can be used as a singleton.
* Why it is needed?
*
* Linux can create and destroy about 100 000 threads per second (quite good).
* With simple ThreadPool (based on mutex and condvar) you can assign about 200 000 tasks per second
* - not much difference comparing to not using a thread pool at all.
*
* But if you reuse OS threads instead of creating and destroying them, several benefits exist:
* - allocator performance will usually be better due to reuse of thread local caches, especially for jemalloc:
* https://github.com/jemalloc/jemalloc/issues/1347
* - address sanitizer and thread sanitizer will not fail due to global limit on number of created threads.
* - program will work faster in gdb;
*/
class GlobalThreadPool : public FreeThreadPool, public ext::singleton<GlobalThreadPool>
{
public:
GlobalThreadPool() : FreeThreadPool(10000, 1000, 10000) {}
};
/** Looks like std::thread but allocates threads in GlobalThreadPool.
* Also holds ThreadStatus for ClickHouse.
*/
class ThreadFromGlobalPool
{
public:
ThreadFromGlobalPool() {}
template <typename Function, typename... Args>
explicit ThreadFromGlobalPool(Function && func, Args &&... args)
{
mutex = std::make_unique<std::mutex>();
/// The function object must be copyable, so we wrap lock_guard in shared_ptr.
GlobalThreadPool::instance().scheduleOrThrow([
lock = std::make_shared<std::lock_guard<std::mutex>>(*mutex),
func = std::forward<Function>(func),
args = std::make_tuple(std::forward<Args>(args)...)]
{
DB::ThreadStatus thread_status;
std::apply(func, args);
});
}
ThreadFromGlobalPool(ThreadFromGlobalPool && rhs)
{
*this = std::move(rhs);
}
ThreadFromGlobalPool & operator=(ThreadFromGlobalPool && rhs)
{
if (mutex)
std::terminate();
mutex = std::move(rhs.mutex);
return *this;
}
~ThreadFromGlobalPool()
{
if (mutex)
std::terminate();
}
void join()
{
{
std::lock_guard lock(*mutex);
}
mutex.reset();
}
bool joinable() const
{
return static_cast<bool>(mutex);
}
private:
std::unique_ptr<std::mutex> mutex; /// Object must be moveable.
};
/// Recommended thread pool for the case when multiple thread pools are created and destroyed.
using ThreadPool = ThreadPoolImpl<ThreadFromGlobalPool>;
/// Allows to save first catched exception in jobs and postpone its rethrow.
class ExceptionHandler
{
public:
void setException(std::exception_ptr && exception);
void throwIfException();
private:
std::exception_ptr first_exception;
std::mutex mutex;
};
ThreadPool::Job createExceptionHandledJob(ThreadPool::Job job, ExceptionHandler & handler);

View File

@ -21,10 +21,13 @@ namespace ErrorCodes
}
thread_local ThreadStatusPtr current_thread = nullptr;
TasksStatsCounters TasksStatsCounters::current()
{
TasksStatsCounters res;
CurrentThread::get()->taskstats_getter->getStat(res.stat, CurrentThread::get()->os_thread_id);
CurrentThread::get().taskstats_getter->getStat(res.stat, CurrentThread::get().os_thread_id);
return res;
}
@ -39,17 +42,19 @@ ThreadStatus::ThreadStatus()
memory_tracker.setDescription("(for thread)");
log = &Poco::Logger::get("ThreadStatus");
current_thread = this;
/// NOTE: It is important not to do any non-trivial actions (like updating ProfileEvents or logging) before ThreadStatus is created
/// Otherwise it could lead to SIGSEGV due to current_thread dereferencing
}
ThreadStatusPtr ThreadStatus::create()
ThreadStatus::~ThreadStatus()
{
return ThreadStatusPtr(new ThreadStatus);
if (deleter)
deleter();
current_thread = nullptr;
}
ThreadStatus::~ThreadStatus() = default;
void ThreadStatus::initPerformanceCounters()
{
performance_counters_finalized = false;

View File

@ -9,6 +9,8 @@
#include <map>
#include <mutex>
#include <shared_mutex>
#include <functional>
#include <boost/noncopyable.hpp>
namespace Poco
@ -23,7 +25,7 @@ namespace DB
class Context;
class QueryStatus;
class ThreadStatus;
using ThreadStatusPtr = std::shared_ptr<ThreadStatus>;
using ThreadStatusPtr = ThreadStatus*;
class QueryThreadLog;
struct TasksStatsCounters;
struct RUsageCounters;
@ -67,14 +69,20 @@ public:
using ThreadGroupStatusPtr = std::shared_ptr<ThreadGroupStatus>;
extern thread_local ThreadStatusPtr current_thread;
/** Encapsulates all per-thread info (ProfileEvents, MemoryTracker, query_id, query context, etc.).
* Used inside thread-local variable. See variables in CurrentThread.cpp
* The object must be created in thread function and destroyed in the same thread before the exit.
* It is accessed through thread-local pointer.
*
* This object should be used only via "CurrentThread", see CurrentThread.h
*/
class ThreadStatus : public std::enable_shared_from_this<ThreadStatus>
class ThreadStatus : public boost::noncopyable
{
public:
ThreadStatus();
~ThreadStatus();
/// Poco's thread number (the same number is used in logs)
UInt32 thread_number = 0;
/// Linux's PID (or TGID) (the same id is shown by ps util)
@ -88,8 +96,8 @@ public:
Progress progress_in;
Progress progress_out;
public:
static ThreadStatusPtr create();
using Deleter = std::function<void()>;
Deleter deleter;
ThreadGroupStatusPtr getThreadGroup() const
{
@ -136,11 +144,7 @@ public:
/// Detaches thread from the thread group and the query, dumps performance counters if they have not been dumped
void detachQuery(bool exit_if_already_detached = false, bool thread_exits = false);
~ThreadStatus();
protected:
ThreadStatus();
void initPerformanceCounters();
void logToQueryThreadLog(QueryThreadLog & thread_log);

View File

@ -4,7 +4,7 @@ add_headers_and_sources(clickhouse_common_zookeeper .)
add_library(clickhouse_common_zookeeper ${LINK_MODE} ${clickhouse_common_zookeeper_headers} ${clickhouse_common_zookeeper_sources})
target_link_libraries (clickhouse_common_zookeeper PUBLIC clickhouse_common_io common PRIVATE string_utils PUBLIC ${Poco_Util_LIBRARY})
target_link_libraries (clickhouse_common_zookeeper PUBLIC clickhouse_common_io common PRIVATE string_utils PUBLIC ${Poco_Util_LIBRARY} Threads::Threads)
target_include_directories(clickhouse_common_zookeeper PUBLIC ${DBMS_INCLUDE_DIR})
if (ENABLE_TESTS)

View File

@ -853,8 +853,8 @@ ZooKeeper::ZooKeeper(
if (!auth_scheme.empty())
sendAuth(auth_scheme, auth_data);
send_thread = std::thread([this] { sendThread(); });
receive_thread = std::thread([this] { receiveThread(); });
send_thread = ThreadFromGlobalPool([this] { sendThread(); });
receive_thread = ThreadFromGlobalPool([this] { receiveThread(); });
ProfileEvents::increment(ProfileEvents::ZooKeeperInit);
}

View File

@ -3,6 +3,7 @@
#include <Core/Types.h>
#include <Common/ConcurrentBoundedQueue.h>
#include <Common/CurrentMetrics.h>
#include <Common/ThreadPool.h>
#include <Common/ZooKeeper/IKeeper.h>
#include <IO/ReadBuffer.h>
@ -209,8 +210,8 @@ private:
Watches watches;
std::mutex watches_mutex;
std::thread send_thread;
std::thread receive_thread;
ThreadFromGlobalPool send_thread;
ThreadFromGlobalPool receive_thread;
void connect(
const Addresses & addresses,

View File

@ -17,6 +17,9 @@
#cmakedefine01 USE_HDFS
#cmakedefine01 USE_XXHASH
#cmakedefine01 USE_INTERNAL_LLVM_LIBRARY
#cmakedefine01 USE_PROTOBUF
#cmakedefine01 USE_CPUID
#cmakedefine01 USE_CPUINFO
#cmakedefine01 CLICKHOUSE_SPLIT_BINARY
#cmakedefine01 LLVM_HAS_RTTI

View File

@ -1,19 +1,20 @@
#include <Common/getNumberOfPhysicalCPUCores.h>
#include <thread>
#if defined(__x86_64__)
#include <libcpuid/libcpuid.h>
#include <Common/Exception.h>
#include <Common/config.h>
#if USE_CPUID
# include <libcpuid/libcpuid.h>
# include <Common/Exception.h>
namespace DB { namespace ErrorCodes { extern const int CPUID_ERROR; }}
#elif USE_CPUINFO
# include <cpuinfo.h>
#endif
unsigned getNumberOfPhysicalCPUCores()
{
#if defined(__x86_64__)
#if USE_CPUID
cpu_raw_data_t raw_data;
if (0 != cpuid_get_raw_data(&raw_data))
throw DB::Exception("Cannot cpuid_get_raw_data: " + std::string(cpuid_error()), DB::ErrorCodes::CPUID_ERROR);
@ -37,6 +38,13 @@ unsigned getNumberOfPhysicalCPUCores()
if (res != 0)
return res;
#elif USE_CPUINFO
uint32_t cores = 0;
if (cpuinfo_initialize())
cores = cpuinfo_get_cores_count();
if (cores)
return cores;
#endif
/// As a fallback (also for non-x86 architectures) assume there are no hyper-threading on the system.

View File

@ -53,6 +53,13 @@ target_link_libraries (thread_creation_latency PRIVATE clickhouse_common_io)
add_executable (thread_pool thread_pool.cpp)
target_link_libraries (thread_pool PRIVATE clickhouse_common_io)
add_executable (thread_pool_2 thread_pool_2.cpp)
target_link_libraries (thread_pool_2 PRIVATE clickhouse_common_io)
add_executable (multi_version multi_version.cpp)
target_link_libraries (multi_version PRIVATE clickhouse_common_io)
add_check(multi_version)
add_executable (array_cache array_cache.cpp)
target_link_libraries (array_cache PRIVATE clickhouse_common_io)

View File

@ -8,7 +8,7 @@
#include <Common/RWLock.h>
#include <Common/Stopwatch.h>
#include <common/Types.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
#include <random>
#include <pcg_random.hpp>
#include <thread>

View File

@ -1,8 +1,8 @@
#include <string.h>
#include <iostream>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
#include <functional>
#include <common/MultiVersion.h>
#include <Common/MultiVersion.h>
#include <Poco/Exception.h>
@ -23,7 +23,7 @@ void thread2(MV & x, const char * result)
}
int main(int argc, char ** argv)
int main(int, char **)
{
try
{

View File

@ -16,7 +16,7 @@
#include <Compression/CompressedReadBuffer.h>
#include <Common/Stopwatch.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
using Key = UInt64;

View File

@ -16,7 +16,7 @@
#include <Compression/CompressedReadBuffer.h>
#include <Common/Stopwatch.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
using Key = UInt64;

View File

@ -5,7 +5,7 @@
#include <Common/Stopwatch.h>
#include <Common/Exception.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
int x = 0;

View File

@ -1,4 +1,4 @@
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
/** Reproduces bug in ThreadPool.
* It get stuck if we call 'wait' many times from many other threads simultaneously.

View File

@ -0,0 +1,21 @@
#include <atomic>
#include <iostream>
#include <Common/ThreadPool.h>
int main(int, char **)
{
std::atomic<size_t> res{0};
for (size_t i = 0; i < 1000; ++i)
{
size_t threads = 16;
ThreadPool pool(threads);
for (size_t j = 0; j < threads; ++j)
pool.schedule([&]{ ++res; });
pool.wait();
}
std::cerr << res << "\n";
return 0;
}

View File

@ -161,9 +161,9 @@ BackgroundSchedulePool::BackgroundSchedulePool(size_t size)
threads.resize(size);
for (auto & thread : threads)
thread = std::thread([this] { threadFunction(); });
thread = ThreadFromGlobalPool([this] { threadFunction(); });
delayed_thread = std::thread([this] { delayExecutionThreadFunction(); });
delayed_thread = ThreadFromGlobalPool([this] { delayExecutionThreadFunction(); });
}
@ -181,7 +181,7 @@ BackgroundSchedulePool::~BackgroundSchedulePool()
delayed_thread.join();
LOG_TRACE(&Logger::get("BackgroundSchedulePool"), "Waiting for threads to finish.");
for (std::thread & thread : threads)
for (auto & thread : threads)
thread.join();
}
catch (...)

View File

@ -13,6 +13,8 @@
#include <boost/noncopyable.hpp>
#include <Common/ZooKeeper/Types.h>
#include <Common/CurrentThread.h>
#include <Common/ThreadPool.h>
namespace DB
{
@ -119,7 +121,7 @@ public:
~BackgroundSchedulePool();
private:
using Threads = std::vector<std::thread>;
using Threads = std::vector<ThreadFromGlobalPool>;
void threadFunction();
void delayExecutionThreadFunction();
@ -141,7 +143,7 @@ private:
std::condition_variable wakeup_cond;
std::mutex delayed_tasks_mutex;
/// Thread waiting for next delayed task.
std::thread delayed_thread;
ThreadFromGlobalPool delayed_thread;
/// Tasks ordered by scheduled time.
DelayedTasks delayed_tasks;

View File

@ -427,6 +427,18 @@ Names Block::getNames() const
}
DataTypes Block::getDataTypes() const
{
DataTypes res;
res.reserve(columns());
for (const auto & elem : data)
res.push_back(elem.type);
return res;
}
template <typename ReturnType>
static ReturnType checkBlockStructure(const Block & lhs, const Block & rhs, const std::string & context_description)
{

View File

@ -82,6 +82,7 @@ public:
const ColumnsWithTypeAndName & getColumnsWithTypeAndName() const;
NamesAndTypesList getNamesAndTypesList() const;
Names getNames() const;
DataTypes getDataTypes() const;
/// Returns number of rows from first column in block, not equal to nullptr. If no columns, returns 0.
size_t rows() const;

View File

@ -166,3 +166,20 @@ template <> constexpr bool IsDecimalNumber<Decimal64> = true;
template <> constexpr bool IsDecimalNumber<Decimal128> = true;
}
/// Specialization of `std::hash` for the Decimal<T> types.
namespace std
{
template <typename T>
struct hash<DB::Decimal<T>> { size_t operator()(const DB::Decimal<T> & x) const { return hash<T>()(x.value); } };
template <>
struct hash<DB::Decimal128>
{
size_t operator()(const DB::Decimal128 & x) const
{
return std::hash<DB::Int64>()(x.value >> 64)
^ std::hash<DB::Int64>()(x.value & std::numeric_limits<DB::UInt64>::max());
}
};
}

View File

@ -35,7 +35,7 @@ void AsynchronousBlockInputStream::next()
{
ready.reset();
pool.schedule([this, thread_group=CurrentThread::getGroup()] ()
pool.schedule([this, thread_group = CurrentThread::getGroup()] ()
{
CurrentMetrics::Increment metric_increment{CurrentMetrics::QueryThread};

View File

@ -5,7 +5,7 @@
#include <DataStreams/IBlockInputStream.h>
#include <Common/setThreadName.h>
#include <Common/CurrentMetrics.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
#include <Common/MemoryTracker.h>
#include <Poco/Ext/ThreadNumber.h>

View File

@ -195,7 +195,7 @@ void MergingAggregatedMemoryEfficientBlockInputStream::start()
*/
for (size_t i = 0; i < merging_threads; ++i)
pool.schedule([this, thread_group=CurrentThread::getGroup()] () { mergeThread(thread_group); });
pool.schedule([this, thread_group = CurrentThread::getGroup()] () { mergeThread(thread_group); });
}
}

View File

@ -4,7 +4,7 @@
#include <DataStreams/IBlockInputStream.h>
#include <Common/ConcurrentBoundedQueue.h>
#include <Common/CurrentThread.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
#include <condition_variable>

View File

@ -13,6 +13,7 @@
#include <Common/CurrentMetrics.h>
#include <Common/MemoryTracker.h>
#include <Common/CurrentThread.h>
#include <Common/ThreadPool.h>
/** Allows to process multiple block input streams (sources) in parallel, using specified number of threads.
@ -303,8 +304,8 @@ private:
Handler & handler;
/// Streams.
using ThreadsData = std::vector<std::thread>;
/// Threads.
using ThreadsData = std::vector<ThreadFromGlobalPool>;
ThreadsData threads;
/** A set of available sources that are not currently processed by any thread.

View File

@ -5,7 +5,7 @@
#include <Common/CurrentThread.h>
#include <Common/setThreadName.h>
#include <Common/getNumberOfPhysicalCPUCores.h>
#include <common/ThreadPool.h>
#include <Common/ThreadPool.h>
#include <Storages/MergeTree/ReplicatedMergeTreeBlockOutputStream.h>
namespace DB

View File

@ -181,7 +181,7 @@ SummingSortedBlockInputStream::SummingSortedBlockInputStream(
if (map_desc.key_col_nums.size() == 1)
{
// Create summation for all value columns in the map
desc.init("sumMap", argument_types);
desc.init("sumMapWithOverflow", argument_types);
columns_to_aggregate.emplace_back(std::move(desc));
}
else
@ -220,7 +220,7 @@ void SummingSortedBlockInputStream::insertCurrentRowIfNeeded(MutableColumns & me
}
else
{
/// It is sumMap aggregate function.
/// It is sumMapWithOverflow aggregate function.
/// Assume that the row isn't empty in this case (just because it is compatible with previous version)
current_row_is_zero = false;
}

View File

@ -70,7 +70,7 @@ private:
/// Stores aggregation function, state, and columns to be used as function arguments
struct AggregateDescription
{
/// An aggregate function 'sumWithOverflow' or 'sumMap' for summing.
/// An aggregate function 'sumWithOverflow' or 'sumMapWithOverflow' for summing.
AggregateFunctionPtr function;
IAggregateFunction::AddFunc add_function = nullptr;
std::vector<size_t> column_numbers;

View File

@ -9,6 +9,7 @@
#include <Common/AlignedBuffer.h>
#include <Formats/FormatSettings.h>
#include <Formats/ProtobufWriter.h>
#include <DataTypes/DataTypeAggregateFunction.h>
#include <DataTypes/DataTypeFactory.h>
@ -248,6 +249,12 @@ void DataTypeAggregateFunction::deserializeTextCSV(IColumn & column, ReadBuffer
}
void DataTypeAggregateFunction::serializeProtobuf(const IColumn & column, size_t row_num, ProtobufWriter & protobuf) const
{
protobuf.writeAggregateFunction(function, static_cast<const ColumnAggregateFunction &>(column).getData()[row_num]);
}
MutableColumnPtr DataTypeAggregateFunction::createColumn() const
{
return ColumnAggregateFunction::create(function);

View File

@ -56,6 +56,7 @@ public:
void serializeTextXML(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings &) const override;
void serializeTextCSV(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings &) const override;
void deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
void serializeProtobuf(const IColumn & column, size_t row_num, ProtobufWriter & protobuf) const override;
MutableColumnPtr createColumn() const override;

View File

@ -430,6 +430,18 @@ void DataTypeArray::deserializeTextCSV(IColumn & column, ReadBuffer & istr, cons
}
void DataTypeArray::serializeProtobuf(const IColumn & column, size_t row_num, ProtobufWriter & protobuf) const
{
const ColumnArray & column_array = static_cast<const ColumnArray &>(column);
const ColumnArray::Offsets & offsets = column_array.getOffsets();
size_t offset = offsets[row_num - 1];
size_t next_offset = offsets[row_num];
const IColumn & nested_column = column_array.getData();
for (size_t i = offset; i < next_offset; ++i)
nested->serializeProtobuf(nested_column, i, protobuf);
}
MutableColumnPtr DataTypeArray::createColumn() const
{
return ColumnArray::create(nested->createColumn(), ColumnArray::ColumnOffsets::create());

View File

@ -84,6 +84,10 @@ public:
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state) const override;
void serializeProtobuf(const IColumn & column,
size_t row_num,
ProtobufWriter & protobuf) const override;
MutableColumnPtr createColumn() const override;
Field getDefault() const override;

View File

@ -4,6 +4,7 @@
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeFactory.h>
#include <Formats/ProtobufWriter.h>
namespace DB
@ -72,6 +73,11 @@ void DataTypeDate::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const
static_cast<ColumnUInt16 &>(column).getData().push_back(value.getDayNum());
}
void DataTypeDate::serializeProtobuf(const IColumn & column, size_t row_num, ProtobufWriter & protobuf) const
{
protobuf.writeDate(DayNum(static_cast<const ColumnUInt16 &>(column).getData()[row_num]));
}
bool DataTypeDate::equals(const IDataType & rhs) const
{
return typeid(rhs) == typeid(*this);

Some files were not shown because too many files have changed in this diff Show More