Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs

This commit is contained in:
kssenii 2021-05-22 21:26:48 +03:00
commit 1c43b333ea
264 changed files with 5957 additions and 1404 deletions

4
.gitmodules vendored
View File

@ -228,3 +228,7 @@
[submodule "contrib/datasketches-cpp"]
path = contrib/datasketches-cpp
url = https://github.com/ClickHouse-Extras/datasketches-cpp.git
[submodule "contrib/yaml-cpp"]
path = contrib/yaml-cpp
url = https://github.com/ClickHouse-Extras/yaml-cpp.git

View File

@ -1,3 +1,142 @@
## ClickHouse release 21.5, 2021-05-20
#### Backward Incompatible Change
* Change comparison of integers and floating point numbers when integer is not exactly representable in the floating point data type. In new version comparison will return false as the rounding error will occur. Example: `9223372036854775808.0 != 9223372036854775808`, because the number `9223372036854775808` is not representable as floating point number exactly (and `9223372036854775808.0` is rounded to `9223372036854776000.0`). But in previous version the comparison will return as the numbers are equal, because if the floating point number `9223372036854776000.0` get converted back to UInt64, it will yield `9223372036854775808`. For the reference, the Python programming language also treats these numbers as equal. But this behaviour was dependend on CPU model (different results on AMD64 and AArch64 for some out-of-range numbers), so we make the comparison more precise. It will treat int and float numbers equal only if int is represented in floating point type exactly. [#22595](https://github.com/ClickHouse/ClickHouse/pull/22595) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Remove support for `argMin` and `argMax` for single `Tuple` argument. The code was not memory-safe. The feature was added by mistake and it is confusing for people. These functions can be reintroduced under different names later. This fixes [#22384](https://github.com/ClickHouse/ClickHouse/issues/22384) and reverts [#17359](https://github.com/ClickHouse/ClickHouse/issues/17359). [#23393](https://github.com/ClickHouse/ClickHouse/pull/23393) ([alexey-milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Added functions `dictGetChildren(dictionary, key)`, `dictGetDescendants(dictionary, key, level)`. Function `dictGetChildren` return all children as an array if indexes. It is a inverse transformation for `dictGetHierarchy`. Function `dictGetDescendants` return all descendants as if `dictGetChildren` was applied `level` times recursively. Zero `level` value is equivalent to infinity. Improved performance of `dictGetHierarchy`, `dictIsIn` functions. Closes [#14656](https://github.com/ClickHouse/ClickHouse/issues/14656). [#22096](https://github.com/ClickHouse/ClickHouse/pull/22096) ([Maksim Kita](https://github.com/kitaisreal)).
* Added function `dictGetOrNull`. It works like `dictGet`, but return `Null` in case key was not found in dictionary. Closes [#22375](https://github.com/ClickHouse/ClickHouse/issues/22375). [#22413](https://github.com/ClickHouse/ClickHouse/pull/22413) ([Maksim Kita](https://github.com/kitaisreal)).
* Added a table function `s3Cluster`, which allows to process files from `s3` in parallel on every node of a specified cluster. [#22012](https://github.com/ClickHouse/ClickHouse/pull/22012) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Added support for replicas and shards in MySQL/PostgreSQL table engine / table function. You can write `SELECT * FROM mysql('host{1,2}-{1|2}', ...)`. Closes [#20969](https://github.com/ClickHouse/ClickHouse/issues/20969). [#22217](https://github.com/ClickHouse/ClickHouse/pull/22217) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Added `ALTER TABLE ... FETCH PART ...` query. It's similar to `FETCH PARTITION`, but fetches only one part. [#22706](https://github.com/ClickHouse/ClickHouse/pull/22706) ([turbo jason](https://github.com/songenjie)).
* Added a setting `max_distributed_depth` that limits the depth of recursive queries to `Distributed` tables. Closes [#20229](https://github.com/ClickHouse/ClickHouse/issues/20229). [#21942](https://github.com/ClickHouse/ClickHouse/pull/21942) ([flynn](https://github.com/ucasFL)).
#### Performance Improvement
* Improved performance of `intDiv` by dynamic dispatch for AVX2. This closes [#22314](https://github.com/ClickHouse/ClickHouse/issues/22314). [#23000](https://github.com/ClickHouse/ClickHouse/pull/23000) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Improved performance of reading from `ArrowStream` input format for sources other then local file (e.g. URL). [#22673](https://github.com/ClickHouse/ClickHouse/pull/22673) ([nvartolomei](https://github.com/nvartolomei)).
* Disabled compression by default when interacting with localhost (with clickhouse-client or server to server with distributed queries) via native protocol. It may improve performance of some import/export operations. This closes [#22234](https://github.com/ClickHouse/ClickHouse/issues/22234). [#22237](https://github.com/ClickHouse/ClickHouse/pull/22237) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Exclude values that does not belong to the shard from right part of IN section for distributed queries (under `optimize_skip_unused_shards_rewrite_in`, enabled by default, since it still requires `optimize_skip_unused_shards`). [#21511](https://github.com/ClickHouse/ClickHouse/pull/21511) ([Azat Khuzhin](https://github.com/azat)).
* Improved performance of reading a subset of columns with File-like table engine and column-oriented format like Parquet, Arrow or ORC. This closes [#issue:20129](https://github.com/ClickHouse/ClickHouse/issues/20129). [#21302](https://github.com/ClickHouse/ClickHouse/pull/21302) ([keenwolf](https://github.com/keen-wolf)).
* Allow to move more conditions to `PREWHERE` as it was before version 21.1 (adjustment of internal heuristics). Insufficient number of moved condtions could lead to worse performance. [#23397](https://github.com/ClickHouse/ClickHouse/pull/23397) ([Anton Popov](https://github.com/CurtizJ)).
* Improved performance of ODBC connections and fixed all the outstanding issues from the backlog. Using `nanodbc` library instead of `Poco::ODBC`. Closes [#9678](https://github.com/ClickHouse/ClickHouse/issues/9678). Add support for DateTime64 and Decimal* for ODBC table engine. Closes [#21961](https://github.com/ClickHouse/ClickHouse/issues/21961). Fixed issue with cyrillic text being truncated. Closes [#16246](https://github.com/ClickHouse/ClickHouse/issues/16246). Added connection pools for odbc bridge. [#21972](https://github.com/ClickHouse/ClickHouse/pull/21972) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### Improvement
* Increase `max_uri_size` (the maximum size of URL in HTTP interface) to 1 MiB by default. This closes [#21197](https://github.com/ClickHouse/ClickHouse/issues/21197). [#22997](https://github.com/ClickHouse/ClickHouse/pull/22997) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Set `background_fetches_pool_size` to `8` that is better for production usage with frequent small insertions or slow ZooKeeper cluster. [#22945](https://github.com/ClickHouse/ClickHouse/pull/22945) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* FlatDictionary added `initial_array_size`, `max_array_size` options. [#22521](https://github.com/ClickHouse/ClickHouse/pull/22521) ([Maksim Kita](https://github.com/kitaisreal)).
* Add new setting `non_replicated_deduplication_window` for non-replicated MergeTree inserts deduplication. [#22514](https://github.com/ClickHouse/ClickHouse/pull/22514) ([alesapin](https://github.com/alesapin)).
* Update paths to the `CatBoost` model configs in config reloading. [#22434](https://github.com/ClickHouse/ClickHouse/pull/22434) ([Kruglov Pavel](https://github.com/Avogar)).
* Added `Decimal256` type support in dictionaries. `Decimal256` is experimental feature. Closes [#20979](https://github.com/ClickHouse/ClickHouse/issues/20979). [#22960](https://github.com/ClickHouse/ClickHouse/pull/22960) ([Maksim Kita](https://github.com/kitaisreal)).
* Enabled `async_socket_for_remote` by default (using less amount of OS threads for distributed queries). [#23683](https://github.com/ClickHouse/ClickHouse/pull/23683) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed `quantile(s)TDigest`. Added special handling of singleton centroids according to tdunning/t-digest 3.2+. Also a bug with over-compression of centroids in implementation of earlier version of the algorithm was fixed. [#23314](https://github.com/ClickHouse/ClickHouse/pull/23314) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Make function name `unhex` case insensitive for compatibility with MySQL. [#23229](https://github.com/ClickHouse/ClickHouse/pull/23229) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Implement functions `arrayHasAny`, `arrayHasAll`, `has`, `indexOf`, `countEqual` for generic case when types of array elements are different. In previous versions the functions `arrayHasAny`, `arrayHasAll` returned false and `has`, `indexOf`, `countEqual` thrown exception. Also add support for `Decimal` and big integer types in functions `has` and similar. This closes [#20272](https://github.com/ClickHouse/ClickHouse/issues/20272). [#23044](https://github.com/ClickHouse/ClickHouse/pull/23044) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Raised the threshold on max number of matches in result of the function `extractAllGroupsHorizontal`. [#23036](https://github.com/ClickHouse/ClickHouse/pull/23036) ([Vasily Nemkov](https://github.com/Enmk)).
* Do not perform `optimize_skip_unused_shards` for cluster with one node. [#22999](https://github.com/ClickHouse/ClickHouse/pull/22999) ([Azat Khuzhin](https://github.com/azat)).
* Added ability to run clickhouse-keeper (experimental drop-in replacement to ZooKeeper) with SSL. Config settings `keeper_server.tcp_port_secure` can be used for secure interaction between client and keeper-server. `keeper_server.raft_configuration.secure` can be used to enable internal secure communication between nodes. [#22992](https://github.com/ClickHouse/ClickHouse/pull/22992) ([alesapin](https://github.com/alesapin)).
* Added ability to flush buffer only in background for `Buffer` tables. [#22986](https://github.com/ClickHouse/ClickHouse/pull/22986) ([Azat Khuzhin](https://github.com/azat)).
* When selecting from MergeTree table with NULL in WHERE condition, in rare cases, exception was thrown. This closes [#20019](https://github.com/ClickHouse/ClickHouse/issues/20019). [#22978](https://github.com/ClickHouse/ClickHouse/pull/22978) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix error handling in Poco HTTP Client for AWS. [#22973](https://github.com/ClickHouse/ClickHouse/pull/22973) ([kreuzerkrieg](https://github.com/kreuzerkrieg)).
* Respect `max_part_removal_threads` for `ReplicatedMergeTree`. [#22971](https://github.com/ClickHouse/ClickHouse/pull/22971) ([Azat Khuzhin](https://github.com/azat)).
* Fix obscure corner case of MergeTree settings inactive_parts_to_throw_insert = 0 with inactive_parts_to_delay_insert > 0. [#22947](https://github.com/ClickHouse/ClickHouse/pull/22947) ([Azat Khuzhin](https://github.com/azat)).
* `dateDiff` now works with `DateTime64` arguments (even for values outside of `DateTime` range) [#22931](https://github.com/ClickHouse/ClickHouse/pull/22931) ([Vasily Nemkov](https://github.com/Enmk)).
* MaterializeMySQL (experimental feature): added an ability to replicate MySQL databases containing views without failing. This is accomplished by ignoring the views. [#22760](https://github.com/ClickHouse/ClickHouse/pull/22760) ([Christian](https://github.com/cfroystad)).
* Allow RBAC row policy via postgresql protocol. Closes [#22658](https://github.com/ClickHouse/ClickHouse/issues/22658). PostgreSQL protocol is enabled in configuration by default. [#22755](https://github.com/ClickHouse/ClickHouse/pull/22755) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add metric to track how much time is spend during waiting for Buffer layer lock. [#22725](https://github.com/ClickHouse/ClickHouse/pull/22725) ([Azat Khuzhin](https://github.com/azat)).
* Allow to use CTE in VIEW definition. This closes [#22491](https://github.com/ClickHouse/ClickHouse/issues/22491). [#22657](https://github.com/ClickHouse/ClickHouse/pull/22657) ([Amos Bird](https://github.com/amosbird)).
* Clear the rest of the screen and show cursor in `clickhouse-client` if previous program has left garbage in terminal. This closes [#16518](https://github.com/ClickHouse/ClickHouse/issues/16518). [#22634](https://github.com/ClickHouse/ClickHouse/pull/22634) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Make `round` function to behave consistently on non-x86_64 platforms. Rounding half to nearest even (Banker's rounding) is used. [#22582](https://github.com/ClickHouse/ClickHouse/pull/22582) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Correctly check structure of blocks of data that are sending by Distributed tables. [#22325](https://github.com/ClickHouse/ClickHouse/pull/22325) ([Azat Khuzhin](https://github.com/azat)).
* Allow publishing Kafka errors to a virtual column of Kafka engine, controlled by the `kafka_handle_error_mode` setting. [#21850](https://github.com/ClickHouse/ClickHouse/pull/21850) ([fastio](https://github.com/fastio)).
* Add aliases `simpleJSONExtract/simpleJSONHas` to `visitParam/visitParamExtract{UInt, Int, Bool, Float, Raw, String}`. Fixes [#21383](https://github.com/ClickHouse/ClickHouse/issues/21383). [#21519](https://github.com/ClickHouse/ClickHouse/pull/21519) ([fastio](https://github.com/fastio)).
* Add `clickhouse-library-bridge` for library dictionary source. Closes [#9502](https://github.com/ClickHouse/ClickHouse/issues/9502). [#21509](https://github.com/ClickHouse/ClickHouse/pull/21509) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Forbid to drop a column if it's referenced by materialized view. Closes [#21164](https://github.com/ClickHouse/ClickHouse/issues/21164). [#21303](https://github.com/ClickHouse/ClickHouse/pull/21303) ([flynn](https://github.com/ucasFL)).
* Support dynamic interserver credentials (rotating credentials without downtime). [#14113](https://github.com/ClickHouse/ClickHouse/pull/14113) ([johnskopis](https://github.com/johnskopis)).
* Add support for Kafka storage with `Arrow` and `ArrowStream` format messages. [#23415](https://github.com/ClickHouse/ClickHouse/pull/23415) ([Chao Ma](https://github.com/godliness)).
* Fixed missing semicolon in exception message. The user may find this exception message unpleasant to read. [#23208](https://github.com/ClickHouse/ClickHouse/pull/23208) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fixed missing whitespace in some exception messages about `LowCardinality` type. [#23207](https://github.com/ClickHouse/ClickHouse/pull/23207) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Some values were formatted with alignment in center in table cells in `Markdown` format. Not anymore. [#23096](https://github.com/ClickHouse/ClickHouse/pull/23096) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Remove non-essential details from suggestions in clickhouse-client. This closes [#22158](https://github.com/ClickHouse/ClickHouse/issues/22158). [#23040](https://github.com/ClickHouse/ClickHouse/pull/23040) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Correct calculation of `bytes_allocated` field in system.dictionaries for sparse_hashed dictionaries. [#22867](https://github.com/ClickHouse/ClickHouse/pull/22867) ([Azat Khuzhin](https://github.com/azat)).
* Fixed approximate total rows accounting for reverse reading from MergeTree. [#22726](https://github.com/ClickHouse/ClickHouse/pull/22726) ([Azat Khuzhin](https://github.com/azat)).
* Fix the case when it was possible to configure dictionary with clickhouse source that was looking to itself that leads to infinite loop. Closes [#14314](https://github.com/ClickHouse/ClickHouse/issues/14314). [#22479](https://github.com/ClickHouse/ClickHouse/pull/22479) ([Maksim Kita](https://github.com/kitaisreal)).
#### Bug Fix
* Multiple fixes for hedged requests. Fixed an error `Can't initialize pipeline with empty pipe` for queries with `GLOBAL IN/JOIN` when the setting `use_hedged_requests` is enabled. Fixes [#23431](https://github.com/ClickHouse/ClickHouse/issues/23431). [#23805](https://github.com/ClickHouse/ClickHouse/pull/23805) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). Fixed a race condition in hedged connections which leads to crash. This fixes [#22161](https://github.com/ClickHouse/ClickHouse/issues/22161). [#22443](https://github.com/ClickHouse/ClickHouse/pull/22443) ([Kruglov Pavel](https://github.com/Avogar)). Fix possible crash in case if `unknown packet` was received from remote query (with `async_socket_for_remote` enabled). Fixes [#21167](https://github.com/ClickHouse/ClickHouse/issues/21167). [#23309](https://github.com/ClickHouse/ClickHouse/pull/23309) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed the behavior when disabling `input_format_with_names_use_header ` setting discards all the input with CSVWithNames format. This fixes [#22406](https://github.com/ClickHouse/ClickHouse/issues/22406). [#23202](https://github.com/ClickHouse/ClickHouse/pull/23202) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fixed remote JDBC bridge timeout connection issue. Closes [#9609](https://github.com/ClickHouse/ClickHouse/issues/9609). [#23771](https://github.com/ClickHouse/ClickHouse/pull/23771) ([Maksim Kita](https://github.com/kitaisreal), [alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix the logic of initial load of `complex_key_hashed` if `update_field` is specified. Closes [#23800](https://github.com/ClickHouse/ClickHouse/issues/23800). [#23824](https://github.com/ClickHouse/ClickHouse/pull/23824) ([Maksim Kita](https://github.com/kitaisreal)).
* Fixed crash when `PREWHERE` and row policy filter are both in effect with empty result. [#23763](https://github.com/ClickHouse/ClickHouse/pull/23763) ([Amos Bird](https://github.com/amosbird)).
* Avoid possible "Cannot schedule a task" error (in case some exception had been occurred) on INSERT into Distributed. [#23744](https://github.com/ClickHouse/ClickHouse/pull/23744) ([Azat Khuzhin](https://github.com/azat)).
* Added an exception in case of completely the same values in both samples in aggregate function `mannWhitneyUTest`. This fixes [#23646](https://github.com/ClickHouse/ClickHouse/issues/23646). [#23654](https://github.com/ClickHouse/ClickHouse/pull/23654) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fixed server fault when inserting data through HTTP caused an exception. This fixes [#23512](https://github.com/ClickHouse/ClickHouse/issues/23512). [#23643](https://github.com/ClickHouse/ClickHouse/pull/23643) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fixed misinterpretation of some `LIKE` expressions with escape sequences. [#23610](https://github.com/ClickHouse/ClickHouse/pull/23610) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fixed restart / stop command hanging. Closes [#20214](https://github.com/ClickHouse/ClickHouse/issues/20214). [#23552](https://github.com/ClickHouse/ClickHouse/pull/23552) ([filimonov](https://github.com/filimonov)).
* Fixed `COLUMNS` matcher in case of multiple JOINs in select query. Closes [#22736](https://github.com/ClickHouse/ClickHouse/issues/22736). [#23501](https://github.com/ClickHouse/ClickHouse/pull/23501) ([Maksim Kita](https://github.com/kitaisreal)).
* Fixed a crash when modifying column's default value when a column itself is used as `ReplacingMergeTree`'s parameter. [#23483](https://github.com/ClickHouse/ClickHouse/pull/23483) ([hexiaoting](https://github.com/hexiaoting)).
* Fixed corner cases in vertical merges with `ReplacingMergeTree`. In rare cases they could lead to fails of merges with exceptions like `Incomplete granules are not allowed while blocks are granules size`. [#23459](https://github.com/ClickHouse/ClickHouse/pull/23459) ([Anton Popov](https://github.com/CurtizJ)).
* Fixed bug that does not allow cast from empty array literal, to array with dimensions greater than 1, e.g. `CAST([] AS Array(Array(String)))`. Closes [#14476](https://github.com/ClickHouse/ClickHouse/issues/14476). [#23456](https://github.com/ClickHouse/ClickHouse/pull/23456) ([Maksim Kita](https://github.com/kitaisreal)).
* Fixed a bug when `deltaSum` aggregate function produced incorrect result after resetting the counter. [#23437](https://github.com/ClickHouse/ClickHouse/pull/23437) ([Russ Frank](https://github.com/rf)).
* Fixed `Cannot unlink file` error on unsuccessful creation of ReplicatedMergeTree table with multidisk configuration. This closes [#21755](https://github.com/ClickHouse/ClickHouse/issues/21755). [#23433](https://github.com/ClickHouse/ClickHouse/pull/23433) ([tavplubix](https://github.com/tavplubix)).
* Fixed incompatible constant expression generation during partition pruning based on virtual columns. This fixes https://github.com/ClickHouse/ClickHouse/pull/21401#discussion_r611888913. [#23366](https://github.com/ClickHouse/ClickHouse/pull/23366) ([Amos Bird](https://github.com/amosbird)).
* Fixed a crash when setting join_algorithm is set to 'auto' and Join is performed with a Dictionary. Close [#23002](https://github.com/ClickHouse/ClickHouse/issues/23002). [#23312](https://github.com/ClickHouse/ClickHouse/pull/23312) ([Vladimir](https://github.com/vdimir)).
* Don't relax NOT conditions during partition pruning. This fixes [#23305](https://github.com/ClickHouse/ClickHouse/issues/23305) and [#21539](https://github.com/ClickHouse/ClickHouse/issues/21539). [#23310](https://github.com/ClickHouse/ClickHouse/pull/23310) ([Amos Bird](https://github.com/amosbird)).
* Fixed very rare race condition on background cleanup of old blocks. It might cause a block not to be deduplicated if it's too close to the end of deduplication window. [#23301](https://github.com/ClickHouse/ClickHouse/pull/23301) ([tavplubix](https://github.com/tavplubix)).
* Fixed very rare (distributed) race condition between creation and removal of ReplicatedMergeTree tables. It might cause exceptions like `node doesn't exist` on attempt to create replicated table. Fixes [#21419](https://github.com/ClickHouse/ClickHouse/issues/21419). [#23294](https://github.com/ClickHouse/ClickHouse/pull/23294) ([tavplubix](https://github.com/tavplubix)).
* Fixed simple key dictionary from DDL creation if primary key is not first attribute. Fixes [#23236](https://github.com/ClickHouse/ClickHouse/issues/23236). [#23262](https://github.com/ClickHouse/ClickHouse/pull/23262) ([Maksim Kita](https://github.com/kitaisreal)).
* Fixed reading from ODBC when there are many long column names in a table. Closes [#8853](https://github.com/ClickHouse/ClickHouse/issues/8853). [#23215](https://github.com/ClickHouse/ClickHouse/pull/23215) ([Kseniia Sumarokova](https://github.com/kssenii)).
* MaterializeMySQL (experimental feature): fixed `Not found column` error when selecting from `MaterializeMySQL` with condition on key column. Fixes [#22432](https://github.com/ClickHouse/ClickHouse/issues/22432). [#23200](https://github.com/ClickHouse/ClickHouse/pull/23200) ([tavplubix](https://github.com/tavplubix)).
* Correct aliases handling if subquery was optimized to constant. Fixes [#22924](https://github.com/ClickHouse/ClickHouse/issues/22924). Fixes [#10401](https://github.com/ClickHouse/ClickHouse/issues/10401). [#23191](https://github.com/ClickHouse/ClickHouse/pull/23191) ([Maksim Kita](https://github.com/kitaisreal)).
* Server might fail to start if `data_type_default_nullable` setting is enabled in default profile, it's fixed. Fixes [#22573](https://github.com/ClickHouse/ClickHouse/issues/22573). [#23185](https://github.com/ClickHouse/ClickHouse/pull/23185) ([tavplubix](https://github.com/tavplubix)).
* Fixed a crash on shutdown which happened because of wrong accounting of current connections. [#23154](https://github.com/ClickHouse/ClickHouse/pull/23154) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fixed `Table .inner_id... doesn't exist` error when selecting from Materialized View after detaching it from Atomic database and attaching back. [#23047](https://github.com/ClickHouse/ClickHouse/pull/23047) ([tavplubix](https://github.com/tavplubix)).
* Fix error `Cannot find column in ActionsDAG result` which may happen if subquery uses `untuple`. Fixes [#22290](https://github.com/ClickHouse/ClickHouse/issues/22290). [#22991](https://github.com/ClickHouse/ClickHouse/pull/22991) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix usage of constant columns of type `Map` with nullable values. [#22939](https://github.com/ClickHouse/ClickHouse/pull/22939) ([Anton Popov](https://github.com/CurtizJ)).
* fixed `formatDateTime()` on `DateTime64` and "%C" format specifier fixed `toDateTime64()` for large values and non-zero scale. [#22937](https://github.com/ClickHouse/ClickHouse/pull/22937) ([Vasily Nemkov](https://github.com/Enmk)).
* Fixed a crash when using `mannWhitneyUTest` and `rankCorr` with window functions. This fixes [#22728](https://github.com/ClickHouse/ClickHouse/issues/22728). [#22876](https://github.com/ClickHouse/ClickHouse/pull/22876) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* LIVE VIEW (experimental feature): fixed possible hanging in concurrent DROP/CREATE of TEMPORARY LIVE VIEW in `TemporaryLiveViewCleaner`, [see](https://gist.github.com/vzakaznikov/0c03195960fc86b56bfe2bc73a90019e). [#22858](https://github.com/ClickHouse/ClickHouse/pull/22858) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fixed pushdown of `HAVING` in case, when filter column is used in aggregation. [#22763](https://github.com/ClickHouse/ClickHouse/pull/22763) ([Anton Popov](https://github.com/CurtizJ)).
* Fixed possible hangs in Zookeeper requests in case of OOM exception. Fixes [#22438](https://github.com/ClickHouse/ClickHouse/issues/22438). [#22684](https://github.com/ClickHouse/ClickHouse/pull/22684) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed wait for mutations on several replicas for ReplicatedMergeTree table engines. Previously, mutation/alter query may finish before mutation actually executed on other replicas. [#22669](https://github.com/ClickHouse/ClickHouse/pull/22669) ([alesapin](https://github.com/alesapin)).
* Fixed exception for Log with nested types without columns in the SELECT clause. [#22654](https://github.com/ClickHouse/ClickHouse/pull/22654) ([Azat Khuzhin](https://github.com/azat)).
* Fix unlimited wait for auxiliary AWS requests. [#22594](https://github.com/ClickHouse/ClickHouse/pull/22594) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Fixed a crash when client closes connection very early [#22579](https://github.com/ClickHouse/ClickHouse/issues/22579). [#22591](https://github.com/ClickHouse/ClickHouse/pull/22591) ([nvartolomei](https://github.com/nvartolomei)).
* `Map` data type (experimental feature): fixed an incorrect formatting of function `map` in distributed queries. [#22588](https://github.com/ClickHouse/ClickHouse/pull/22588) ([foolchi](https://github.com/foolchi)).
* Fixed deserialization of empty string without newline at end of TSV format. This closes [#20244](https://github.com/ClickHouse/ClickHouse/issues/20244). Possible workaround without version update: set `input_format_null_as_default` to zero. It was zero in old versions. [#22527](https://github.com/ClickHouse/ClickHouse/pull/22527) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fixed wrong cast of a column of `LowCardinality` type in Merge Join algorithm. Close [#22386](https://github.com/ClickHouse/ClickHouse/issues/22386), close [#22388](https://github.com/ClickHouse/ClickHouse/issues/22388). [#22510](https://github.com/ClickHouse/ClickHouse/pull/22510) ([Vladimir](https://github.com/vdimir)).
* Buffer overflow (on read) was possible in `tokenbf_v1` full text index. The excessive bytes are not used but the read operation may lead to crash in rare cases. This closes [#19233](https://github.com/ClickHouse/ClickHouse/issues/19233). [#22421](https://github.com/ClickHouse/ClickHouse/pull/22421) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Do not limit HTTP chunk size. Fixes [#21907](https://github.com/ClickHouse/ClickHouse/issues/21907). [#22322](https://github.com/ClickHouse/ClickHouse/pull/22322) ([Ivan](https://github.com/abyss7)).
* Fixed a bug, which leads to underaggregation of data in case of enabled `optimize_aggregation_in_order` and many parts in table. Slightly improve performance of aggregation with enabled `optimize_aggregation_in_order`. [#21889](https://github.com/ClickHouse/ClickHouse/pull/21889) ([Anton Popov](https://github.com/CurtizJ)).
* Check if table function view is used as a column. This complements #20350. [#21465](https://github.com/ClickHouse/ClickHouse/pull/21465) ([Amos Bird](https://github.com/amosbird)).
* Fix "unknown column" error for tables with `Merge` engine in queris with `JOIN` and aggregation. Closes [#18368](https://github.com/ClickHouse/ClickHouse/issues/18368), close [#22226](https://github.com/ClickHouse/ClickHouse/issues/22226). [#21370](https://github.com/ClickHouse/ClickHouse/pull/21370) ([Vladimir](https://github.com/vdimir)).
* Fixed name clashes in pushdown optimization. It caused incorrect `WHERE` filtration after FULL JOIN. Close [#20497](https://github.com/ClickHouse/ClickHouse/issues/20497). [#20622](https://github.com/ClickHouse/ClickHouse/pull/20622) ([Vladimir](https://github.com/vdimir)).
* Fixed very rare bug when quorum insert with `quorum_parallel=1` is not really "quorum" because of deduplication. [#18215](https://github.com/ClickHouse/ClickHouse/pull/18215) ([filimonov](https://github.com/filimonov) - reported, [alesapin](https://github.com/alesapin) - fixed).
#### Build/Testing/Packaging Improvement
* Run stateless tests in parallel in CI. [#22300](https://github.com/ClickHouse/ClickHouse/pull/22300) ([alesapin](https://github.com/alesapin)).
* Simplify debian packages. This fixes [#21698](https://github.com/ClickHouse/ClickHouse/issues/21698). [#22976](https://github.com/ClickHouse/ClickHouse/pull/22976) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Added support for ClickHouse build on Apple M1. [#21639](https://github.com/ClickHouse/ClickHouse/pull/21639) ([changvvb](https://github.com/changvvb)).
* Fixed ClickHouse Keeper build for MacOS. [#22860](https://github.com/ClickHouse/ClickHouse/pull/22860) ([alesapin](https://github.com/alesapin)).
* Fixed some tests on AArch64 platform. [#22596](https://github.com/ClickHouse/ClickHouse/pull/22596) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Added function alignment for possibly better performance. [#21431](https://github.com/ClickHouse/ClickHouse/pull/21431) ([Danila Kutenin](https://github.com/danlark1)).
* Adjust some tests to output identical results on amd64 and aarch64 (qemu). The result was depending on implementation specific CPU behaviour. [#22590](https://github.com/ClickHouse/ClickHouse/pull/22590) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Allow query profiling only on x86_64. See [#15174](https://github.com/ClickHouse/ClickHouse/issues/15174#issuecomment-812954965) and [#15638](https://github.com/ClickHouse/ClickHouse/issues/15638#issuecomment-703805337). This closes [#15638](https://github.com/ClickHouse/ClickHouse/issues/15638). [#22580](https://github.com/ClickHouse/ClickHouse/pull/22580) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Allow building with unbundled xz (lzma) using `USE_INTERNAL_XZ_LIBRARY=OFF` CMake option. [#22571](https://github.com/ClickHouse/ClickHouse/pull/22571) ([Kfir Itzhak](https://github.com/mastertheknife)).
* Enable bundled `openldap` on `ppc64le` [#22487](https://github.com/ClickHouse/ClickHouse/pull/22487) ([Kfir Itzhak](https://github.com/mastertheknife)).
* Disable incompatible libraries (platform specific typically) on `ppc64le` [#22475](https://github.com/ClickHouse/ClickHouse/pull/22475) ([Kfir Itzhak](https://github.com/mastertheknife)).
* Add Jepsen test in CI for clickhouse Keeper. [#22373](https://github.com/ClickHouse/ClickHouse/pull/22373) ([alesapin](https://github.com/alesapin)).
* Build `jemalloc` with support for [heap profiling](https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling). [#22834](https://github.com/ClickHouse/ClickHouse/pull/22834) ([nvartolomei](https://github.com/nvartolomei)).
* Avoid UB in `*Log` engines for rwlock unlock due to unlock from another thread. [#22583](https://github.com/ClickHouse/ClickHouse/pull/22583) ([Azat Khuzhin](https://github.com/azat)).
* Fixed UB by unlocking the rwlock of the TinyLog from the same thread. [#22560](https://github.com/ClickHouse/ClickHouse/pull/22560) ([Azat Khuzhin](https://github.com/azat)).
## ClickHouse release 21.4
### ClickHouse release 21.4.1 2021-04-12

View File

@ -527,6 +527,7 @@ include (cmake/find/nanodbc.cmake)
include (cmake/find/rocksdb.cmake)
include (cmake/find/libpqxx.cmake)
include (cmake/find/nuraft.cmake)
include (cmake/find/yaml-cpp.cmake)
if(NOT USE_INTERNAL_PARQUET_LIBRARY)

View File

@ -3,5 +3,11 @@ add_library (bridge
)
target_include_directories (daemon PUBLIC ..)
target_link_libraries (bridge PRIVATE daemon dbms Poco::Data Poco::Data::ODBC)
target_link_libraries (bridge
PRIVATE
daemon
dbms
Poco::Data
Poco::Data::ODBC
)

View File

@ -78,6 +78,8 @@ PoolWithFailover::PoolWithFailover(
const RemoteDescription & addresses,
const std::string & user,
const std::string & password,
unsigned default_connections_,
unsigned max_connections_,
size_t max_tries_)
: max_tries(max_tries_)
, shareable(false)
@ -85,7 +87,13 @@ PoolWithFailover::PoolWithFailover(
/// Replicas have the same priority, but traversed replicas are moved to the end of the queue.
for (const auto & [host, port] : addresses)
{
replicas_by_priority[0].emplace_back(std::make_shared<Pool>(database, host, user, password, port));
replicas_by_priority[0].emplace_back(std::make_shared<Pool>(database,
host, user, password, port,
/* socket_ = */ "",
MYSQLXX_DEFAULT_TIMEOUT,
MYSQLXX_DEFAULT_RW_TIMEOUT,
default_connections_,
max_connections_));
}
}

View File

@ -115,6 +115,8 @@ namespace mysqlxx
const RemoteDescription & addresses,
const std::string & user,
const std::string & password,
unsigned default_connections_ = MYSQLXX_POOL_WITH_FAILOVER_DEFAULT_START_CONNECTIONS,
unsigned max_connections_ = MYSQLXX_POOL_WITH_FAILOVER_DEFAULT_MAX_CONNECTIONS,
size_t max_tries_ = MYSQLXX_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES);
PoolWithFailover(const PoolWithFailover & other);

View File

@ -1,9 +1,9 @@
# This strings autochanged from release_lib.sh:
SET(VERSION_REVISION 54451)
SET(VERSION_REVISION 54452)
SET(VERSION_MAJOR 21)
SET(VERSION_MINOR 6)
SET(VERSION_MINOR 7)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 96fced4c3cf432fb0b401d2ab01f0c56e5f74a96)
SET(VERSION_DESCRIBE v21.6.1.1-prestable)
SET(VERSION_STRING 21.6.1.1)
SET(VERSION_GITHASH 976ccc2e908ac3bc28f763bfea8134ea0a121b40)
SET(VERSION_DESCRIBE v21.7.1.1-prestable)
SET(VERSION_STRING 21.7.1.1)
# end of autochange

View File

@ -0,0 +1,9 @@
option(USE_YAML_CPP "Enable yaml-cpp" ${ENABLE_LIBRARIES})
if (NOT USE_YAML_CPP)
return()
endif()
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/yaml-cpp")
message (ERROR "submodule contrib/yaml-cpp is missing. to fix try run: \n git submodule update --init --recursive")
endif()

View File

@ -50,6 +50,10 @@ add_subdirectory (replxx-cmake)
add_subdirectory (unixodbc-cmake)
add_subdirectory (nanodbc-cmake)
if (USE_YAML_CPP)
add_subdirectory (yaml-cpp-cmake)
endif()
if (USE_INTERNAL_XZ_LIBRARY)
add_subdirectory (xz)
endif()

1
contrib/yaml-cpp vendored Submodule

@ -0,0 +1 @@
Subproject commit 0c86adac6d117ee2b4afcedb8ade19036ca0327d

View File

@ -0,0 +1,39 @@
set (LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/yaml-cpp)
set (SRCS
${LIBRARY_DIR}/src/binary.cpp
${LIBRARY_DIR}/src/emitterutils.cpp
${LIBRARY_DIR}/src/null.cpp
${LIBRARY_DIR}/src/scantoken.cpp
${LIBRARY_DIR}/src/convert.cpp
${LIBRARY_DIR}/src/exceptions.cpp
${LIBRARY_DIR}/src/ostream_wrapper.cpp
${LIBRARY_DIR}/src/simplekey.cpp
${LIBRARY_DIR}/src/depthguard.cpp
${LIBRARY_DIR}/src/exp.cpp
${LIBRARY_DIR}/src/parse.cpp
${LIBRARY_DIR}/src/singledocparser.cpp
${LIBRARY_DIR}/src/directives.cpp
${LIBRARY_DIR}/src/memory.cpp
${LIBRARY_DIR}/src/parser.cpp
${LIBRARY_DIR}/src/stream.cpp
${LIBRARY_DIR}/src/emit.cpp
${LIBRARY_DIR}/src/nodebuilder.cpp
${LIBRARY_DIR}/src/regex_yaml.cpp
${LIBRARY_DIR}/src/tag.cpp
${LIBRARY_DIR}/src/emitfromevents.cpp
${LIBRARY_DIR}/src/node.cpp
${LIBRARY_DIR}/src/scanner.cpp
${LIBRARY_DIR}/src/emitter.cpp
${LIBRARY_DIR}/src/node_data.cpp
${LIBRARY_DIR}/src/scanscalar.cpp
${LIBRARY_DIR}/src/emitterstate.cpp
${LIBRARY_DIR}/src/nodeevents.cpp
${LIBRARY_DIR}/src/scantag.cpp
)
add_library (yaml-cpp ${SRCS})
target_include_directories(yaml-cpp PRIVATE ${LIBRARY_DIR}/include/yaml-cpp)
target_include_directories(yaml-cpp SYSTEM BEFORE PUBLIC ${LIBRARY_DIR}/include)

4
debian/changelog vendored
View File

@ -1,5 +1,5 @@
clickhouse (21.6.1.1) unstable; urgency=low
clickhouse (21.7.1.1) unstable; urgency=low
* Modified source code
-- clickhouse-release <clickhouse-release@yandex-team.ru> Tue, 20 Apr 2021 01:48:16 +0300
-- clickhouse-release <clickhouse-release@yandex-team.ru> Thu, 20 May 2021 22:23:29 +0300

View File

@ -1,7 +1,7 @@
FROM ubuntu:18.04
ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
ARG version=21.6.1.*
ARG version=21.7.1.*
RUN apt-get update \
&& apt-get install --yes --no-install-recommends \

View File

@ -1,7 +1,7 @@
FROM ubuntu:20.04
ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
ARG version=21.6.1.*
ARG version=21.7.1.*
ARG gosu_ver=1.10
# set non-empty deb_location_url url to create a docker image

View File

@ -1,7 +1,7 @@
FROM ubuntu:18.04
ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
ARG version=21.6.1.*
ARG version=21.7.1.*
RUN apt-get update && \
apt-get install -y apt-transport-https dirmngr && \

View File

@ -15,7 +15,12 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
...
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause'])
SETTINGS
[connection_pool_size=16, ]
[connection_max_tries=3, ]
[connection_auto_close=true ]
;
```
See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.

View File

@ -5,11 +5,11 @@ toc_title: "Функции для шифрования"
# Функции шифрования {#encryption-functions}
Даннвые функции реализуют шифрование и расшифровку данных с помощью AES (Advanced Encryption Standard) алгоритма.
Данные функции реализуют шифрование и расшифровку данных с помощью AES (Advanced Encryption Standard) алгоритма.
Длина ключа зависит от режима шифрования. Он может быть длинной в 16, 24 и 32 байта для режимов шифрования `-128-`, `-196-` и `-256-` соответственно.
Длина инициализирующего вектора всегда 16 байт (лишнии байты игнорируются).
Длина инициализирующего вектора всегда 16 байт (лишние байты игнорируются).
Обратите внимание, что до версии Clickhouse 21.1 эти функции работали медленно.

View File

@ -62,7 +62,6 @@ def build_amp(lang, args, cfg):
for root, _, filenames in os.walk(site_temp):
if 'index.html' in filenames:
paths.append(prepare_amp_html(lang, args, root, site_temp, main_site_dir))
test.test_amp(paths, lang)
logging.info(f'Finished building AMP version for {lang}')

View File

@ -40,7 +40,7 @@ def build_for_lang(lang, args):
site_names = {
'en': 'ClickHouse Blog',
'ru': 'Блог ClickHouse '
'ru': 'Блог ClickHouse'
}
assert len(site_names) == len(languages)
@ -62,7 +62,7 @@ def build_for_lang(lang, args):
strict=True,
theme=theme_cfg,
nav=blog_nav,
copyright='©20162020 Yandex LLC',
copyright='©20162021 Yandex LLC',
use_directory_urls=True,
repo_name='ClickHouse/ClickHouse',
repo_url='https://github.com/ClickHouse/ClickHouse/',

View File

@ -94,7 +94,7 @@ def build_for_lang(lang, args):
site_dir=site_dir,
strict=True,
theme=theme_cfg,
copyright='©20162020 Yandex LLC',
copyright='©20162021 Yandex LLC',
use_directory_urls=True,
repo_name='ClickHouse/ClickHouse',
repo_url='https://github.com/ClickHouse/ClickHouse/',

View File

@ -31,7 +31,16 @@ def build_nav_entry(root, args):
result_items.append((prio, title, payload))
elif filename.endswith('.md'):
path = os.path.join(root, filename)
meta, content = util.read_md_file(path)
meta = ''
content = ''
try:
meta, content = util.read_md_file(path)
except:
print('Error in file: {}'.format(path))
raise
path = path.split('/', 2)[-1]
title = meta.get('toc_title', find_first_header(content))
if title:

View File

@ -3,34 +3,9 @@
import logging
import os
import sys
import bs4
import logging
import os
import subprocess
import bs4
def test_amp(paths, lang):
try:
# Get latest amp validator version
subprocess.check_call('amphtml-validator --help',
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
shell=True)
except subprocess.CalledProcessError:
subprocess.check_call('npm i -g amphtml-validator', stderr=subprocess.DEVNULL, shell=True)
paths = ' '.join(paths)
command = f'amphtml-validator {paths}'
try:
subprocess.check_output(command, shell=True).decode('utf-8')
except subprocess.CalledProcessError:
logging.error(f'Invalid AMP for {lang}')
raise
def test_template(template_path):
if template_path.endswith('amp.html'):

View File

@ -1369,6 +1369,27 @@ private:
{
const auto * exception = server_exception ? server_exception.get() : client_exception.get();
fmt::print(stderr, "Error on processing query '{}': {}\n", ast_to_process->formatForErrorMessage(), exception->message());
// Try to reconnect after errors, for two reasons:
// 1. We might not have realized that the server died, e.g. if
// it sent us a <Fatal> trace and closed connection properly.
// 2. The connection might have gotten into a wrong state and
// the next query will get false positive about
// "Unknown packet from server".
try
{
connection->forceConnected(connection_parameters.timeouts);
}
catch (...)
{
// Just report it, we'll terminate below.
fmt::print(stderr,
"Error while reconnecting to the server: Code: {}: {}\n",
getCurrentExceptionCode(),
getCurrentExceptionMessage(true));
assert(!connection->isConnected());
}
}
if (!connection->isConnected())
@ -1472,11 +1493,6 @@ private:
server_exception.reset();
client_exception.reset();
have_error = false;
// We have to reinitialize connection after errors, because it
// might have gotten into a wrong state and we'll get false
// positives about "Unknown packet from server".
connection->forceConnected(connection_parameters.timeouts);
}
else if (ast_to_process->formatForErrorMessage().size() > 500)
{

View File

@ -0,0 +1,86 @@
# We can use 3 main node types in YAML: Scalar, Map and Sequence.
# A Scalar is a simple key-value pair:
scalar: 123
# Here we have a key "scalar" and value "123"
# If we rewrite this in XML, we will get <scalar>123</scalar>
# We can also represent an empty value with '':
key: ''
# A Map is a node, which contains other nodes:
map:
key1: value1
key2: value2
small_map:
key3: value3
# This map can be converted into:
# <map>
# <key1>value1</key1>
# <key2>value2</key2>
# <small_map>
# <key3>value3</key3>
# </small_map>
# </map>
# A Sequence is a node, which contains also other nodes.
# The main difference from Map is that Sequence can also contain simple values.
sequence:
- val1
- val2
- key: 123
- map:
mkey1: foo
mkey2: bar
# We can represent it in XML this way:
# <sequence>val1</sequence>
# <sequence>val2</sequence>
# <sequence>
# <key>123</key>
# </sequence>
# <sequence>
# <map>
# <mkey1>foo</mkey1>
# <mkey2>bar</mkey2>
# </map>
# </sequence>
# YAML does not have direct support for structures like XML attributes.
# We represent them as nodes with @ prefix in key. Note, that @ is reserved by YAML standard,
# so you will need to write double quotes around the key. Both Map and Sequence can have
# attributes as children nodes
map:
"@attr1": value1
"@attr2": value2
key: 123
# This gives us:
# <map attr1="value1" attr2="value2">
# <key>123</key>
# </map>
sequence:
- "@attr1": value1
- "@attr2": value2
- 123
- abc
# And this gives us:
# <map attr1="value1" attr2="value2">123</map>
# <map attr1="value1" attr2="value2">abc</map>

View File

@ -52,6 +52,9 @@ template <typename Value, bool float_return> using FuncQuantilesTDigest = Aggreg
template <typename Value, bool float_return> using FuncQuantileTDigestWeighted = AggregateFunctionQuantile<Value, QuantileTDigest<Value>, NameQuantileTDigestWeighted, true, std::conditional_t<float_return, Float32, void>, false>;
template <typename Value, bool float_return> using FuncQuantilesTDigestWeighted = AggregateFunctionQuantile<Value, QuantileTDigest<Value>, NameQuantilesTDigestWeighted, true, std::conditional_t<float_return, Float32, void>, true>;
template <typename Value, bool float_return> using FuncQuantileBFloat16 = AggregateFunctionQuantile<Value, QuantileBFloat16Histogram<Value>, NameQuantileBFloat16, false, std::conditional_t<float_return, Float64, void>, false>;
template <typename Value, bool float_return> using FuncQuantilesBFloat16 = AggregateFunctionQuantile<Value, QuantileBFloat16Histogram<Value>, NameQuantilesBFloat16, false, std::conditional_t<float_return, Float64, void>, true>;
template <template <typename, bool> class Function>
static constexpr bool supportDecimal()
@ -156,6 +159,9 @@ void registerAggregateFunctionsQuantile(AggregateFunctionFactory & factory)
factory.registerFunction(NameQuantileTDigestWeighted::name, createAggregateFunctionQuantile<FuncQuantileTDigestWeighted>);
factory.registerFunction(NameQuantilesTDigestWeighted::name, createAggregateFunctionQuantile<FuncQuantilesTDigestWeighted>);
factory.registerFunction(NameQuantileBFloat16::name, createAggregateFunctionQuantile<FuncQuantileBFloat16>);
factory.registerFunction(NameQuantilesBFloat16::name, createAggregateFunctionQuantile<FuncQuantilesBFloat16>);
/// 'median' is an alias for 'quantile'
factory.registerAlias("median", NameQuantile::name);
factory.registerAlias("medianDeterministic", NameQuantileDeterministic::name);
@ -167,6 +173,7 @@ void registerAggregateFunctionsQuantile(AggregateFunctionFactory & factory)
factory.registerAlias("medianTimingWeighted", NameQuantileTimingWeighted::name);
factory.registerAlias("medianTDigest", NameQuantileTDigest::name);
factory.registerAlias("medianTDigestWeighted", NameQuantileTDigestWeighted::name);
factory.registerAlias("medianBFloat16", NameQuantileBFloat16::name);
}
}

View File

@ -9,6 +9,7 @@
#include <AggregateFunctions/QuantileExactWeighted.h>
#include <AggregateFunctions/QuantileTiming.h>
#include <AggregateFunctions/QuantileTDigest.h>
#include <AggregateFunctions/QuantileBFloat16Histogram.h>
#include <AggregateFunctions/IAggregateFunction.h>
#include <AggregateFunctions/QuantilesCommon.h>
@ -228,4 +229,7 @@ struct NameQuantileTDigestWeighted { static constexpr auto name = "quantileTDige
struct NameQuantilesTDigest { static constexpr auto name = "quantilesTDigest"; };
struct NameQuantilesTDigestWeighted { static constexpr auto name = "quantilesTDigestWeighted"; };
struct NameQuantileBFloat16 { static constexpr auto name = "quantileBFloat16"; };
struct NameQuantilesBFloat16 { static constexpr auto name = "quantilesBFloat16"; };
}

View File

@ -0,0 +1,63 @@
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/AggregateFunctionSegmentLengthSum.h>
#include <AggregateFunctions/FactoryHelpers.h>
#include <AggregateFunctions/Helpers.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime.h>
#include <ext/range.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}
namespace
{
template <template <typename> class Data>
AggregateFunctionPtr createAggregateFunctionSegmentLengthSum(const std::string & name, const DataTypes & arguments, const Array &)
{
if (arguments.size() != 2)
throw Exception(
"Aggregate function " + name + " requires two timestamps argument.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
auto args = {arguments[0].get(), arguments[1].get()};
if (WhichDataType{args.begin()[0]}.idx != WhichDataType{args.begin()[1]}.idx)
throw Exception(
"Illegal type " + args.begin()[0]->getName() + " and " + args.begin()[1]->getName() + " of arguments of aggregate function "
+ name + ", there two arguments should have same DataType",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
for (const auto & arg : args)
{
if (!isNativeNumber(arg) && !isDateOrDateTime(arg))
throw Exception(
"Illegal type " + arg->getName() + " of argument of aggregate function " + name
+ ", must be Number, Date, DateTime or DateTime64",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
AggregateFunctionPtr res(createWithBasicNumberOrDateOrDateTime<AggregateFunctionSegmentLengthSum, Data>(*arguments[0], arguments));
if (res)
return res;
throw Exception(
"Illegal type " + arguments.front().get()->getName() + " of first argument of aggregate function " + name
+ ", must be Native Unsigned Number",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
}
void registerAggregateFunctionSegmentLengthSum(AggregateFunctionFactory & factory)
{
factory.registerFunction("segmentLengthSum", createAggregateFunctionSegmentLengthSum<AggregateFunctionSegmentLengthSumData>);
}
}

View File

@ -0,0 +1,199 @@
#pragma once
#include <unordered_set>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeDateTime.h>
#include <DataTypes/DataTypesNumber.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Common/ArenaAllocator.h>
#include <Common/assert_cast.h>
#include <AggregateFunctions/AggregateFunctionNull.h>
namespace DB
{
template <typename T>
struct AggregateFunctionSegmentLengthSumData
{
using Segment = std::pair<T, T>;
using Segments = PODArrayWithStackMemory<Segment, 64>;
bool sorted = false;
Segments segments;
size_t size() const { return segments.size(); }
void add(T start, T end)
{
if (sorted && segments.size() > 0)
{
sorted = segments.back().first <= start;
}
segments.emplace_back(start, end);
}
void merge(const AggregateFunctionSegmentLengthSumData & other)
{
if (other.segments.empty())
return;
const auto size = segments.size();
segments.insert(std::begin(other.segments), std::end(other.segments));
/// either sort whole container or do so partially merging ranges afterwards
if (!sorted && !other.sorted)
std::stable_sort(std::begin(segments), std::end(segments));
else
{
const auto begin = std::begin(segments);
const auto middle = std::next(begin, size);
const auto end = std::end(segments);
if (!sorted)
std::stable_sort(begin, middle);
if (!other.sorted)
std::stable_sort(middle, end);
std::inplace_merge(begin, middle, end);
}
sorted = true;
}
void sort()
{
if (!sorted)
{
std::stable_sort(std::begin(segments), std::end(segments));
sorted = true;
}
}
void serialize(WriteBuffer & buf) const
{
writeBinary(sorted, buf);
writeBinary(segments.size(), buf);
for (const auto & time_gap : segments)
{
writeBinary(time_gap.first, buf);
writeBinary(time_gap.second, buf);
}
}
void deserialize(ReadBuffer & buf)
{
readBinary(sorted, buf);
size_t size;
readBinary(size, buf);
segments.clear();
segments.reserve(size);
T start, end;
for (size_t i = 0; i < size; ++i)
{
readBinary(start, buf);
readBinary(end, buf);
segments.emplace_back(start, end);
}
}
};
template <typename T, typename Data>
class AggregateFunctionSegmentLengthSum final : public IAggregateFunctionDataHelper<Data, AggregateFunctionSegmentLengthSum<T, Data>>
{
private:
template <typename TResult>
TResult getSegmentLengthSum(Data & data) const
{
if (data.size() == 0)
return 0;
data.sort();
TResult res = 0;
typename Data::Segment cur_segment = data.segments[0];
for (size_t i = 1; i < data.segments.size(); ++i)
{
if (cur_segment.second < data.segments[i].first)
{
res += cur_segment.second - cur_segment.first;
cur_segment = data.segments[i];
}
else
cur_segment.second = std::max(cur_segment.second, data.segments[i].second);
}
res += cur_segment.second - cur_segment.first;
return res;
}
public:
String getName() const override { return "segmentLengthSum"; }
explicit AggregateFunctionSegmentLengthSum(const DataTypes & arguments)
: IAggregateFunctionDataHelper<Data, AggregateFunctionSegmentLengthSum<T, Data>>(arguments, {})
{
}
DataTypePtr getReturnType() const override
{
if constexpr (std::is_floating_point_v<T>)
return std::make_shared<DataTypeFloat64>();
return std::make_shared<DataTypeUInt64>();
}
bool allocatesMemoryInArena() const override { return false; }
AggregateFunctionPtr getOwnNullAdapter(
const AggregateFunctionPtr & nested_function,
const DataTypes & arguments,
const Array & params,
const AggregateFunctionProperties & /*properties*/) const override
{
return std::make_shared<AggregateFunctionNullVariadic<false, false, false>>(nested_function, arguments, params);
}
void add(AggregateDataPtr __restrict place, const IColumn ** columns, const size_t row_num, Arena *) const override
{
auto start = assert_cast<const ColumnVector<T> *>(columns[0])->getData()[row_num];
auto end = assert_cast<const ColumnVector<T> *>(columns[1])->getData()[row_num];
this->data(place).add(start, end);
}
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, Arena *) const override
{
this->data(place).merge(this->data(rhs));
}
void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf) const override
{
this->data(place).serialize(buf);
}
void deserialize(AggregateDataPtr __restrict place, ReadBuffer & buf, Arena *) const override
{
this->data(place).deserialize(buf);
}
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override
{
if constexpr (std::is_floating_point_v<T>)
assert_cast<ColumnFloat64 &>(to).getData().push_back(getSegmentLengthSum<Float64>(this->data(place)));
else
assert_cast<ColumnUInt64 &>(to).getData().push_back(getSegmentLengthSum<UInt64>(this->data(place)));
}
};
}

View File

@ -114,6 +114,24 @@ static IAggregateFunction * createWithUnsignedIntegerType(const IDataType & argu
return nullptr;
}
template <template <typename, typename> class AggregateFunctionTemplate, template <typename> class Data, typename... TArgs>
static IAggregateFunction * createWithBasicNumberOrDateOrDateTime(const IDataType & argument_type, TArgs &&... args)
{
WhichDataType which(argument_type);
#define DISPATCH(TYPE) \
if (which.idx == TypeIndex::TYPE) \
return new AggregateFunctionTemplate<TYPE, Data<TYPE>>(std::forward<TArgs>(args)...);
FOR_BASIC_NUMERIC_TYPES(DISPATCH)
#undef DISPATCH
if (which.idx == TypeIndex::Date)
return new AggregateFunctionTemplate<UInt16, Data<UInt16>>(std::forward<TArgs>(args)...);
if (which.idx == TypeIndex::DateTime)
return new AggregateFunctionTemplate<UInt32, Data<UInt32>>(std::forward<TArgs>(args)...);
return nullptr;
}
template <template <typename> class AggregateFunctionTemplate, typename... TArgs>
static IAggregateFunction * createWithNumericBasedType(const IDataType & argument_type, TArgs && ... args)
{

View File

@ -0,0 +1,207 @@
#pragma once
#include <IO/ReadBuffer.h>
#include <IO/WriteBuffer.h>
#include <Common/HashTable/HashMap.h>
#include <common/types.h>
#include <ext/bit_cast.h>
namespace DB
{
/** `bfloat16` is a 16-bit floating point data type that is the same as the corresponding most significant 16 bits of the `float`.
* https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
*
* To calculate quantile, simply convert input value to 16 bit (convert to float, then take the most significant 16 bits),
* and calculate the histogram of these values.
*
* Hash table is the preferred way to store histogram, because the number of distinct values is small:
* ```
* SELECT uniq(bfloat)
* FROM
* (
* SELECT
* number,
* toFloat32(number) AS f,
* bitShiftRight(bitAnd(reinterpretAsUInt32(reinterpretAsFixedString(f)), 4294901760) AS cut, 16),
* reinterpretAsFloat32(reinterpretAsFixedString(cut)) AS bfloat
* FROM numbers(100000000)
* )
*
* uniq(bfloat)
* 2623
*
* ```
* (when increasing the range of values 1000 times, the number of distinct bfloat16 values increases just by 1280).
*
* Then calculate quantile from the histogram.
*
* This sketch is very simple and rough. Its relative precision is constant 1 / 256 = 0.390625%.
*/
template <typename Value>
struct QuantileBFloat16Histogram
{
using BFloat16 = UInt16;
using Weight = UInt64;
/// Make automatic memory for 16 elements to avoid allocations for small states.
/// The usage of trivial hash is ok, because we effectively take logarithm of the values and pathological cases are unlikely.
using Data = HashMapWithStackMemory<BFloat16, Weight, TrivialHash, 4>;
Data data;
void add(const Value & x)
{
add(x, 1);
}
void add(const Value & x, Weight w)
{
if (!isNaN(x))
data[toBFloat16(x)] += w;
}
void merge(const QuantileBFloat16Histogram & rhs)
{
for (const auto & pair : rhs.data)
data[pair.getKey()] += pair.getMapped();
}
void serialize(WriteBuffer & buf) const
{
data.write(buf);
}
void deserialize(ReadBuffer & buf)
{
data.read(buf);
}
Value get(Float64 level) const
{
return getImpl<Value>(level);
}
void getMany(const Float64 * levels, const size_t * indices, size_t size, Value * result) const
{
getManyImpl(levels, indices, size, result);
}
Float64 getFloat(Float64 level) const
{
return getImpl<Float64>(level);
}
void getManyFloat(const Float64 * levels, const size_t * indices, size_t size, Float64 * result) const
{
getManyImpl(levels, indices, size, result);
}
private:
/// Take the most significant 16 bits of the floating point number.
BFloat16 toBFloat16(const Value & x) const
{
return ext::bit_cast<UInt32>(static_cast<Float32>(x)) >> 16;
}
/// Put the bits into most significant 16 bits of the floating point number and fill other bits with zeros.
Float32 toFloat32(const BFloat16 & x) const
{
return ext::bit_cast<Float32>(x << 16);
}
using Pair = PairNoInit<Float32, Weight>;
template <typename T>
T getImpl(Float64 level) const
{
size_t size = data.size();
if (0 == size)
return std::numeric_limits<T>::quiet_NaN();
std::unique_ptr<Pair[]> array_holder(new Pair[size]);
Pair * array = array_holder.get();
Float64 sum_weight = 0;
Pair * arr_it = array;
for (const auto & pair : data)
{
sum_weight += pair.getMapped();
*arr_it = {toFloat32(pair.getKey()), pair.getMapped()};
++arr_it;
}
std::sort(array, array + size, [](const Pair & a, const Pair & b) { return a.first < b.first; });
Float64 threshold = std::ceil(sum_weight * level);
Float64 accumulated = 0;
for (const Pair * p = array; p != (array + size); ++p)
{
accumulated += p->second;
if (accumulated >= threshold)
return p->first;
}
return array[size - 1].first;
}
template <typename T>
void getManyImpl(const Float64 * levels, const size_t * indices, size_t num_levels, T * result) const
{
size_t size = data.size();
if (0 == size)
{
for (size_t i = 0; i < num_levels; ++i)
result[i] = std::numeric_limits<T>::quiet_NaN();
return;
}
std::unique_ptr<Pair[]> array_holder(new Pair[size]);
Pair * array = array_holder.get();
Float64 sum_weight = 0;
Pair * arr_it = array;
for (const auto & pair : data)
{
sum_weight += pair.getMapped();
*arr_it = {toFloat32(pair.getKey()), pair.getMapped()};
++arr_it;
}
std::sort(array, array + size, [](const Pair & a, const Pair & b) { return a.first < b.first; });
size_t level_index = 0;
Float64 accumulated = 0;
Float64 threshold = std::ceil(sum_weight * levels[indices[level_index]]);
for (const Pair * p = array; p != (array + size); ++p)
{
accumulated += p->second;
while (accumulated >= threshold)
{
result[indices[level_index]] = p->first;
++level_index;
if (level_index == num_levels)
return;
threshold = std::ceil(sum_weight * levels[indices[level_index]]);
}
}
while (level_index < num_levels)
{
result[indices[level_index]] = array[size - 1].first;
++level_index;
}
}
};
}

View File

@ -62,6 +62,7 @@ void registerAggregateFunctionCombinatorDistinct(AggregateFunctionCombinatorFact
void registerWindowFunctions(AggregateFunctionFactory & factory);
void registerAggregateFunctionSegmentLengthSum(AggregateFunctionFactory &);
void registerAggregateFunctions()
{
@ -111,6 +112,8 @@ void registerAggregateFunctions()
registerAggregateFunctionStudentTTest(factory);
registerWindowFunctions(factory);
registerAggregateFunctionSegmentLengthSum(factory);
}
{

View File

@ -43,6 +43,7 @@ SRCS(
AggregateFunctionRankCorrelation.cpp
AggregateFunctionResample.cpp
AggregateFunctionRetention.cpp
AggregateFunctionSegmentLengthSum.cpp
AggregateFunctionSequenceMatch.cpp
AggregateFunctionSimpleLinearRegression.cpp
AggregateFunctionSimpleState.cpp

View File

@ -187,6 +187,7 @@ add_object_library(clickhouse_interpreters_clusterproxy Interpreters/ClusterProx
add_object_library(clickhouse_interpreters_jit Interpreters/JIT)
add_object_library(clickhouse_columns Columns)
add_object_library(clickhouse_storages Storages)
add_object_library(clickhouse_storages_mysql Storages/MySQL)
add_object_library(clickhouse_storages_distributed Storages/Distributed)
add_object_library(clickhouse_storages_mergetree Storages/MergeTree)
add_object_library(clickhouse_storages_liveview Storages/LiveView)

View File

@ -1,309 +0,0 @@
#pragma once
#include <cstddef>
#include <cstdlib>
#include <Common/Exception.h>
#include <Common/formatReadable.h>
namespace DB
{
namespace ErrorCodes
{
extern const int CANNOT_ALLOCATE_MEMORY;
}
/** An array of (almost) unchangeable size:
* the size is specified in the constructor;
* `resize` method removes old data, and necessary only for
* so that you can first create an empty object using the default constructor,
* and then decide on the size.
*
* There is a possibility to not initialize elements by default, but create them inplace.
* Member destructors are called automatically.
*
* `sizeof` is equal to the size of one pointer.
*
* Not exception-safe.
*
* Copying is supported via assign() method. Moving empties the original object.
* That is, it is inconvenient to use this array in many cases.
*
* Designed for situations in which many arrays of the same small size are created,
* but the size is not known at compile time.
* Also gives a significant advantage in cases where it is important that `sizeof` is minimal.
* For example, if arrays are put in an open-addressing hash table with inplace storage of values (like HashMap)
*
* In this case, compared to std::vector:
* - for arrays of 1 element size - an advantage of about 2 times;
* - for arrays of 5 elements - an advantage of about 1.5 times
* (DB::Field, containing UInt64 and String, used as T);
*/
const size_t empty_auto_array_helper = 0;
template <typename T>
class AutoArray
{
public:
/// For deferred creation.
AutoArray()
{
setEmpty();
}
explicit AutoArray(size_t size_)
{
init(size_, false);
}
/** Initializes all elements with a copy constructor with the `value` parameter.
*/
AutoArray(size_t size_, const T & value)
{
init(size_, true);
for (size_t i = 0; i < size_; ++i)
{
new (place(i)) T(value);
}
}
/** `resize` removes all existing items.
*/
void resize(size_t size_, bool dont_init_elems = false)
{
uninit();
init(size_, dont_init_elems);
}
/** Move operations.
*/
AutoArray(AutoArray && src)
{
if (this == &src)
return;
setEmpty();
data_ptr = src.data_ptr;
src.setEmpty();
}
AutoArray & operator= (AutoArray && src)
{
if (this == &src)
return *this;
uninit();
data_ptr = src.data_ptr;
src.setEmpty();
return *this;
}
~AutoArray()
{
uninit();
}
size_t size() const
{
return m_size();
}
bool empty() const
{
return size() == 0;
}
void clear()
{
uninit();
setEmpty();
}
template <typename It>
void assign(It from_begin, It from_end)
{
uninit();
size_t size = from_end - from_begin;
init(size, /* dont_init_elems = */ true);
It it = from_begin;
for (size_t i = 0; i < size; ++i, ++it)
new (place(i)) T(*it);
}
void assign(const AutoArray & from)
{
assign(from.begin(), from.end());
}
/** You can read and modify elements using the [] operator
* only if items were initialized
* (that is, into the constructor was not passed DontInitElemsTag,
* or you initialized them using `place` and `placement new`).
*/
T & operator[](size_t i)
{
return elem(i);
}
const T & operator[](size_t i) const
{
return elem(i);
}
T * data()
{
return elemPtr(0);
}
const T * data() const
{
return elemPtr(0);
}
/** Get the piece of memory in which the element should be located.
* The function is intended to initialize an element,
* which has not yet been initialized
* new (arr.place(i)) T(args);
*/
char * place(size_t i)
{
return data_ptr + sizeof(T) * i;
}
using iterator = T *;
using const_iterator = const T *;
iterator begin() { return elemPtr(0); }
iterator end() { return elemPtr(size()); }
const_iterator begin() const { return elemPtr(0); }
const_iterator end() const { return elemPtr(size()); }
bool operator== (const AutoArray<T> & rhs) const
{
size_t s = size();
if (s != rhs.size())
return false;
for (size_t i = 0; i < s; ++i)
if (elem(i) != rhs.elem(i))
return false;
return true;
}
bool operator!= (const AutoArray<T> & rhs) const
{
return !(*this == rhs);
}
bool operator< (const AutoArray<T> & rhs) const
{
size_t s = size();
size_t rhs_s = rhs.size();
if (s < rhs_s)
return true;
if (s > rhs_s)
return false;
for (size_t i = 0; i < s; ++i)
{
if (elem(i) < rhs.elem(i))
return true;
if (elem(i) > rhs.elem(i))
return false;
}
return false;
}
private:
static constexpr size_t alignment = alignof(T);
/// Bytes allocated to store size of array before data. It is padded to have minimum size as alignment.
/// Padding is at left and the size is stored at right (just before the first data element).
static constexpr size_t prefix_size = std::max(sizeof(size_t), alignment);
char * data_ptr;
size_t & m_size()
{
return reinterpret_cast<size_t *>(data_ptr)[-1];
}
size_t m_size() const
{
return reinterpret_cast<const size_t *>(data_ptr)[-1];
}
T * elemPtr(size_t i)
{
return reinterpret_cast<T *>(data_ptr) + i;
}
const T * elemPtr(size_t i) const
{
return reinterpret_cast<const T *>(data_ptr) + i;
}
T & elem(size_t i)
{
return *elemPtr(i);
}
const T & elem(size_t i) const
{
return *elemPtr(i);
}
void setEmpty()
{
data_ptr = const_cast<char *>(reinterpret_cast<const char *>(&empty_auto_array_helper)) + sizeof(size_t);
}
void init(size_t new_size, bool dont_init_elems)
{
if (!new_size)
{
setEmpty();
return;
}
void * new_data = nullptr;
int res = posix_memalign(&new_data, alignment, prefix_size + new_size * sizeof(T));
if (0 != res)
throwFromErrno(fmt::format("Cannot allocate memory (posix_memalign) {}.", ReadableSize(new_size)),
ErrorCodes::CANNOT_ALLOCATE_MEMORY, res);
data_ptr = static_cast<char *>(new_data);
data_ptr += prefix_size;
m_size() = new_size;
if (!dont_init_elems)
for (size_t i = 0; i < new_size; ++i)
new (place(i)) T();
}
void uninit()
{
size_t s = size();
if (s)
{
for (size_t i = 0; i < s; ++i)
elem(i).~T();
data_ptr -= prefix_size;
free(data_ptr);
}
}
};
}

View File

@ -3,6 +3,7 @@ set (SRCS
ConfigProcessor.cpp
configReadClient.cpp
ConfigReloader.cpp
YAMLParser.cpp
)
add_library(clickhouse_common_config ${SRCS})
@ -15,3 +16,10 @@ target_link_libraries(clickhouse_common_config
PRIVATE
string_utils
)
if (USE_YAML_CPP)
target_link_libraries(clickhouse_common_config
PRIVATE
yaml-cpp
)
endif()

View File

@ -1,4 +1,8 @@
#if !defined(ARCADIA_BUILD)
#include <Common/config.h>
#endif
#include "ConfigProcessor.h"
#include "YAMLParser.h"
#include <sys/utsname.h>
#include <cerrno>
@ -434,7 +438,9 @@ ConfigProcessor::Files ConfigProcessor::getConfigMergeFiles(const std::string &
std::string base_name = path.stem();
// Skip non-config and temporary files
if (fs::is_regular_file(path) && (extension == ".xml" || extension == ".conf") && !startsWith(base_name, "."))
if (fs::is_regular_file(path)
&& (extension == ".xml" || extension == ".conf" || extension == ".yaml" || extension == ".yml")
&& !startsWith(base_name, "."))
files.push_back(it->path());
}
}
@ -449,12 +455,21 @@ XMLDocumentPtr ConfigProcessor::processConfig(
zkutil::ZooKeeperNodeCache * zk_node_cache,
const zkutil::EventPtr & zk_changed_event)
{
XMLDocumentPtr config;
LOG_DEBUG(log, "Processing configuration file '{}'.", path);
XMLDocumentPtr config;
if (fs::exists(path))
{
config = dom_parser.parse(path);
fs::path p(path);
if (p.extension() == ".xml")
{
config = dom_parser.parse(path);
}
else if (p.extension() == ".yaml" || p.extension() == ".yml")
{
config = YAMLParser::parse(path);
}
}
else
{
@ -489,8 +504,20 @@ XMLDocumentPtr ConfigProcessor::processConfig(
{
LOG_DEBUG(log, "Merging configuration file '{}'.", merge_file);
XMLDocumentPtr with = dom_parser.parse(merge_file);
XMLDocumentPtr with;
fs::path p(merge_file);
if (p.extension() == ".yaml" || p.extension() == ".yml")
{
with = YAMLParser::parse(merge_file);
}
else
{
with = dom_parser.parse(merge_file);
}
merge(config, with);
contributing_files.push_back(merge_file);
}
catch (Exception & e)

View File

@ -1,5 +1,9 @@
#pragma once
#if !defined(ARCADIA_BUILD)
#include <Common/config.h>
#endif
#include <string>
#include <unordered_set>
#include <vector>
@ -141,3 +145,4 @@ private:
};
}

View File

@ -0,0 +1,166 @@
#if !defined(ARCADIA_BUILD)
#include <Common/config.h>
#endif
#if USE_YAML_CPP
#include "YAMLParser.h"
#include <string>
#include <cstring>
#include <vector>
#include <Poco/DOM/Document.h>
#include <Poco/DOM/DOMParser.h>
#include <Poco/DOM/DOMWriter.h>
#include <Poco/DOM/NodeList.h>
#include <Poco/DOM/Element.h>
#include <Poco/DOM/AutoPtr.h>
#include <Poco/DOM/NamedNodeMap.h>
#include <Poco/DOM/Text.h>
#include <Common/Exception.h>
#include <yaml-cpp/yaml.h> // Y_IGNORE
#include <common/logger_useful.h>
using namespace Poco::XML;
namespace DB
{
namespace ErrorCodes
{
extern const int CANNOT_OPEN_FILE;
extern const int CANNOT_PARSE_YAML;
}
/// A prefix symbol in yaml key
/// We add attributes to nodes by using a prefix symbol in the key part.
/// Currently we use @ as a prefix symbol. Note, that @ is reserved
/// by YAML standard, so we need to write a key-value pair like this: "@attribute": attr_value
const char YAML_ATTRIBUTE_PREFIX = '@';
namespace
{
Poco::AutoPtr<Poco::XML::Element> createCloneNode(Poco::XML::Element & original_node)
{
Poco::AutoPtr<Poco::XML::Element> clone_node = original_node.ownerDocument()->createElement(original_node.nodeName());
original_node.parentNode()->appendChild(clone_node);
return clone_node;
}
void processNode(const YAML::Node & node, Poco::XML::Element & parent_xml_element)
{
auto * xml_document = parent_xml_element.ownerDocument();
switch (node.Type())
{
case YAML::NodeType::Scalar:
{
auto value = node.as<std::string>();
Poco::AutoPtr<Poco::XML::Text> xml_value = xml_document->createTextNode(value);
parent_xml_element.appendChild(xml_value);
break;
}
/// We process YAML Sequences as a
/// list of <key>value</key> tags with same key and different values.
/// For example, we translate this sequence
/// seq:
/// - val1
/// - val2
///
/// into this:
/// <seq>val1</seq>
/// <seq>val2</seq>
case YAML::NodeType::Sequence:
{
for (const auto & child_node : node)
if (parent_xml_element.hasChildNodes())
{
/// We want to process sequences like that:
/// seq:
/// - val1
/// - k2: val2
/// - val3
/// - k4: val4
/// - val5
/// into xml like this:
/// <seq>val1</seq>
/// <seq>
/// <k2>val2</k2>
/// </seq>
/// <seq>val3</seq>
/// <seq>
/// <k4>val4</k4>
/// </seq>
/// <seq>val5</seq>
/// So, we create a new parent node with same tag for each child node
processNode(child_node, *createCloneNode(parent_xml_element));
}
else
{
processNode(child_node, parent_xml_element);
}
break;
}
case YAML::NodeType::Map:
{
for (const auto & key_value_pair : node)
{
const auto & key_node = key_value_pair.first;
const auto & value_node = key_value_pair.second;
auto key = key_node.as<std::string>();
bool is_attribute = (key.starts_with(YAML_ATTRIBUTE_PREFIX) && value_node.IsScalar());
if (is_attribute)
{
/// we use substr(1) here to remove YAML_ATTRIBUTE_PREFIX from key
auto attribute_name = key.substr(1);
auto value = value_node.as<std::string>();
parent_xml_element.setAttribute(attribute_name, value);
}
else
{
Poco::AutoPtr<Poco::XML::Element> xml_key = xml_document->createElement(key);
parent_xml_element.appendChild(xml_key);
processNode(value_node, *xml_key);
}
}
break;
}
case YAML::NodeType::Null: break;
case YAML::NodeType::Undefined:
{
throw Exception(ErrorCodes::CANNOT_PARSE_YAML, "YAMLParser has encountered node with undefined type and cannot continue parsing of the file");
}
}
}
}
Poco::AutoPtr<Poco::XML::Document> YAMLParser::parse(const String& path)
{
YAML::Node node_yml;
try
{
node_yml = YAML::LoadFile(path);
}
catch (const YAML::ParserException& e)
{
/// yaml-cpp cannot parse the file because its contents are incorrect
throw Exception(ErrorCodes::CANNOT_PARSE_YAML, "Unable to parse YAML configuration file {}", path, e.what());
}
catch (const YAML::BadFile&)
{
/// yaml-cpp cannot open the file even though it exists
throw Exception(ErrorCodes::CANNOT_OPEN_FILE, "Unable to open YAML configuration file {}", path);
}
Poco::AutoPtr<Poco::XML::Document> xml = new Document;
Poco::AutoPtr<Poco::XML::Element> root_node = xml->createElement("yandex");
xml->appendChild(root_node);
processNode(node_yml, *root_node);
return xml;
}
}
#endif

View File

@ -0,0 +1,55 @@
#pragma once
#if !defined(ARCADIA_BUILD)
#include <Common/config.h>
#endif
#include <string>
#include <Poco/DOM/Document.h>
#include "Poco/DOM/AutoPtr.h"
#include <common/logger_useful.h>
#if USE_YAML_CPP
namespace DB
{
/// Real YAML parser: loads yaml file into a YAML::Node
class YAMLParserImpl
{
public:
static Poco::AutoPtr<Poco::XML::Document> parse(const String& path);
};
using YAMLParser = YAMLParserImpl;
}
#else
namespace DB
{
namespace ErrorCodes
{
extern const int CANNOT_PARSE_YAML;
}
/// Fake YAML parser: throws an exception if we try to parse YAML configs in a build without yaml-cpp
class DummyYAMLParser
{
public:
static Poco::AutoPtr<Poco::XML::Document> parse(const String& path)
{
Poco::AutoPtr<Poco::XML::Document> xml = new Poco::XML::Document;
throw Exception(ErrorCodes::CANNOT_PARSE_YAML, "Unable to parse YAML configuration file {} without usage of yaml-cpp library", path);
return xml;
}
};
using YAMLParser = DummyYAMLParser;
}
#endif

View File

@ -87,9 +87,20 @@ static DNSResolver::IPAddresses resolveIPAddressImpl(const std::string & host)
{
Poco::Net::IPAddress ip;
/// NOTE: Poco::Net::DNS::resolveOne(host) doesn't work for IP addresses like 127.0.0.2
if (Poco::Net::IPAddress::tryParse(host, ip))
return DNSResolver::IPAddresses(1, ip);
/// NOTE:
/// - Poco::Net::DNS::resolveOne(host) doesn't work for IP addresses like 127.0.0.2
/// - Poco::Net::IPAddress::tryParse() expect hex string for IPv6 (w/o brackets)
if (host.starts_with('['))
{
assert(host.ends_with(']'));
if (Poco::Net::IPAddress::tryParse(host.substr(1, host.size() - 2), ip))
return DNSResolver::IPAddresses(1, ip);
}
else
{
if (Poco::Net::IPAddress::tryParse(host, ip))
return DNSResolver::IPAddresses(1, ip);
}
/// Family: AF_UNSPEC
/// AI_ALL is required for checking if client is allowed to connect from an address

View File

@ -552,6 +552,7 @@
M(582, NO_SUCH_PROJECTION_IN_TABLE) \
M(583, ILLEGAL_PROJECTION) \
M(584, PROJECTION_NOT_USED) \
M(585, CANNOT_PARSE_YAML) \
\
M(997, CANNOT_CREATE_FILE) \
M(998, POSTGRESQL_CONNECTION_FAILURE) \

View File

@ -123,7 +123,7 @@ inline bool isWhitespaceASCII(char c)
/// Since |isWhiteSpaceASCII()| is used inside algorithms it's easier to implement another function than add extra argument.
inline bool isWhitespaceASCIIOneLine(char c)
{
return c == ' ' || c == '\t' || c == '\r' || c == '\f' || c == '\v';
return c == ' ' || c == '\t' || c == '\f' || c == '\v';
}
inline bool isControlASCII(char c)

View File

@ -16,3 +16,4 @@
#cmakedefine01 USE_STATS
#cmakedefine01 CLICKHOUSE_SPLIT_BINARY
#cmakedefine01 USE_DATASKETCHES
#cmakedefine01 USE_YAML_CPP

View File

@ -7,9 +7,6 @@ endif()
add_executable (sip_hash_perf sip_hash_perf.cpp)
target_link_libraries (sip_hash_perf PRIVATE clickhouse_common_io)
add_executable (auto_array auto_array.cpp)
target_link_libraries (auto_array PRIVATE clickhouse_common_io)
add_executable (small_table small_table.cpp)
target_link_libraries (small_table PRIVATE clickhouse_common_io)

View File

@ -1,197 +0,0 @@
#include <iostream>
#include <iomanip>
#include <map>
#include <pcg_random.hpp>
#include <Core/Field.h>
#include <Common/HashTable/HashMap.h>
#include <Common/AutoArray.h>
#include <IO/WriteHelpers.h>
#include <Common/Stopwatch.h>
int main(int argc, char ** argv)
{
pcg64 rng;
{
size_t n = 10;
using T = std::string;
DB::AutoArray<T> arr(n);
for (size_t i = 0; i < arr.size(); ++i)
arr[i] = "Hello, world! " + DB::toString(i);
for (auto & elem : arr)
std::cerr << elem << std::endl;
}
std::cerr << std::endl;
{
size_t n = 10;
using T = std::string;
using Arr = DB::AutoArray<T>;
Arr arr;
arr.resize(n);
for (size_t i = 0; i < arr.size(); ++i)
arr[i] = "Hello, world! " + DB::toString(i);
for (auto & elem : arr)
std::cerr << elem << std::endl;
std::cerr << std::endl;
Arr arr2 = std::move(arr);
std::cerr << arr.size() << ", " << arr2.size() << std::endl; // NOLINT
for (auto & elem : arr2)
std::cerr << elem << std::endl;
}
std::cerr << std::endl;
{
size_t n = 10;
size_t keys = 10;
using T = std::string;
using Arr = DB::AutoArray<T>;
using Map = std::map<Arr, T>;
Map map;
for (size_t i = 0; i < keys; ++i)
{
Arr key(n);
for (size_t j = 0; j < n; ++j)
key[j] = DB::toString(rng());
map[std::move(key)] = "Hello, world! " + DB::toString(i);
}
for (const auto & kv : map)
{
std::cerr << "[";
for (size_t j = 0; j < n; ++j)
std::cerr << (j == 0 ? "" : ", ") << kv.first[j];
std::cerr << "]";
std::cerr << ":\t" << kv.second << std::endl;
}
std::cerr << std::endl;
Map map2 = std::move(map);
for (const auto & kv : map2)
{
std::cerr << "[";
for (size_t j = 0; j < n; ++j)
std::cerr << (j == 0 ? "" : ", ") << kv.first[j];
std::cerr << "]";
std::cerr << ":\t" << kv.second << std::endl;
}
}
std::cerr << std::endl;
{
size_t n = 10;
size_t keys = 10;
using T = std::string;
using Arr = DB::AutoArray<T>;
using Vec = std::vector<Arr>;
Vec vec;
for (size_t i = 0; i < keys; ++i)
{
Arr key(n);
for (size_t j = 0; j < n; ++j)
key[j] = DB::toString(rng());
vec.push_back(std::move(key));
}
for (const auto & elem : vec)
{
std::cerr << "[";
for (size_t j = 0; j < n; ++j)
std::cerr << (j == 0 ? "" : ", ") << elem[j];
std::cerr << "]" << std::endl;
}
std::cerr << std::endl;
Vec vec2 = std::move(vec);
for (const auto & elem : vec2)
{
std::cerr << "[";
for (size_t j = 0; j < n; ++j)
std::cerr << (j == 0 ? "" : ", ") << elem[j];
std::cerr << "]" << std::endl;
}
}
if (argc == 2 && !strcmp(argv[1], "1"))
{
size_t n = 5;
size_t map_size = 1000000;
using T = DB::Field;
T field = std::string("Hello, world");
using Arr = std::vector<T>;
using Map = HashMap<UInt64, Arr>;
Stopwatch watch;
Map map;
for (size_t i = 0; i < map_size; ++i)
{
Map::LookupResult it;
bool inserted;
map.emplace(rng(), it, inserted);
if (inserted)
{
new (&it->getMapped()) Arr(n);
for (size_t j = 0; j < n; ++j)
(it->getMapped())[j] = field;
}
}
std::cerr << std::fixed << std::setprecision(2)
<< "Vector: Elapsed: " << watch.elapsedSeconds()
<< " (" << map_size / watch.elapsedSeconds() << " rows/sec., "
<< "sizeof(Map::value_type) = " << sizeof(Map::value_type)
<< std::endl;
}
{
size_t n = 10000;
using Arr = DB::AutoArray<std::string>;
Arr arr1(n);
Arr arr2(n);
for (size_t i = 0; i < n; ++i)
{
arr1[i] = "Hello, world! " + DB::toString(i);
arr2[i] = "Goodbye, world! " + DB::toString(i);
}
arr2 = std::move(arr1);
arr1.resize(n); // NOLINT
std::cerr
<< "arr1.size(): " << arr1.size() << ", arr2.size(): " << arr2.size() << std::endl
<< "arr1.data(): " << arr1.data() << ", arr2.data(): " << arr2.data() << std::endl
<< "arr1[0]: " << arr1[0] << ", arr2[0]: " << arr2[0] << std::endl;
}
return 0;
}

View File

@ -61,6 +61,10 @@ static void NO_INLINE testForType(size_t method, size_t rows_size)
test<Key, ::absl::flat_hash_map<Key, UInt64>>(data.data(), data.size(), "Abseil HashMap");
}
else if (method == 3)
{
test<Key, ::absl::flat_hash_map<Key, UInt64, DefaultHash<Key>>>(data.data(), data.size(), "Abseil HashMap with CH Hash");
}
else if (method == 4)
{
test<Key, std::unordered_map<Key, UInt64>>(data.data(), data.size(), "std::unordered_map");
}
@ -81,50 +85,110 @@ static void NO_INLINE testForType(size_t method, size_t rows_size)
* ./integer_hash_tables_benchmark 1 $2 100000000 < $1
* ./integer_hash_tables_benchmark 2 $2 100000000 < $1
* ./integer_hash_tables_benchmark 3 $2 100000000 < $1
* ./integer_hash_tables_benchmark 4 $2 100000000 < $1
*
* Results of this benchmark on hits_100m_obfuscated
* Results of this benchmark on hits_100m_obfuscated X86-64
*
* File hits_100m_obfuscated/201307_1_96_4/WatchID.bin
* CH HashMap: Elapsed: 7.366 (13575745.933 elem/sec.), map size: 99997493
* Google DenseMap: Elapsed: 10.089 (9911817.125 elem/sec.), map size: 99997493
* Abseil HashMap: Elapsed: 9.011 (11097794.073 elem/sec.), map size: 99997493
* std::unordered_map: Elapsed: 44.758 (2234223.189 elem/sec.), map size: 99997493
* CH HashMap: Elapsed: 7.416 (13484217.815 elem/sec.), map size: 99997493
* Google DenseMap: Elapsed: 10.303 (9706022.031 elem/sec.), map size: 99997493
* Abseil HashMap: Elapsed: 9.106 (10982139.229 elem/sec.), map size: 99997493
* Abseil HashMap with CH Hash: Elapsed: 9.221 (10845360.669 elem/sec.), map size: 99997493
* std::unordered_map: Elapsed: 45.213 (2211758.706 elem/sec.), map size: 9999749
*
* File hits_100m_obfuscated/201307_1_96_4/URLHash.bin
* CH HashMap: Elapsed: 2.672 (37421588.347 elem/sec.), map size: 20714865
* Google DenseMap: Elapsed: 3.409 (29333308.209 elem/sec.), map size: 20714865
* Abseil HashMap: Elapsed: 2.778 (36000540.035 elem/sec.), map size: 20714865
* std::unordered_map: Elapsed: 8.643 (11570012.207 elem/sec.), map size: 20714865
* CH HashMap: Elapsed: 2.620 (38168135.308 elem/sec.), map size: 20714865
* Google DenseMap: Elapsed: 3.426 (29189309.058 elem/sec.), map size: 20714865
* Abseil HashMap: Elapsed: 2.788 (35870495.097 elem/sec.), map size: 20714865
* Abseil HashMap with CH Hash: Elapsed: 2.991 (33428850.155 elem/sec.), map size: 20714865
* std::unordered_map: Elapsed: 8.503 (11760331.346 elem/sec.), map size: 20714865
*
* File hits_100m_obfuscated/201307_1_96_4/UserID.bin
* CH HashMap: Elapsed: 2.116 (47267659.076 elem/sec.), map size: 17630976
* Google DenseMap: Elapsed: 2.722 (36740693.786 elem/sec.), map size: 17630976
* Abseil HashMap: Elapsed: 2.597 (38509988.663 elem/sec.), map size: 17630976
* std::unordered_map: Elapsed: 7.327 (13647271.471 elem/sec.), map size: 17630976
* CH HashMap: Elapsed: 2.157 (46352039.753 elem/sec.), map size: 17630976
* Google DenseMap: Elapsed: 2.725 (36694226.782 elem/sec.), map size: 17630976
* Abseil HashMap: Elapsed: 2.590 (38604284.187 elem/sec.), map size: 17630976
* Abseil HashMap with CH Hash: Elapsed: 2.785 (35904856.137 elem/sec.), map size: 17630976
* std::unordered_map: Elapsed: 7.268 (13759557.609 elem/sec.), map size: 17630976
*
* File hits_100m_obfuscated/201307_1_96_4/RegionID.bin
* CH HashMap: Elapsed: 0.201 (498144193.695 elem/sec.), map size: 9040
* Google DenseMap: Elapsed: 0.261 (382656387.016 elem/sec.), map size: 9046
* Abseil HashMap: Elapsed: 0.307 (325874545.117 elem/sec.), map size: 9040
* std::unordered_map: Elapsed: 0.466 (214379083.420 elem/sec.), map size: 9040
* CH HashMap: Elapsed: 0.192 (521583315.810 elem/sec.), map size: 9040
* Google DenseMap: Elapsed: 0.297 (337081407.799 elem/sec.), map size: 9046
* Abseil HashMap: Elapsed: 0.295 (338805623.511 elem/sec.), map size: 9040
* Abseil HashMap with CH Hash: Elapsed: 0.331 (302155391.036 elem/sec.), map size: 9040
* std::unordered_map: Elapsed: 0.455 (219971555.390 elem/sec.), map size: 9040
*
* File hits_100m_obfuscated/201307_1_96_4/CounterID.bin
* CH HashMap: Elapsed: 0.220 (455344735.648 elem/sec.), map size: 6506
* Google DenseMap: Elapsed: 0.297 (336187522.818 elem/sec.), map size: 6506
* Abseil HashMap: Elapsed: 0.307 (325264214.480 elem/sec.), map size: 6506
* std::unordered_map: Elapsed: 0.389 (257195996.114 elem/sec.), map size: 6506
* CH HashMap: Elapsed: 0.217 (460216823.609 elem/sec.), map size: 6506
* Google DenseMap: Elapsed: 0.373 (267838665.098 elem/sec.), map size: 6506
* Abseil HashMap: Elapsed: 0.325 (308124728.989 elem/sec.), map size: 6506
* Abseil HashMap with CH Hash: Elapsed: 0.354 (282167144.801 elem/sec.), map size: 6506
* std::unordered_map: Elapsed: 0.390 (256573354.171 elem/sec.), map size: 6506
*
* File hits_100m_obfuscated/201307_1_96_4/TraficSourceID.bin
* CH HashMap: Elapsed: 0.274 (365196673.729 elem/sec.), map size: 10
* Google DenseMap: Elapsed: 0.782 (127845746.927 elem/sec.), map size: 1565609 /// Broken because there is 0 key in dataset
* Abseil HashMap: Elapsed: 0.303 (330461565.053 elem/sec.), map size: 10
* std::unordered_map: Elapsed: 0.843 (118596530.649 elem/sec.), map size: 10
* CH HashMap: Elapsed: 0.246 (406714566.282 elem/sec.), map size: 10
* Google DenseMap: Elapsed: 0.760 (131615151.233 elem/sec.), map size: 1565609 /// Broken because there is 0 key in dataset
* Abseil HashMap: Elapsed: 0.309 (324068156.680 elem/sec.), map size: 10
* Abseil HashMap with CH Hash: Elapsed: 0.339 (295108223.814 elem/sec.), map size: 10
* std::unordered_map: Elapsed: 0.811 (123304031.195 elem/sec.), map size: 10
*
* File hits_100m_obfuscated/201307_1_96_4/AdvEngineID.bin
* CH HashMap: Elapsed: 0.160 (623399865.019 elem/sec.), map size: 19
* Google DenseMap: Elapsed: 1.673 (59757144.027 elem/sec.), map size: 32260732 /// Broken because there is 0 key in dataset
* Abseil HashMap: Elapsed: 0.297 (336589258.845 elem/sec.), map size: 19
* std::unordered_map: Elapsed: 0.332 (301114451.384 elem/sec.), map size: 19
* CH HashMap: Elapsed: 0.155 (643245257.748 elem/sec.), map size: 19
* Google DenseMap: Elapsed: 1.629 (61395025.417 elem/sec.), map size: 32260732 // Broken because there is 0 key in dataset
* Abseil HashMap: Elapsed: 0.292 (342765027.204 elem/sec.), map size: 19
* Abseil HashMap with CH Hash: Elapsed: 0.330 (302822020.210 elem/sec.), map size: 19
* std::unordered_map: Elapsed: 0.308 (325059333.730 elem/sec.), map size: 19
*
*
* Results of this benchmark on hits_100m_obfuscated AARCH64
*
* File hits_100m_obfuscated/201307_1_96_4/WatchID.bin
* CH HashMap: Elapsed: 9.530 (10493528.533 elem/sec.), map size: 99997493
* Google DenseMap: Elapsed: 14.436 (6927091.135 elem/sec.), map size: 99997493
* Abseil HashMap: Elapsed: 16.671 (5998504.085 elem/sec.), map size: 99997493
* Abseil HashMap with CH Hash: Elapsed: 16.803 (5951365.711 elem/sec.), map size: 99997493
* std::unordered_map: Elapsed: 50.805 (1968305.658 elem/sec.), map size: 99997493
*
* File hits_100m_obfuscated/201307_1_96_4/URLHash.bin
* CH HashMap: Elapsed: 3.693 (27076878.092 elem/sec.), map size: 20714865
* Google DenseMap: Elapsed: 5.051 (19796401.694 elem/sec.), map size: 20714865
* Abseil HashMap: Elapsed: 5.617 (17804528.625 elem/sec.), map size: 20714865
* Abseil HashMap with CH Hash: Elapsed: 5.702 (17537013.639 elem/sec.), map size: 20714865
* std::unordered_map: Elapsed: 10.757 (9296040.953 elem/sec.), map size: 2071486
*
* File hits_100m_obfuscated/201307_1_96_4/UserID.bin
* CH HashMap: Elapsed: 2.982 (33535795.695 elem/sec.), map size: 17630976
* Google DenseMap: Elapsed: 3.940 (25381557.959 elem/sec.), map size: 17630976
* Abseil HashMap: Elapsed: 4.493 (22259078.458 elem/sec.), map size: 17630976
* Abseil HashMap with CH Hash: Elapsed: 4.596 (21759738.710 elem/sec.), map size: 17630976
* std::unordered_map: Elapsed: 9.035 (11067903.596 elem/sec.), map size: 17630976
*
* File hits_100m_obfuscated/201307_1_96_4/RegionID.bin
* CH HashMap: Elapsed: 0.302 (331026285.361 elem/sec.), map size: 9040
* Google DenseMap: Elapsed: 0.623 (160419421.840 elem/sec.), map size: 9046
* Abseil HashMap: Elapsed: 0.981 (101971186.758 elem/sec.), map size: 9040
* Abseil HashMap with CH Hash: Elapsed: 0.991 (100932993.199 elem/sec.), map size: 9040
* std::unordered_map: Elapsed: 0.809 (123541402.715 elem/sec.), map size: 9040
*
* File hits_100m_obfuscated/201307_1_96_4/CounterID.bin
* CH HashMap: Elapsed: 0.343 (291821742.078 elem/sec.), map size: 6506
* Google DenseMap: Elapsed: 0.718 (139191105.450 elem/sec.), map size: 6506
* Abseil HashMap: Elapsed: 1.019 (98148285.278 elem/sec.), map size: 6506
* Abseil HashMap with CH Hash: Elapsed: 1.048 (95446843.667 elem/sec.), map size: 6506
* std::unordered_map: Elapsed: 0.701 (142701070.085 elem/sec.), map size: 6506
*
* File hits_100m_obfuscated/201307_1_96_4/TraficSourceID.bin
* CH HashMap: Elapsed: 0.376 (265905243.103 elem/sec.), map size: 10
* Google DenseMap: Elapsed: 1.309 (76420707.298 elem/sec.), map size: 1565609 /// Broken because there is 0 key in dataset
* Abseil HashMap: Elapsed: 0.955 (104668109.775 elem/sec.), map size: 10
* Abseil HashMap with CH Hash: Elapsed: 0.967 (103456305.391 elem/sec.), map size: 10
* std::unordered_map: Elapsed: 1.241 (80591305.890 elem/sec.), map size: 10
*
* File hits_100m_obfuscated/201307_1_96_4/AdvEngineID.bin
* CH HashMap: Elapsed: 0.213 (470208130.105 elem/sec.), map size: 19
* Google DenseMap: Elapsed: 2.525 (39607131.523 elem/sec.), map size: 32260732 /// Broken because there is 0 key in dataset
* Abseil HashMap: Elapsed: 0.950 (105233678.618 elem/sec.), map size: 19
* Abseil HashMap with CH Hash: Elapsed: 0.962 (104001230.717 elem/sec.), map size: 19
* std::unordered_map: Elapsed: 0.585 (171059989.837 elem/sec.), map size: 19
*/
int main(int argc, char ** argv)

View File

@ -81,10 +81,41 @@ struct NetworkInterfaces
bool isLocalAddress(const Poco::Net::IPAddress & address)
{
/** 127.0.0.1 is treat as local address unconditionally.
* ::1 is also treat as local address unconditionally.
*
* 127.0.0.{2..255} are not treat as local addresses, because they are used in tests
* to emulate distributed queries across localhost.
*
* But 127.{0,1}.{0,1}.{0,1} are treat as local addresses,
* because they are used in Debian for localhost.
*/
if (address.isLoopback())
{
if (address.family() == Poco::Net::AddressFamily::IPv4)
{
/// The address is located in memory in big endian form (network byte order).
const unsigned char * digits = static_cast<const unsigned char *>(address.addr());
if (digits[0] == 127
&& digits[1] <= 1
&& digits[2] <= 1
&& digits[3] <= 1)
{
return true;
}
}
else if (address.family() == Poco::Net::AddressFamily::IPv6)
{
return true;
}
}
NetworkInterfaces interfaces;
return interfaces.hasAddress(address);
}
bool isLocalAddress(const Poco::Net::SocketAddress & address, UInt16 clickhouse_port)
{
return clickhouse_port == address.port() && isLocalAddress(address.host());

View File

@ -28,15 +28,27 @@ std::pair<std::string, UInt16> parseAddress(const std::string & str, UInt16 defa
throw Exception("Illegal address passed to function parseAddress: "
"the address begins with opening square bracket, but no closing square bracket found", ErrorCodes::BAD_ARGUMENTS);
port = find_first_symbols<':'>(closing_square_bracket + 1, end);
port = closing_square_bracket + 1;
}
else
port = find_first_symbols<':'>(begin, end);
if (port != end)
{
UInt16 port_number = parse<UInt16>(port + 1);
return { std::string(begin, port), port_number };
if (*port != ':')
throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Illegal port prefix passed to function parseAddress: {}", port);
++port;
UInt16 port_number;
ReadBufferFromMemory port_buf(port, end - port);
if (!tryReadText<UInt16>(port_number, port_buf) || !port_buf.eof())
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Illegal port passed to function parseAddress: {}", port);
}
return { std::string(begin, port - 1), port_number };
}
else if (default_port)
{

View File

@ -11,9 +11,30 @@ TEST(LocalAddress, SmokeTest)
std::string address_str;
DB::readString(address_str, cmd->out);
cmd->wait();
std::cerr << "Got Address:" << address_str << std::endl;
std::cerr << "Got Address: " << address_str << std::endl;
Poco::Net::IPAddress address(address_str);
EXPECT_TRUE(DB::isLocalAddress(address));
}
TEST(LocalAddress, Localhost)
{
EXPECT_TRUE(DB::isLocalAddress(Poco::Net::IPAddress{"127.0.0.1"}));
EXPECT_TRUE(DB::isLocalAddress(Poco::Net::IPAddress{"127.0.1.1"}));
EXPECT_TRUE(DB::isLocalAddress(Poco::Net::IPAddress{"127.1.1.1"}));
EXPECT_TRUE(DB::isLocalAddress(Poco::Net::IPAddress{"127.1.0.1"}));
EXPECT_TRUE(DB::isLocalAddress(Poco::Net::IPAddress{"127.1.0.0"}));
EXPECT_TRUE(DB::isLocalAddress(Poco::Net::IPAddress{"::1"}));
/// Make sure we don't mess with the byte order.
EXPECT_FALSE(DB::isLocalAddress(Poco::Net::IPAddress{"1.0.0.127"}));
EXPECT_FALSE(DB::isLocalAddress(Poco::Net::IPAddress{"1.1.1.127"}));
EXPECT_FALSE(DB::isLocalAddress(Poco::Net::IPAddress{"0.0.0.0"}));
EXPECT_FALSE(DB::isLocalAddress(Poco::Net::IPAddress{"::"}));
EXPECT_FALSE(DB::isLocalAddress(Poco::Net::IPAddress{"::2"}));
/// See the comment in the implementation of isLocalAddress.
EXPECT_FALSE(DB::isLocalAddress(Poco::Net::IPAddress{"127.0.0.2"}));
}

View File

@ -721,6 +721,9 @@ private:
#undef DBMS_MIN_FIELD_SIZE
using Row = std::vector<Field>;
template <> struct Field::TypeToEnum<Null> { static const Types::Which value = Types::Null; };
template <> struct Field::TypeToEnum<UInt64> { static const Types::Which value = Types::UInt64; };
template <> struct Field::TypeToEnum<UInt128> { static const Types::Which value = Types::UInt128; };

View File

@ -452,7 +452,7 @@ namespace MySQLReplication
UInt32 number_columns;
String schema;
String table;
std::vector<Field> rows;
Row rows;
RowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_, const RowsEventHeader & rows_header)
: EventBase(std::move(header_)), number_columns(0), table_map(table_map_)

View File

@ -1,18 +0,0 @@
#pragma once
#include <vector>
#include <Common/AutoArray.h>
#include <Core/Field.h>
namespace DB
{
/** The data type for representing one row of the table in the RAM.
* Warning! It is preferable to store column blocks instead of single rows. See Block.h
*/
using Row = AutoArray<Field>;
}

View File

@ -1,6 +1,5 @@
#pragma once
#include <Core/Row.h>
#include <Core/SortDescription.h>
#include <Core/SortCursor.h>

View File

@ -193,7 +193,7 @@ void PostgreSQLBlockInputStream::insertValue(IColumn & column, std::string_view
size_t dimension = 0, max_dimension = 0, expected_dimensions = array_info[idx].num_dimensions;
const auto parse_value = array_info[idx].pqxx_parser;
std::vector<std::vector<Field>> dimensions(expected_dimensions + 1);
std::vector<Row> dimensions(expected_dimensions + 1);
while (parsed.first != pqxx::array_parser::juncture::done)
{

View File

@ -20,6 +20,7 @@
# include <Parsers/parseQuery.h>
# include <Parsers/queryToString.h>
# include <Storages/StorageMySQL.h>
# include <Storages/MySQL/MySQLSettings.h>
# include <Common/escapeForFileName.h>
# include <Common/parseAddress.h>
# include <Common/setThreadName.h>
@ -253,12 +254,13 @@ void DatabaseConnectionMySQL::fetchLatestTablesStructureIntoCache(
std::move(mysql_pool),
database_name_in_mysql,
table_name,
false,
"",
/* replace_query_ */ false,
/* on_duplicate_clause = */ "",
ColumnsDescription{columns_name_and_type},
ConstraintsDescription{},
String{},
getContext()));
getContext(),
MySQLSettings{}));
}
}

View File

@ -477,7 +477,7 @@ static inline void fillSignAndVersionColumnsData(Block & data, Int8 sign_value,
template <bool assert_nullable = false>
static void writeFieldsToColumn(
IColumn & column_to, const std::vector<Field> & rows_data, size_t column_index, const std::vector<bool> & mask, ColumnUInt8 * null_map_column = nullptr)
IColumn & column_to, const Row & rows_data, size_t column_index, const std::vector<bool> & mask, ColumnUInt8 * null_map_column = nullptr)
{
if (ColumnNullable * column_nullable = typeid_cast<ColumnNullable *>(&column_to))
writeFieldsToColumn<true>(column_nullable->getNestedColumn(), rows_data, column_index, mask, &column_nullable->getNullMapColumn());
@ -599,7 +599,7 @@ static void writeFieldsToColumn(
}
template <Int8 sign>
static size_t onWriteOrDeleteData(const std::vector<Field> & rows_data, Block & buffer, size_t version)
static size_t onWriteOrDeleteData(const Row & rows_data, Block & buffer, size_t version)
{
size_t prev_bytes = buffer.bytes();
for (size_t column = 0; column < buffer.columns() - 2; ++column)
@ -623,7 +623,7 @@ static inline bool differenceSortingKeys(const Tuple & row_old_data, const Tuple
return false;
}
static inline size_t onUpdateData(const std::vector<Field> & rows_data, Block & buffer, size_t version, const std::vector<size_t> & sorting_columns_index)
static inline size_t onUpdateData(const Row & rows_data, Block & buffer, size_t version, const std::vector<size_t> & sorting_columns_index)
{
if (rows_data.size() % 2 != 0)
throw Exception("LOGICAL ERROR: It is a bug.", ErrorCodes::LOGICAL_ERROR);

View File

@ -1,6 +1,5 @@
#include <Common/quoteString.h>
#include <Common/typeid_cast.h>
#include <Core/Row.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionsMiscellaneous.h>

View File

@ -34,7 +34,7 @@ public:
FillColumnDescription & getFillDescription(size_t ind) { return description[ind].fill_description; }
private:
std::vector<Field> row;
Row row;
SortDescription description;
};

View File

@ -1,7 +1,6 @@
#include <optional>
#include <Core/Field.h>
#include <Core/Row.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnTuple.h>

View File

@ -12,7 +12,6 @@
#define DBMS_HASH_MAP_COUNT_COLLISIONS
*/
#include <common/types.h>
#include <Core/Row.h>
#include <IO/ReadBufferFromFile.h>
#include <Compression/CompressedReadBuffer.h>
#include <Common/HashTable/HashMap.h>
@ -27,19 +26,6 @@
* This is important, because if you run all the tests one by one, the results will be incorrect.
* (Due to the peculiarities of the work of the allocator, the first test takes advantage.)
*
* Depending on USE_AUTO_ARRAY, one of the structures is selected as the value.
* USE_AUTO_ARRAY = 0 - uses std::vector (hard-copy structure, sizeof = 24 bytes).
* USE_AUTO_ARRAY = 1 - uses AutoArray (a structure specially designed for such cases, sizeof = 8 bytes).
*
* That is, the test also allows you to compare AutoArray and std::vector.
*
* If USE_AUTO_ARRAY = 0, then HashMap confidently overtakes all.
* If USE_AUTO_ARRAY = 1, then HashMap is slightly less serious (20%) ahead of google::dense_hash_map.
*
* When using HashMap, AutoArray has a rather serious (40%) advantage over std::vector.
* And when using other hash tables, AutoArray even more seriously overtakes std::vector
* (up to three and a half times in the case of std::unordered_map and google::sparse_hash_map).
*
* HashMap, unlike google::dense_hash_map, much more depends on the quality of the hash function.
*
* PS. Measure everything yourself, otherwise I'm almost confused.
@ -49,9 +35,6 @@
* But in this test, there was something similar to the old scenario of using hash tables in the aggregation.
*/
#define USE_AUTO_ARRAY 0
struct AlternativeHash
{
size_t operator() (UInt64 x) const
@ -85,12 +68,7 @@ int main(int argc, char ** argv)
using namespace DB;
using Key = UInt64;
#if USE_AUTO_ARRAY
using Value = AutoArray<IAggregateFunction*>;
#else
using Value = std::vector<IAggregateFunction*>;
#endif
size_t n = argc < 2 ? 10000000 : std::stol(argv[1]);
//size_t m = std::stol(argv[2]);
@ -119,13 +97,8 @@ int main(int argc, char ** argv)
INIT
#ifndef USE_AUTO_ARRAY
#undef INIT
#define INIT
#endif
Row row(1);
row[0] = UInt64(0);
std::cerr << "sizeof(Key) = " << sizeof(Key) << ", sizeof(Value) = " << sizeof(Value) << std::endl;

View File

@ -1,57 +1,57 @@
#include <iostream>
#include <llvm/IR/IRBuilder.h>
// #include <llvm/IR/IRBuilder.h>
#include <Interpreters/JIT/CHJIT.h>
// #include <Interpreters/JIT/CHJIT.h>
void test_function()
{
std::cerr << "Test function" << std::endl;
}
// void test_function()
// {
// std::cerr << "Test function" << std::endl;
// }
int main(int argc, char **argv)
{
(void)(argc);
(void)(argv);
auto jit = DB::CHJIT();
// auto jit = DB::CHJIT();
jit.registerExternalSymbol("test_function", reinterpret_cast<void *>(&test_function));
// jit.registerExternalSymbol("test_function", reinterpret_cast<void *>(&test_function));
auto compiled_module_info = jit.compileModule([](llvm::Module & module)
{
auto & context = module.getContext();
llvm::IRBuilder<> b (context);
// auto compiled_module_info = jit.compileModule([](llvm::Module & module)
// {
// auto & context = module.getContext();
// llvm::IRBuilder<> b (context);
auto * func_declaration_type = llvm::FunctionType::get(b.getVoidTy(), { }, /*isVarArg=*/false);
auto * func_declaration = llvm::Function::Create(func_declaration_type, llvm::Function::ExternalLinkage, "test_function", module);
// auto * func_declaration_type = llvm::FunctionType::get(b.getVoidTy(), { }, /*isVarArg=*/false);
// auto * func_declaration = llvm::Function::Create(func_declaration_type, llvm::Function::ExternalLinkage, "test_function", module);
auto * value_type = b.getInt64Ty();
auto * pointer_type = value_type->getPointerTo();
// auto * value_type = b.getInt64Ty();
// auto * pointer_type = value_type->getPointerTo();
auto * func_type = llvm::FunctionType::get(b.getVoidTy(), { pointer_type }, /*isVarArg=*/false);
auto * function = llvm::Function::Create(func_type, llvm::Function::ExternalLinkage, "test_name", module);
auto * entry = llvm::BasicBlock::Create(context, "entry", function);
// auto * func_type = llvm::FunctionType::get(b.getVoidTy(), { pointer_type }, /*isVarArg=*/false);
// auto * function = llvm::Function::Create(func_type, llvm::Function::ExternalLinkage, "test_name", module);
// auto * entry = llvm::BasicBlock::Create(context, "entry", function);
auto * argument = function->args().begin();
b.SetInsertPoint(entry);
// auto * argument = function->args().begin();
// b.SetInsertPoint(entry);
b.CreateCall(func_declaration);
// b.CreateCall(func_declaration);
auto * load_argument = b.CreateLoad(argument);
auto * value = b.CreateAdd(load_argument, load_argument);
b.CreateRet(value);
});
// auto * load_argument = b.CreateLoad(argument);
// auto * value = b.CreateAdd(load_argument, load_argument);
// b.CreateRet(value);
// });
for (const auto & compiled_function_name : compiled_module_info.compiled_functions)
{
std::cerr << compiled_function_name << std::endl;
}
// for (const auto & compiled_function_name : compiled_module_info.compiled_functions)
// {
// std::cerr << compiled_function_name << std::endl;
// }
int64_t value = 5;
auto * test_name_function = reinterpret_cast<int64_t (*)(int64_t *)>(jit.findCompiledFunction(compiled_module_info, "test_name"));
auto result = test_name_function(&value);
std::cerr << "Result " << result << std::endl;
// int64_t value = 5;
// auto * test_name_function = reinterpret_cast<int64_t (*)(int64_t *)>(jit.findCompiledFunction(compiled_module_info, "test_name"));
// auto result = test_name_function(&value);
// std::cerr << "Result " << result << std::endl;
return 0;
}

View File

@ -95,7 +95,7 @@ void ASTSelectQuery::formatImpl(const FormatSettings & s, FormatState & state, F
if (tables())
{
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << indent_str << "FROM " << (s.hilite ? hilite_none : "");
s.ostr << (s.hilite ? hilite_keyword : "") << s.nl_or_ws << indent_str << "FROM" << (s.hilite ? hilite_none : "");
tables()->formatImpl(s, state, frame);
}

View File

@ -35,24 +35,24 @@ void ASTSelectWithUnionQuery::formatQueryImpl(const FormatSettings & settings, F
if (mode == Mode::Unspecified)
return "";
else if (mode == Mode::ALL)
return "ALL";
return " ALL";
else
return "DISTINCT";
return " DISTINCT";
};
for (ASTs::const_iterator it = list_of_selects->children.begin(); it != list_of_selects->children.end(); ++it)
{
if (it != list_of_selects->children.begin())
settings.ostr << settings.nl_or_ws << indent_str << (settings.hilite ? hilite_keyword : "") << "UNION "
settings.ostr << settings.nl_or_ws << indent_str << (settings.hilite ? hilite_keyword : "") << "UNION"
<< mode_to_str((is_normalized) ? union_mode : list_of_modes[it - list_of_selects->children.begin() - 1])
<< (settings.hilite ? hilite_none : "");
if (auto * node = (*it)->as<ASTSelectWithUnionQuery>())
{
settings.ostr << settings.nl_or_ws << indent_str;
if (node->list_of_selects->children.size() == 1)
{
if (it != list_of_selects->children.begin())
settings.ostr << settings.nl_or_ws;
(node->list_of_selects->children.at(0))->formatImpl(settings, state, frame);
}
else

View File

@ -29,6 +29,11 @@ void ASTSubquery::appendColumnNameImpl(WriteBuffer & ostr) const
void ASTSubquery::formatImplWithoutAlias(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const
{
/// NOTE: due to trickery of filling cte_name (in interpreters) it is hard
/// to print it w/o newline (for !oneline case), since if nl_or_ws
/// prepended here, then formatting will be incorrect with alias:
///
/// (select 1 in ((select 1) as sub))
if (!cte_name.empty())
{
settings.ostr << (settings.hilite ? hilite_identifier : "");
@ -40,7 +45,7 @@ void ASTSubquery::formatImplWithoutAlias(const FormatSettings & settings, Format
std::string indent_str = settings.one_line ? "" : std::string(4u * frame.indent, ' ');
std::string nl_or_nothing = settings.one_line ? "" : "\n";
settings.ostr << nl_or_nothing << indent_str << "(" << nl_or_nothing;
settings.ostr << "(" << nl_or_nothing;
FormatStateStacked frame_nested = frame;
frame_nested.need_parens = false;
++frame_nested.indent;

View File

@ -109,14 +109,17 @@ void ASTTableExpression::formatImpl(const FormatSettings & settings, FormatState
if (database_and_table_name)
{
settings.ostr << " ";
database_and_table_name->formatImpl(settings, state, frame);
}
else if (table_function)
{
settings.ostr << " ";
table_function->formatImpl(settings, state, frame);
}
else if (subquery)
{
settings.ostr << settings.nl_or_ws << indent_str;
subquery->formatImpl(settings, state, frame);
}
@ -142,9 +145,15 @@ void ASTTableExpression::formatImpl(const FormatSettings & settings, FormatState
}
void ASTTableJoin::formatImplBeforeTable(const FormatSettings & settings, FormatState &, FormatStateStacked) const
void ASTTableJoin::formatImplBeforeTable(const FormatSettings & settings, FormatState &, FormatStateStacked frame) const
{
settings.ostr << (settings.hilite ? hilite_keyword : "");
std::string indent_str = settings.one_line ? "" : std::string(4 * frame.indent, ' ');
if (kind != Kind::Comma)
{
settings.ostr << settings.nl_or_ws << indent_str;
}
switch (locality)
{
@ -241,6 +250,7 @@ void ASTArrayJoin::formatImpl(const FormatSettings & settings, FormatState & sta
frame.expression_list_prepend_whitespace = true;
settings.ostr << (settings.hilite ? hilite_keyword : "")
<< settings.nl_or_ws
<< (kind == Kind::Left ? "LEFT " : "") << "ARRAY JOIN" << (settings.hilite ? hilite_none : "");
settings.one_line
@ -254,10 +264,7 @@ void ASTTablesInSelectQueryElement::formatImpl(const FormatSettings & settings,
if (table_expression)
{
if (table_join)
{
table_join->as<ASTTableJoin &>().formatImplBeforeTable(settings, state, frame);
settings.ostr << " ";
}
table_expression->formatImpl(settings, state, frame);
@ -275,13 +282,8 @@ void ASTTablesInSelectQuery::formatImpl(const FormatSettings & settings, FormatS
{
std::string indent_str = settings.one_line ? "" : std::string(4 * frame.indent, ' ');
for (ASTs::const_iterator it = children.begin(); it != children.end(); ++it)
{
if (it != children.begin())
settings.ostr << settings.nl_or_ws << indent_str;
(*it)->formatImpl(settings, state, frame);
}
for (const auto & child : children)
child->formatImpl(settings, state, frame);
}
}

View File

@ -16,8 +16,11 @@ ASTPtr ASTWithElement::clone() const
void ASTWithElement::formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const
{
std::string indent_str = settings.one_line ? "" : std::string(4 * frame.indent, ' ');
settings.writeIdentifier(name);
settings.ostr << (settings.hilite ? hilite_keyword : "") << " AS " << (settings.hilite ? hilite_none : "");
settings.ostr << (settings.hilite ? hilite_keyword : "") << " AS" << (settings.hilite ? hilite_none : "");
settings.ostr << settings.nl_or_ws << indent_str;
dynamic_cast<const ASTWithAlias &>(*subquery).formatImplWithoutAlias(settings, state, frame);
}

View File

@ -344,7 +344,7 @@ AvroDeserializer::DeserializeFn AvroDeserializer::createDeserializeFn(avro::Node
if (target.isEnum())
{
const auto & enum_type = dynamic_cast<const IDataTypeEnum &>(*target_type);
std::vector<Field> symbol_mapping;
Row symbol_mapping;
for (size_t i = 0; i < root_node->names(); i++)
{
symbol_mapping.push_back(enum_type.castToValue(root_node->nameAt(i)));

View File

@ -12,6 +12,7 @@
#include <DataTypes/NestedUtils.h>
#include <IO/WriteHelpers.h>
namespace DB
{
@ -109,6 +110,9 @@ static bool isInPartitionKey(const std::string & column_name, const Names & part
return is_in_partition_key != partition_key_columns.end();
}
using Row = std::vector<Field>;
/// Returns true if merge result is not empty
static bool mergeMap(const SummingSortedAlgorithm::MapDescription & desc,
Row & row, const ColumnRawPtrs & raw_columns, size_t row_number)

View File

@ -2,7 +2,7 @@
#include <Processors/Merges/Algorithms/IMergingAlgorithmWithDelayedChunk.h>
#include <Processors/Merges/Algorithms/MergedData.h>
#include <Core/Row.h>
namespace DB
{

View File

@ -68,9 +68,6 @@ void readHeaders(
if (in.eof())
throw Poco::Net::MessageException("Field is invalid");
if (value.empty())
throw Poco::Net::MessageException("Field value is empty");
if (ch == '\n')
throw Poco::Net::MessageException("No CRLF found");

View File

@ -2,7 +2,6 @@
#include <DataStreams/IBlockInputStream.h>
#include <Core/Row.h>
#include <Core/Block.h>
#include <common/types.h>
#include <Core/NamesAndTypes.h>

View File

@ -211,9 +211,12 @@ namespace
virtual void insertStringColumn(const ColumnPtr & column, const String & name) = 0;
virtual void insertUInt64Column(const ColumnPtr & column, const String & name) = 0;
virtual void insertUUIDColumn(const ColumnPtr & column, const String & name) = 0;
virtual void
insertPartitionValueColumn(size_t rows, const Row & partition_value, const DataTypePtr & partition_value_type, const String & name)
= 0;
virtual void insertPartitionValueColumn(
size_t rows,
const Row & partition_value,
const DataTypePtr & partition_value_type,
const String & name) = 0;
};
}
@ -358,8 +361,8 @@ namespace
columns.push_back(column);
}
void
insertPartitionValueColumn(size_t rows, const Row & partition_value, const DataTypePtr & partition_value_type, const String &) final
void insertPartitionValueColumn(
size_t rows, const Row & partition_value, const DataTypePtr & partition_value_type, const String &) final
{
ColumnPtr column;
if (rows)

View File

@ -79,7 +79,7 @@ void MergeTreeDataPartInMemory::flushToDisk(const String & base_path, const Stri
new_data_part->uuid = uuid;
new_data_part->setColumns(columns);
new_data_part->partition.value.assign(partition.value);
new_data_part->partition.value = partition.value;
new_data_part->minmax_idx = minmax_idx;
if (disk->exists(destination_path))

View File

@ -150,7 +150,7 @@ BlocksWithPartition MergeTreeDataWriter::splitBlockIntoParts(const Block & block
if (!metadata_snapshot->hasPartitionKey()) /// Table is not partitioned.
{
result.emplace_back(Block(block), Row());
result.emplace_back(Block(block), Row{});
return result;
}

View File

@ -1,7 +1,6 @@
#pragma once
#include <Core/Block.h>
#include <Core/Row.h>
#include <IO/WriteBufferFromFile.h>
#include <Compression/CompressedWriteBuffer.h>

View File

@ -1,9 +1,10 @@
#pragma once
#include <Core/Row.h>
#include <common/types.h>
#include <Disks/IDisk.h>
#include <IO/WriteBuffer.h>
#include <Core/Field.h>
namespace DB
{
@ -38,7 +39,7 @@ public:
void store(const MergeTreeData & storage, const DiskPtr & disk, const String & part_path, MergeTreeDataPartChecksums & checksums) const;
void store(const Block & partition_key_sample, const DiskPtr & disk, const String & part_path, MergeTreeDataPartChecksums & checksums) const;
void assign(const MergeTreePartition & other) { value.assign(other.value); }
void assign(const MergeTreePartition & other) { value = other.value; }
void create(const StorageMetadataPtr & metadata_snapshot, Block block, size_t row);
};

View File

@ -0,0 +1,42 @@
#include <Storages/MySQL/MySQLSettings.h>
#include <Parsers/ASTCreateQuery.h>
#include <Parsers/ASTSetQuery.h>
#include <Parsers/ASTFunction.h>
#include <Common/Exception.h>
namespace DB
{
namespace ErrorCodes
{
extern const int UNKNOWN_SETTING;
}
IMPLEMENT_SETTINGS_TRAITS(MySQLSettingsTraits, LIST_OF_MYSQL_SETTINGS)
void MySQLSettings::loadFromQuery(ASTStorage & storage_def)
{
if (storage_def.settings)
{
try
{
applyChanges(storage_def.settings->changes);
}
catch (Exception & e)
{
if (e.code() == ErrorCodes::UNKNOWN_SETTING)
e.addMessage("for storage " + storage_def.engine->name);
throw;
}
}
else
{
auto settings_ast = std::make_shared<ASTSetQuery>();
settings_ast->is_standalone = false;
storage_def.set(storage_def.settings, settings_ast);
}
}
}

View File

@ -0,0 +1,32 @@
#pragma once
#include <Core/Defines.h>
#include <Core/BaseSettings.h>
namespace Poco::Util
{
class AbstractConfiguration;
}
namespace DB
{
class ASTStorage;
#define LIST_OF_MYSQL_SETTINGS(M) \
M(UInt64, connection_pool_size, 16, "Size of connection pool (if all connections are in use, the query will wait until some connection will be freed).", 0) \
M(UInt64, connection_max_tries, 3, "Number of retries for pool with failover", 0) \
M(Bool, connection_auto_close, true, "Auto-close connection after query execution, i.e. disable connection reuse.", 0) \
DECLARE_SETTINGS_TRAITS(MySQLSettingsTraits, LIST_OF_MYSQL_SETTINGS)
/** Settings for the MySQL family of engines.
*/
struct MySQLSettings : public BaseSettings<MySQLSettingsTraits>
{
void loadFromQuery(ASTStorage & storage_def);
};
}

View File

@ -12,6 +12,7 @@
#include <Processors/Pipe.h>
#include <Common/parseRemoteDescription.h>
#include <Storages/StorageMySQL.h>
#include <Storages/MySQL/MySQLSettings.h>
#include <Storages/StoragePostgreSQL.h>
#include <Storages/StorageURL.h>
#include <common/logger_useful.h>
@ -79,7 +80,8 @@ StorageExternalDistributed::StorageExternalDistributed(
columns_,
constraints_,
String{},
context);
context,
MySQLSettings{});
break;
}
#endif

View File

@ -15,6 +15,7 @@
#include <IO/Operators.h>
#include <IO/WriteHelpers.h>
#include <Parsers/ASTLiteral.h>
#include <Parsers/ASTCreateQuery.h>
#include <mysqlxx/Transaction.h>
#include <Processors/Sources/SourceFromInputStream.h>
#include <Processors/Pipe.h>
@ -50,13 +51,15 @@ StorageMySQL::StorageMySQL(
const ColumnsDescription & columns_,
const ConstraintsDescription & constraints_,
const String & comment,
ContextPtr context_)
ContextPtr context_,
const MySQLSettings & mysql_settings_)
: IStorage(table_id_)
, WithContext(context_->getGlobalContext())
, remote_database_name(remote_database_name_)
, remote_table_name(remote_table_name_)
, replace_query{replace_query_}
, on_duplicate_clause{on_duplicate_clause_}
, mysql_settings(mysql_settings_)
, pool(std::make_shared<mysqlxx::PoolWithFailover>(pool_))
{
StorageInMemoryMetadata storage_metadata;
@ -98,7 +101,8 @@ Pipe StorageMySQL::read(
}
StreamSettings mysql_input_stream_settings(context_->getSettingsRef(), true, false);
StreamSettings mysql_input_stream_settings(context_->getSettingsRef(),
mysql_settings.connection_auto_close);
return Pipe(std::make_shared<SourceFromInputStream>(
std::make_shared<MySQLWithFailoverBlockInputStream>(pool, query, sample_block, mysql_input_stream_settings)));
}
@ -250,8 +254,22 @@ void registerStorageMySQL(StorageFactory & factory)
const String & password = engine_args[4]->as<ASTLiteral &>().value.safeGet<String>();
size_t max_addresses = args.getContext()->getSettingsRef().glob_expansion_max_elements;
/// TODO: move some arguments from the arguments to the SETTINGS.
MySQLSettings mysql_settings;
if (args.storage_def->settings)
{
mysql_settings.loadFromQuery(*args.storage_def);
}
if (!mysql_settings.connection_pool_size)
throw Exception("connection_pool_size cannot be zero.", ErrorCodes::BAD_ARGUMENTS);
auto addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 3306);
mysqlxx::PoolWithFailover pool(remote_database, addresses, username, password);
mysqlxx::PoolWithFailover pool(remote_database, addresses,
username, password,
MYSQLXX_POOL_WITH_FAILOVER_DEFAULT_START_CONNECTIONS,
mysql_settings.connection_pool_size,
mysql_settings.connection_max_tries);
bool replace_query = false;
std::string on_duplicate_clause;
@ -275,9 +293,11 @@ void registerStorageMySQL(StorageFactory & factory)
args.columns,
args.constraints,
args.comment,
args.getContext());
args.getContext(),
mysql_settings);
},
{
.supports_settings = true,
.source_access_type = AccessType::MYSQL,
});
}

View File

@ -9,6 +9,7 @@
#include <ext/shared_ptr_helper.h>
#include <Storages/IStorage.h>
#include <Storages/MySQL/MySQLSettings.h>
#include <mysqlxx/PoolWithFailover.h>
@ -33,7 +34,8 @@ public:
const ColumnsDescription & columns_,
const ConstraintsDescription & constraints_,
const String & comment,
ContextPtr context_);
ContextPtr context_,
const MySQLSettings & mysql_settings_);
std::string getName() const override { return "MySQL"; }
@ -56,6 +58,8 @@ private:
bool replace_query;
std::string on_duplicate_clause;
MySQLSettings mysql_settings;
mysqlxx::PoolWithFailoverPtr pool;
};

File diff suppressed because it is too large Load Diff

View File

@ -112,6 +112,7 @@ SRCS(
MergeTree/localBackup.cpp
MergeTree/registerStorageMergeTree.cpp
MutationCommands.cpp
MySQL/MySQLSettings.cpp
PartitionCommands.cpp
ProjectionsDescription.cpp
ReadInOrderOptimizer.cpp

View File

@ -15,6 +15,7 @@
#include <Parsers/ASTFunction.h>
#include <Parsers/ASTLiteral.h>
#include <Storages/StorageMySQL.h>
#include <Storages/MySQL/MySQLSettings.h>
#include <TableFunctions/ITableFunction.h>
#include <TableFunctions/TableFunctionFactory.h>
#include <TableFunctions/TableFunctionMySQL.h>
@ -107,7 +108,8 @@ StoragePtr TableFunctionMySQL::executeImpl(
columns,
ConstraintsDescription{},
String{},
context);
context,
MySQLSettings{});
pool.reset();

View File

@ -314,11 +314,12 @@ def colored(text, args, color=None, on_color=None, attrs=None):
SERVER_DIED = False
exit_code = 0
stop_time = None
queue = multiprocessing.Queue(maxsize=1)
# def run_tests_array(all_tests, suite, suite_dir, suite_tmp_dir, run_total):
def run_tests_array(all_tests_with_params):
all_tests, suite, suite_dir, suite_tmp_dir = all_tests_with_params
all_tests, num_tests, suite, suite_dir, suite_tmp_dir = all_tests_with_params
global exit_code
global SERVER_DIED
global stop_time
@ -348,10 +349,21 @@ def run_tests_array(all_tests_with_params):
else:
return ''
if all_tests:
print(f"\nRunning {len(all_tests)} {suite} tests ({multiprocessing.current_process().name}).\n")
if num_tests > 0:
about = 'about ' if is_concurrent else ''
print(f"\nRunning {about}{num_tests} {suite} tests ({multiprocessing.current_process().name}).\n")
while True:
if is_concurrent:
case = queue.get()
if not case:
break
else:
if all_tests:
case = all_tests.pop(0)
else:
break
for case in all_tests:
if SERVER_DIED:
stop_tests()
break
@ -677,6 +689,47 @@ def collect_build_flags(client):
return result
def do_run_tests(jobs, suite, suite_dir, suite_tmp_dir, all_tests, parallel_tests, sequential_tests, parallel):
if jobs > 1 and len(parallel_tests) > 0:
print("Found", len(parallel_tests), "parallel tests and", len(sequential_tests), "sequential tests")
run_n, run_total = parallel.split('/')
run_n = float(run_n)
run_total = float(run_total)
tests_n = len(parallel_tests)
if run_total > tests_n:
run_total = tests_n
if jobs > tests_n:
jobs = tests_n
if jobs > run_total:
run_total = jobs
batch_size = max(1, len(parallel_tests) // jobs)
parallel_tests_array = []
for _ in range(jobs):
parallel_tests_array.append((None, batch_size, suite, suite_dir, suite_tmp_dir))
with closing(multiprocessing.Pool(processes=jobs)) as pool:
pool.map_async(run_tests_array, parallel_tests_array)
for suit in parallel_tests:
queue.put(suit)
for _ in range(jobs):
queue.put(None)
queue.close()
pool.join()
run_tests_array((sequential_tests, len(sequential_tests), suite, suite_dir, suite_tmp_dir))
return len(sequential_tests) + len(parallel_tests)
else:
num_tests = len(all_tests)
run_tests_array((all_tests, num_tests, suite, suite_dir, suite_tmp_dir))
return num_tests
def main(args):
global SERVER_DIED
global stop_time
@ -840,34 +893,8 @@ def main(args):
else:
parallel_tests.append(test)
if jobs > 1 and len(parallel_tests) > 0:
print("Found", len(parallel_tests), "parallel tests and", len(sequential_tests), "sequential tests")
run_n, run_total = args.parallel.split('/')
run_n = float(run_n)
run_total = float(run_total)
tests_n = len(parallel_tests)
if run_total > tests_n:
run_total = tests_n
if jobs > tests_n:
jobs = tests_n
if jobs > run_total:
run_total = jobs
# Create two batches per process for more uniform execution time.
batch_size = max(1, len(parallel_tests) // (jobs * 2))
parallel_tests_array = []
for i in range(0, len(parallel_tests), batch_size):
parallel_tests_array.append((parallel_tests[i:i+batch_size], suite, suite_dir, suite_tmp_dir))
with closing(multiprocessing.Pool(processes=jobs)) as pool:
pool.map(run_tests_array, parallel_tests_array)
run_tests_array((sequential_tests, suite, suite_dir, suite_tmp_dir))
total_tests_run += len(sequential_tests) + len(parallel_tests)
else:
run_tests_array((all_tests, suite, suite_dir, suite_tmp_dir))
total_tests_run += len(all_tests)
total_tests_run += do_run_tests(
jobs, suite, suite_dir, suite_tmp_dir, all_tests, parallel_tests, sequential_tests, args.parallel)
if args.hung_check:

View File

@ -126,7 +126,8 @@ class ClickHouseCluster:
"""
def __init__(self, base_path, name=None, base_config_dir=None, server_bin_path=None, client_bin_path=None,
odbc_bridge_bin_path=None, library_bridge_bin_path=None, zookeeper_config_path=None, custom_dockerd_host=None):
odbc_bridge_bin_path=None, library_bridge_bin_path=None, zookeeper_config_path=None,
custom_dockerd_host=None):
for param in list(os.environ.keys()):
print("ENV %40s %s" % (param, os.environ[param]))
self.base_dir = p.dirname(base_path)
@ -219,7 +220,9 @@ class ClickHouseCluster:
with_redis=False, with_minio=False, with_cassandra=False,
hostname=None, env_variables=None, image="yandex/clickhouse-integration-test", tag=None,
stay_alive=False, ipv4_address=None, ipv6_address=None, with_installed_binary=False, tmpfs=None,
zookeeper_docker_compose_path=None, zookeeper_use_tmpfs=True, minio_certs_dir=None, use_keeper=True):
zookeeper_docker_compose_path=None, zookeeper_use_tmpfs=True, minio_certs_dir=None, use_keeper=True,
main_config_name="config.xml", users_config_name="users.xml", copy_common_configs=True):
"""Add an instance to the cluster.
name - the name of the instance directory and the value of the 'instance' macro in ClickHouse.
@ -280,6 +283,9 @@ class ClickHouseCluster:
ipv4_address=ipv4_address,
ipv6_address=ipv6_address,
with_installed_binary=with_installed_binary,
main_config_name=main_config_name,
users_config_name=users_config_name,
copy_common_configs=copy_common_configs,
tmpfs=tmpfs or [])
docker_compose_yml_dir = get_docker_compose_path()
@ -944,7 +950,7 @@ class ClickHouseCluster:
subprocess_check_call(self.base_zookeeper_cmd + ["start", n])
CLICKHOUSE_START_COMMAND = "clickhouse server --config-file=/etc/clickhouse-server/config.xml --log-file=/var/log/clickhouse-server/clickhouse-server.log --errorlog-file=/var/log/clickhouse-server/clickhouse-server.err.log"
CLICKHOUSE_START_COMMAND = "clickhouse server --config-file=/etc/clickhouse-server/{main_config_file} --log-file=/var/log/clickhouse-server/clickhouse-server.log --errorlog-file=/var/log/clickhouse-server/clickhouse-server.err.log"
CLICKHOUSE_STAY_ALIVE_COMMAND = 'bash -c "{} --daemon; tail -f /dev/null"'.format(CLICKHOUSE_START_COMMAND)
@ -1000,6 +1006,8 @@ class ClickHouseInstance:
macros, with_zookeeper, zookeeper_config_path, with_mysql, with_mysql_cluster, with_kafka, with_kerberized_kafka, with_rabbitmq, with_kerberized_hdfs,
with_mongo, with_redis, with_minio,
with_cassandra, server_bin_path, odbc_bridge_bin_path, library_bridge_bin_path, clickhouse_path_dir, with_odbc_drivers,
clickhouse_start_command=CLICKHOUSE_START_COMMAND,
main_config_name="config.xml", users_config_name="users.xml", copy_common_configs=True,
hostname=None, env_variables=None,
image="yandex/clickhouse-integration-test", tag="latest",
stay_alive=False, ipv4_address=None, ipv6_address=None, with_installed_binary=False, tmpfs=None):
@ -1036,6 +1044,12 @@ class ClickHouseInstance:
self.with_minio = with_minio
self.with_cassandra = with_cassandra
self.main_config_name = main_config_name
self.users_config_name = users_config_name
self.copy_common_configs = copy_common_configs
self.clickhouse_start_command = clickhouse_start_command.replace("{main_config_file}", self.main_config_name)
self.path = p.join(self.cluster.instances_dir, name)
self.docker_compose_path = p.join(self.path, 'docker-compose.yml')
self.env_variables = env_variables or {}
@ -1177,7 +1191,7 @@ class ClickHouseInstance:
if not self.stay_alive:
raise Exception("clickhouse can be started again only with stay_alive=True instance")
self.exec_in_container(["bash", "-c", "{} --daemon".format(CLICKHOUSE_START_COMMAND)], user=str(os.getuid()))
self.exec_in_container(["bash", "-c", "{} --daemon".format(self.clickhouse_start_command)], user=str(os.getuid()))
# wait start
from helpers.test_tools import assert_eq_with_retry
assert_eq_with_retry(self, "select 1", "1", retry_count=int(start_wait_sec / 0.5), sleep_time=0.5)
@ -1263,7 +1277,7 @@ class ClickHouseInstance:
self.exec_in_container(["bash", "-c",
"cp /usr/share/clickhouse-odbc-bridge_fresh /usr/bin/clickhouse-odbc-bridge && chmod 777 /usr/bin/clickhouse"],
user='root')
self.exec_in_container(["bash", "-c", "{} --daemon".format(CLICKHOUSE_START_COMMAND)], user=str(os.getuid()))
self.exec_in_container(["bash", "-c", "{} --daemon".format(self.clickhouse_start_command)], user=str(os.getuid()))
from helpers.test_tools import assert_eq_with_retry
# wait start
assert_eq_with_retry(self, "select 1", "1", retry_count=retries)
@ -1404,8 +1418,10 @@ class ClickHouseInstance:
os.makedirs(instance_config_dir)
print("Copy common default production configuration from {}".format(self.base_config_dir))
shutil.copyfile(p.join(self.base_config_dir, 'config.xml'), p.join(instance_config_dir, 'config.xml'))
shutil.copyfile(p.join(self.base_config_dir, 'users.xml'), p.join(instance_config_dir, 'users.xml'))
shutil.copyfile(p.join(self.base_config_dir, self.main_config_name), p.join(instance_config_dir, self.main_config_name))
shutil.copyfile(p.join(self.base_config_dir, self.users_config_name), p.join(instance_config_dir, self.users_config_name))
print("Create directory for configuration generated in this helper")
# used by all utils with any config
@ -1423,7 +1439,9 @@ class ClickHouseInstance:
print("Copy common configuration from helpers")
# The file is named with 0_ prefix to be processed before other configuration overloads.
shutil.copy(p.join(HELPERS_DIR, '0_common_instance_config.xml'), self.config_d_dir)
if self.copy_common_configs:
shutil.copy(p.join(HELPERS_DIR, '0_common_instance_config.xml'), self.config_d_dir)
shutil.copy(p.join(HELPERS_DIR, '0_common_instance_users.xml'), users_d_dir)
if len(self.custom_dictionaries_paths):
shutil.copy(p.join(HELPERS_DIR, '0_common_enable_dictionaries.xml'), self.config_d_dir)
@ -1502,10 +1520,10 @@ class ClickHouseInstance:
self._create_odbc_config_file()
odbc_ini_path = '- ' + self.odbc_ini_path
entrypoint_cmd = CLICKHOUSE_START_COMMAND
entrypoint_cmd = self.clickhouse_start_command
if self.stay_alive:
entrypoint_cmd = CLICKHOUSE_STAY_ALIVE_COMMAND
entrypoint_cmd = CLICKHOUSE_STAY_ALIVE_COMMAND.replace("{main_config_file}", self.main_config_name)
print("Entrypoint cmd: {}".format(entrypoint_cmd))

View File

@ -70,11 +70,11 @@ def check_args_and_update_paths(args):
if not os.path.exists(path):
raise Exception("Path {} doesn't exist".format(path))
if not os.path.exists(os.path.join(args.base_configs_dir, "config.xml")):
raise Exception("No configs.xml in {}".format(args.base_configs_dir))
if (not os.path.exists(os.path.join(args.base_configs_dir, "config.xml"))) and (not os.path.exists(os.path.join(args.base_configs_dir, "config.yaml"))):
raise Exception("No configs.xml or configs.yaml in {}".format(args.base_configs_dir))
if not os.path.exists(os.path.join(args.base_configs_dir, "users.xml")):
raise Exception("No users.xml in {}".format(args.base_configs_dir))
if (not os.path.exists(os.path.join(args.base_configs_dir, "users.xml"))) and (not os.path.exists(os.path.join(args.base_configs_dir, "users.yaml"))):
raise Exception("No users.xml or users.yaml in {}".format(args.base_configs_dir))
def docker_kill_handler_handler(signum, frame):
subprocess.check_call('docker kill $(docker ps -a -q --filter name={name} --format="{{{{.ID}}}}")'.format(name=CONTAINER_NAME), shell=True)

View File

@ -0,0 +1,13 @@
<yandex>
<!-- Sources to read users, roles, access rights, profiles of settings, quotas. -->
<user_directories replace="replace">
<users_xml>
<!-- Path to configuration file with predefined users. -->
<path>users.xml</path>
</users_xml>
<local_directory>
<!-- Path to folder where users created by SQL commands are stored. -->
<path>access/</path>
</local_directory>
</user_directories>
</yandex>

View File

@ -0,0 +1,23 @@
<yandex>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<force_sync>false</force_sync>
<startup_timeout>60000</startup_timeout>
<!-- we want all logs for complex problems investigation -->
<reserved_log_items>1000000000000000</reserved_log_items>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>localhost</hostname>
<port>44444</port>
</server>
</raft_configuration>
</keeper_server>
</yandex>

View File

@ -0,0 +1,7 @@
<yandex>
<logger>
<console>true</console>
<log remove="remove"/>
<errorlog remove="remove"/>
</logger>
</yandex>

View File

@ -0,0 +1,8 @@
<yandex>
<logger>
<!-- Disable rotation
https://pocoproject.org/docs/Poco.FileChannel.html
-->
<size>never</size>
</logger>
</yandex>

View File

@ -0,0 +1,9 @@
<yandex>
<macros>
<test>Hello, world!</test>
<shard>s1</shard>
<replica>r1</replica>
<default_path_test>/clickhouse/tables/{database}/{shard}/</default_path_test>
<default_name_test>table_{table}</default_name_test>
</macros>
</yandex>

View File

@ -0,0 +1,8 @@
<yandex>
<metric_log>
<database>system</database>
<table>metric_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
<collect_interval_milliseconds>1000</collect_interval_milliseconds>
</metric_log>
</yandex>

View File

@ -0,0 +1,49 @@
<yandex>
<remote_servers>
<![CDATA[
You can run additional servers simply as
./clickhouse-server -- --path=9001 --tcp_port=9001
]]>
<single_remote_shard_at_port_9001>
<shard>
<replica>
<host>localhost</host>
<port>9001</port>
</replica>
</shard>
</single_remote_shard_at_port_9001>
<two_remote_shards_at_port_9001_9002>
<shard>
<replica>
<host>localhost</host>
<port>9001</port>
</replica>
</shard>
<shard>
<replica>
<host>localhost</host>
<port>9002</port>
</replica>
</shard>
</two_remote_shards_at_port_9001_9002>
<two_shards_one_local_one_remote_at_port_9001>
<shard>
<replica>
<host>localhost</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>localhost</host>
<port>9001</port>
</replica>
</shard>
</two_shards_one_local_one_remote_at_port_9001>
</remote_servers>
</yandex>

View File

@ -0,0 +1,8 @@
<yandex>
<part_log>
<database>system</database>
<table>part_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</part_log>
</yandex>

View File

@ -0,0 +1,8 @@
<yandex>
<path replace="replace">./</path>
<tmp_path replace="replace">./tmp/</tmp_path>
<user_files_path replace="replace">./user_files/</user_files_path>
<format_schema_path replace="replace">./format_schemas/</format_schema_path>
<access_control_path replace="replace">./access/</access_control_path>
<top_level_domains_path replace="replace">./top_level_domains/</top_level_domains_path>
</yandex>

View File

@ -0,0 +1,10 @@
<?xml version="1.0"?>
<!-- Config for test server -->
<yandex>
<query_masking_rules>
<rule>
<regexp>TOPSECRET.TOPSECRET</regexp>
<replace>[hidden]</replace>
</rule>
</query_masking_rules>
</yandex>

View File

@ -0,0 +1,3 @@
<yandex>
<tcp_with_proxy_port>9010</tcp_with_proxy_port>
</yandex>

View File

@ -0,0 +1,7 @@
<yandex>
<text_log>
<database>system</database>
<table>text_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</text_log>
</yandex>

Some files were not shown because too many files have changed in this diff Show More