mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-21 15:12:02 +00:00
Merge branch 'master' into progress-bar
This commit is contained in:
commit
08eca33d81
17
.github/ISSUE_TEMPLATE/40_bug-report.md
vendored
17
.github/ISSUE_TEMPLATE/40_bug-report.md
vendored
@ -10,12 +10,26 @@ assignees: ''
|
||||
You have to provide the following information whenever possible.
|
||||
|
||||
**Describe the bug**
|
||||
|
||||
A clear and concise description of what works not as it is supposed to.
|
||||
|
||||
**Does it reproduce on recent release?**
|
||||
|
||||
[The list of releases](https://github.com/ClickHouse/ClickHouse/blob/master/utils/list-versions/version_date.tsv)
|
||||
|
||||
**Enable crash reporting**
|
||||
|
||||
If possible, change "enabled" to true in "send_crash_reports" section in `config.xml`:
|
||||
|
||||
```
|
||||
<send_crash_reports>
|
||||
<!-- Changing <enabled> to true allows sending crash reports to -->
|
||||
<!-- the ClickHouse core developers team via Sentry https://sentry.io -->
|
||||
<enabled>false</enabled>
|
||||
```
|
||||
|
||||
**How to reproduce**
|
||||
|
||||
* Which ClickHouse server version to use
|
||||
* Which interface to use, if matters
|
||||
* Non-default settings, if any
|
||||
@ -24,10 +38,13 @@ A clear and concise description of what works not as it is supposed to.
|
||||
* Queries to run that lead to unexpected result
|
||||
|
||||
**Expected behavior**
|
||||
|
||||
A clear and concise description of what you expected to happen.
|
||||
|
||||
**Error message and/or stacktrace**
|
||||
|
||||
If applicable, add screenshots to help explain your problem.
|
||||
|
||||
**Additional context**
|
||||
|
||||
Add any other context about the problem here.
|
||||
|
4
.gitmodules
vendored
4
.gitmodules
vendored
@ -228,3 +228,7 @@
|
||||
[submodule "contrib/datasketches-cpp"]
|
||||
path = contrib/datasketches-cpp
|
||||
url = https://github.com/ClickHouse-Extras/datasketches-cpp.git
|
||||
|
||||
[submodule "contrib/yaml-cpp"]
|
||||
path = contrib/yaml-cpp
|
||||
url = https://github.com/ClickHouse-Extras/yaml-cpp.git
|
||||
|
139
CHANGELOG.md
139
CHANGELOG.md
@ -1,3 +1,142 @@
|
||||
## ClickHouse release 21.5, 2021-05-20
|
||||
|
||||
#### Backward Incompatible Change
|
||||
|
||||
* Change comparison of integers and floating point numbers when integer is not exactly representable in the floating point data type. In new version comparison will return false as the rounding error will occur. Example: `9223372036854775808.0 != 9223372036854775808`, because the number `9223372036854775808` is not representable as floating point number exactly (and `9223372036854775808.0` is rounded to `9223372036854776000.0`). But in previous version the comparison will return as the numbers are equal, because if the floating point number `9223372036854776000.0` get converted back to UInt64, it will yield `9223372036854775808`. For the reference, the Python programming language also treats these numbers as equal. But this behaviour was dependend on CPU model (different results on AMD64 and AArch64 for some out-of-range numbers), so we make the comparison more precise. It will treat int and float numbers equal only if int is represented in floating point type exactly. [#22595](https://github.com/ClickHouse/ClickHouse/pull/22595) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Remove support for `argMin` and `argMax` for single `Tuple` argument. The code was not memory-safe. The feature was added by mistake and it is confusing for people. These functions can be reintroduced under different names later. This fixes [#22384](https://github.com/ClickHouse/ClickHouse/issues/22384) and reverts [#17359](https://github.com/ClickHouse/ClickHouse/issues/17359). [#23393](https://github.com/ClickHouse/ClickHouse/pull/23393) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
#### New Feature
|
||||
|
||||
* Added functions `dictGetChildren(dictionary, key)`, `dictGetDescendants(dictionary, key, level)`. Function `dictGetChildren` return all children as an array if indexes. It is a inverse transformation for `dictGetHierarchy`. Function `dictGetDescendants` return all descendants as if `dictGetChildren` was applied `level` times recursively. Zero `level` value is equivalent to infinity. Improved performance of `dictGetHierarchy`, `dictIsIn` functions. Closes [#14656](https://github.com/ClickHouse/ClickHouse/issues/14656). [#22096](https://github.com/ClickHouse/ClickHouse/pull/22096) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Added function `dictGetOrNull`. It works like `dictGet`, but return `Null` in case key was not found in dictionary. Closes [#22375](https://github.com/ClickHouse/ClickHouse/issues/22375). [#22413](https://github.com/ClickHouse/ClickHouse/pull/22413) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Added a table function `s3Cluster`, which allows to process files from `s3` in parallel on every node of a specified cluster. [#22012](https://github.com/ClickHouse/ClickHouse/pull/22012) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Added support for replicas and shards in MySQL/PostgreSQL table engine / table function. You can write `SELECT * FROM mysql('host{1,2}-{1|2}', ...)`. Closes [#20969](https://github.com/ClickHouse/ClickHouse/issues/20969). [#22217](https://github.com/ClickHouse/ClickHouse/pull/22217) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Added `ALTER TABLE ... FETCH PART ...` query. It's similar to `FETCH PARTITION`, but fetches only one part. [#22706](https://github.com/ClickHouse/ClickHouse/pull/22706) ([turbo jason](https://github.com/songenjie)).
|
||||
* Added a setting `max_distributed_depth` that limits the depth of recursive queries to `Distributed` tables. Closes [#20229](https://github.com/ClickHouse/ClickHouse/issues/20229). [#21942](https://github.com/ClickHouse/ClickHouse/pull/21942) ([flynn](https://github.com/ucasFL)).
|
||||
|
||||
#### Performance Improvement
|
||||
|
||||
* Improved performance of `intDiv` by dynamic dispatch for AVX2. This closes [#22314](https://github.com/ClickHouse/ClickHouse/issues/22314). [#23000](https://github.com/ClickHouse/ClickHouse/pull/23000) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Improved performance of reading from `ArrowStream` input format for sources other then local file (e.g. URL). [#22673](https://github.com/ClickHouse/ClickHouse/pull/22673) ([nvartolomei](https://github.com/nvartolomei)).
|
||||
* Disabled compression by default when interacting with localhost (with clickhouse-client or server to server with distributed queries) via native protocol. It may improve performance of some import/export operations. This closes [#22234](https://github.com/ClickHouse/ClickHouse/issues/22234). [#22237](https://github.com/ClickHouse/ClickHouse/pull/22237) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Exclude values that does not belong to the shard from right part of IN section for distributed queries (under `optimize_skip_unused_shards_rewrite_in`, enabled by default, since it still requires `optimize_skip_unused_shards`). [#21511](https://github.com/ClickHouse/ClickHouse/pull/21511) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Improved performance of reading a subset of columns with File-like table engine and column-oriented format like Parquet, Arrow or ORC. This closes [#issue:20129](https://github.com/ClickHouse/ClickHouse/issues/20129). [#21302](https://github.com/ClickHouse/ClickHouse/pull/21302) ([keenwolf](https://github.com/keen-wolf)).
|
||||
* Allow to move more conditions to `PREWHERE` as it was before version 21.1 (adjustment of internal heuristics). Insufficient number of moved condtions could lead to worse performance. [#23397](https://github.com/ClickHouse/ClickHouse/pull/23397) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Improved performance of ODBC connections and fixed all the outstanding issues from the backlog. Using `nanodbc` library instead of `Poco::ODBC`. Closes [#9678](https://github.com/ClickHouse/ClickHouse/issues/9678). Add support for DateTime64 and Decimal* for ODBC table engine. Closes [#21961](https://github.com/ClickHouse/ClickHouse/issues/21961). Fixed issue with cyrillic text being truncated. Closes [#16246](https://github.com/ClickHouse/ClickHouse/issues/16246). Added connection pools for odbc bridge. [#21972](https://github.com/ClickHouse/ClickHouse/pull/21972) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
* Increase `max_uri_size` (the maximum size of URL in HTTP interface) to 1 MiB by default. This closes [#21197](https://github.com/ClickHouse/ClickHouse/issues/21197). [#22997](https://github.com/ClickHouse/ClickHouse/pull/22997) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Set `background_fetches_pool_size` to `8` that is better for production usage with frequent small insertions or slow ZooKeeper cluster. [#22945](https://github.com/ClickHouse/ClickHouse/pull/22945) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* FlatDictionary added `initial_array_size`, `max_array_size` options. [#22521](https://github.com/ClickHouse/ClickHouse/pull/22521) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Add new setting `non_replicated_deduplication_window` for non-replicated MergeTree inserts deduplication. [#22514](https://github.com/ClickHouse/ClickHouse/pull/22514) ([alesapin](https://github.com/alesapin)).
|
||||
* Update paths to the `CatBoost` model configs in config reloading. [#22434](https://github.com/ClickHouse/ClickHouse/pull/22434) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Added `Decimal256` type support in dictionaries. `Decimal256` is experimental feature. Closes [#20979](https://github.com/ClickHouse/ClickHouse/issues/20979). [#22960](https://github.com/ClickHouse/ClickHouse/pull/22960) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Enabled `async_socket_for_remote` by default (using less amount of OS threads for distributed queries). [#23683](https://github.com/ClickHouse/ClickHouse/pull/23683) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fixed `quantile(s)TDigest`. Added special handling of singleton centroids according to tdunning/t-digest 3.2+. Also a bug with over-compression of centroids in implementation of earlier version of the algorithm was fixed. [#23314](https://github.com/ClickHouse/ClickHouse/pull/23314) ([Vladimir Chebotarev](https://github.com/excitoon)).
|
||||
* Make function name `unhex` case insensitive for compatibility with MySQL. [#23229](https://github.com/ClickHouse/ClickHouse/pull/23229) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Implement functions `arrayHasAny`, `arrayHasAll`, `has`, `indexOf`, `countEqual` for generic case when types of array elements are different. In previous versions the functions `arrayHasAny`, `arrayHasAll` returned false and `has`, `indexOf`, `countEqual` thrown exception. Also add support for `Decimal` and big integer types in functions `has` and similar. This closes [#20272](https://github.com/ClickHouse/ClickHouse/issues/20272). [#23044](https://github.com/ClickHouse/ClickHouse/pull/23044) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Raised the threshold on max number of matches in result of the function `extractAllGroupsHorizontal`. [#23036](https://github.com/ClickHouse/ClickHouse/pull/23036) ([Vasily Nemkov](https://github.com/Enmk)).
|
||||
* Do not perform `optimize_skip_unused_shards` for cluster with one node. [#22999](https://github.com/ClickHouse/ClickHouse/pull/22999) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Added ability to run clickhouse-keeper (experimental drop-in replacement to ZooKeeper) with SSL. Config settings `keeper_server.tcp_port_secure` can be used for secure interaction between client and keeper-server. `keeper_server.raft_configuration.secure` can be used to enable internal secure communication between nodes. [#22992](https://github.com/ClickHouse/ClickHouse/pull/22992) ([alesapin](https://github.com/alesapin)).
|
||||
* Added ability to flush buffer only in background for `Buffer` tables. [#22986](https://github.com/ClickHouse/ClickHouse/pull/22986) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* When selecting from MergeTree table with NULL in WHERE condition, in rare cases, exception was thrown. This closes [#20019](https://github.com/ClickHouse/ClickHouse/issues/20019). [#22978](https://github.com/ClickHouse/ClickHouse/pull/22978) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix error handling in Poco HTTP Client for AWS. [#22973](https://github.com/ClickHouse/ClickHouse/pull/22973) ([kreuzerkrieg](https://github.com/kreuzerkrieg)).
|
||||
* Respect `max_part_removal_threads` for `ReplicatedMergeTree`. [#22971](https://github.com/ClickHouse/ClickHouse/pull/22971) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix obscure corner case of MergeTree settings inactive_parts_to_throw_insert = 0 with inactive_parts_to_delay_insert > 0. [#22947](https://github.com/ClickHouse/ClickHouse/pull/22947) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* `dateDiff` now works with `DateTime64` arguments (even for values outside of `DateTime` range) [#22931](https://github.com/ClickHouse/ClickHouse/pull/22931) ([Vasily Nemkov](https://github.com/Enmk)).
|
||||
* MaterializeMySQL (experimental feature): added an ability to replicate MySQL databases containing views without failing. This is accomplished by ignoring the views. [#22760](https://github.com/ClickHouse/ClickHouse/pull/22760) ([Christian](https://github.com/cfroystad)).
|
||||
* Allow RBAC row policy via postgresql protocol. Closes [#22658](https://github.com/ClickHouse/ClickHouse/issues/22658). PostgreSQL protocol is enabled in configuration by default. [#22755](https://github.com/ClickHouse/ClickHouse/pull/22755) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Add metric to track how much time is spend during waiting for Buffer layer lock. [#22725](https://github.com/ClickHouse/ClickHouse/pull/22725) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Allow to use CTE in VIEW definition. This closes [#22491](https://github.com/ClickHouse/ClickHouse/issues/22491). [#22657](https://github.com/ClickHouse/ClickHouse/pull/22657) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Clear the rest of the screen and show cursor in `clickhouse-client` if previous program has left garbage in terminal. This closes [#16518](https://github.com/ClickHouse/ClickHouse/issues/16518). [#22634](https://github.com/ClickHouse/ClickHouse/pull/22634) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Make `round` function to behave consistently on non-x86_64 platforms. Rounding half to nearest even (Banker's rounding) is used. [#22582](https://github.com/ClickHouse/ClickHouse/pull/22582) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Correctly check structure of blocks of data that are sending by Distributed tables. [#22325](https://github.com/ClickHouse/ClickHouse/pull/22325) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Allow publishing Kafka errors to a virtual column of Kafka engine, controlled by the `kafka_handle_error_mode` setting. [#21850](https://github.com/ClickHouse/ClickHouse/pull/21850) ([fastio](https://github.com/fastio)).
|
||||
* Add aliases `simpleJSONExtract/simpleJSONHas` to `visitParam/visitParamExtract{UInt, Int, Bool, Float, Raw, String}`. Fixes [#21383](https://github.com/ClickHouse/ClickHouse/issues/21383). [#21519](https://github.com/ClickHouse/ClickHouse/pull/21519) ([fastio](https://github.com/fastio)).
|
||||
* Add `clickhouse-library-bridge` for library dictionary source. Closes [#9502](https://github.com/ClickHouse/ClickHouse/issues/9502). [#21509](https://github.com/ClickHouse/ClickHouse/pull/21509) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Forbid to drop a column if it's referenced by materialized view. Closes [#21164](https://github.com/ClickHouse/ClickHouse/issues/21164). [#21303](https://github.com/ClickHouse/ClickHouse/pull/21303) ([flynn](https://github.com/ucasFL)).
|
||||
* Support dynamic interserver credentials (rotating credentials without downtime). [#14113](https://github.com/ClickHouse/ClickHouse/pull/14113) ([johnskopis](https://github.com/johnskopis)).
|
||||
* Add support for Kafka storage with `Arrow` and `ArrowStream` format messages. [#23415](https://github.com/ClickHouse/ClickHouse/pull/23415) ([Chao Ma](https://github.com/godliness)).
|
||||
* Fixed missing semicolon in exception message. The user may find this exception message unpleasant to read. [#23208](https://github.com/ClickHouse/ClickHouse/pull/23208) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fixed missing whitespace in some exception messages about `LowCardinality` type. [#23207](https://github.com/ClickHouse/ClickHouse/pull/23207) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Some values were formatted with alignment in center in table cells in `Markdown` format. Not anymore. [#23096](https://github.com/ClickHouse/ClickHouse/pull/23096) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Remove non-essential details from suggestions in clickhouse-client. This closes [#22158](https://github.com/ClickHouse/ClickHouse/issues/22158). [#23040](https://github.com/ClickHouse/ClickHouse/pull/23040) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Correct calculation of `bytes_allocated` field in system.dictionaries for sparse_hashed dictionaries. [#22867](https://github.com/ClickHouse/ClickHouse/pull/22867) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fixed approximate total rows accounting for reverse reading from MergeTree. [#22726](https://github.com/ClickHouse/ClickHouse/pull/22726) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix the case when it was possible to configure dictionary with clickhouse source that was looking to itself that leads to infinite loop. Closes [#14314](https://github.com/ClickHouse/ClickHouse/issues/14314). [#22479](https://github.com/ClickHouse/ClickHouse/pull/22479) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Multiple fixes for hedged requests. Fixed an error `Can't initialize pipeline with empty pipe` for queries with `GLOBAL IN/JOIN` when the setting `use_hedged_requests` is enabled. Fixes [#23431](https://github.com/ClickHouse/ClickHouse/issues/23431). [#23805](https://github.com/ClickHouse/ClickHouse/pull/23805) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). Fixed a race condition in hedged connections which leads to crash. This fixes [#22161](https://github.com/ClickHouse/ClickHouse/issues/22161). [#22443](https://github.com/ClickHouse/ClickHouse/pull/22443) ([Kruglov Pavel](https://github.com/Avogar)). Fix possible crash in case if `unknown packet` was received from remote query (with `async_socket_for_remote` enabled). Fixes [#21167](https://github.com/ClickHouse/ClickHouse/issues/21167). [#23309](https://github.com/ClickHouse/ClickHouse/pull/23309) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fixed the behavior when disabling `input_format_with_names_use_header ` setting discards all the input with CSVWithNames format. This fixes [#22406](https://github.com/ClickHouse/ClickHouse/issues/22406). [#23202](https://github.com/ClickHouse/ClickHouse/pull/23202) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Fixed remote JDBC bridge timeout connection issue. Closes [#9609](https://github.com/ClickHouse/ClickHouse/issues/9609). [#23771](https://github.com/ClickHouse/ClickHouse/pull/23771) ([Maksim Kita](https://github.com/kitaisreal), [alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix the logic of initial load of `complex_key_hashed` if `update_field` is specified. Closes [#23800](https://github.com/ClickHouse/ClickHouse/issues/23800). [#23824](https://github.com/ClickHouse/ClickHouse/pull/23824) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fixed crash when `PREWHERE` and row policy filter are both in effect with empty result. [#23763](https://github.com/ClickHouse/ClickHouse/pull/23763) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Avoid possible "Cannot schedule a task" error (in case some exception had been occurred) on INSERT into Distributed. [#23744](https://github.com/ClickHouse/ClickHouse/pull/23744) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Added an exception in case of completely the same values in both samples in aggregate function `mannWhitneyUTest`. This fixes [#23646](https://github.com/ClickHouse/ClickHouse/issues/23646). [#23654](https://github.com/ClickHouse/ClickHouse/pull/23654) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Fixed server fault when inserting data through HTTP caused an exception. This fixes [#23512](https://github.com/ClickHouse/ClickHouse/issues/23512). [#23643](https://github.com/ClickHouse/ClickHouse/pull/23643) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Fixed misinterpretation of some `LIKE` expressions with escape sequences. [#23610](https://github.com/ClickHouse/ClickHouse/pull/23610) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fixed restart / stop command hanging. Closes [#20214](https://github.com/ClickHouse/ClickHouse/issues/20214). [#23552](https://github.com/ClickHouse/ClickHouse/pull/23552) ([filimonov](https://github.com/filimonov)).
|
||||
* Fixed `COLUMNS` matcher in case of multiple JOINs in select query. Closes [#22736](https://github.com/ClickHouse/ClickHouse/issues/22736). [#23501](https://github.com/ClickHouse/ClickHouse/pull/23501) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fixed a crash when modifying column's default value when a column itself is used as `ReplacingMergeTree`'s parameter. [#23483](https://github.com/ClickHouse/ClickHouse/pull/23483) ([hexiaoting](https://github.com/hexiaoting)).
|
||||
* Fixed corner cases in vertical merges with `ReplacingMergeTree`. In rare cases they could lead to fails of merges with exceptions like `Incomplete granules are not allowed while blocks are granules size`. [#23459](https://github.com/ClickHouse/ClickHouse/pull/23459) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fixed bug that does not allow cast from empty array literal, to array with dimensions greater than 1, e.g. `CAST([] AS Array(Array(String)))`. Closes [#14476](https://github.com/ClickHouse/ClickHouse/issues/14476). [#23456](https://github.com/ClickHouse/ClickHouse/pull/23456) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fixed a bug when `deltaSum` aggregate function produced incorrect result after resetting the counter. [#23437](https://github.com/ClickHouse/ClickHouse/pull/23437) ([Russ Frank](https://github.com/rf)).
|
||||
* Fixed `Cannot unlink file` error on unsuccessful creation of ReplicatedMergeTree table with multidisk configuration. This closes [#21755](https://github.com/ClickHouse/ClickHouse/issues/21755). [#23433](https://github.com/ClickHouse/ClickHouse/pull/23433) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fixed incompatible constant expression generation during partition pruning based on virtual columns. This fixes https://github.com/ClickHouse/ClickHouse/pull/21401#discussion_r611888913. [#23366](https://github.com/ClickHouse/ClickHouse/pull/23366) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fixed a crash when setting join_algorithm is set to 'auto' and Join is performed with a Dictionary. Close [#23002](https://github.com/ClickHouse/ClickHouse/issues/23002). [#23312](https://github.com/ClickHouse/ClickHouse/pull/23312) ([Vladimir](https://github.com/vdimir)).
|
||||
* Don't relax NOT conditions during partition pruning. This fixes [#23305](https://github.com/ClickHouse/ClickHouse/issues/23305) and [#21539](https://github.com/ClickHouse/ClickHouse/issues/21539). [#23310](https://github.com/ClickHouse/ClickHouse/pull/23310) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fixed very rare race condition on background cleanup of old blocks. It might cause a block not to be deduplicated if it's too close to the end of deduplication window. [#23301](https://github.com/ClickHouse/ClickHouse/pull/23301) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fixed very rare (distributed) race condition between creation and removal of ReplicatedMergeTree tables. It might cause exceptions like `node doesn't exist` on attempt to create replicated table. Fixes [#21419](https://github.com/ClickHouse/ClickHouse/issues/21419). [#23294](https://github.com/ClickHouse/ClickHouse/pull/23294) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fixed simple key dictionary from DDL creation if primary key is not first attribute. Fixes [#23236](https://github.com/ClickHouse/ClickHouse/issues/23236). [#23262](https://github.com/ClickHouse/ClickHouse/pull/23262) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fixed reading from ODBC when there are many long column names in a table. Closes [#8853](https://github.com/ClickHouse/ClickHouse/issues/8853). [#23215](https://github.com/ClickHouse/ClickHouse/pull/23215) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* MaterializeMySQL (experimental feature): fixed `Not found column` error when selecting from `MaterializeMySQL` with condition on key column. Fixes [#22432](https://github.com/ClickHouse/ClickHouse/issues/22432). [#23200](https://github.com/ClickHouse/ClickHouse/pull/23200) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Correct aliases handling if subquery was optimized to constant. Fixes [#22924](https://github.com/ClickHouse/ClickHouse/issues/22924). Fixes [#10401](https://github.com/ClickHouse/ClickHouse/issues/10401). [#23191](https://github.com/ClickHouse/ClickHouse/pull/23191) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Server might fail to start if `data_type_default_nullable` setting is enabled in default profile, it's fixed. Fixes [#22573](https://github.com/ClickHouse/ClickHouse/issues/22573). [#23185](https://github.com/ClickHouse/ClickHouse/pull/23185) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fixed a crash on shutdown which happened because of wrong accounting of current connections. [#23154](https://github.com/ClickHouse/ClickHouse/pull/23154) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fixed `Table .inner_id... doesn't exist` error when selecting from Materialized View after detaching it from Atomic database and attaching back. [#23047](https://github.com/ClickHouse/ClickHouse/pull/23047) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fix error `Cannot find column in ActionsDAG result` which may happen if subquery uses `untuple`. Fixes [#22290](https://github.com/ClickHouse/ClickHouse/issues/22290). [#22991](https://github.com/ClickHouse/ClickHouse/pull/22991) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix usage of constant columns of type `Map` with nullable values. [#22939](https://github.com/ClickHouse/ClickHouse/pull/22939) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* fixed `formatDateTime()` on `DateTime64` and "%C" format specifier fixed `toDateTime64()` for large values and non-zero scale. [#22937](https://github.com/ClickHouse/ClickHouse/pull/22937) ([Vasily Nemkov](https://github.com/Enmk)).
|
||||
* Fixed a crash when using `mannWhitneyUTest` and `rankCorr` with window functions. This fixes [#22728](https://github.com/ClickHouse/ClickHouse/issues/22728). [#22876](https://github.com/ClickHouse/ClickHouse/pull/22876) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* LIVE VIEW (experimental feature): fixed possible hanging in concurrent DROP/CREATE of TEMPORARY LIVE VIEW in `TemporaryLiveViewCleaner`, [see](https://gist.github.com/vzakaznikov/0c03195960fc86b56bfe2bc73a90019e). [#22858](https://github.com/ClickHouse/ClickHouse/pull/22858) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fixed pushdown of `HAVING` in case, when filter column is used in aggregation. [#22763](https://github.com/ClickHouse/ClickHouse/pull/22763) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fixed possible hangs in Zookeeper requests in case of OOM exception. Fixes [#22438](https://github.com/ClickHouse/ClickHouse/issues/22438). [#22684](https://github.com/ClickHouse/ClickHouse/pull/22684) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fixed wait for mutations on several replicas for ReplicatedMergeTree table engines. Previously, mutation/alter query may finish before mutation actually executed on other replicas. [#22669](https://github.com/ClickHouse/ClickHouse/pull/22669) ([alesapin](https://github.com/alesapin)).
|
||||
* Fixed exception for Log with nested types without columns in the SELECT clause. [#22654](https://github.com/ClickHouse/ClickHouse/pull/22654) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix unlimited wait for auxiliary AWS requests. [#22594](https://github.com/ClickHouse/ClickHouse/pull/22594) ([Vladimir Chebotarev](https://github.com/excitoon)).
|
||||
* Fixed a crash when client closes connection very early [#22579](https://github.com/ClickHouse/ClickHouse/issues/22579). [#22591](https://github.com/ClickHouse/ClickHouse/pull/22591) ([nvartolomei](https://github.com/nvartolomei)).
|
||||
* `Map` data type (experimental feature): fixed an incorrect formatting of function `map` in distributed queries. [#22588](https://github.com/ClickHouse/ClickHouse/pull/22588) ([foolchi](https://github.com/foolchi)).
|
||||
* Fixed deserialization of empty string without newline at end of TSV format. This closes [#20244](https://github.com/ClickHouse/ClickHouse/issues/20244). Possible workaround without version update: set `input_format_null_as_default` to zero. It was zero in old versions. [#22527](https://github.com/ClickHouse/ClickHouse/pull/22527) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fixed wrong cast of a column of `LowCardinality` type in Merge Join algorithm. Close [#22386](https://github.com/ClickHouse/ClickHouse/issues/22386), close [#22388](https://github.com/ClickHouse/ClickHouse/issues/22388). [#22510](https://github.com/ClickHouse/ClickHouse/pull/22510) ([Vladimir](https://github.com/vdimir)).
|
||||
* Buffer overflow (on read) was possible in `tokenbf_v1` full text index. The excessive bytes are not used but the read operation may lead to crash in rare cases. This closes [#19233](https://github.com/ClickHouse/ClickHouse/issues/19233). [#22421](https://github.com/ClickHouse/ClickHouse/pull/22421) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Do not limit HTTP chunk size. Fixes [#21907](https://github.com/ClickHouse/ClickHouse/issues/21907). [#22322](https://github.com/ClickHouse/ClickHouse/pull/22322) ([Ivan](https://github.com/abyss7)).
|
||||
* Fixed a bug, which leads to underaggregation of data in case of enabled `optimize_aggregation_in_order` and many parts in table. Slightly improve performance of aggregation with enabled `optimize_aggregation_in_order`. [#21889](https://github.com/ClickHouse/ClickHouse/pull/21889) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Check if table function view is used as a column. This complements #20350. [#21465](https://github.com/ClickHouse/ClickHouse/pull/21465) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix "unknown column" error for tables with `Merge` engine in queris with `JOIN` and aggregation. Closes [#18368](https://github.com/ClickHouse/ClickHouse/issues/18368), close [#22226](https://github.com/ClickHouse/ClickHouse/issues/22226). [#21370](https://github.com/ClickHouse/ClickHouse/pull/21370) ([Vladimir](https://github.com/vdimir)).
|
||||
* Fixed name clashes in pushdown optimization. It caused incorrect `WHERE` filtration after FULL JOIN. Close [#20497](https://github.com/ClickHouse/ClickHouse/issues/20497). [#20622](https://github.com/ClickHouse/ClickHouse/pull/20622) ([Vladimir](https://github.com/vdimir)).
|
||||
* Fixed very rare bug when quorum insert with `quorum_parallel=1` is not really "quorum" because of deduplication. [#18215](https://github.com/ClickHouse/ClickHouse/pull/18215) ([filimonov](https://github.com/filimonov) - reported, [alesapin](https://github.com/alesapin) - fixed).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Run stateless tests in parallel in CI. [#22300](https://github.com/ClickHouse/ClickHouse/pull/22300) ([alesapin](https://github.com/alesapin)).
|
||||
* Simplify debian packages. This fixes [#21698](https://github.com/ClickHouse/ClickHouse/issues/21698). [#22976](https://github.com/ClickHouse/ClickHouse/pull/22976) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Added support for ClickHouse build on Apple M1. [#21639](https://github.com/ClickHouse/ClickHouse/pull/21639) ([changvvb](https://github.com/changvvb)).
|
||||
* Fixed ClickHouse Keeper build for MacOS. [#22860](https://github.com/ClickHouse/ClickHouse/pull/22860) ([alesapin](https://github.com/alesapin)).
|
||||
* Fixed some tests on AArch64 platform. [#22596](https://github.com/ClickHouse/ClickHouse/pull/22596) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Added function alignment for possibly better performance. [#21431](https://github.com/ClickHouse/ClickHouse/pull/21431) ([Danila Kutenin](https://github.com/danlark1)).
|
||||
* Adjust some tests to output identical results on amd64 and aarch64 (qemu). The result was depending on implementation specific CPU behaviour. [#22590](https://github.com/ClickHouse/ClickHouse/pull/22590) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow query profiling only on x86_64. See [#15174](https://github.com/ClickHouse/ClickHouse/issues/15174#issuecomment-812954965) and [#15638](https://github.com/ClickHouse/ClickHouse/issues/15638#issuecomment-703805337). This closes [#15638](https://github.com/ClickHouse/ClickHouse/issues/15638). [#22580](https://github.com/ClickHouse/ClickHouse/pull/22580) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow building with unbundled xz (lzma) using `USE_INTERNAL_XZ_LIBRARY=OFF` CMake option. [#22571](https://github.com/ClickHouse/ClickHouse/pull/22571) ([Kfir Itzhak](https://github.com/mastertheknife)).
|
||||
* Enable bundled `openldap` on `ppc64le` [#22487](https://github.com/ClickHouse/ClickHouse/pull/22487) ([Kfir Itzhak](https://github.com/mastertheknife)).
|
||||
* Disable incompatible libraries (platform specific typically) on `ppc64le` [#22475](https://github.com/ClickHouse/ClickHouse/pull/22475) ([Kfir Itzhak](https://github.com/mastertheknife)).
|
||||
* Add Jepsen test in CI for clickhouse Keeper. [#22373](https://github.com/ClickHouse/ClickHouse/pull/22373) ([alesapin](https://github.com/alesapin)).
|
||||
* Build `jemalloc` with support for [heap profiling](https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling). [#22834](https://github.com/ClickHouse/ClickHouse/pull/22834) ([nvartolomei](https://github.com/nvartolomei)).
|
||||
* Avoid UB in `*Log` engines for rwlock unlock due to unlock from another thread. [#22583](https://github.com/ClickHouse/ClickHouse/pull/22583) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fixed UB by unlocking the rwlock of the TinyLog from the same thread. [#22560](https://github.com/ClickHouse/ClickHouse/pull/22560) ([Azat Khuzhin](https://github.com/azat)).
|
||||
|
||||
|
||||
## ClickHouse release 21.4
|
||||
|
||||
### ClickHouse release 21.4.1 2021-04-12
|
||||
|
@ -36,7 +36,7 @@ option(FAIL_ON_UNSUPPORTED_OPTIONS_COMBINATION
|
||||
if(FAIL_ON_UNSUPPORTED_OPTIONS_COMBINATION)
|
||||
set(RECONFIGURE_MESSAGE_LEVEL FATAL_ERROR)
|
||||
else()
|
||||
set(RECONFIGURE_MESSAGE_LEVEL STATUS)
|
||||
set(RECONFIGURE_MESSAGE_LEVEL WARNING)
|
||||
endif()
|
||||
|
||||
enable_language(C CXX ASM)
|
||||
@ -504,7 +504,6 @@ include (cmake/find/libuv.cmake) # for amqpcpp and cassandra
|
||||
include (cmake/find/amqpcpp.cmake)
|
||||
include (cmake/find/capnp.cmake)
|
||||
include (cmake/find/llvm.cmake)
|
||||
include (cmake/find/termcap.cmake) # for external static llvm
|
||||
include (cmake/find/h3.cmake)
|
||||
include (cmake/find/libxml2.cmake)
|
||||
include (cmake/find/brotli.cmake)
|
||||
@ -527,6 +526,7 @@ include (cmake/find/nanodbc.cmake)
|
||||
include (cmake/find/rocksdb.cmake)
|
||||
include (cmake/find/libpqxx.cmake)
|
||||
include (cmake/find/nuraft.cmake)
|
||||
include (cmake/find/yaml-cpp.cmake)
|
||||
|
||||
|
||||
if(NOT USE_INTERNAL_PARQUET_LIBRARY)
|
||||
|
@ -8,8 +8,11 @@ ClickHouse® is an open-source column-oriented database management system that a
|
||||
* [Tutorial](https://clickhouse.tech/docs/en/getting_started/tutorial/) shows how to set up and query small ClickHouse cluster.
|
||||
* [Documentation](https://clickhouse.tech/docs/en/) provides more in-depth information.
|
||||
* [YouTube channel](https://www.youtube.com/c/ClickHouseDB) has a lot of content about ClickHouse in video format.
|
||||
* [Slack](https://join.slack.com/t/clickhousedb/shared_invite/zt-nwwakmk4-xOJ6cdy0sJC3It8j348~IA) and [Telegram](https://telegram.me/clickhouse_en) allow to chat with ClickHouse users in real-time.
|
||||
* [Slack](https://join.slack.com/t/clickhousedb/shared_invite/zt-qfort0u8-TWqK4wIP0YSdoDE0btKa1w) and [Telegram](https://telegram.me/clickhouse_en) allow to chat with ClickHouse users in real-time.
|
||||
* [Blog](https://clickhouse.yandex/blog/en/) contains various ClickHouse-related articles, as well as announcements and reports about events.
|
||||
* [Code Browser](https://clickhouse.tech/codebrowser/html_report/ClickHouse/index.html) with syntax highlight and navigation.
|
||||
* [Contacts](https://clickhouse.tech/#contacts) can help to get your questions answered if there are any.
|
||||
* You can also [fill this form](https://clickhouse.tech/#meet) to meet Yandex ClickHouse team in person.
|
||||
|
||||
## Upcoming Events
|
||||
* [SF Bay Area ClickHouse Community Meetup (online)](https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/278144089/) on 16 June 2021.
|
||||
|
@ -3,5 +3,11 @@ add_library (bridge
|
||||
)
|
||||
|
||||
target_include_directories (daemon PUBLIC ..)
|
||||
target_link_libraries (bridge PRIVATE daemon dbms Poco::Data Poco::Data::ODBC)
|
||||
target_link_libraries (bridge
|
||||
PRIVATE
|
||||
daemon
|
||||
dbms
|
||||
Poco::Data
|
||||
Poco::Data::ODBC
|
||||
)
|
||||
|
||||
|
@ -468,7 +468,7 @@ void BaseDaemon::reloadConfiguration()
|
||||
* instead of using files specified in config.xml.
|
||||
* (It's convenient to log in console when you start server without any command line parameters.)
|
||||
*/
|
||||
config_path = config().getString("config-file", "config.xml");
|
||||
config_path = config().getString("config-file", getDefaultConfigFileName());
|
||||
DB::ConfigProcessor config_processor(config_path, false, true);
|
||||
config_processor.setConfigPath(Poco::Path(config_path).makeParent().toString());
|
||||
loaded_config = config_processor.loadConfig(/* allow_zk_includes = */ true);
|
||||
@ -516,6 +516,11 @@ std::string BaseDaemon::getDefaultCorePath() const
|
||||
return "/opt/cores/";
|
||||
}
|
||||
|
||||
std::string BaseDaemon::getDefaultConfigFileName() const
|
||||
{
|
||||
return "config.xml";
|
||||
}
|
||||
|
||||
void BaseDaemon::closeFDs()
|
||||
{
|
||||
#if defined(OS_FREEBSD) || defined(OS_DARWIN)
|
||||
|
@ -149,6 +149,8 @@ protected:
|
||||
|
||||
virtual std::string getDefaultCorePath() const;
|
||||
|
||||
virtual std::string getDefaultConfigFileName() const;
|
||||
|
||||
std::optional<DB::StatusFile> pid_file;
|
||||
|
||||
std::atomic_bool is_cancelled{false};
|
||||
|
@ -78,6 +78,8 @@ PoolWithFailover::PoolWithFailover(
|
||||
const RemoteDescription & addresses,
|
||||
const std::string & user,
|
||||
const std::string & password,
|
||||
unsigned default_connections_,
|
||||
unsigned max_connections_,
|
||||
size_t max_tries_)
|
||||
: max_tries(max_tries_)
|
||||
, shareable(false)
|
||||
@ -85,7 +87,13 @@ PoolWithFailover::PoolWithFailover(
|
||||
/// Replicas have the same priority, but traversed replicas are moved to the end of the queue.
|
||||
for (const auto & [host, port] : addresses)
|
||||
{
|
||||
replicas_by_priority[0].emplace_back(std::make_shared<Pool>(database, host, user, password, port));
|
||||
replicas_by_priority[0].emplace_back(std::make_shared<Pool>(database,
|
||||
host, user, password, port,
|
||||
/* socket_ = */ "",
|
||||
MYSQLXX_DEFAULT_TIMEOUT,
|
||||
MYSQLXX_DEFAULT_RW_TIMEOUT,
|
||||
default_connections_,
|
||||
max_connections_));
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -115,6 +115,8 @@ namespace mysqlxx
|
||||
const RemoteDescription & addresses,
|
||||
const std::string & user,
|
||||
const std::string & password,
|
||||
unsigned default_connections_ = MYSQLXX_POOL_WITH_FAILOVER_DEFAULT_START_CONNECTIONS,
|
||||
unsigned max_connections_ = MYSQLXX_POOL_WITH_FAILOVER_DEFAULT_MAX_CONNECTIONS,
|
||||
size_t max_tries_ = MYSQLXX_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES);
|
||||
|
||||
PoolWithFailover(const PoolWithFailover & other);
|
||||
|
@ -1,9 +1,9 @@
|
||||
# This strings autochanged from release_lib.sh:
|
||||
SET(VERSION_REVISION 54451)
|
||||
SET(VERSION_REVISION 54452)
|
||||
SET(VERSION_MAJOR 21)
|
||||
SET(VERSION_MINOR 6)
|
||||
SET(VERSION_MINOR 7)
|
||||
SET(VERSION_PATCH 1)
|
||||
SET(VERSION_GITHASH 96fced4c3cf432fb0b401d2ab01f0c56e5f74a96)
|
||||
SET(VERSION_DESCRIBE v21.6.1.1-prestable)
|
||||
SET(VERSION_STRING 21.6.1.1)
|
||||
SET(VERSION_GITHASH 976ccc2e908ac3bc28f763bfea8134ea0a121b40)
|
||||
SET(VERSION_DESCRIBE v21.7.1.1-prestable)
|
||||
SET(VERSION_STRING 21.7.1.1)
|
||||
# end of autochange
|
||||
|
@ -1,102 +1,34 @@
|
||||
if (APPLE OR SPLIT_SHARED_LIBRARIES OR NOT ARCH_AMD64)
|
||||
if (APPLE OR SPLIT_SHARED_LIBRARIES OR NOT ARCH_AMD64 OR SANITIZE STREQUAL "undefined")
|
||||
set (ENABLE_EMBEDDED_COMPILER OFF CACHE INTERNAL "")
|
||||
endif()
|
||||
|
||||
option (ENABLE_EMBEDDED_COMPILER "Enable support for 'compile_expressions' option for query execution" ON)
|
||||
# Broken in macos. TODO: update clang, re-test, enable on Apple
|
||||
if (ENABLE_EMBEDDED_COMPILER AND NOT SPLIT_SHARED_LIBRARIES AND ARCH_AMD64 AND NOT (SANITIZE STREQUAL "undefined"))
|
||||
option (USE_INTERNAL_LLVM_LIBRARY "Use bundled or system LLVM library." ${NOT_UNBUNDLED})
|
||||
endif()
|
||||
|
||||
if (NOT ENABLE_EMBEDDED_COMPILER)
|
||||
if(USE_INTERNAL_LLVM_LIBRARY)
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Cannot use internal LLVM library with ENABLE_EMBEDDED_COMPILER=OFF")
|
||||
endif()
|
||||
set (USE_EMBEDDED_COMPILER 0)
|
||||
return()
|
||||
endif()
|
||||
|
||||
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/llvm/llvm/CMakeLists.txt")
|
||||
if (USE_INTERNAL_LLVM_LIBRARY)
|
||||
message (WARNING "submodule contrib/llvm is missing. to fix try run: \n git submodule update --init --recursive")
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't fidd internal LLVM library")
|
||||
endif()
|
||||
set (MISSING_INTERNAL_LLVM_LIBRARY 1)
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "submodule /contrib/llvm is missing. to fix try run: \n git submodule update --init --recursive")
|
||||
endif ()
|
||||
|
||||
if (NOT USE_INTERNAL_LLVM_LIBRARY)
|
||||
set (LLVM_PATHS "/usr/local/lib/llvm" "/usr/lib/llvm")
|
||||
set (USE_EMBEDDED_COMPILER 1)
|
||||
|
||||
foreach(llvm_v 11.1 11)
|
||||
if (NOT LLVM_FOUND)
|
||||
find_package (LLVM ${llvm_v} CONFIG PATHS ${LLVM_PATHS})
|
||||
endif ()
|
||||
endforeach ()
|
||||
set (LLVM_FOUND 1)
|
||||
set (LLVM_VERSION "12.0.0bundled")
|
||||
set (LLVM_INCLUDE_DIRS
|
||||
"${ClickHouse_SOURCE_DIR}/contrib/llvm/llvm/include"
|
||||
"${ClickHouse_BINARY_DIR}/contrib/llvm/llvm/include"
|
||||
)
|
||||
set (LLVM_LIBRARY_DIRS "${ClickHouse_BINARY_DIR}/contrib/llvm/llvm")
|
||||
|
||||
if (LLVM_FOUND)
|
||||
# Remove dynamically-linked zlib and libedit from LLVM's dependencies:
|
||||
set_target_properties(LLVMSupport PROPERTIES INTERFACE_LINK_LIBRARIES "-lpthread;LLVMDemangle;${ZLIB_LIBRARIES}")
|
||||
set_target_properties(LLVMLineEditor PROPERTIES INTERFACE_LINK_LIBRARIES "LLVMSupport")
|
||||
|
||||
option(LLVM_HAS_RTTI "Enable if LLVM was build with RTTI enabled" ON)
|
||||
set (USE_EMBEDDED_COMPILER 1)
|
||||
else()
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find system LLVM")
|
||||
set (USE_EMBEDDED_COMPILER 0)
|
||||
endif()
|
||||
|
||||
if (LLVM_FOUND AND OS_LINUX AND USE_LIBCXX AND NOT FORCE_LLVM_WITH_LIBCXX)
|
||||
message(WARNING "Option USE_INTERNAL_LLVM_LIBRARY is not set but the LLVM library from OS packages "
|
||||
"in Linux is incompatible with libc++ ABI. LLVM Will be disabled. Force: -DFORCE_LLVM_WITH_LIBCXX=ON")
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Unsupported LLVM configuration, cannot enable LLVM")
|
||||
set (LLVM_FOUND 0)
|
||||
set (USE_EMBEDDED_COMPILER 0)
|
||||
endif ()
|
||||
endif()
|
||||
|
||||
if(NOT LLVM_FOUND AND NOT MISSING_INTERNAL_LLVM_LIBRARY)
|
||||
if (CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR)
|
||||
message(WARNING "Option ENABLE_EMBEDDED_COMPILER is set but internal LLVM library cannot build if build directory is the same as source directory.")
|
||||
set (LLVM_FOUND 0)
|
||||
set (USE_EMBEDDED_COMPILER 0)
|
||||
elseif (SPLIT_SHARED_LIBRARIES)
|
||||
# llvm-tablegen cannot find shared libraries that we build. Probably can be easily fixed.
|
||||
message(WARNING "Option USE_INTERNAL_LLVM_LIBRARY is not compatible with SPLIT_SHARED_LIBRARIES. Build of LLVM will be disabled.")
|
||||
set (LLVM_FOUND 0)
|
||||
set (USE_EMBEDDED_COMPILER 0)
|
||||
elseif (NOT ARCH_AMD64)
|
||||
# It's not supported yet, but you can help.
|
||||
message(WARNING "Option USE_INTERNAL_LLVM_LIBRARY is only available for x86_64. Build of LLVM will be disabled.")
|
||||
set (LLVM_FOUND 0)
|
||||
set (USE_EMBEDDED_COMPILER 0)
|
||||
elseif (SANITIZE STREQUAL "undefined")
|
||||
# llvm-tblgen, that is used during LLVM build, doesn't work with UBSan.
|
||||
message(WARNING "Option USE_INTERNAL_LLVM_LIBRARY does not work with UBSan, because 'llvm-tblgen' tool from LLVM has undefined behaviour. Build of LLVM will be disabled.")
|
||||
set (LLVM_FOUND 0)
|
||||
set (USE_EMBEDDED_COMPILER 0)
|
||||
else ()
|
||||
set (USE_INTERNAL_LLVM_LIBRARY ON)
|
||||
set (LLVM_FOUND 1)
|
||||
set (USE_EMBEDDED_COMPILER 1)
|
||||
set (LLVM_VERSION "9.0.0bundled")
|
||||
set (LLVM_INCLUDE_DIRS
|
||||
"${ClickHouse_SOURCE_DIR}/contrib/llvm/llvm/include"
|
||||
"${ClickHouse_BINARY_DIR}/contrib/llvm/llvm/include"
|
||||
)
|
||||
set (LLVM_LIBRARY_DIRS "${ClickHouse_BINARY_DIR}/contrib/llvm/llvm")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
if (LLVM_FOUND)
|
||||
message(STATUS "LLVM include Directory: ${LLVM_INCLUDE_DIRS}")
|
||||
message(STATUS "LLVM library Directory: ${LLVM_LIBRARY_DIRS}")
|
||||
message(STATUS "LLVM C++ compiler flags: ${LLVM_CXXFLAGS}")
|
||||
else()
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't enable LLVM")
|
||||
endif()
|
||||
message(STATUS "LLVM include Directory: ${LLVM_INCLUDE_DIRS}")
|
||||
message(STATUS "LLVM library Directory: ${LLVM_LIBRARY_DIRS}")
|
||||
message(STATUS "LLVM C++ compiler flags: ${LLVM_CXXFLAGS}")
|
||||
|
||||
# This list was generated by listing all LLVM libraries, compiling the binary and removing all libraries while it still compiles.
|
||||
set (REQUIRED_LLVM_LIBRARIES
|
||||
LLVMOrcJIT
|
||||
LLVMExecutionEngine
|
||||
LLVMRuntimeDyld
|
||||
LLVMX86CodeGen
|
||||
|
@ -1,17 +0,0 @@
|
||||
if (ENABLE_EMBEDDED_COMPILER AND NOT USE_INTERNAL_LLVM_LIBRARY AND USE_STATIC_LIBRARIES)
|
||||
find_library (TERMCAP_LIBRARY tinfo)
|
||||
if (NOT TERMCAP_LIBRARY)
|
||||
find_library (TERMCAP_LIBRARY ncurses)
|
||||
endif()
|
||||
if (NOT TERMCAP_LIBRARY)
|
||||
find_library (TERMCAP_LIBRARY termcap)
|
||||
endif()
|
||||
|
||||
if (NOT TERMCAP_LIBRARY)
|
||||
message (FATAL_ERROR "Statically Linking external LLVM requires termcap")
|
||||
endif()
|
||||
|
||||
target_link_libraries(LLVMSupport INTERFACE ${TERMCAP_LIBRARY})
|
||||
|
||||
message (STATUS "Using termcap: ${TERMCAP_LIBRARY}")
|
||||
endif()
|
9
cmake/find/yaml-cpp.cmake
Normal file
9
cmake/find/yaml-cpp.cmake
Normal file
@ -0,0 +1,9 @@
|
||||
option(USE_YAML_CPP "Enable yaml-cpp" ${ENABLE_LIBRARIES})
|
||||
|
||||
if (NOT USE_YAML_CPP)
|
||||
return()
|
||||
endif()
|
||||
|
||||
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/yaml-cpp")
|
||||
message (ERROR "submodule contrib/yaml-cpp is missing. to fix try run: \n git submodule update --init --recursive")
|
||||
endif()
|
9
contrib/CMakeLists.txt
vendored
9
contrib/CMakeLists.txt
vendored
@ -50,6 +50,10 @@ add_subdirectory (replxx-cmake)
|
||||
add_subdirectory (unixodbc-cmake)
|
||||
add_subdirectory (nanodbc-cmake)
|
||||
|
||||
if (USE_YAML_CPP)
|
||||
add_subdirectory (yaml-cpp-cmake)
|
||||
endif()
|
||||
|
||||
if (USE_INTERNAL_XZ_LIBRARY)
|
||||
add_subdirectory (xz)
|
||||
endif()
|
||||
@ -205,11 +209,12 @@ elseif(GTEST_SRC_DIR)
|
||||
target_compile_definitions(gtest INTERFACE GTEST_HAS_POSIX_RE=0)
|
||||
endif()
|
||||
|
||||
if (USE_EMBEDDED_COMPILER AND USE_INTERNAL_LLVM_LIBRARY)
|
||||
if (USE_EMBEDDED_COMPILER)
|
||||
# ld: unknown option: --color-diagnostics
|
||||
if (APPLE)
|
||||
set (LINKER_SUPPORTS_COLOR_DIAGNOSTICS 0 CACHE INTERNAL "")
|
||||
endif ()
|
||||
|
||||
set (LLVM_ENABLE_EH 1 CACHE INTERNAL "")
|
||||
set (LLVM_ENABLE_RTTI 1 CACHE INTERNAL "")
|
||||
set (LLVM_ENABLE_PIC 0 CACHE INTERNAL "")
|
||||
@ -224,8 +229,6 @@ if (USE_EMBEDDED_COMPILER AND USE_INTERNAL_LLVM_LIBRARY)
|
||||
|
||||
set (CMAKE_CXX_STANDARD ${CMAKE_CXX_STANDARD_bak})
|
||||
unset (CMAKE_CXX_STANDARD_bak)
|
||||
|
||||
target_include_directories(LLVMSupport SYSTEM BEFORE PRIVATE ${ZLIB_INCLUDE_DIR})
|
||||
endif ()
|
||||
|
||||
if (USE_INTERNAL_LIBGSASL_LIBRARY)
|
||||
|
2
contrib/avro
vendored
2
contrib/avro
vendored
@ -1 +1 @@
|
||||
Subproject commit 92caca2d42fc9a97e34e95f963593539d32ed331
|
||||
Subproject commit 1ee16d8c5a7808acff5cf0475f771195d9aa3faa
|
2
contrib/grpc
vendored
2
contrib/grpc
vendored
@ -1 +1 @@
|
||||
Subproject commit 5b79aae85c515e0df4abfb7b1e07975fdc7cecc1
|
||||
Subproject commit 60c986e15cae70aade721d26badabab1f822fdd6
|
2
contrib/libunwind
vendored
2
contrib/libunwind
vendored
@ -1 +1 @@
|
||||
Subproject commit 8fe25d7dc70f2a4ea38c3e5a33fa9d4199b67a5a
|
||||
Subproject commit a491c27b33109a842d577c0f7ac5f5f218859181
|
2
contrib/llvm
vendored
2
contrib/llvm
vendored
@ -1 +1 @@
|
||||
Subproject commit cfaf365cf96918999d09d976ec736b4518cf5d02
|
||||
Subproject commit e5751459412bce1391fb7a2e9bbc01e131bf72f1
|
2
contrib/re2
vendored
2
contrib/re2
vendored
@ -1 +1 @@
|
||||
Subproject commit 7cf8b88e8f70f97fd4926b56aa87e7f53b2717e0
|
||||
Subproject commit 13ebb377c6ad763ca61d12dd6f88b1126bd0b911
|
@ -1,7 +1,7 @@
|
||||
file (READ ${SOURCE_FILENAME} CONTENT)
|
||||
string (REGEX REPLACE "using re2::RE2;" "" CONTENT "${CONTENT}")
|
||||
string (REGEX REPLACE "using re2::LazyRE2;" "" CONTENT "${CONTENT}")
|
||||
string (REGEX REPLACE "namespace re2" "namespace re2_st" CONTENT "${CONTENT}")
|
||||
string (REGEX REPLACE "namespace re2 {" "namespace re2_st {" CONTENT "${CONTENT}")
|
||||
string (REGEX REPLACE "re2::" "re2_st::" CONTENT "${CONTENT}")
|
||||
string (REGEX REPLACE "\"re2/" "\"re2_st/" CONTENT "${CONTENT}")
|
||||
string (REGEX REPLACE "(.\\*?_H)" "\\1_ST" CONTENT "${CONTENT}")
|
||||
|
1
contrib/yaml-cpp
vendored
Submodule
1
contrib/yaml-cpp
vendored
Submodule
@ -0,0 +1 @@
|
||||
Subproject commit 0c86adac6d117ee2b4afcedb8ade19036ca0327d
|
39
contrib/yaml-cpp-cmake/CMakeLists.txt
Normal file
39
contrib/yaml-cpp-cmake/CMakeLists.txt
Normal file
@ -0,0 +1,39 @@
|
||||
set (LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/yaml-cpp)
|
||||
|
||||
set (SRCS
|
||||
${LIBRARY_DIR}/src/binary.cpp
|
||||
${LIBRARY_DIR}/src/emitterutils.cpp
|
||||
${LIBRARY_DIR}/src/null.cpp
|
||||
${LIBRARY_DIR}/src/scantoken.cpp
|
||||
${LIBRARY_DIR}/src/convert.cpp
|
||||
${LIBRARY_DIR}/src/exceptions.cpp
|
||||
${LIBRARY_DIR}/src/ostream_wrapper.cpp
|
||||
${LIBRARY_DIR}/src/simplekey.cpp
|
||||
${LIBRARY_DIR}/src/depthguard.cpp
|
||||
${LIBRARY_DIR}/src/exp.cpp
|
||||
${LIBRARY_DIR}/src/parse.cpp
|
||||
${LIBRARY_DIR}/src/singledocparser.cpp
|
||||
${LIBRARY_DIR}/src/directives.cpp
|
||||
${LIBRARY_DIR}/src/memory.cpp
|
||||
${LIBRARY_DIR}/src/parser.cpp
|
||||
${LIBRARY_DIR}/src/stream.cpp
|
||||
${LIBRARY_DIR}/src/emit.cpp
|
||||
${LIBRARY_DIR}/src/nodebuilder.cpp
|
||||
${LIBRARY_DIR}/src/regex_yaml.cpp
|
||||
${LIBRARY_DIR}/src/tag.cpp
|
||||
${LIBRARY_DIR}/src/emitfromevents.cpp
|
||||
${LIBRARY_DIR}/src/node.cpp
|
||||
${LIBRARY_DIR}/src/scanner.cpp
|
||||
${LIBRARY_DIR}/src/emitter.cpp
|
||||
${LIBRARY_DIR}/src/node_data.cpp
|
||||
${LIBRARY_DIR}/src/scanscalar.cpp
|
||||
${LIBRARY_DIR}/src/emitterstate.cpp
|
||||
${LIBRARY_DIR}/src/nodeevents.cpp
|
||||
${LIBRARY_DIR}/src/scantag.cpp
|
||||
)
|
||||
|
||||
add_library (yaml-cpp ${SRCS})
|
||||
|
||||
|
||||
target_include_directories(yaml-cpp PRIVATE ${LIBRARY_DIR}/include/yaml-cpp)
|
||||
target_include_directories(yaml-cpp SYSTEM BEFORE PUBLIC ${LIBRARY_DIR}/include)
|
4
debian/changelog
vendored
4
debian/changelog
vendored
@ -1,5 +1,5 @@
|
||||
clickhouse (21.6.1.1) unstable; urgency=low
|
||||
clickhouse (21.7.1.1) unstable; urgency=low
|
||||
|
||||
* Modified source code
|
||||
|
||||
-- clickhouse-release <clickhouse-release@yandex-team.ru> Tue, 20 Apr 2021 01:48:16 +0300
|
||||
-- clickhouse-release <clickhouse-release@yandex-team.ru> Thu, 20 May 2021 22:23:29 +0300
|
||||
|
@ -1,7 +1,7 @@
|
||||
FROM ubuntu:18.04
|
||||
|
||||
ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
|
||||
ARG version=21.6.1.*
|
||||
ARG version=21.7.1.*
|
||||
|
||||
RUN apt-get update \
|
||||
&& apt-get install --yes --no-install-recommends \
|
||||
|
@ -154,6 +154,10 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ
|
||||
|
||||
if clang_tidy:
|
||||
cmake_flags.append('-DENABLE_CLANG_TIDY=1')
|
||||
cmake_flags.append('-DENABLE_UTILS=1')
|
||||
cmake_flags.append('-DUSE_GTEST=1')
|
||||
cmake_flags.append('-DENABLE_TESTS=1')
|
||||
cmake_flags.append('-DENABLE_EXAMPLES=1')
|
||||
# Don't stop on first error to find more clang-tidy errors in one run.
|
||||
result.append('NINJA_FLAGS=-k0')
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
FROM ubuntu:20.04
|
||||
|
||||
ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
|
||||
ARG version=21.6.1.*
|
||||
ARG version=21.7.1.*
|
||||
ARG gosu_ver=1.10
|
||||
|
||||
# set non-empty deb_location_url url to create a docker image
|
||||
|
@ -1,7 +1,7 @@
|
||||
FROM ubuntu:18.04
|
||||
|
||||
ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
|
||||
ARG version=21.6.1.*
|
||||
ARG version=21.7.1.*
|
||||
|
||||
RUN apt-get update && \
|
||||
apt-get install -y apt-transport-https dirmngr && \
|
||||
|
@ -73,7 +73,7 @@ function start_server
|
||||
--path "$FASTTEST_DATA"
|
||||
--user_files_path "$FASTTEST_DATA/user_files"
|
||||
--top_level_domains_path "$FASTTEST_DATA/top_level_domains"
|
||||
--keeper_server.log_storage_path "$FASTTEST_DATA/coordination"
|
||||
--keeper_server.storage_path "$FASTTEST_DATA/coordination"
|
||||
)
|
||||
clickhouse-server "${opts[@]}" &>> "$FASTTEST_OUTPUT/server.log" &
|
||||
server_pid=$!
|
||||
@ -374,37 +374,17 @@ function run_tests
|
||||
01801_s3_cluster
|
||||
|
||||
# Depends on LLVM JIT
|
||||
01072_nullable_jit
|
||||
01852_jit_if
|
||||
01865_jit_comparison_constant_result
|
||||
01871_merge_tree_compile_expressions
|
||||
)
|
||||
|
||||
(time clickhouse-test --hung-check -j 8 --order=random --use-skip-list --no-long --testname --shard --zookeeper --skip "${TESTS_TO_SKIP[@]}" -- "$FASTTEST_FOCUS" 2>&1 ||:) | ts '%Y-%m-%d %H:%M:%S' | tee "$FASTTEST_OUTPUT/test_log.txt"
|
||||
|
||||
# substr is to remove semicolon after test name
|
||||
readarray -t FAILED_TESTS < <(awk '/\[ FAIL|TIMEOUT|ERROR \]/ { print substr($3, 1, length($3)-1) }' "$FASTTEST_OUTPUT/test_log.txt" | tee "$FASTTEST_OUTPUT/failed-parallel-tests.txt")
|
||||
|
||||
# We will rerun sequentially any tests that have failed during parallel run.
|
||||
# They might have failed because there was some interference from other tests
|
||||
# running concurrently. If they fail even in seqential mode, we will report them.
|
||||
# FIXME All tests that require exclusive access to the server must be
|
||||
# explicitly marked as `sequential`, and `clickhouse-test` must detect them and
|
||||
# run them in a separate group after all other tests. This is faster and also
|
||||
# explicit instead of guessing.
|
||||
if [[ -n "${FAILED_TESTS[*]}" ]]
|
||||
then
|
||||
stop_server ||:
|
||||
|
||||
# Clean the data so that there is no interference from the previous test run.
|
||||
rm -rf "$FASTTEST_DATA"/{{meta,}data,user_files,coordination} ||:
|
||||
|
||||
start_server
|
||||
|
||||
echo "Going to run again: ${FAILED_TESTS[*]}"
|
||||
|
||||
clickhouse-test --hung-check --order=random --no-long --testname --shard --zookeeper "${FAILED_TESTS[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee -a "$FASTTEST_OUTPUT/test_log.txt"
|
||||
else
|
||||
echo "No failed tests"
|
||||
fi
|
||||
time clickhouse-test --hung-check -j 8 --order=random --use-skip-list \
|
||||
--no-long --testname --shard --zookeeper --skip "${TESTS_TO_SKIP[@]}" \
|
||||
-- "$FASTTEST_FOCUS" 2>&1 \
|
||||
| ts '%Y-%m-%d %H:%M:%S' \
|
||||
| tee "$FASTTEST_OUTPUT/test_log.txt"
|
||||
}
|
||||
|
||||
case "$stage" in
|
||||
|
@ -56,17 +56,19 @@ function watchdog
|
||||
sleep 3600
|
||||
|
||||
echo "Fuzzing run has timed out"
|
||||
killall clickhouse-client ||:
|
||||
for _ in {1..10}
|
||||
do
|
||||
if ! pgrep -f clickhouse-client
|
||||
# Only kill by pid the particular client that runs the fuzzing, or else
|
||||
# we can kill some clickhouse-client processes this script starts later,
|
||||
# e.g. for checking server liveness.
|
||||
if ! kill $fuzzer_pid
|
||||
then
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
killall -9 clickhouse-client ||:
|
||||
kill -9 -- $fuzzer_pid ||:
|
||||
}
|
||||
|
||||
function filter_exists
|
||||
@ -85,7 +87,7 @@ function fuzz
|
||||
{
|
||||
# Obtain the list of newly added tests. They will be fuzzed in more extreme way than other tests.
|
||||
# Don't overwrite the NEW_TESTS_OPT so that it can be set from the environment.
|
||||
NEW_TESTS="$(grep -P 'tests/queries/0_stateless/.*\.sql' ci-changed-files.txt | sed -r -e 's!^!ch/!' | sort -R)"
|
||||
NEW_TESTS="$(sed -n 's!\(^tests/queries/0_stateless/.*\.sql\)$!ch/\1!p' ci-changed-files.txt | sort -R)"
|
||||
# ci-changed-files.txt contains also files that has been deleted/renamed, filter them out.
|
||||
NEW_TESTS="$(filter_exists $NEW_TESTS)"
|
||||
if [[ -n "$NEW_TESTS" ]]
|
||||
@ -115,17 +117,49 @@ continue
|
||||
|
||||
gdb -batch -command script.gdb -p "$(pidof clickhouse-server)" &
|
||||
|
||||
fuzzer_exit_code=0
|
||||
# SC2012: Use find instead of ls to better handle non-alphanumeric filenames. They are all alphanumeric.
|
||||
# SC2046: Quote this to prevent word splitting. Actually I need word splitting.
|
||||
# shellcheck disable=SC2012,SC2046
|
||||
clickhouse-client --query-fuzzer-runs=1000 --queries-file $(ls -1 ch/tests/queries/0_stateless/*.sql | sort -R) $NEW_TESTS_OPT \
|
||||
> >(tail -n 100000 > fuzzer.log) \
|
||||
2>&1 \
|
||||
|| fuzzer_exit_code=$?
|
||||
2>&1 &
|
||||
fuzzer_pid=$!
|
||||
echo "Fuzzer pid is $fuzzer_pid"
|
||||
|
||||
# Start a watchdog that should kill the fuzzer on timeout.
|
||||
# The shell won't kill the child sleep when we kill it, so we have to put it
|
||||
# into a separate process group so that we can kill them all.
|
||||
set -m
|
||||
watchdog &
|
||||
watchdog_pid=$!
|
||||
set +m
|
||||
# Check that the watchdog has started.
|
||||
kill -0 $watchdog_pid
|
||||
|
||||
# Wait for the fuzzer to complete.
|
||||
# Note that the 'wait || ...' thing is required so that the script doesn't
|
||||
# exit because of 'set -e' when 'wait' returns nonzero code.
|
||||
fuzzer_exit_code=0
|
||||
wait "$fuzzer_pid" || fuzzer_exit_code=$?
|
||||
echo "Fuzzer exit code is $fuzzer_exit_code"
|
||||
|
||||
kill -- -$watchdog_pid ||:
|
||||
|
||||
# If the server dies, most often the fuzzer returns code 210: connetion
|
||||
# refused, and sometimes also code 32: attempt to read after eof. For
|
||||
# simplicity, check again whether the server is accepting connections, using
|
||||
# clickhouse-client. We don't check for existence of server process, because
|
||||
# the process is still present while the server is terminating and not
|
||||
# accepting the connections anymore.
|
||||
if clickhouse-client --query "select 1 format Null"
|
||||
then
|
||||
server_died=0
|
||||
else
|
||||
echo "Server live check returns $?"
|
||||
server_died=1
|
||||
fi
|
||||
|
||||
# Stop the server.
|
||||
clickhouse-client --query "select elapsed, query from system.processes" ||:
|
||||
killall clickhouse-server ||:
|
||||
for _ in {1..10}
|
||||
@ -137,6 +171,41 @@ continue
|
||||
sleep 1
|
||||
done
|
||||
killall -9 clickhouse-server ||:
|
||||
|
||||
# Debug.
|
||||
date
|
||||
sleep 10
|
||||
jobs
|
||||
pstree -aspgT
|
||||
|
||||
# Make files with status and description we'll show for this check on Github.
|
||||
task_exit_code=$fuzzer_exit_code
|
||||
if [ "$server_died" == 1 ]
|
||||
then
|
||||
# The server has died.
|
||||
task_exit_code=210
|
||||
echo "failure" > status.txt
|
||||
if ! grep -ao "Received signal.*\|Logical error.*\|Assertion.*failed\|Failed assertion.*\|.*runtime error: .*\|.*is located.*\|SUMMARY: AddressSanitizer:.*\|SUMMARY: MemorySanitizer:.*\|SUMMARY: ThreadSanitizer:.*\|.*_LIBCPP_ASSERT.*" server.log > description.txt
|
||||
then
|
||||
echo "Lost connection to server. See the logs." > description.txt
|
||||
fi
|
||||
elif [ "$fuzzer_exit_code" == "143" ] || [ "$fuzzer_exit_code" == "0" ]
|
||||
then
|
||||
# Variants of a normal run:
|
||||
# 0 -- fuzzing ended earlier than timeout.
|
||||
# 143 -- SIGTERM -- the fuzzer was killed by timeout.
|
||||
task_exit_code=0
|
||||
echo "success" > status.txt
|
||||
echo "OK" > description.txt
|
||||
else
|
||||
# The server was alive, but the fuzzer returned some error. Probably this
|
||||
# is a problem in the fuzzer itself. Don't grep the server log in this
|
||||
# case, because we will find a message about normal server termination
|
||||
# (Received signal 15), which is confusing.
|
||||
task_exit_code=$fuzzer_exit_code
|
||||
echo "failure" > status.txt
|
||||
echo "Fuzzer failed ($fuzzer_exit_code). See the logs." > description.txt
|
||||
fi
|
||||
}
|
||||
|
||||
case "$stage" in
|
||||
@ -165,50 +234,7 @@ case "$stage" in
|
||||
time configure
|
||||
;&
|
||||
"fuzz")
|
||||
# Start a watchdog that should kill the fuzzer on timeout.
|
||||
# The shell won't kill the child sleep when we kill it, so we have to put it
|
||||
# into a separate process group so that we can kill them all.
|
||||
set -m
|
||||
watchdog &
|
||||
watchdog_pid=$!
|
||||
set +m
|
||||
# Check that the watchdog has started
|
||||
kill -0 $watchdog_pid
|
||||
|
||||
fuzzer_exit_code=0
|
||||
time fuzz || fuzzer_exit_code=$?
|
||||
kill -- -$watchdog_pid ||:
|
||||
|
||||
# Debug
|
||||
date
|
||||
sleep 10
|
||||
jobs
|
||||
pstree -aspgT
|
||||
|
||||
# Make files with status and description we'll show for this check on Github
|
||||
task_exit_code=$fuzzer_exit_code
|
||||
if [ "$fuzzer_exit_code" == 143 ]
|
||||
then
|
||||
# SIGTERM -- the fuzzer was killed by timeout, which means a normal run.
|
||||
echo "success" > status.txt
|
||||
echo "OK" > description.txt
|
||||
task_exit_code=0
|
||||
elif [ "$fuzzer_exit_code" == 210 ]
|
||||
then
|
||||
# Lost connection to the server. This probably means that the server died
|
||||
# with abort.
|
||||
echo "failure" > status.txt
|
||||
if ! grep -ao "Received signal.*\|Logical error.*\|Assertion.*failed\|Failed assertion.*\|.*runtime error: .*\|.*is located.*\|SUMMARY: AddressSanitizer:.*\|SUMMARY: MemorySanitizer:.*\|SUMMARY: ThreadSanitizer:.*\|.*_LIBCPP_ASSERT.*" server.log > description.txt
|
||||
then
|
||||
echo "Lost connection to server. See the logs." > description.txt
|
||||
fi
|
||||
else
|
||||
# Something different -- maybe the fuzzer itself died? Don't grep the
|
||||
# server log in this case, because we will find a message about normal
|
||||
# server termination (Received signal 15), which is confusing.
|
||||
echo "failure" > status.txt
|
||||
echo "Fuzzer failed ($fuzzer_exit_code). See the logs." > description.txt
|
||||
fi
|
||||
time fuzz
|
||||
;&
|
||||
"report")
|
||||
cat > report.html <<EOF ||:
|
||||
|
@ -80,7 +80,8 @@ RUN python3 -m pip install \
|
||||
redis \
|
||||
tzlocal \
|
||||
urllib3 \
|
||||
requests-kerberos
|
||||
requests-kerberos \
|
||||
pyhdfs
|
||||
|
||||
COPY modprobe.sh /usr/local/bin/modprobe
|
||||
COPY dockerd-entrypoint.sh /usr/local/bin/
|
||||
|
@ -0,0 +1,92 @@
|
||||
version: '2.3'
|
||||
services:
|
||||
zoo1:
|
||||
image: ${image:-yandex/clickhouse-integration-test}
|
||||
restart: always
|
||||
user: ${user:-}
|
||||
volumes:
|
||||
- type: bind
|
||||
source: ${keeper_binary:-}
|
||||
target: /usr/bin/clickhouse
|
||||
- type: bind
|
||||
source: ${keeper_config_dir1:-}
|
||||
target: /etc/clickhouse-keeper
|
||||
- type: bind
|
||||
source: ${keeper_logs_dir1:-}
|
||||
target: /var/log/clickhouse-keeper
|
||||
- type: ${keeper_fs:-tmpfs}
|
||||
source: ${keeper_db_dir1:-}
|
||||
target: /var/lib/clickhouse-keeper
|
||||
entrypoint: "clickhouse keeper --config=/etc/clickhouse-keeper/keeper_config1.xml --log-file=/var/log/clickhouse-keeper/clickhouse-keeper.log --errorlog-file=/var/log/clickhouse-keeper/clickhouse-keeper.err.log"
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
- NET_ADMIN
|
||||
- IPC_LOCK
|
||||
- SYS_NICE
|
||||
security_opt:
|
||||
- label:disable
|
||||
dns_opt:
|
||||
- attempts:2
|
||||
- timeout:1
|
||||
- inet6
|
||||
- rotate
|
||||
zoo2:
|
||||
image: ${image:-yandex/clickhouse-integration-test}
|
||||
restart: always
|
||||
user: ${user:-}
|
||||
volumes:
|
||||
- type: bind
|
||||
source: ${keeper_binary:-}
|
||||
target: /usr/bin/clickhouse
|
||||
- type: bind
|
||||
source: ${keeper_config_dir2:-}
|
||||
target: /etc/clickhouse-keeper
|
||||
- type: bind
|
||||
source: ${keeper_logs_dir2:-}
|
||||
target: /var/log/clickhouse-keeper
|
||||
- type: ${keeper_fs:-tmpfs}
|
||||
source: ${keeper_db_dir2:-}
|
||||
target: /var/lib/clickhouse-keeper
|
||||
entrypoint: "clickhouse keeper --config=/etc/clickhouse-keeper/keeper_config2.xml --log-file=/var/log/clickhouse-keeper/clickhouse-keeper.log --errorlog-file=/var/log/clickhouse-keeper/clickhouse-keeper.err.log"
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
- NET_ADMIN
|
||||
- IPC_LOCK
|
||||
- SYS_NICE
|
||||
security_opt:
|
||||
- label:disable
|
||||
dns_opt:
|
||||
- attempts:2
|
||||
- timeout:1
|
||||
- inet6
|
||||
- rotate
|
||||
zoo3:
|
||||
image: ${image:-yandex/clickhouse-integration-test}
|
||||
restart: always
|
||||
user: ${user:-}
|
||||
volumes:
|
||||
- type: bind
|
||||
source: ${keeper_binary:-}
|
||||
target: /usr/bin/clickhouse
|
||||
- type: bind
|
||||
source: ${keeper_config_dir3:-}
|
||||
target: /etc/clickhouse-keeper
|
||||
- type: bind
|
||||
source: ${keeper_logs_dir3:-}
|
||||
target: /var/log/clickhouse-keeper
|
||||
- type: ${keeper_fs:-tmpfs}
|
||||
source: ${keeper_db_dir3:-}
|
||||
target: /var/lib/clickhouse-keeper
|
||||
entrypoint: "clickhouse keeper --config=/etc/clickhouse-keeper/keeper_config3.xml --log-file=/var/log/clickhouse-keeper/clickhouse-keeper.log --errorlog-file=/var/log/clickhouse-keeper/clickhouse-keeper.err.log"
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
- NET_ADMIN
|
||||
- IPC_LOCK
|
||||
- SYS_NICE
|
||||
security_opt:
|
||||
- label:disable
|
||||
dns_opt:
|
||||
- attempts:2
|
||||
- timeout:1
|
||||
- inet6
|
||||
- rotate
|
@ -27,6 +27,10 @@ while true; do
|
||||
done
|
||||
set -e
|
||||
|
||||
# cleanup for retry run if volume is not recreated
|
||||
docker kill "$(docker ps -aq)" || true
|
||||
docker rm "$(docker ps -aq)" || true
|
||||
|
||||
echo "Start tests"
|
||||
export CLICKHOUSE_TESTS_SERVER_BIN_PATH=/clickhouse
|
||||
export CLICKHOUSE_TESTS_CLIENT_BIN_PATH=/clickhouse
|
||||
|
@ -552,6 +552,63 @@ create table query_metric_stats_denorm engine File(TSVWithNamesAndTypes,
|
||||
order by test, query_index, metric_name
|
||||
;
|
||||
" 2> >(tee -a analyze/errors.log 1>&2)
|
||||
|
||||
# Fetch historical query variability thresholds from the CI database
|
||||
clickhouse-local --query "
|
||||
left join file('analyze/report-thresholds.tsv', TSV,
|
||||
'test text, report_threshold float') thresholds
|
||||
on query_metric_stats.test = thresholds.test
|
||||
"
|
||||
|
||||
if [ -v CHPC_DATABASE_URL ]
|
||||
then
|
||||
set +x # Don't show password in the log
|
||||
client=(clickhouse-client
|
||||
# Surprisingly, clickhouse-client doesn't understand --host 127.0.0.1:9000
|
||||
# so I have to extract host and port with clickhouse-local. I tried to use
|
||||
# Poco URI parser to support this in the client, but it's broken and can't
|
||||
# parse host:port.
|
||||
$(clickhouse-local --query "with '${CHPC_DATABASE_URL}' as url select '--host ' || domain(url) || ' --port ' || toString(port(url)) format TSV")
|
||||
--secure
|
||||
--user "${CHPC_DATABASE_USER}"
|
||||
--password "${CHPC_DATABASE_PASSWORD}"
|
||||
--config "right/config/client_config.xml"
|
||||
--database perftest
|
||||
--date_time_input_format=best_effort)
|
||||
|
||||
|
||||
# Precision is going to be 1.5 times worse for PRs. How do I know it? I ran this:
|
||||
# SELECT quantilesExact(0., 0.1, 0.5, 0.75, 0.95, 1.)(p / m)
|
||||
# FROM
|
||||
# (
|
||||
# SELECT
|
||||
# quantileIf(0.95)(stat_threshold, pr_number = 0) AS m,
|
||||
# quantileIf(0.95)(stat_threshold, (pr_number != 0) AND (abs(diff) < stat_threshold)) AS p
|
||||
# FROM query_metrics_v2
|
||||
# WHERE (event_date > (today() - toIntervalMonth(1))) AND (metric = 'client_time')
|
||||
# GROUP BY
|
||||
# test,
|
||||
# query_index,
|
||||
# query_display_name
|
||||
# HAVING count(*) > 100
|
||||
# )
|
||||
# The file can be empty if the server is inaccessible, so we can't use TSVWithNamesAndTypes.
|
||||
"${client[@]}" --query "
|
||||
select test, query_index,
|
||||
quantileExact(0.99)(abs(diff)) max_diff,
|
||||
quantileExactIf(0.99)(stat_threshold, abs(diff) < stat_threshold) * 1.5 max_stat_threshold,
|
||||
query_display_name
|
||||
from query_metrics_v2
|
||||
where event_date > now() - interval 1 month
|
||||
and metric = 'client_time'
|
||||
and pr_number = 0
|
||||
group by test, query_index, query_display_name
|
||||
having count(*) > 100
|
||||
" > analyze/historical-thresholds.tsv
|
||||
else
|
||||
touch analyze/historical-thresholds.tsv
|
||||
fi
|
||||
|
||||
}
|
||||
|
||||
# Analyze results
|
||||
@ -596,6 +653,26 @@ create view query_metric_stats as
|
||||
diff float, stat_threshold float')
|
||||
;
|
||||
|
||||
create table report_thresholds engine File(TSVWithNamesAndTypes, 'report/thresholds.tsv')
|
||||
as select
|
||||
query_display_names.test test, query_display_names.query_index query_index,
|
||||
ceil(greatest(0.1, historical_thresholds.max_diff,
|
||||
test_thresholds.report_threshold), 2) changed_threshold,
|
||||
ceil(greatest(0.2, historical_thresholds.max_stat_threshold,
|
||||
test_thresholds.report_threshold + 0.1), 2) unstable_threshold,
|
||||
query_display_names.query_display_name query_display_name
|
||||
from query_display_names
|
||||
left join file('analyze/historical-thresholds.tsv', TSV,
|
||||
'test text, query_index int, max_diff float, max_stat_threshold float,
|
||||
query_display_name text') historical_thresholds
|
||||
on query_display_names.test = historical_thresholds.test
|
||||
and query_display_names.query_index = historical_thresholds.query_index
|
||||
and query_display_names.query_display_name = historical_thresholds.query_display_name
|
||||
left join file('analyze/report-thresholds.tsv', TSV,
|
||||
'test text, report_threshold float') test_thresholds
|
||||
on query_display_names.test = test_thresholds.test
|
||||
;
|
||||
|
||||
-- Main statistics for queries -- query time as reported in query log.
|
||||
create table queries engine File(TSVWithNamesAndTypes, 'report/queries.tsv')
|
||||
as select
|
||||
@ -610,23 +687,23 @@ create table queries engine File(TSVWithNamesAndTypes, 'report/queries.tsv')
|
||||
-- uncaught regressions, because for the default 7 runs we do for PRs,
|
||||
-- the randomization distribution has only 16 values, so the max quantile
|
||||
-- is actually 0.9375.
|
||||
abs(diff) > report_threshold and abs(diff) >= stat_threshold as changed_fail,
|
||||
abs(diff) > report_threshold - 0.05 and abs(diff) >= stat_threshold as changed_show,
|
||||
abs(diff) > changed_threshold and abs(diff) >= stat_threshold as changed_fail,
|
||||
abs(diff) > changed_threshold - 0.05 and abs(diff) >= stat_threshold as changed_show,
|
||||
|
||||
not changed_fail and stat_threshold > report_threshold + 0.10 as unstable_fail,
|
||||
not changed_show and stat_threshold > report_threshold - 0.05 as unstable_show,
|
||||
not changed_fail and stat_threshold > unstable_threshold as unstable_fail,
|
||||
not changed_show and stat_threshold > unstable_threshold - 0.05 as unstable_show,
|
||||
|
||||
left, right, diff, stat_threshold,
|
||||
if(report_threshold > 0, report_threshold, 0.10) as report_threshold,
|
||||
query_metric_stats.test test, query_metric_stats.query_index query_index,
|
||||
query_display_name
|
||||
query_display_names.query_display_name query_display_name
|
||||
from query_metric_stats
|
||||
left join file('analyze/report-thresholds.tsv', TSV,
|
||||
'test text, report_threshold float') thresholds
|
||||
on query_metric_stats.test = thresholds.test
|
||||
left join query_display_names
|
||||
on query_metric_stats.test = query_display_names.test
|
||||
and query_metric_stats.query_index = query_display_names.query_index
|
||||
left join report_thresholds
|
||||
on query_display_names.test = report_thresholds.test
|
||||
and query_display_names.query_index = report_thresholds.query_index
|
||||
and query_display_names.query_display_name = report_thresholds.query_display_name
|
||||
-- 'server_time' is rounded down to ms, which might be bad for very short queries.
|
||||
-- Use 'client_time' instead.
|
||||
where metric_name = 'client_time'
|
||||
@ -889,7 +966,6 @@ create table all_query_metrics_tsv engine File(TSV, 'report/all-query-metrics.ts
|
||||
order by test, query_index;
|
||||
" 2> >(tee -a report/errors.log 1>&2)
|
||||
|
||||
|
||||
# Prepare source data for metrics and flamegraphs for queries that were profiled
|
||||
# by perf.py.
|
||||
for version in {right,left}
|
||||
|
@ -44,7 +44,7 @@ parser.add_argument('--port', nargs='*', default=[9000], help="Space-separated l
|
||||
parser.add_argument('--runs', type=int, default=1, help='Number of query runs per server.')
|
||||
parser.add_argument('--max-queries', type=int, default=None, help='Test no more than this number of queries, chosen at random.')
|
||||
parser.add_argument('--queries-to-run', nargs='*', type=int, default=None, help='Space-separated list of indexes of queries to test.')
|
||||
parser.add_argument('--max-query-seconds', type=int, default=10, help='For how many seconds at most a query is allowed to run. The script finishes with error if this time is exceeded.')
|
||||
parser.add_argument('--max-query-seconds', type=int, default=15, help='For how many seconds at most a query is allowed to run. The script finishes with error if this time is exceeded.')
|
||||
parser.add_argument('--profile-seconds', type=int, default=0, help='For how many seconds to profile a query for which the performance has changed.')
|
||||
parser.add_argument('--long', action='store_true', help='Do not skip the tests tagged as long.')
|
||||
parser.add_argument('--print-queries', action='store_true', help='Print test queries and exit.')
|
||||
@ -273,8 +273,14 @@ for query_index in queries_to_run:
|
||||
prewarm_id = f'{query_prefix}.prewarm0'
|
||||
|
||||
try:
|
||||
# Will also detect too long queries during warmup stage
|
||||
res = c.execute(q, query_id = prewarm_id, settings = {'max_execution_time': args.max_query_seconds})
|
||||
# During the warmup runs, we will also:
|
||||
# * detect queries that are exceedingly long, to fail fast,
|
||||
# * collect profiler traces, which might be helpful for analyzing
|
||||
# test coverage. We disable profiler for normal runs because
|
||||
# it makes the results unstable.
|
||||
res = c.execute(q, query_id = prewarm_id,
|
||||
settings = {'max_execution_time': args.max_query_seconds,
|
||||
'query_profiler_real_time_period_ns': 10000000})
|
||||
except clickhouse_driver.errors.Error as e:
|
||||
# Add query id to the exception to make debugging easier.
|
||||
e.args = (prewarm_id, *e.args)
|
||||
@ -359,10 +365,11 @@ for query_index in queries_to_run:
|
||||
# For very short queries we have a special mode where we run them for at
|
||||
# least some time. The recommended lower bound of run time for "normal"
|
||||
# queries is about 0.1 s, and we run them about 10 times, giving the
|
||||
# time per query per server of about one second. Use this value as a
|
||||
# reference for "short" queries.
|
||||
# time per query per server of about one second. Run "short" queries
|
||||
# for longer time, because they have a high percentage of overhead and
|
||||
# might give less stable results.
|
||||
if is_short[query_index]:
|
||||
if server_seconds >= 2 * len(this_query_connections):
|
||||
if server_seconds >= 8 * len(this_query_connections):
|
||||
break
|
||||
# Also limit the number of runs, so that we don't go crazy processing
|
||||
# the results -- 'eqmed.sql' is really suboptimal.
|
||||
|
@ -446,11 +446,17 @@ if args.report == 'main':
|
||||
attrs[3] = f'style="background: {color_bad}"'
|
||||
else:
|
||||
attrs[3] = ''
|
||||
# Just don't add the slightly unstable queries we don't consider
|
||||
# errors. It's not clear what the user should do with them.
|
||||
continue
|
||||
|
||||
text += tableRow(r, attrs, anchor)
|
||||
|
||||
text += tableEnd()
|
||||
tables.append(text)
|
||||
|
||||
# Don't add an empty table.
|
||||
if very_unstable_queries:
|
||||
tables.append(text)
|
||||
|
||||
add_unstable_queries()
|
||||
|
||||
@ -549,16 +555,15 @@ if args.report == 'main':
|
||||
message_array.append(str(slower_queries) + ' slower')
|
||||
|
||||
if unstable_partial_queries:
|
||||
unstable_queries += unstable_partial_queries
|
||||
error_tests += unstable_partial_queries
|
||||
very_unstable_queries += unstable_partial_queries
|
||||
status = 'failure'
|
||||
|
||||
if unstable_queries:
|
||||
message_array.append(str(unstable_queries) + ' unstable')
|
||||
|
||||
# Disabled before fix.
|
||||
# if very_unstable_queries:
|
||||
# status = 'failure'
|
||||
# Don't show mildly unstable queries, only the very unstable ones we
|
||||
# treat as errors.
|
||||
if very_unstable_queries:
|
||||
error_tests += very_unstable_queries
|
||||
status = 'failure'
|
||||
message_array.append(str(very_unstable_queries) + ' unstable')
|
||||
|
||||
error_tests += slow_average_tests
|
||||
if error_tests:
|
||||
|
@ -2,7 +2,6 @@
|
||||
FROM ubuntu:20.04
|
||||
|
||||
RUN apt-get update --yes && env DEBIAN_FRONTEND=noninteractive apt-get install wget unzip git openjdk-14-jdk maven python3 --yes --no-install-recommends
|
||||
|
||||
RUN wget https://github.com/sqlancer/sqlancer/archive/master.zip -O /sqlancer.zip
|
||||
RUN mkdir /sqlancer && \
|
||||
cd /sqlancer && \
|
||||
|
@ -73,4 +73,4 @@ RUN set -x \
|
||||
VOLUME /var/lib/docker
|
||||
EXPOSE 2375
|
||||
ENTRYPOINT ["dockerd-entrypoint.sh"]
|
||||
CMD ["sh", "-c", "python3 regression.py --no-color -o classic --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json; /usr/local/bin/process_testflows_result.py || echo -e 'failure\tCannot parse results' > check_status.tsv"]
|
||||
CMD ["sh", "-c", "python3 regression.py --no-color -o new-fails --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json; /usr/local/bin/process_testflows_result.py || echo -e 'failure\tCannot parse results' > check_status.tsv; find * -type f | grep _instances | grep clickhouse-server | xargs -n1 tar -rvf clickhouse_logs.tar; gzip -9 clickhouse_logs.tar"]
|
||||
|
@ -41,6 +41,14 @@ toc_title: Cloud
|
||||
- Built-in monitoring and database management platform
|
||||
- Professional database expert technical support and service
|
||||
|
||||
## SberCloud {#sbercloud}
|
||||
|
||||
[SberCloud.Advanced](https://sbercloud.ru/en/advanced) provides [MapReduce Service (MRS)](https://docs.sbercloud.ru/mrs/ug/topics/ug__clickhouse.html), a reliable, secure, and easy-to-use enterprise-level platform for storing, processing, and analyzing big data. MRS allows you to quickly create and manage ClickHouse clusters.
|
||||
|
||||
- A ClickHouse instance consists of three ZooKeeper nodes and multiple ClickHouse nodes. The Dedicated Replica mode is used to ensure high reliability of dual data copies.
|
||||
- MRS provides smooth and elastic scaling capabilities to quickly meet service growth requirements in scenarios where the cluster storage capacity or CPU computing resources are not enough. When you expand the capacity of ClickHouse nodes in a cluster, MRS provides a one-click data balancing tool and gives you the initiative to balance data. You can determine the data balancing mode and time based on service characteristics to ensure service availability, implementing smooth scaling.
|
||||
- MRS uses the Elastic Load Balance ensuring high availability deployment architecture to automatically distribute user access traffic to multiple backend nodes, expanding service capabilities to external systems and improving fault tolerance. With the ELB polling mechanism, data is written to local tables and read from distributed tables on different nodes. In this way, data read/write load and high availability of application access are guaranteed.
|
||||
|
||||
## Tencent Cloud {#tencent-cloud}
|
||||
|
||||
[Tencent Managed Service for ClickHouse](https://cloud.tencent.com/product/cdwch) provides the following key features:
|
||||
|
@ -14,4 +14,4 @@ Service categories:
|
||||
- [Support](../commercial/support.md)
|
||||
|
||||
!!! note "For service providers"
|
||||
If you happen to represent one of them, feel free to open a pull request adding your company to the respective section (or even adding a new section if the service doesn’t fit into existing categories). The easiest way to open a pull-request for documentation page is by using a “pencil” edit button in the top-right corner. If your service available in some local market, make sure to mention it in a localized documentation page as well (or at least point it out in a pull-request description).
|
||||
If you happen to represent one of them, feel free to open a pull request adding your company to the respective section (or even adding a new section if the service does not fit into existing categories). The easiest way to open a pull-request for documentation page is by using a “pencil” edit button in the top-right corner. If your service available in some local market, make sure to mention it in a localized documentation page as well (or at least point it out in a pull-request description).
|
||||
|
@ -120,11 +120,11 @@ clickhouse-client -nmT < tests/queries/0_stateless/01521_dummy_test.sql | tee te
|
||||
- don't switch databases (unless necessary)
|
||||
- you can create several table replicas on the same node if needed
|
||||
- you can use one of the test cluster definitions when needed (see system.clusters)
|
||||
- use `number` / `numbers_mt` / `zeros` / `zeros_mt` and similar for queries / to initialize data when appliable
|
||||
- use `number` / `numbers_mt` / `zeros` / `zeros_mt` and similar for queries / to initialize data when applicable
|
||||
- clean up the created objects after test and before the test (DROP IF EXISTS) - in case of some dirty state
|
||||
- prefer sync mode of operations (mutations, merges, etc.)
|
||||
- use other SQL files in the `0_stateless` folder as an example
|
||||
- ensure the feature / feature combination you want to tests is not covered yet with existing tests
|
||||
- ensure the feature / feature combination you want to test is not yet covered with existing tests
|
||||
|
||||
#### Commit / push / create PR.
|
||||
|
||||
|
@ -21,11 +21,11 @@ Various `IColumn` implementations (`ColumnUInt8`, `ColumnString`, and so on) are
|
||||
|
||||
Nevertheless, it is possible to work with individual values as well. To represent an individual value, the `Field` is used. `Field` is just a discriminated union of `UInt64`, `Int64`, `Float64`, `String` and `Array`. `IColumn` has the `operator []` method to get the n-th value as a `Field`, and the `insert` method to append a `Field` to the end of a column. These methods are not very efficient, because they require dealing with temporary `Field` objects representing an individual value. There are more efficient methods, such as `insertFrom`, `insertRangeFrom`, and so on.
|
||||
|
||||
`Field` doesn’t have enough information about a specific data type for a table. For example, `UInt8`, `UInt16`, `UInt32`, and `UInt64` are all represented as `UInt64` in a `Field`.
|
||||
`Field` does not have enough information about a specific data type for a table. For example, `UInt8`, `UInt16`, `UInt32`, and `UInt64` are all represented as `UInt64` in a `Field`.
|
||||
|
||||
## Leaky Abstractions {#leaky-abstractions}
|
||||
|
||||
`IColumn` has methods for common relational transformations of data, but they don’t meet all needs. For example, `ColumnUInt64` doesn’t have a method to calculate the sum of two columns, and `ColumnString` doesn’t have a method to run a substring search. These countless routines are implemented outside of `IColumn`.
|
||||
`IColumn` has methods for common relational transformations of data, but they do not meet all needs. For example, `ColumnUInt64` does not have a method to calculate the sum of two columns, and `ColumnString` does not have a method to run a substring search. These countless routines are implemented outside of `IColumn`.
|
||||
|
||||
Various functions on columns can be implemented in a generic, non-efficient way using `IColumn` methods to extract `Field` values, or in a specialized way using knowledge of inner memory layout of data in a specific `IColumn` implementation. It is implemented by casting functions to a specific `IColumn` type and deal with internal representation directly. For example, `ColumnUInt64` has the `getData` method that returns a reference to an internal array, then a separate routine reads or fills that array directly. We have “leaky abstractions” to allow efficient specializations of various routines.
|
||||
|
||||
@ -35,7 +35,7 @@ Various functions on columns can be implemented in a generic, non-efficient way
|
||||
|
||||
`IDataType` and `IColumn` are only loosely related to each other. Different data types can be represented in memory by the same `IColumn` implementations. For example, `DataTypeUInt32` and `DataTypeDateTime` are both represented by `ColumnUInt32` or `ColumnConstUInt32`. In addition, the same data type can be represented by different `IColumn` implementations. For example, `DataTypeUInt8` can be represented by `ColumnUInt8` or `ColumnConstUInt8`.
|
||||
|
||||
`IDataType` only stores metadata. For instance, `DataTypeUInt8` doesn’t store anything at all (except virtual pointer `vptr`) and `DataTypeFixedString` stores just `N` (the size of fixed-size strings).
|
||||
`IDataType` only stores metadata. For instance, `DataTypeUInt8` does not store anything at all (except virtual pointer `vptr`) and `DataTypeFixedString` stores just `N` (the size of fixed-size strings).
|
||||
|
||||
`IDataType` has helper methods for various data formats. Examples are methods to serialize a value with possible quoting, to serialize a value for JSON, and to serialize a value as part of the XML format. There is no direct correspondence to data formats. For example, the different data formats `Pretty` and `TabSeparated` can use the same `serializeTextEscaped` helper method from the `IDataType` interface.
|
||||
|
||||
@ -43,7 +43,7 @@ Various functions on columns can be implemented in a generic, non-efficient way
|
||||
|
||||
A `Block` is a container that represents a subset (chunk) of a table in memory. It is just a set of triples: `(IColumn, IDataType, column name)`. During query execution, data is processed by `Block`s. If we have a `Block`, we have data (in the `IColumn` object), we have information about its type (in `IDataType`) that tells us how to deal with that column, and we have the column name. It could be either the original column name from the table or some artificial name assigned for getting temporary results of calculations.
|
||||
|
||||
When we calculate some function over columns in a block, we add another column with its result to the block, and we don’t touch columns for arguments of the function because operations are immutable. Later, unneeded columns can be removed from the block, but not modified. It is convenient for the elimination of common subexpressions.
|
||||
When we calculate some function over columns in a block, we add another column with its result to the block, and we do not touch columns for arguments of the function because operations are immutable. Later, unneeded columns can be removed from the block, but not modified. It is convenient for the elimination of common subexpressions.
|
||||
|
||||
Blocks are created for every processed chunk of data. Note that for the same type of calculation, the column names and types remain the same for different blocks, and only column data changes. It is better to split block data from the block header because small block sizes have a high overhead of temporary strings for copying shared_ptrs and column names.
|
||||
|
||||
@ -118,11 +118,11 @@ Interpreters are responsible for creating the query execution pipeline from an `
|
||||
|
||||
There are ordinary functions and aggregate functions. For aggregate functions, see the next section.
|
||||
|
||||
Ordinary functions don’t change the number of rows – they work as if they are processing each row independently. In fact, functions are not called for individual rows, but for `Block`’s of data to implement vectorized query execution.
|
||||
Ordinary functions do not change the number of rows – they work as if they are processing each row independently. In fact, functions are not called for individual rows, but for `Block`’s of data to implement vectorized query execution.
|
||||
|
||||
There are some miscellaneous functions, like [blockSize](../sql-reference/functions/other-functions.md#function-blocksize), [rowNumberInBlock](../sql-reference/functions/other-functions.md#function-rownumberinblock), and [runningAccumulate](../sql-reference/functions/other-functions.md#runningaccumulate), that exploit block processing and violate the independence of rows.
|
||||
|
||||
ClickHouse has strong typing, so there’s no implicit type conversion. If a function doesn’t support a specific combination of types, it throws an exception. But functions can work (be overloaded) for many different combinations of types. For example, the `plus` function (to implement the `+` operator) works for any combination of numeric types: `UInt8` + `Float32`, `UInt16` + `Int8`, and so on. Also, some variadic functions can accept any number of arguments, such as the `concat` function.
|
||||
ClickHouse has strong typing, so there’s no implicit type conversion. If a function does not support a specific combination of types, it throws an exception. But functions can work (be overloaded) for many different combinations of types. For example, the `plus` function (to implement the `+` operator) works for any combination of numeric types: `UInt8` + `Float32`, `UInt16` + `Int8`, and so on. Also, some variadic functions can accept any number of arguments, such as the `concat` function.
|
||||
|
||||
Implementing a function may be slightly inconvenient because a function explicitly dispatches supported data types and supported `IColumns`. For example, the `plus` function has code generated by instantiation of a C++ template for each combination of numeric types, and constant or non-constant left and right arguments.
|
||||
|
||||
@ -152,7 +152,7 @@ Internally, it is just a primitive multithreaded server without coroutines or fi
|
||||
|
||||
The server initializes the `Context` class with the necessary environment for query execution: the list of available databases, users and access rights, settings, clusters, the process list, the query log, and so on. Interpreters use this environment.
|
||||
|
||||
We maintain full backward and forward compatibility for the server TCP protocol: old clients can talk to new servers, and new clients can talk to old servers. But we don’t want to maintain it eternally, and we are removing support for old versions after about one year.
|
||||
We maintain full backward and forward compatibility for the server TCP protocol: old clients can talk to new servers, and new clients can talk to old servers. But we do not want to maintain it eternally, and we are removing support for old versions after about one year.
|
||||
|
||||
!!! note "Note"
|
||||
For most external applications, we recommend using the HTTP interface because it is simple and easy to use. The TCP protocol is more tightly linked to internal data structures: it uses an internal format for passing blocks of data, and it uses custom framing for compressed data. We haven’t released a C library for that protocol because it requires linking most of the ClickHouse codebase, which is not practical.
|
||||
@ -169,13 +169,13 @@ There is no global query plan for distributed query execution. Each node has its
|
||||
|
||||
`MergeTree` is a family of storage engines that supports indexing by primary key. The primary key can be an arbitrary tuple of columns or expressions. Data in a `MergeTree` table is stored in “parts”. Each part stores data in the primary key order, so data is ordered lexicographically by the primary key tuple. All the table columns are stored in separate `column.bin` files in these parts. The files consist of compressed blocks. Each block is usually from 64 KB to 1 MB of uncompressed data, depending on the average value size. The blocks consist of column values placed contiguously one after the other. Column values are in the same order for each column (the primary key defines the order), so when you iterate by many columns, you get values for the corresponding rows.
|
||||
|
||||
The primary key itself is “sparse”. It doesn’t address every single row, but only some ranges of data. A separate `primary.idx` file has the value of the primary key for each N-th row, where N is called `index_granularity` (usually, N = 8192). Also, for each column, we have `column.mrk` files with “marks,” which are offsets to each N-th row in the data file. Each mark is a pair: the offset in the file to the beginning of the compressed block, and the offset in the decompressed block to the beginning of data. Usually, compressed blocks are aligned by marks, and the offset in the decompressed block is zero. Data for `primary.idx` always resides in memory, and data for `column.mrk` files is cached.
|
||||
The primary key itself is “sparse”. It does not address every single row, but only some ranges of data. A separate `primary.idx` file has the value of the primary key for each N-th row, where N is called `index_granularity` (usually, N = 8192). Also, for each column, we have `column.mrk` files with “marks,” which are offsets to each N-th row in the data file. Each mark is a pair: the offset in the file to the beginning of the compressed block, and the offset in the decompressed block to the beginning of data. Usually, compressed blocks are aligned by marks, and the offset in the decompressed block is zero. Data for `primary.idx` always resides in memory, and data for `column.mrk` files is cached.
|
||||
|
||||
When we are going to read something from a part in `MergeTree`, we look at `primary.idx` data and locate ranges that could contain requested data, then look at `column.mrk` data and calculate offsets for where to start reading those ranges. Because of sparseness, excess data may be read. ClickHouse is not suitable for a high load of simple point queries, because the entire range with `index_granularity` rows must be read for each key, and the entire compressed block must be decompressed for each column. We made the index sparse because we must be able to maintain trillions of rows per single server without noticeable memory consumption for the index. Also, because the primary key is sparse, it is not unique: it cannot check the existence of the key in the table at INSERT time. You could have many rows with the same key in a table.
|
||||
|
||||
When you `INSERT` a bunch of data into `MergeTree`, that bunch is sorted by primary key order and forms a new part. There are background threads that periodically select some parts and merge them into a single sorted part to keep the number of parts relatively low. That’s why it is called `MergeTree`. Of course, merging leads to “write amplification”. All parts are immutable: they are only created and deleted, but not modified. When SELECT is executed, it holds a snapshot of the table (a set of parts). After merging, we also keep old parts for some time to make a recovery after failure easier, so if we see that some merged part is probably broken, we can replace it with its source parts.
|
||||
|
||||
`MergeTree` is not an LSM tree because it doesn’t contain “memtable” and “log”: inserted data is written directly to the filesystem. This makes it suitable only to INSERT data in batches, not by individual row and not very frequently – about once per second is ok, but a thousand times a second is not. We did it this way for simplicity’s sake, and because we are already inserting data in batches in our applications.
|
||||
`MergeTree` is not an LSM tree because it does not contain “memtable” and “log”: inserted data is written directly to the filesystem. This makes it suitable only to INSERT data in batches, not by individual row and not very frequently – about once per second is ok, but a thousand times a second is not. We did it this way for simplicity’s sake, and because we are already inserting data in batches in our applications.
|
||||
|
||||
There are MergeTree engines that are doing additional work during background merges. Examples are `CollapsingMergeTree` and `AggregatingMergeTree`. This could be treated as special support for updates. Keep in mind that these are not real updates because users usually have no control over the time when background merges are executed, and data in a `MergeTree` table is almost always stored in more than one part, not in completely merged form.
|
||||
|
||||
@ -185,7 +185,7 @@ Replication in ClickHouse can be configured on a per-table basis. You could have
|
||||
|
||||
Replication is implemented in the `ReplicatedMergeTree` storage engine. The path in `ZooKeeper` is specified as a parameter for the storage engine. All tables with the same path in `ZooKeeper` become replicas of each other: they synchronize their data and maintain consistency. Replicas can be added and removed dynamically simply by creating or dropping a table.
|
||||
|
||||
Replication uses an asynchronous multi-master scheme. You can insert data into any replica that has a session with `ZooKeeper`, and data is replicated to all other replicas asynchronously. Because ClickHouse doesn’t support UPDATEs, replication is conflict-free. As there is no quorum acknowledgment of inserts, just-inserted data might be lost if one node fails.
|
||||
Replication uses an asynchronous multi-master scheme. You can insert data into any replica that has a session with `ZooKeeper`, and data is replicated to all other replicas asynchronously. Because ClickHouse does not support UPDATEs, replication is conflict-free. As there is no quorum acknowledgment of inserts, just-inserted data might be lost if one node fails.
|
||||
|
||||
Metadata for replication is stored in ZooKeeper. There is a replication log that lists what actions to do. Actions are: get part; merge parts; drop a partition, and so on. Each replica copies the replication log to its queue and then executes the actions from the queue. For example, on insertion, the “get the part” action is created in the log, and every replica downloads that part. Merges are coordinated between replicas to get byte-identical results. All parts are merged in the same way on all replicas. One of the leaders initiates a new merge first and writes “merge parts” actions to the log. Multiple replicas (or all) can be leaders at the same time. A replica can be prevented from becoming a leader using the `merge_tree` setting `replicated_can_become_leader`. The leaders are responsible for scheduling background merges.
|
||||
|
||||
|
@ -20,7 +20,7 @@ Install the latest [Xcode](https://apps.apple.com/am/app/xcode/id497799835?mt=12
|
||||
|
||||
Open it at least once to accept the end-user license agreement and automatically install the required components.
|
||||
|
||||
Then, make sure that the latest Comman Line Tools are installed and selected in the system:
|
||||
Then, make sure that the latest Command Line Tools are installed and selected in the system:
|
||||
|
||||
``` bash
|
||||
sudo rm -rf /Library/Developer/CommandLineTools
|
||||
|
@ -134,7 +134,7 @@ $ ./release
|
||||
|
||||
## Faster builds for development
|
||||
|
||||
Normally all tools of the ClickHouse bundle, such as `clickhouse-server`, `clickhouse-client` etc., are linked into a single static executable, `clickhouse`. This executable must be re-linked on every change, which might be slow. Two common ways to improve linking time are to use `lld` linker, and use the 'split' build configuration, which builds a separate binary for every tool, and further splits the code into serveral shared libraries. To enable these tweaks, pass the following flags to `cmake`:
|
||||
Normally all tools of the ClickHouse bundle, such as `clickhouse-server`, `clickhouse-client` etc., are linked into a single static executable, `clickhouse`. This executable must be re-linked on every change, which might be slow. Two common ways to improve linking time are to use `lld` linker, and use the 'split' build configuration, which builds a separate binary for every tool, and further splits the code into several shared libraries. To enable these tweaks, pass the following flags to `cmake`:
|
||||
|
||||
```
|
||||
-DCMAKE_C_FLAGS="--ld-path=lld" -DCMAKE_CXX_FLAGS="--ld-path=lld" -DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1
|
||||
|
@ -15,7 +15,7 @@ ClickHouse cannot work or build on a 32-bit system. You should acquire access to
|
||||
|
||||
To start working with ClickHouse repository you will need a GitHub account.
|
||||
|
||||
You probably already have one, but if you don’t, please register at https://github.com. In case you do not have SSH keys, you should generate them and then upload them on GitHub. It is required for sending over your patches. It is also possible to use the same SSH keys that you use with any other SSH servers - probably you already have those.
|
||||
You probably already have one, but if you do not, please register at https://github.com. In case you do not have SSH keys, you should generate them and then upload them on GitHub. It is required for sending over your patches. It is also possible to use the same SSH keys that you use with any other SSH servers - probably you already have those.
|
||||
|
||||
Create a fork of ClickHouse repository. To do that please click on the “fork” button in the upper right corner at https://github.com/ClickHouse/ClickHouse. It will fork your own copy of ClickHouse/ClickHouse to your account.
|
||||
|
||||
|
@ -195,7 +195,7 @@ std::cerr << static_cast<int>(c) << std::endl;
|
||||
|
||||
The same is true for small methods in any classes or structs.
|
||||
|
||||
For templated classes and structs, don’t separate the method declarations from the implementation (because otherwise they must be defined in the same translation unit).
|
||||
For templated classes and structs, do not separate the method declarations from the implementation (because otherwise they must be defined in the same translation unit).
|
||||
|
||||
**31.** You can wrap lines at 140 characters, instead of 80.
|
||||
|
||||
@ -442,7 +442,7 @@ Use `RAII` and see above.
|
||||
|
||||
**3.** Error handling.
|
||||
|
||||
Use exceptions. In most cases, you only need to throw an exception, and don’t need to catch it (because of `RAII`).
|
||||
Use exceptions. In most cases, you only need to throw an exception, and do not need to catch it (because of `RAII`).
|
||||
|
||||
In offline data processing applications, it’s often acceptable to not catch exceptions.
|
||||
|
||||
@ -599,7 +599,7 @@ public:
|
||||
|
||||
There is no need to use a separate `namespace` for application code.
|
||||
|
||||
Small libraries don’t need this, either.
|
||||
Small libraries do not need this, either.
|
||||
|
||||
For medium to large libraries, put everything in a `namespace`.
|
||||
|
||||
@ -755,9 +755,9 @@ If there is a good solution already available, then use it, even if it means you
|
||||
|
||||
(But be prepared to remove bad libraries from code.)
|
||||
|
||||
**3.** You can install a library that isn’t in the packages, if the packages don’t have what you need or have an outdated version or the wrong type of compilation.
|
||||
**3.** You can install a library that isn’t in the packages, if the packages do not have what you need or have an outdated version or the wrong type of compilation.
|
||||
|
||||
**4.** If the library is small and doesn’t have its own complex build system, put the source files in the `contrib` folder.
|
||||
**4.** If the library is small and does not have its own complex build system, put the source files in the `contrib` folder.
|
||||
|
||||
**5.** Preference is always given to libraries that are already in use.
|
||||
|
||||
|
@ -35,7 +35,7 @@ Tests should use (create, drop, etc) only tables in `test` database that is assu
|
||||
|
||||
### Choosing the Test Name
|
||||
|
||||
The name of the test starts with a five-digit prefix followed by a descriptive name, such as `00422_hash_function_constexpr.sql`. To choose the prefix, find the largest prefix already present in the directory, and increment it by one. In the meantime, some other tests might be added with the same numeric prefix, but this is OK and doesn't lead to any problems, you don't have to change it later.
|
||||
The name of the test starts with a five-digit prefix followed by a descriptive name, such as `00422_hash_function_constexpr.sql`. To choose the prefix, find the largest prefix already present in the directory, and increment it by one. In the meantime, some other tests might be added with the same numeric prefix, but this is OK and does not lead to any problems, you don't have to change it later.
|
||||
|
||||
Some tests are marked with `zookeeper`, `shard` or `long` in their names. `zookeeper` is for tests that are using ZooKeeper. `shard` is for tests that requires server to listen `127.0.0.*`; `distributed` or `global` have the same meaning. `long` is for tests that run slightly longer that one second. You can disable these groups of tests using `--no-zookeeper`, `--no-shard` and `--no-long` options, respectively. Make sure to add a proper prefix to your test name if it needs ZooKeeper or distributed queries.
|
||||
|
||||
@ -51,7 +51,7 @@ Do not check for a particular wording of error message, it may change in the fut
|
||||
|
||||
### Testing a Distributed Query
|
||||
|
||||
If you want to use distributed queries in functional tests, you can leverage `remote` table function with `127.0.0.{1..2}` addresses for the server to query itself; or you can use predefined test clusters in server configuration file like `test_shard_localhost`. Remember to add the words `shard` or `distributed` to the test name, so that it is ran in CI in correct configurations, where the server is configured to support distributed queries.
|
||||
If you want to use distributed queries in functional tests, you can leverage `remote` table function with `127.0.0.{1..2}` addresses for the server to query itself; or you can use predefined test clusters in server configuration file like `test_shard_localhost`. Remember to add the words `shard` or `distributed` to the test name, so that it is run in CI in correct configurations, where the server is configured to support distributed queries.
|
||||
|
||||
|
||||
## Known Bugs {#known-bugs}
|
||||
@ -60,11 +60,11 @@ If we know some bugs that can be easily reproduced by functional tests, we place
|
||||
|
||||
## Integration Tests {#integration-tests}
|
||||
|
||||
Integration tests allow to test ClickHouse in clustered configuration and ClickHouse interaction with other servers like MySQL, Postgres, MongoDB. They are useful to emulate network splits, packet drops, etc. These tests are run under Docker and create multiple containers with various software.
|
||||
Integration tests allow testing ClickHouse in clustered configuration and ClickHouse interaction with other servers like MySQL, Postgres, MongoDB. They are useful to emulate network splits, packet drops, etc. These tests are run under Docker and create multiple containers with various software.
|
||||
|
||||
See `tests/integration/README.md` on how to run these tests.
|
||||
|
||||
Note that integration of ClickHouse with third-party drivers is not tested. Also we currently don’t have integration tests with our JDBC and ODBC drivers.
|
||||
Note that integration of ClickHouse with third-party drivers is not tested. Also, we currently do not have integration tests with our JDBC and ODBC drivers.
|
||||
|
||||
## Unit Tests {#unit-tests}
|
||||
|
||||
@ -123,7 +123,7 @@ Example with gdb:
|
||||
$ sudo -u clickhouse gdb --args /usr/bin/clickhouse server --config-file /etc/clickhouse-server/config.xml
|
||||
```
|
||||
|
||||
If the system clickhouse-server is already running and you don’t want to stop it, you can change port numbers in your `config.xml` (or override them in a file in `config.d` directory), provide appropriate data path, and run it.
|
||||
If the system clickhouse-server is already running and you do not want to stop it, you can change port numbers in your `config.xml` (or override them in a file in `config.d` directory), provide appropriate data path, and run it.
|
||||
|
||||
`clickhouse` binary has almost no dependencies and works across wide range of Linux distributions. To quick and dirty test your changes on a server, you can simply `scp` your fresh built `clickhouse` binary to your server and then run it as in examples above.
|
||||
|
||||
@ -161,7 +161,7 @@ $ clickhouse benchmark --concurrency 16 < queries.tsv
|
||||
|
||||
Then leave it for a night or weekend and go take a rest.
|
||||
|
||||
You should check that `clickhouse-server` doesn’t crash, memory footprint is bounded and performance not degrading over time.
|
||||
You should check that `clickhouse-server` does not crash, memory footprint is bounded and performance not degrading over time.
|
||||
|
||||
Precise query execution timings are not recorded and not compared due to high variability of queries and environment.
|
||||
|
||||
@ -230,7 +230,7 @@ Fuzzers are not built by default. To build fuzzers both `-DENABLE_FUZZING=1` and
|
||||
We recommend to disable Jemalloc while building fuzzers. Configuration used to integrate ClickHouse fuzzing to
|
||||
Google OSS-Fuzz can be found at `docker/fuzz`.
|
||||
|
||||
We also use simple fuzz test to generate random SQL queries and to check that the server doesn’t die executing them.
|
||||
We also use simple fuzz test to generate random SQL queries and to check that the server does not die executing them.
|
||||
You can find it in `00746_sql_fuzzy.pl`. This test should be run continuously (overnight and longer).
|
||||
|
||||
We also use sophisticated AST-based query fuzzer that is able to find huge amount of corner cases. It does random permutations and substitutions in queries AST. It remembers AST nodes from previous tests to use them for fuzzing of subsequent tests while processing them in random order. You can learn more about this fuzzer in [this blog article](https://clickhouse.tech/blog/en/2021/fuzzing-clickhouse/).
|
||||
@ -332,7 +332,7 @@ We run tests with Yandex internal CI and job automation system named “Sandbox
|
||||
|
||||
Build jobs and tests are run in Sandbox on per commit basis. Resulting packages and test results are published in GitHub and can be downloaded by direct links. Artifacts are stored for several months. When you send a pull request on GitHub, we tag it as “can be tested” and our CI system will build ClickHouse packages (release, debug, with address sanitizer, etc) for you.
|
||||
|
||||
We don’t use Travis CI due to the limit on time and computational power.
|
||||
We don’t use Jenkins. It was used before and now we are happy we are not using Jenkins.
|
||||
We do not use Travis CI due to the limit on time and computational power.
|
||||
We do not use Jenkins. It was used before and now we are happy we are not using Jenkins.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/development/tests/) <!--hide-->
|
||||
|
@ -47,7 +47,7 @@ EXCHANGE TABLES new_table AND old_table;
|
||||
|
||||
### ReplicatedMergeTree in Atomic Database {#replicatedmergetree-in-atomic-database}
|
||||
|
||||
For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables is recomended do not specify parameters of engine - path in ZooKeeper and replica name. In this case will be used parameters of the configuration [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). If you want specify parameters of engine explicitly than recomended to use {uuid} macros. This is useful so that unique paths are automatically generated for each table in the ZooKeeper.
|
||||
For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables, it is recommended to not specify engine parameters - path in ZooKeeper and replica name. In this case, configuration parameters will be used [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). If you want to specify engine parameters explicitly, it is recommended to use {uuid} macros. This is useful so that unique paths are automatically generated for each table in ZooKeeper.
|
||||
|
||||
## See Also
|
||||
|
||||
|
@ -82,8 +82,8 @@ Virtual column is an integral table engine attribute that is defined in the engi
|
||||
|
||||
You shouldn’t specify virtual columns in the `CREATE TABLE` query and you can’t see them in `SHOW CREATE TABLE` and `DESCRIBE TABLE` query results. Virtual columns are also read-only, so you can’t insert data into virtual columns.
|
||||
|
||||
To select data from a virtual column, you must specify its name in the `SELECT` query. `SELECT *` doesn’t return values from virtual columns.
|
||||
To select data from a virtual column, you must specify its name in the `SELECT` query. `SELECT *` does not return values from virtual columns.
|
||||
|
||||
If you create a table with a column that has the same name as one of the table virtual columns, the virtual column becomes inaccessible. We don’t recommend doing this. To help avoid conflicts, virtual column names are usually prefixed with an underscore.
|
||||
If you create a table with a column that has the same name as one of the table virtual columns, the virtual column becomes inaccessible. We do not recommend doing this. To help avoid conflicts, virtual column names are usually prefixed with an underscore.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/engines/table-engines/) <!--hide-->
|
||||
|
@ -40,7 +40,7 @@ Required parameters:
|
||||
|
||||
- `kafka_broker_list` — A comma-separated list of brokers (for example, `localhost:9092`).
|
||||
- `kafka_topic_list` — A list of Kafka topics.
|
||||
- `kafka_group_name` — A group of Kafka consumers. Reading margins are tracked for each group separately. If you don’t want messages to be duplicated in the cluster, use the same group name everywhere.
|
||||
- `kafka_group_name` — A group of Kafka consumers. Reading margins are tracked for each group separately. If you do not want messages to be duplicated in the cluster, use the same group name everywhere.
|
||||
- `kafka_format` — Message format. Uses the same notation as the SQL `FORMAT` function, such as `JSONEachRow`. For more information, see the [Formats](../../../interfaces/formats.md) section.
|
||||
|
||||
Optional parameters:
|
||||
|
@ -15,7 +15,12 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
|
||||
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
|
||||
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
|
||||
...
|
||||
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
|
||||
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause'])
|
||||
SETTINGS
|
||||
[connection_pool_size=16, ]
|
||||
[connection_max_tries=3, ]
|
||||
[connection_auto_close=true ]
|
||||
;
|
||||
```
|
||||
|
||||
See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
|
||||
|
@ -38,7 +38,7 @@ Engines:
|
||||
|
||||
## Differences {#differences}
|
||||
|
||||
The `TinyLog` engine is the simplest in the family and provides the poorest functionality and lowest efficiency. The `TinyLog` engine doesn’t support parallel data reading by several threads in a single query. It reads data slower than other engines in the family that support parallel reading from a single query and it uses almost as many file descriptors as the `Log` engine because it stores each column in a separate file. Use it only in simple scenarios.
|
||||
The `TinyLog` engine is the simplest in the family and provides the poorest functionality and lowest efficiency. The `TinyLog` engine does not support parallel data reading by several threads in a single query. It reads data slower than other engines in the family that support parallel reading from a single query and it uses almost as many file descriptors as the `Log` engine because it stores each column in a separate file. Use it only in simple scenarios.
|
||||
|
||||
The `Log` and `StripeLog` engines support parallel data reading. When reading data, ClickHouse uses multiple threads. Each thread processes a separate data block. The `Log` engine uses a separate file for each column of the table. `StripeLog` stores all the data in one file. As a result, the `StripeLog` engine uses fewer file descriptors, but the `Log` engine provides higher efficiency when reading data.
|
||||
|
||||
|
@ -126,7 +126,7 @@ Also when there are at least 2 more “state” rows than “cancel” rows, or
|
||||
Thus, collapsing should not change the results of calculating statistics.
|
||||
Changes gradually collapsed so that in the end only the last state of almost every object left.
|
||||
|
||||
The `Sign` is required because the merging algorithm doesn’t guarantee that all of the rows with the same sorting key will be in the same resulting data part and even on the same physical server. ClickHouse process `SELECT` queries with multiple threads, and it can not predict the order of rows in the result. The aggregation is required if there is a need to get completely “collapsed” data from `CollapsingMergeTree` table.
|
||||
The `Sign` is required because the merging algorithm does not guarantee that all of the rows with the same sorting key will be in the same resulting data part and even on the same physical server. ClickHouse process `SELECT` queries with multiple threads, and it can not predict the order of rows in the result. The aggregation is required if there is a need to get completely “collapsed” data from `CollapsingMergeTree` table.
|
||||
|
||||
To finalize collapsing, write a query with `GROUP BY` clause and aggregate functions that account for the sign. For example, to calculate quantity, use `sum(Sign)` instead of `count()`. To calculate the sum of something, use `sum(Sign * x)` instead of `sum(x)`, and so on, and also add `HAVING sum(Sign) > 0`.
|
||||
|
||||
|
@ -33,6 +33,8 @@ ORDER BY (CounterID, StartDate, intHash32(UserID));
|
||||
|
||||
In this example, we set partitioning by the event types that occurred during the current week.
|
||||
|
||||
By default, the floating-point partition key is not supported. To use it enable the setting [allow_floating_point_partition_key](../../../operations/settings/merge-tree-settings.md#allow_floating_point_partition_key).
|
||||
|
||||
When inserting new data to a table, this data is stored as a separate part (chunk) sorted by the primary key. In 10-15 minutes after inserting, the parts of the same partition are merged into the entire part.
|
||||
|
||||
!!! info "Info"
|
||||
|
@ -7,7 +7,7 @@ toc_title: GraphiteMergeTree
|
||||
|
||||
This engine is designed for thinning and aggregating/averaging (rollup) [Graphite](http://graphite.readthedocs.io/en/latest/index.html) data. It may be helpful to developers who want to use ClickHouse as a data store for Graphite.
|
||||
|
||||
You can use any ClickHouse table engine to store the Graphite data if you don’t need rollup, but if you need a rollup use `GraphiteMergeTree`. The engine reduces the volume of storage and increases the efficiency of queries from Graphite.
|
||||
You can use any ClickHouse table engine to store the Graphite data if you do not need rollup, but if you need a rollup use `GraphiteMergeTree`. The engine reduces the volume of storage and increases the efficiency of queries from Graphite.
|
||||
|
||||
The engine inherits properties from [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md).
|
||||
|
||||
|
@ -64,7 +64,7 @@ For a description of parameters, see the [CREATE query description](../../../sql
|
||||
|
||||
ClickHouse uses the sorting key as a primary key if the primary key is not defined obviously by the `PRIMARY KEY` clause.
|
||||
|
||||
Use the `ORDER BY tuple()` syntax, if you don’t need sorting. See [Selecting the Primary Key](#selecting-the-primary-key).
|
||||
Use the `ORDER BY tuple()` syntax, if you do not need sorting. See [Selecting the Primary Key](#selecting-the-primary-key).
|
||||
|
||||
- `PARTITION BY` — The [partitioning key](../../../engines/table-engines/mergetree-family/custom-partitioning-key.md). Optional.
|
||||
|
||||
@ -162,7 +162,7 @@ Data parts can be stored in `Wide` or `Compact` format. In `Wide` format each co
|
||||
|
||||
Data storing format is controlled by the `min_bytes_for_wide_part` and `min_rows_for_wide_part` settings of the table engine. If the number of bytes or rows in a data part is less then the corresponding setting's value, the part is stored in `Compact` format. Otherwise it is stored in `Wide` format. If none of these settings is set, data parts are stored in `Wide` format.
|
||||
|
||||
Each data part is logically divided into granules. A granule is the smallest indivisible data set that ClickHouse reads when selecting data. ClickHouse doesn’t split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, ClickHouse creates an index file that stores the marks. For each column, whether it’s in the primary key or not, ClickHouse also stores the same marks. These marks let you find data directly in column files.
|
||||
Each data part is logically divided into granules. A granule is the smallest indivisible data set that ClickHouse reads when selecting data. ClickHouse does not split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, ClickHouse creates an index file that stores the marks. For each column, whether it’s in the primary key or not, ClickHouse also stores the same marks. These marks let you find data directly in column files.
|
||||
|
||||
The granule size is restricted by the `index_granularity` and `index_granularity_bytes` settings of the table engine. The number of rows in a granule lays in the `[1, index_granularity]` range, depending on the size of the rows. The size of a granule can exceed `index_granularity_bytes` if the size of a single row is greater than the value of the setting. In this case, the size of the granule equals the size of the row.
|
||||
|
||||
@ -227,7 +227,7 @@ This feature is helpful when using the [SummingMergeTree](../../../engines/table
|
||||
|
||||
In this case it makes sense to leave only a few columns in the primary key that will provide efficient range scans and add the remaining dimension columns to the sorting key tuple.
|
||||
|
||||
[ALTER](../../../sql-reference/statements/alter/index.md) of the sorting key is a lightweight operation because when a new column is simultaneously added to the table and to the sorting key, existing data parts don’t need to be changed. Since the old sorting key is a prefix of the new sorting key and there is no data in the newly added column, the data is sorted by both the old and new sorting keys at the moment of table modification.
|
||||
[ALTER](../../../sql-reference/statements/alter/index.md) of the sorting key is a lightweight operation because when a new column is simultaneously added to the table and to the sorting key, existing data parts do not need to be changed. Since the old sorting key is a prefix of the new sorting key and there is no data in the newly added column, the data is sorted by both the old and new sorting keys at the moment of table modification.
|
||||
|
||||
### Use of Indexes and Partitions in Queries {#use-of-indexes-and-partitions-in-queries}
|
||||
|
||||
@ -265,7 +265,7 @@ The key for partitioning by month allows reading only those data blocks which co
|
||||
|
||||
Consider, for example, the days of the month. They form a [monotonic sequence](https://en.wikipedia.org/wiki/Monotonic_function) for one month, but not monotonic for more extended periods. This is a partially-monotonic sequence. If a user creates the table with partially-monotonic primary key, ClickHouse creates a sparse index as usual. When a user selects data from this kind of table, ClickHouse analyzes the query conditions. If the user wants to get data between two marks of the index and both these marks fall within one month, ClickHouse can use the index in this particular case because it can calculate the distance between the parameters of a query and index marks.
|
||||
|
||||
ClickHouse cannot use an index if the values of the primary key in the query parameter range don’t represent a monotonic sequence. In this case, ClickHouse uses the full scan method.
|
||||
ClickHouse cannot use an index if the values of the primary key in the query parameter range do not represent a monotonic sequence. In this case, ClickHouse uses the full scan method.
|
||||
|
||||
ClickHouse uses this logic not only for days of the month sequences, but for any primary key that represents a partially-monotonic sequence.
|
||||
|
||||
|
@ -7,9 +7,9 @@ toc_title: ReplacingMergeTree
|
||||
|
||||
The engine differs from [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engines-mergetree) in that it removes duplicate entries with the same [sorting key](../../../engines/table-engines/mergetree-family/mergetree.md) value (`ORDER BY` table section, not `PRIMARY KEY`).
|
||||
|
||||
Data deduplication occurs only during a merge. Merging occurs in the background at an unknown time, so you can’t plan for it. Some of the data may remain unprocessed. Although you can run an unscheduled merge using the `OPTIMIZE` query, don’t count on using it, because the `OPTIMIZE` query will read and write a large amount of data.
|
||||
Data deduplication occurs only during a merge. Merging occurs in the background at an unknown time, so you can’t plan for it. Some of the data may remain unprocessed. Although you can run an unscheduled merge using the `OPTIMIZE` query, do not count on using it, because the `OPTIMIZE` query will read and write a large amount of data.
|
||||
|
||||
Thus, `ReplacingMergeTree` is suitable for clearing out duplicate data in the background in order to save space, but it doesn’t guarantee the absence of duplicates.
|
||||
Thus, `ReplacingMergeTree` is suitable for clearing out duplicate data in the background in order to save space, but it does not guarantee the absence of duplicates.
|
||||
|
||||
## Creating a Table {#creating-a-table}
|
||||
|
||||
@ -34,7 +34,7 @@ For a description of request parameters, see [statement description](../../../sq
|
||||
|
||||
**ReplacingMergeTree Parameters**
|
||||
|
||||
- `ver` — column with version. Type `UInt*`, `Date` or `DateTime`. Optional parameter.
|
||||
- `ver` — column with the version number. Type `UInt*`, `Date`, `DateTime` or `DateTime64`. Optional parameter.
|
||||
|
||||
When merging, `ReplacingMergeTree` from all the rows with the same sorting key leaves only one:
|
||||
|
||||
@ -66,5 +66,3 @@ All of the parameters excepting `ver` have the same meaning as in `MergeTree`.
|
||||
- `ver` - column with the version. Optional parameter. For a description, see the text above.
|
||||
|
||||
</details>
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/table_engines/replacingmergetree/) <!--hide-->
|
||||
|
@ -95,17 +95,19 @@ If ZooKeeper isn’t set in the config file, you can’t create replicated table
|
||||
|
||||
ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](../../../operations/settings/settings.md#settings-max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](../../../operations/settings/settings.md#settings-fallback_to_stale_replicas_for_distributed_queries).
|
||||
|
||||
For each `INSERT` query, approximately ten entries are added to ZooKeeper through several transactions. (To be more precise, this is for each inserted block of data; an INSERT query contains one block or one block per `max_insert_block_size = 1048576` rows.) This leads to slightly longer latencies for `INSERT` compared to non-replicated tables. But if you follow the recommendations to insert data in batches of no more than one `INSERT` per second, it doesn’t create any problems. The entire ClickHouse cluster used for coordinating one ZooKeeper cluster has a total of several hundred `INSERTs` per second. The throughput on data inserts (the number of rows per second) is just as high as for non-replicated data.
|
||||
For each `INSERT` query, approximately ten entries are added to ZooKeeper through several transactions. (To be more precise, this is for each inserted block of data; an INSERT query contains one block or one block per `max_insert_block_size = 1048576` rows.) This leads to slightly longer latencies for `INSERT` compared to non-replicated tables. But if you follow the recommendations to insert data in batches of no more than one `INSERT` per second, it does not create any problems. The entire ClickHouse cluster used for coordinating one ZooKeeper cluster has a total of several hundred `INSERTs` per second. The throughput on data inserts (the number of rows per second) is just as high as for non-replicated data.
|
||||
|
||||
For very large clusters, you can use different ZooKeeper clusters for different shards. However, this hasn’t proven necessary on the Yandex.Metrica cluster (approximately 300 servers).
|
||||
|
||||
Replication is asynchronous and multi-master. `INSERT` queries (as well as `ALTER`) can be sent to any available server. Data is inserted on the server where the query is run, and then it is copied to the other servers. Because it is asynchronous, recently inserted data appears on the other replicas with some latency. If part of the replicas are not available, the data is written when they become available. If a replica is available, the latency is the amount of time it takes to transfer the block of compressed data over the network. The number of threads performing background tasks for replicated tables can be set by [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size) setting.
|
||||
|
||||
`ReplicatedMergeTree` engine uses a separate thread pool for replicated fetches. Size of the pool is limited by the [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size) setting which can be tuned with a server restart.
|
||||
|
||||
By default, an INSERT query waits for confirmation of writing the data from only one replica. If the data was successfully written to only one replica and the server with this replica ceases to exist, the stored data will be lost. To enable getting confirmation of data writes from multiple replicas, use the `insert_quorum` option.
|
||||
|
||||
Each block of data is written atomically. The INSERT query is divided into blocks up to `max_insert_block_size = 1048576` rows. In other words, if the `INSERT` query has less than 1048576 rows, it is made atomically.
|
||||
|
||||
Data blocks are deduplicated. For multiple writes of the same data block (data blocks of the same size containing the same rows in the same order), the block is only written once. The reason for this is in case of network failures when the client application doesn’t know if the data was written to the DB, so the `INSERT` query can simply be repeated. It doesn’t matter which replica INSERTs were sent to with identical data. `INSERTs` are idempotent. Deduplication parameters are controlled by [merge_tree](../../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-merge_tree) server settings.
|
||||
Data blocks are deduplicated. For multiple writes of the same data block (data blocks of the same size containing the same rows in the same order), the block is only written once. The reason for this is in case of network failures when the client application does not know if the data was written to the DB, so the `INSERT` query can simply be repeated. It does not matter which replica INSERTs were sent to with identical data. `INSERTs` are idempotent. Deduplication parameters are controlled by [merge_tree](../../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-merge_tree) server settings.
|
||||
|
||||
During replication, only the source data to insert is transferred over the network. Further data transformation (merging) is coordinated and performed on all the replicas in the same way. This minimizes network usage, which means that replication works well when replicas reside in different datacenters. (Note that duplicating data in different datacenters is the main goal of replication.)
|
||||
|
||||
@ -172,7 +174,7 @@ In this case, the path consists of the following parts:
|
||||
|
||||
`{layer}-{shard}` is the shard identifier. In this example it consists of two parts, since the Yandex.Metrica cluster uses bi-level sharding. For most tasks, you can leave just the {shard} substitution, which will be expanded to the shard identifier.
|
||||
|
||||
`table_name` is the name of the node for the table in ZooKeeper. It is a good idea to make it the same as the table name. It is defined explicitly, because in contrast to the table name, it doesn’t change after a RENAME query.
|
||||
`table_name` is the name of the node for the table in ZooKeeper. It is a good idea to make it the same as the table name. It is defined explicitly, because in contrast to the table name, it does not change after a RENAME query.
|
||||
*HINT*: you could add a database name in front of `table_name` as well. E.g. `db_name.table_name`
|
||||
|
||||
The two built-in substitutions `{database}` and `{table}` can be used, they expand into the table name and the database name respectively (unless these macros are defined in the `macros` section). So the zookeeper path can be specified as `'/clickhouse/tables/{layer}-{shard}/{database}/{table}'`.
|
||||
@ -284,6 +286,7 @@ If the data in ZooKeeper was lost or damaged, you can save data by moving it to
|
||||
**See Also**
|
||||
|
||||
- [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size)
|
||||
- [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size)
|
||||
- [execute_merges_on_single_replica_time_threshold](../../../operations/settings/settings.md#execute-merges-on-single-replica-time-threshold)
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/table_engines/replication/) <!--hide-->
|
||||
|
@ -7,7 +7,7 @@ toc_title: SummingMergeTree
|
||||
|
||||
The engine inherits from [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engines-mergetree). The difference is that when merging data parts for `SummingMergeTree` tables ClickHouse replaces all the rows with the same primary key (or more accurately, with the same [sorting key](../../../engines/table-engines/mergetree-family/mergetree.md)) with one row which contains summarized values for the columns with the numeric data type. If the sorting key is composed in a way that a single key value corresponds to large number of rows, this significantly reduces storage volume and speeds up data selection.
|
||||
|
||||
We recommend to use the engine together with `MergeTree`. Store complete data in `MergeTree` table, and use `SummingMergeTree` for aggregated data storing, for example, when preparing reports. Such an approach will prevent you from losing valuable data due to an incorrectly composed primary key.
|
||||
We recommend using the engine together with `MergeTree`. Store complete data in `MergeTree` table, and use `SummingMergeTree` for aggregated data storing, for example, when preparing reports. Such an approach will prevent you from losing valuable data due to an incorrectly composed primary key.
|
||||
|
||||
## Creating a Table {#creating-a-table}
|
||||
|
||||
|
@ -133,7 +133,7 @@ When ClickHouse inserts data, it orders rows by the primary key. If the `Version
|
||||
|
||||
## Selecting Data {#selecting-data}
|
||||
|
||||
ClickHouse doesn’t guarantee that all of the rows with the same primary key will be in the same resulting data part or even on the same physical server. This is true both for writing the data and for subsequent merging of the data parts. In addition, ClickHouse processes `SELECT` queries with multiple threads, and it cannot predict the order of rows in the result. This means that aggregation is required if there is a need to get completely “collapsed” data from a `VersionedCollapsingMergeTree` table.
|
||||
ClickHouse does not guarantee that all of the rows with the same primary key will be in the same resulting data part or even on the same physical server. This is true both for writing the data and for subsequent merging of the data parts. In addition, ClickHouse processes `SELECT` queries with multiple threads, and it cannot predict the order of rows in the result. This means that aggregation is required if there is a need to get completely “collapsed” data from a `VersionedCollapsingMergeTree` table.
|
||||
|
||||
To finalize collapsing, write a query with a `GROUP BY` clause and aggregate functions that account for the sign. For example, to calculate quantity, use `sum(Sign)` instead of `count()`. To calculate the sum of something, use `sum(Sign * x)` instead of `sum(x)`, and add `HAVING sum(Sign) > 0`.
|
||||
|
||||
@ -219,7 +219,7 @@ HAVING sum(Sign) > 0
|
||||
└─────────────────────┴───────────┴──────────┴─────────┘
|
||||
```
|
||||
|
||||
If we don’t need aggregation and want to force collapsing, we can use the `FINAL` modifier for the `FROM` clause.
|
||||
If we do not need aggregation and want to force collapsing, we can use the `FINAL` modifier for the `FROM` clause.
|
||||
|
||||
``` sql
|
||||
SELECT * FROM UAct FINAL
|
||||
|
@ -20,11 +20,11 @@ Engine parameters:
|
||||
|
||||
Optional engine parameters:
|
||||
|
||||
- `flush_time`, `flush_rows`, `flush_bytes` – Conditions for flushing data from the buffer, that will happen only in background (ommited or zero means no `flush*` parameters).
|
||||
- `flush_time`, `flush_rows`, `flush_bytes` – Conditions for flushing data from the buffer, that will happen only in background (omitted or zero means no `flush*` parameters).
|
||||
|
||||
Data is flushed from the buffer and written to the destination table if all the `min*` conditions or at least one `max*` condition are met.
|
||||
|
||||
Also if at least one `flush*` condition are met flush initiated in background, this is different from `max*`, since `flush*` allows you to configure background flushes separately to avoid adding latency for `INSERT` (into `Buffer`) queries.
|
||||
Also, if at least one `flush*` condition are met flush initiated in background, this is different from `max*`, since `flush*` allows you to configure background flushes separately to avoid adding latency for `INSERT` (into `Buffer`) queries.
|
||||
|
||||
- `min_time`, `max_time`, `flush_time` – Condition for the time in seconds from the moment of the first write to the buffer.
|
||||
- `min_rows`, `max_rows`, `flush_rows` – Condition for the number of rows in the buffer.
|
||||
@ -49,12 +49,12 @@ You can set empty strings in single quotation marks for the database and table n
|
||||
When reading from a Buffer table, data is processed both from the buffer and from the destination table (if there is one).
|
||||
Note that the Buffer tables does not support an index. In other words, data in the buffer is fully scanned, which might be slow for large buffers. (For data in a subordinate table, the index that it supports will be used.)
|
||||
|
||||
If the set of columns in the Buffer table doesn’t match the set of columns in a subordinate table, a subset of columns that exist in both tables is inserted.
|
||||
If the set of columns in the Buffer table does not match the set of columns in a subordinate table, a subset of columns that exist in both tables is inserted.
|
||||
|
||||
If the types don’t match for one of the columns in the Buffer table and a subordinate table, an error message is entered in the server log and the buffer is cleared.
|
||||
The same thing happens if the subordinate table doesn’t exist when the buffer is flushed.
|
||||
If the types do not match for one of the columns in the Buffer table and a subordinate table, an error message is entered in the server log, and the buffer is cleared.
|
||||
The same thing happens if the subordinate table does not exist when the buffer is flushed.
|
||||
|
||||
If you need to run ALTER for a subordinate table and the Buffer table, we recommend first deleting the Buffer table, running ALTER for the subordinate table, then creating the Buffer table again.
|
||||
If you need to run ALTER for a subordinate table, and the Buffer table, we recommend first deleting the Buffer table, running ALTER for the subordinate table, then creating the Buffer table again.
|
||||
|
||||
If the server is restarted abnormally, the data in the buffer is lost.
|
||||
|
||||
@ -70,6 +70,6 @@ Due to these disadvantages, we can only recommend using a Buffer table in rare c
|
||||
|
||||
A Buffer table is used when too many INSERTs are received from a large number of servers over a unit of time and data can’t be buffered before insertion, which means the INSERTs can’t run fast enough.
|
||||
|
||||
Note that it doesn’t make sense to insert data one row at a time, even for Buffer tables. This will only produce a speed of a few thousand rows per second, while inserting larger blocks of data can produce over a million rows per second (see the section “Performance”).
|
||||
Note that it does not make sense to insert data one row at a time, even for Buffer tables. This will only produce a speed of a few thousand rows per second, while inserting larger blocks of data can produce over a million rows per second (see the section “Performance”).
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/table_engines/buffer/) <!--hide-->
|
||||
|
@ -25,7 +25,7 @@ The Distributed engine accepts parameters:
|
||||
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
|
||||
- [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) for the examples
|
||||
|
||||
Also it accept the following settings:
|
||||
Also, it accepts the following settings:
|
||||
|
||||
- `fsync_after_insert` - do the `fsync` for the file data after asynchronous insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.
|
||||
|
||||
@ -124,7 +124,7 @@ Replicas are duplicating servers (in order to read all the data, you can access
|
||||
Cluster names must not contain dots.
|
||||
|
||||
The parameters `host`, `port`, and optionally `user`, `password`, `secure`, `compression` are specified for each server:
|
||||
- `host` – The address of the remote server. You can use either the domain or the IPv4 or IPv6 address. If you specify the domain, the server makes a DNS request when it starts, and the result is stored as long as the server is running. If the DNS request fails, the server doesn’t start. If you change the DNS record, restart the server.
|
||||
- `host` – The address of the remote server. You can use either the domain or the IPv4 or IPv6 address. If you specify the domain, the server makes a DNS request when it starts, and the result is stored as long as the server is running. If the DNS request fails, the server does not start. If you change the DNS record, restart the server.
|
||||
- `port` – The TCP port for messenger activity (`tcp_port` in the config, usually set to 9000). Do not confuse it with http_port.
|
||||
- `user` – Name of the user for connecting to a remote server. Default value: default. This user must have access to connect to the specified server. Access is configured in the users.xml file. For more information, see the section [Access rights](../../../operations/access-rights.md).
|
||||
- `password` – The password for connecting to a remote server (not masked). Default value: empty string.
|
||||
@ -143,13 +143,13 @@ To view your clusters, use the `system.clusters` table.
|
||||
|
||||
The Distributed engine allows working with a cluster like a local server. However, the cluster is inextensible: you must write its configuration in the server config file (even better, for all the cluster’s servers).
|
||||
|
||||
The Distributed engine requires writing clusters to the config file. Clusters from the config file are updated on the fly, without restarting the server. If you need to send a query to an unknown set of shards and replicas each time, you don’t need to create a Distributed table – use the `remote` table function instead. See the section [Table functions](../../../sql-reference/table-functions/index.md).
|
||||
The Distributed engine requires writing clusters to the config file. Clusters from the config file are updated on the fly, without restarting the server. If you need to send a query to an unknown set of shards and replicas each time, you do not need to create a Distributed table – use the `remote` table function instead. See the section [Table functions](../../../sql-reference/table-functions/index.md).
|
||||
|
||||
There are two methods for writing data to a cluster:
|
||||
|
||||
First, you can define which servers to write which data to and perform the write directly on each shard. In other words, perform INSERT in the tables that the distributed table “looks at”. This is the most flexible solution as you can use any sharding scheme, which could be non-trivial due to the requirements of the subject area. This is also the most optimal solution since data can be written to different shards completely independently.
|
||||
|
||||
Second, you can perform INSERT in a Distributed table. In this case, the table will distribute the inserted data across the servers itself. In order to write to a Distributed table, it must have a sharding key set (the last parameter). In addition, if there is only one shard, the write operation works without specifying the sharding key, since it doesn’t mean anything in this case.
|
||||
Second, you can perform INSERT in a Distributed table. In this case, the table will distribute the inserted data across the servers itself. In order to write to a Distributed table, it must have a sharding key set (the last parameter). In addition, if there is only one shard, the write operation works without specifying the sharding key, since it does not mean anything in this case.
|
||||
|
||||
Each shard can have a weight defined in the config file. By default, the weight is equal to one. Data is distributed across shards in the amount proportional to the shard weight. For example, if there are two shards and the first has a weight of 9 while the second has a weight of 10, the first will be sent 9 / 19 parts of the rows, and the second will be sent 10 / 19.
|
||||
|
||||
@ -165,7 +165,7 @@ The sharding expression can be any expression from constants and table columns t
|
||||
|
||||
A simple reminder from the division is a limited solution for sharding and isn’t always appropriate. It works for medium and large volumes of data (dozens of servers), but not for very large volumes of data (hundreds of servers or more). In the latter case, use the sharding scheme required by the subject area, rather than using entries in Distributed tables.
|
||||
|
||||
SELECT queries are sent to all the shards and work regardless of how data is distributed across the shards (they can be distributed completely randomly). When you add a new shard, you don’t have to transfer the old data to it. You can write new data with a heavier weight – the data will be distributed slightly unevenly, but queries will work correctly and efficiently.
|
||||
SELECT queries are sent to all the shards and work regardless of how data is distributed across the shards (they can be distributed completely randomly). When you add a new shard, you do not have to transfer the old data to it. You can write new data with a heavier weight – the data will be distributed slightly unevenly, but queries will work correctly and efficiently.
|
||||
|
||||
You should be concerned about the sharding scheme in the following cases:
|
||||
|
||||
|
@ -9,7 +9,7 @@ ClickHouse allows sending a server the data that is needed for processing a quer
|
||||
|
||||
For example, if you have a text file with important user identifiers, you can upload it to the server along with a query that uses filtration by this list.
|
||||
|
||||
If you need to run more than one query with a large volume of external data, don’t use this feature. It is better to upload the data to the DB ahead of time.
|
||||
If you need to run more than one query with a large volume of external data, do not use this feature. It is better to upload the data to the DB ahead of time.
|
||||
|
||||
External data can be uploaded using the command-line client (in non-interactive mode), or using the HTTP interface.
|
||||
|
||||
|
@ -24,7 +24,7 @@ The `Format` parameter specifies one of the available file formats. To perform
|
||||
`INSERT` queries – for output. The available formats are listed in the
|
||||
[Formats](../../../interfaces/formats.md#formats) section.
|
||||
|
||||
ClickHouse does not allow to specify filesystem path for`File`. It will use folder defined by [path](../../../operations/server-configuration-parameters/settings.md) setting in server configuration.
|
||||
ClickHouse does not allow specifying filesystem path for`File`. It will use folder defined by [path](../../../operations/server-configuration-parameters/settings.md) setting in server configuration.
|
||||
|
||||
When creating table using `File(Format)` it creates empty subdirectory in that folder. When data is written to that table, it’s put into `data.Format` file in that subdirectory.
|
||||
|
||||
|
@ -28,7 +28,7 @@ See the detailed description of the [CREATE TABLE](../../../sql-reference/statem
|
||||
- `join_type` – [JOIN type](../../../sql-reference/statements/select/join.md#select-join-types).
|
||||
- `k1[, k2, ...]` – Key columns from the `USING` clause that the `JOIN` operation is made with.
|
||||
|
||||
Enter `join_strictness` and `join_type` parameters without quotes, for example, `Join(ANY, LEFT, col1)`. They must match the `JOIN` operation that the table will be used for. If the parameters don’t match, ClickHouse doesn’t throw an exception and may return incorrect data.
|
||||
Enter `join_strictness` and `join_type` parameters without quotes, for example, `Join(ANY, LEFT, col1)`. They must match the `JOIN` operation that the table will be used for. If the parameters do not match, ClickHouse does not throw an exception and may return incorrect data.
|
||||
|
||||
## Table Usage {#table-usage}
|
||||
|
||||
|
@ -6,7 +6,7 @@ toc_title: Memory
|
||||
# Memory Table Engine {#memory}
|
||||
|
||||
The Memory engine stores data in RAM, in uncompressed form. Data is stored in exactly the same form as it is received when read. In other words, reading from this table is completely free.
|
||||
Concurrent data access is synchronized. Locks are short: read and write operations don’t block each other.
|
||||
Concurrent data access is synchronized. Locks are short: read and write operations do not block each other.
|
||||
Indexes are not supported. Reading is parallelized.
|
||||
|
||||
Maximal productivity (over 10 GB/sec) is reached on simple queries, because there is no reading from the disk, decompressing, or deserializing data. (We should note that in many cases, the productivity of the MergeTree engine is almost as high.)
|
||||
|
@ -22,4 +22,4 @@ Here is the illustration of the difference between traditional row-oriented syst
|
||||
**Columnar**
|
||||
![Columnar](https://clickhouse.tech/docs/en/images/column-oriented.gif#)
|
||||
|
||||
A columnar database is a preferred choice for analytical applications because it allows to have many columns in a table just in case, but don’t pay the cost for unused columns on read query execution time. Column-oriented databases are designed for big data processing because and data warehousing, they often natively scale using distributed clusters of low-cost hardware to increase throughput. ClickHouse does it with combination of [distributed](../../engines/table-engines/special/distributed.md) and [replicated](../../engines/table-engines/mergetree-family/replication.md) tables.
|
||||
A columnar database is a preferred choice for analytical applications because it allows to have many columns in a table just in case, but do not pay the cost for unused columns on read query execution time. Column-oriented databases are designed for big data processing because and data warehousing, they often natively scale using distributed clusters of low-cost hardware to increase throughput. ClickHouse does it with combination of [distributed](../../engines/table-engines/special/distributed.md) and [replicated](../../engines/table-engines/mergetree-family/replication.md) tables.
|
||||
|
@ -6,7 +6,7 @@ toc_priority: 10
|
||||
|
||||
# What Does “ClickHouse” Mean? {#what-does-clickhouse-mean}
|
||||
|
||||
It’s a combination of “**Click**stream” and “Data ware**House**”. It comes from the original use case at Yandex.Metrica, where ClickHouse was supposed to keep records of all clicks by people from all over the Internet and it still does the job. You can read more about this use case on [ClickHouse history](../../introduction/history.md) page.
|
||||
It’s a combination of “**Click**stream” and “Data ware**House**”. It comes from the original use case at Yandex.Metrica, where ClickHouse was supposed to keep records of all clicks by people from all over the Internet, and it still does the job. You can read more about this use case on [ClickHouse history](../../introduction/history.md) page.
|
||||
|
||||
This two-part meaning has two consequences:
|
||||
|
||||
|
@ -15,9 +15,9 @@ One of the following batches of those t-shirts was supposed to be given away on
|
||||
|
||||
So, what does it mean? Here are some ways to translate *“не тормозит”*:
|
||||
|
||||
- If you translate it literally, it’d be something like *“ClickHouse doesn’t press the brake pedal”*.
|
||||
- If you translate it literally, it’d be something like *“ClickHouse does not press the brake pedal”*.
|
||||
- If you’d want to express it as close to how it sounds to a Russian person with IT background, it’d be something like *“If your larger system lags, it’s not because it uses ClickHouse”*.
|
||||
- Shorter, but not so precise versions could be *“ClickHouse is not slow”*, *“ClickHouse doesn’t lag”* or just *“ClickHouse is fast”*.
|
||||
- Shorter, but not so precise versions could be *“ClickHouse is not slow”*, *“ClickHouse does not lag”* or just *“ClickHouse is fast”*.
|
||||
|
||||
If you haven’t seen one of those t-shirts in person, you can check them out online in many ClickHouse-related videos. For example, this one:
|
||||
|
||||
|
@ -31,7 +31,7 @@ All database management systems could be classified into two groups: OLAP (Onlin
|
||||
|
||||
In practice OLAP and OLTP are not categories, it’s more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems integrated, which might be not so big deal but having more systems make it more expensive to maintain. So the trend of recent years is HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of the workload are handled equally well by a single database management system.
|
||||
|
||||
Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../faq/general/why-clickhouse-is-so-fast.md) and it still doesn’t have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.
|
||||
Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../faq/general/why-clickhouse-is-so-fast.md) and it still does not have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.
|
||||
|
||||
The fundamental trade-off between OLAP and OLTP systems remains:
|
||||
|
||||
|
@ -6,9 +6,9 @@ toc_priority: 9
|
||||
|
||||
# Who Is Using ClickHouse? {#who-is-using-clickhouse}
|
||||
|
||||
Being an open-source product makes this question not so straightforward to answer. You don’t have to tell anyone if you want to start using ClickHouse, you just go grab source code or pre-compiled packages. There’s no contract to sign and the [Apache 2.0 license](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE) allows for unconstrained software distribution.
|
||||
Being an open-source product makes this question not so straightforward to answer. You do not have to tell anyone if you want to start using ClickHouse, you just go grab source code or pre-compiled packages. There’s no contract to sign and the [Apache 2.0 license](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE) allows for unconstrained software distribution.
|
||||
|
||||
Also, the technology stack is often in a grey zone of what’s covered by an NDA. Some companies consider technologies they use as a competitive advantage even if they are open-source and don’t allow employees to share any details publicly. Some see some PR risks and allow employees to share implementation details only with their PR department approval.
|
||||
Also, the technology stack is often in a grey zone of what’s covered by an NDA. Some companies consider technologies they use as a competitive advantage even if they are open-source and do not allow employees to share any details publicly. Some see some PR risks and allow employees to share implementation details only with their PR department approval.
|
||||
|
||||
So how to tell who is using ClickHouse?
|
||||
|
||||
|
@ -39,7 +39,7 @@ Question candidates:
|
||||
- How to kill a process (query) in ClickHouse?
|
||||
- How to implement pivot (like in pandas)?
|
||||
- How to remove the default ClickHouse user through users.d?
|
||||
- Importing MySQL dump to Clickhouse
|
||||
- Importing MySQL dump to ClickHouse
|
||||
- Window function workarounds (row_number, lag/lead, running diff/sum/average)
|
||||
##}
|
||||
|
||||
|
@ -12,7 +12,7 @@ The short answer is “yes”. ClickHouse has multiple mechanisms that allow fre
|
||||
|
||||
ClickHouse allows to automatically drop values when some condition happens. This condition is configured as an expression based on any columns, usually just static offset for any timestamp column.
|
||||
|
||||
The key advantage of this approach is that it doesn’t need any external system to trigger, once TTL is configured, data removal happens automatically in background.
|
||||
The key advantage of this approach is that it does not need any external system to trigger, once TTL is configured, data removal happens automatically in background.
|
||||
|
||||
!!! note "Note"
|
||||
TTL can also be used to move data not only to [/dev/null](https://en.wikipedia.org/wiki/Null_device), but also between different storage systems, like from SSD to HDD.
|
||||
@ -21,7 +21,7 @@ More details on [configuring TTL](../../engines/table-engines/mergetree-family/m
|
||||
|
||||
## ALTER DELETE {#alter-delete}
|
||||
|
||||
ClickHouse doesn’t have real-time point deletes like in [OLTP](https://en.wikipedia.org/wiki/Online_transaction_processing) databases. The closest thing to them are mutations. They are issued as `ALTER ... DELETE` or `ALTER ... UPDATE` queries to distinguish from normal `DELETE` or `UPDATE` as they are asynchronous batch operations, not immediate modifications. The rest of syntax after `ALTER TABLE` prefix is similar.
|
||||
ClickHouse does not have real-time point deletes like in [OLTP](https://en.wikipedia.org/wiki/Online_transaction_processing) databases. The closest thing to them are mutations. They are issued as `ALTER ... DELETE` or `ALTER ... UPDATE` queries to distinguish from normal `DELETE` or `UPDATE` as they are asynchronous batch operations, not immediate modifications. The rest of syntax after `ALTER TABLE` prefix is similar.
|
||||
|
||||
`ALTER DELETE` can be issued to flexibly remove old data. If you need to do it regularly, the main downside will be the need to have an external system to submit the query. There are also some performance considerations since mutation rewrite complete parts even there’s only a single row to be deleted.
|
||||
|
||||
|
@ -25,7 +25,7 @@ Here’re some key points to get reasonable fidelity in a pre-production environ
|
||||
- Don’t make it read-only with some frozen data.
|
||||
- Don’t make it write-only with just copying data without building some typical reports.
|
||||
- Don’t wipe it clean instead of applying schema migrations.
|
||||
- Use a sample of real production data and queries. Try to choose a sample that’s still representative and makes `SELECT` queries return reasonable results. Use obfuscation if your data is sensitive and internal policies don’t allow it to leave the production environment.
|
||||
- Use a sample of real production data and queries. Try to choose a sample that’s still representative and makes `SELECT` queries return reasonable results. Use obfuscation if your data is sensitive and internal policies do not allow it to leave the production environment.
|
||||
- Make sure that pre-production is covered by your monitoring and alerting software the same way as your production environment does.
|
||||
- If your production spans across multiple datacenters or regions, make your pre-production does the same.
|
||||
- If your production uses complex features like replication, distributed table, cascading materialize views, make sure they are configured similarly in pre-production.
|
||||
@ -61,8 +61,8 @@ For production use, there are two key options: `stable` and `lts`. Here is some
|
||||
|
||||
- `stable` is the kind of package we recommend by default. They are released roughly monthly (and thus provide new features with reasonable delay) and three latest stable releases are supported in terms of diagnostics and backporting of bugfixes.
|
||||
- `lts` are released twice a year and are supported for a year after their initial release. You might prefer them over `stable` in the following cases:
|
||||
- Your company has some internal policies that don’t allow for frequent upgrades or using non-LTS software.
|
||||
- You are using ClickHouse in some secondary products that either doesn’t require any complex ClickHouse features and don’t have enough resources to keep it updated.
|
||||
- Your company has some internal policies that do not allow for frequent upgrades or using non-LTS software.
|
||||
- You are using ClickHouse in some secondary products that either does not require any complex ClickHouse features and do not have enough resources to keep it updated.
|
||||
|
||||
Many teams who initially thought that `lts` is the way to go, often switch to `stable` anyway because of some recent feature that’s important for their product.
|
||||
|
||||
|
@ -132,7 +132,7 @@ To start the server as a daemon, run:
|
||||
$ sudo service clickhouse-server start
|
||||
```
|
||||
|
||||
If you don’t have `service` command, run as
|
||||
If you do not have `service` command, run as
|
||||
|
||||
``` bash
|
||||
$ sudo /etc/init.d/clickhouse-server start
|
||||
@ -140,7 +140,7 @@ $ sudo /etc/init.d/clickhouse-server start
|
||||
|
||||
See the logs in the `/var/log/clickhouse-server/` directory.
|
||||
|
||||
If the server doesn’t start, check the configurations in the file `/etc/clickhouse-server/config.xml`.
|
||||
If the server does not start, check the configurations in the file `/etc/clickhouse-server/config.xml`.
|
||||
|
||||
You can also manually launch the server from the console:
|
||||
|
||||
@ -149,7 +149,7 @@ $ clickhouse-server --config-file=/etc/clickhouse-server/config.xml
|
||||
```
|
||||
|
||||
In this case, the log will be printed to the console, which is convenient during development.
|
||||
If the configuration file is in the current directory, you don’t need to specify the `--config-file` parameter. By default, it uses `./config.xml`.
|
||||
If the configuration file is in the current directory, you do not need to specify the `--config-file` parameter. By default, it uses `./config.xml`.
|
||||
|
||||
ClickHouse supports access restriction settings. They are located in the `users.xml` file (next to `config.xml`).
|
||||
By default, access is allowed from anywhere for the `default` user, without a password. See `user/default/networks`.
|
||||
|
@ -105,7 +105,7 @@ Syntax for creating tables is way more complicated compared to databases (see [r
|
||||
2. Table schema, i.e. list of columns and their [data types](../sql-reference/data-types/index.md).
|
||||
3. [Table engine](../engines/table-engines/index.md) and its settings, which determines all the details on how queries to this table will be physically executed.
|
||||
|
||||
Yandex.Metrica is a web analytics service, and sample dataset doesn’t cover its full functionality, so there are only two tables to create:
|
||||
Yandex.Metrica is a web analytics service, and sample dataset does not cover its full functionality, so there are only two tables to create:
|
||||
|
||||
- `hits` is a table with each action done by all users on all websites covered by the service.
|
||||
- `visits` is a table that contains pre-built sessions instead of individual actions.
|
||||
|
@ -20,7 +20,7 @@ For more information about training CatBoost models, see [Training and applying
|
||||
|
||||
## Prerequisites {#prerequisites}
|
||||
|
||||
If you don’t have the [Docker](https://docs.docker.com/install/) yet, install it.
|
||||
If you do not have the [Docker](https://docs.docker.com/install/) yet, install it.
|
||||
|
||||
!!! note "Note"
|
||||
[Docker](https://www.docker.com) is a software platform that allows you to create containers that isolate a CatBoost and ClickHouse installation from the rest of the system.
|
||||
|
@ -54,7 +54,7 @@ The higher the load on the system, the more important it is to customize the sys
|
||||
- There is one large table per query. All tables are small, except for one.
|
||||
- A query result is significantly smaller than the source data. In other words, data is filtered or aggregated, so the result fits in a single server’s RAM.
|
||||
|
||||
It is easy to see that the OLAP scenario is very different from other popular scenarios (such as OLTP or Key-Value access). So it doesn’t make sense to try to use OLTP or a Key-Value DB for processing analytical queries if you want to get decent performance. For example, if you try to use MongoDB or Redis for analytics, you will get very poor performance compared to OLAP databases.
|
||||
It is easy to see that the OLAP scenario is very different from other popular scenarios (such as OLTP or Key-Value access). So it does not make sense to try to use OLTP or a Key-Value DB for processing analytical queries if you want to get decent performance. For example, if you try to use MongoDB or Redis for analytics, you will get very poor performance compared to OLAP databases.
|
||||
|
||||
## Why Column-Oriented Databases Work Better in the OLAP Scenario {#why-column-oriented-databases-work-better-in-the-olap-scenario}
|
||||
|
||||
@ -80,15 +80,15 @@ For example, the query “count the number of records for each advertising platf
|
||||
|
||||
### CPU {#cpu}
|
||||
|
||||
Since executing a query requires processing a large number of rows, it helps to dispatch all operations for entire vectors instead of for separate rows, or to implement the query engine so that there is almost no dispatching cost. If you don’t do this, with any half-decent disk subsystem, the query interpreter inevitably stalls the CPU. It makes sense to both store data in columns and process it, when possible, by columns.
|
||||
Since executing a query requires processing a large number of rows, it helps to dispatch all operations for entire vectors instead of for separate rows, or to implement the query engine so that there is almost no dispatching cost. If you do not do this, with any half-decent disk subsystem, the query interpreter inevitably stalls the CPU. It makes sense to both store data in columns and process it, when possible, by columns.
|
||||
|
||||
There are two ways to do this:
|
||||
|
||||
1. A vector engine. All operations are written for vectors, instead of for separate values. This means you don’t need to call operations very often, and dispatching costs are negligible. Operation code contains an optimized internal cycle.
|
||||
1. A vector engine. All operations are written for vectors, instead of for separate values. This means you do not need to call operations very often, and dispatching costs are negligible. Operation code contains an optimized internal cycle.
|
||||
|
||||
2. Code generation. The code generated for the query has all the indirect calls in it.
|
||||
|
||||
This is not done in “normal” databases, because it doesn’t make sense when running simple queries. However, there are exceptions. For example, MemSQL uses code generation to reduce latency when processing SQL queries. (For comparison, analytical DBMSs require optimization of throughput, not latency.)
|
||||
This is not done in “normal” databases, because it does not make sense when running simple queries. However, there are exceptions. For example, MemSQL uses code generation to reduce latency when processing SQL queries. (For comparison, analytical DBMSs require optimization of throughput, not latency.)
|
||||
|
||||
Note that for CPU efficiency, the query language must be declarative (SQL or MDX), or at least a vector (J, K). The query should only contain implicit loops, allowing for optimization.
|
||||
|
||||
|
@ -66,7 +66,7 @@ When processing a query, the client shows:
|
||||
3. The result in the specified format.
|
||||
4. The number of lines in the result, the time passed, and the average speed of query processing.
|
||||
|
||||
You can cancel a long query by pressing Ctrl+C. However, you will still need to wait for a little for the server to abort the request. It is not possible to cancel a query at certain stages. If you don’t wait and press Ctrl+C a second time, the client will exit.
|
||||
You can cancel a long query by pressing Ctrl+C. However, you will still need to wait for a little for the server to abort the request. It is not possible to cancel a query at certain stages. If you do not wait and press Ctrl+C a second time, the client will exit.
|
||||
|
||||
The command-line client allows passing external data (external temporary tables) for querying. For more information, see the section “External data for query processing”.
|
||||
|
||||
|
@ -662,7 +662,7 @@ ClickHouse allows:
|
||||
- Any order of key-value pairs in the object.
|
||||
- Omitting some values.
|
||||
|
||||
ClickHouse ignores spaces between elements and commas after the objects. You can pass all the objects in one line. You don’t have to separate them with line breaks.
|
||||
ClickHouse ignores spaces between elements and commas after the objects. You can pass all the objects in one line. You do not have to separate them with line breaks.
|
||||
|
||||
**Omitted values processing**
|
||||
|
||||
@ -770,9 +770,9 @@ SELECT * FROM json_each_row_nested
|
||||
|
||||
## Native {#native}
|
||||
|
||||
The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is “columnar” – it doesn’t convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
|
||||
The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is “columnar” – it does not convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
|
||||
|
||||
You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It doesn’t make sense to work with this format yourself.
|
||||
You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It does not make sense to work with this format yourself.
|
||||
|
||||
## Null {#null}
|
||||
|
||||
@ -1039,7 +1039,7 @@ struct Message {
|
||||
}
|
||||
```
|
||||
|
||||
Deserialization is effective and usually doesn’t increase the system load.
|
||||
Deserialization is effective and usually does not increase the system load.
|
||||
|
||||
See also [Format Schema](#formatschema).
|
||||
|
||||
@ -1312,7 +1312,7 @@ ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` q
|
||||
|
||||
Unsupported ORC data types: `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`.
|
||||
|
||||
The data types of ClickHouse table columns don’t have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) the data to the data type set for the ClickHouse table column.
|
||||
The data types of ClickHouse table columns do not have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) the data to the data type set for the ClickHouse table column.
|
||||
|
||||
### Inserting Data {#inserting-data-2}
|
||||
|
||||
|
@ -52,7 +52,7 @@ X-ClickHouse-Summary: {"read_rows":"0","read_bytes":"0","written_rows":"0","writ
|
||||
```
|
||||
|
||||
As you can see, curl is somewhat inconvenient in that spaces must be URL escaped.
|
||||
Although wget escapes everything itself, we don’t recommend using it because it doesn’t work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
|
||||
Although wget escapes everything itself, we do not recommend using it because it does not work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
|
||||
|
||||
``` bash
|
||||
$ echo 'SELECT 1' | curl 'http://localhost:8123/' --data-binary @-
|
||||
@ -146,7 +146,7 @@ Deleting the table.
|
||||
$ echo 'DROP TABLE t' | curl 'http://localhost:8123/' --data-binary @-
|
||||
```
|
||||
|
||||
For successful requests that don’t return a data table, an empty response body is returned.
|
||||
For successful requests that do not return a data table, an empty response body is returned.
|
||||
|
||||
|
||||
## Compression {#compression}
|
||||
@ -273,7 +273,7 @@ Possible header fields:
|
||||
- `written_rows` — Number of rows written.
|
||||
- `written_bytes` — Volume of data written in bytes.
|
||||
|
||||
Running requests don’t stop automatically if the HTTP connection is lost. Parsing and data formatting are performed on the server-side, and using the network might be ineffective.
|
||||
Running requests do not stop automatically if the HTTP connection is lost. Parsing and data formatting are performed on the server-side, and using the network might be ineffective.
|
||||
The optional ‘query_id’ parameter can be passed as the query ID (any string). For more information, see the section “Settings, replace_running_query”.
|
||||
|
||||
The optional ‘quota_key’ parameter can be passed as the quota key (any string). For more information, see the section “Quotas”.
|
||||
|
@ -61,7 +61,7 @@ Unlike other database management systems, secondary indexes in ClickHouse does n
|
||||
|
||||
## Suitable for Online Queries {#suitable-for-online-queries}
|
||||
|
||||
Most OLAP database management systems don’t aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more which forces to prepare reports offline (in advance or by responding with “come back later”).
|
||||
Most OLAP database management systems do not aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more which forces to prepare reports offline (in advance or by responding with “come back later”).
|
||||
|
||||
In ClickHouse low latency means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment while the user interface page is loading. In other words, online.
|
||||
|
||||
|
@ -31,7 +31,7 @@ To see all users, roles, profiles, etc. and all their grants use [SHOW ACCESS](.
|
||||
|
||||
## Usage {#access-control-usage}
|
||||
|
||||
By default, the ClickHouse server provides the `default` user account which is not allowed using SQL-driven access control and account management but has all the rights and permissions. The `default` user account is used in any cases when the username is not defined, for example, at login from client or in distributed queries. In distributed query processing a default user account is used, if the configuration of the server or cluster doesn’t specify the [user and password](../engines/table-engines/special/distributed.md) properties.
|
||||
By default, the ClickHouse server provides the `default` user account which is not allowed using SQL-driven access control and account management but has all the rights and permissions. The `default` user account is used in any cases when the username is not defined, for example, at login from client or in distributed queries. In distributed query processing a default user account is used, if the configuration of the server or cluster does not specify the [user and password](../engines/table-engines/special/distributed.md) properties.
|
||||
|
||||
If you just started using ClickHouse, consider the following scenario:
|
||||
|
||||
|
@ -5,7 +5,7 @@ toc_title: Data Backup
|
||||
|
||||
# Data Backup {#data-backup}
|
||||
|
||||
While [replication](../engines/table-engines/mergetree-family/replication.md) provides protection from hardware failures, it does not protect against human errors: accidental deletion of data, deletion of the wrong table or a table on the wrong cluster, and software bugs that result in incorrect data processing or data corruption. In many cases mistakes like these will affect all replicas. ClickHouse has built-in safeguards to prevent some types of mistakes — for example, by default [you can’t just drop tables with a MergeTree-like engine containing more than 50 Gb of data](server-configuration-parameters/settings.md#max-table-size-to-drop). However, these safeguards don’t cover all possible cases and can be circumvented.
|
||||
While [replication](../engines/table-engines/mergetree-family/replication.md) provides protection from hardware failures, it does not protect against human errors: accidental deletion of data, deletion of the wrong table or a table on the wrong cluster, and software bugs that result in incorrect data processing or data corruption. In many cases mistakes like these will affect all replicas. ClickHouse has built-in safeguards to prevent some types of mistakes — for example, by default [you can’t just drop tables with a MergeTree-like engine containing more than 50 Gb of data](server-configuration-parameters/settings.md#max-table-size-to-drop). However, these safeguards do not cover all possible cases and can be circumvented.
|
||||
|
||||
In order to effectively mitigate possible human errors, you should carefully prepare a strategy for backing up and restoring your data **in advance**.
|
||||
|
||||
@ -30,7 +30,7 @@ For smaller volumes of data, a simple `INSERT INTO ... SELECT ...` to remote tab
|
||||
|
||||
## Manipulations with Parts {#manipulations-with-parts}
|
||||
|
||||
ClickHouse allows using the `ALTER TABLE ... FREEZE PARTITION ...` query to create a local copy of table partitions. This is implemented using hardlinks to the `/var/lib/clickhouse/shadow/` folder, so it usually does not consume extra disk space for old data. The created copies of files are not handled by ClickHouse server, so you can just leave them there: you will have a simple backup that doesn’t require any additional external system, but it will still be prone to hardware issues. For this reason, it’s better to remotely copy them to another location and then remove the local copies. Distributed filesystems and object stores are still a good options for this, but normal attached file servers with a large enough capacity might work as well (in this case the transfer will occur via the network filesystem or maybe [rsync](https://en.wikipedia.org/wiki/Rsync)).
|
||||
ClickHouse allows using the `ALTER TABLE ... FREEZE PARTITION ...` query to create a local copy of table partitions. This is implemented using hardlinks to the `/var/lib/clickhouse/shadow/` folder, so it usually does not consume extra disk space for old data. The created copies of files are not handled by ClickHouse server, so you can just leave them there: you will have a simple backup that does not require any additional external system, but it will still be prone to hardware issues. For this reason, it’s better to remotely copy them to another location and then remove the local copies. Distributed filesystems and object stores are still a good options for this, but normal attached file servers with a large enough capacity might work as well (in this case the transfer will occur via the network filesystem or maybe [rsync](https://en.wikipedia.org/wiki/Rsync)).
|
||||
Data can be restored from backup using the `ALTER TABLE ... ATTACH PARTITION ...`
|
||||
|
||||
For more information about queries related to partition manipulations, see the [ALTER documentation](../sql-reference/statements/alter/partition.md#alter_manipulations-with-partitions).
|
||||
|
@ -5,9 +5,9 @@ toc_title: Configuration Files
|
||||
|
||||
# Configuration Files {#configuration_files}
|
||||
|
||||
ClickHouse supports multi-file configuration management. The main server configuration file is `/etc/clickhouse-server/config.xml`. Other files must be in the `/etc/clickhouse-server/config.d` directory.
|
||||
ClickHouse supports multi-file configuration management. The main server configuration file is `/etc/clickhouse-server/config.xml` or `/etc/clickhouse-server/config.yaml`. Other files must be in the `/etc/clickhouse-server/config.d` directory. Note, that any configuration file can be written either in XML or YAML, but mixing formats in one file is not supported. For example, you can have main configs as `config.xml` and `users.xml` and write additional files in `config.d` and `users.d` directories in `.yaml`.
|
||||
|
||||
All the configuration files should be in XML format. Also, they should have the same root element, usually `<yandex>`.
|
||||
All the configuration files should be in XML or YAML formats. All XML files should have the same root element, usually `<yandex>`. As for YAML, `yandex:` should not be present, the parser will insert it automatically.
|
||||
|
||||
## Override {#override}
|
||||
|
||||
@ -32,7 +32,7 @@ Users configuration can be splitted into separate files similar to `config.xml`
|
||||
Directory name is defined as `users_config` setting without `.xml` postfix concatenated with `.d`.
|
||||
Directory `users.d` is used by default, as `users_config` defaults to `users.xml`.
|
||||
|
||||
## Example {#example}
|
||||
## XML example {#example}
|
||||
|
||||
For example, you can have separate config file for each user like this:
|
||||
|
||||
@ -55,6 +55,70 @@ $ cat /etc/clickhouse-server/users.d/alice.xml
|
||||
</yandex>
|
||||
```
|
||||
|
||||
## YAML examples {#example}
|
||||
|
||||
Here you can see default config written in YAML: [config.yaml.example](https://github.com/ClickHouse/ClickHouse/blob/master/programs/server/config.yaml.example).
|
||||
|
||||
There are some differences between YAML and XML formats in terms of ClickHouse configurations. Here are some tips for writing a configuration in YAML format.
|
||||
|
||||
You should use a Scalar node to write a key-value pair:
|
||||
``` yaml
|
||||
key: value
|
||||
```
|
||||
|
||||
To create a node, containing other nodes you should use a Map:
|
||||
``` yaml
|
||||
map_key:
|
||||
key1: val1
|
||||
key2: val2
|
||||
key3: val3
|
||||
```
|
||||
|
||||
To create a list of values or nodes assigned to one tag you should use a Sequence:
|
||||
``` yaml
|
||||
seq_key:
|
||||
- val1
|
||||
- val2
|
||||
- key1: val3
|
||||
- map:
|
||||
key2: val4
|
||||
key3: val5
|
||||
```
|
||||
|
||||
If you want to write an attribute for a Sequence or Map node, you should use a @ prefix before the attribute key. Note, that @ is reserved by YAML standard, so you should also to wrap it into double quotes:
|
||||
|
||||
``` yaml
|
||||
map:
|
||||
"@attr1": value1
|
||||
"@attr2": value2
|
||||
key: 123
|
||||
```
|
||||
|
||||
From that Map we will get these XML nodes:
|
||||
|
||||
``` xml
|
||||
<map attr1="value1" attr2="value2">
|
||||
<key>123</key>
|
||||
</map>
|
||||
```
|
||||
|
||||
You can also set attributes for Sequence:
|
||||
|
||||
``` yaml
|
||||
seq:
|
||||
- "@attr1": value1
|
||||
- "@attr2": value2
|
||||
- 123
|
||||
- abc
|
||||
```
|
||||
|
||||
So, we can get YAML config equal to this XML one:
|
||||
|
||||
``` xml
|
||||
<seq attr1="value1" attr2="value2">123</seq>
|
||||
<seq attr1="value1" attr2="value2">abc</seq>
|
||||
```
|
||||
|
||||
## Implementation Details {#implementation-details}
|
||||
|
||||
For each config file, the server also generates `file-preprocessed.xml` files when starting. These files contain all the completed substitutions and overrides, and they are intended for informational use. If ZooKeeper substitutions were used in the config files but ZooKeeper is not available on the server start, the server loads the configuration from the preprocessed file.
|
||||
|
@ -17,6 +17,7 @@ To define LDAP server you must add `ldap_servers` section to the `config.xml`.
|
||||
<yandex>
|
||||
<!- ... -->
|
||||
<ldap_servers>
|
||||
<!- Typical LDAP server. -->
|
||||
<my_ldap_server>
|
||||
<host>localhost</host>
|
||||
<port>636</port>
|
||||
@ -31,6 +32,18 @@ To define LDAP server you must add `ldap_servers` section to the `config.xml`.
|
||||
<tls_ca_cert_dir>/path/to/tls_ca_cert_dir</tls_ca_cert_dir>
|
||||
<tls_cipher_suite>ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:AES256-GCM-SHA384</tls_cipher_suite>
|
||||
</my_ldap_server>
|
||||
|
||||
<!- Typical Active Directory with configured user DN detection for further role mapping. -->
|
||||
<my_ad_server>
|
||||
<host>localhost</host>
|
||||
<port>389</port>
|
||||
<bind_dn>EXAMPLE\{user_name}</bind_dn>
|
||||
<user_dn_detection>
|
||||
<base_dn>CN=Users,DC=example,DC=com</base_dn>
|
||||
<search_filter>(&(objectClass=user)(sAMAccountName={user_name}))</search_filter>
|
||||
</user_dn_detection>
|
||||
<enable_tls>no</enable_tls>
|
||||
</my_ad_server>
|
||||
</ldap_servers>
|
||||
</yandex>
|
||||
```
|
||||
@ -43,6 +56,15 @@ Note, that you can define multiple LDAP servers inside the `ldap_servers` sectio
|
||||
- `port` — LDAP server port, default is `636` if `enable_tls` is set to `true`, `389` otherwise.
|
||||
- `bind_dn` — Template used to construct the DN to bind to.
|
||||
- The resulting DN will be constructed by replacing all `{user_name}` substrings of the template with the actual user name during each authentication attempt.
|
||||
- `user_dn_detection` - Section with LDAP search parameters for detecting the actual user DN of the bound user.
|
||||
- This is mainly used in search filters for further role mapping when the server is Active Directory. The resulting user DN will be used when replacing `{user_dn}` substrings wherever they are allowed. By default, user DN is set equal to bind DN, but once search is performed, it will be updated with to the actual detected user DN value.
|
||||
- `base_dn` - Template used to construct the base DN for the LDAP search.
|
||||
- The resulting DN will be constructed by replacing all `{user_name}` and `{bind_dn}` substrings of the template with the actual user name and bind DN during the LDAP search.
|
||||
- `scope` - Scope of the LDAP search.
|
||||
- Accepted values are: `base`, `one_level`, `children`, `subtree` (the default).
|
||||
- `search_filter` - Template used to construct the search filter for the LDAP search.
|
||||
- The resulting filter will be constructed by replacing all `{user_name}`, `{bind_dn}`, and `{base_dn}` substrings of the template with the actual user name, bind DN, and base DN during the LDAP search.
|
||||
- Note, that the special characters must be escaped properly in XML.
|
||||
- `verification_cooldown` — A period of time, in seconds, after a successful bind attempt, during which the user will be assumed to be successfully authenticated for all consecutive requests without contacting the LDAP server.
|
||||
- Specify `0` (the default) to disable caching and force contacting the LDAP server for each authentication request.
|
||||
- `enable_tls` — A flag to trigger the use of the secure connection to the LDAP server.
|
||||
@ -107,7 +129,7 @@ Goes into `config.xml`.
|
||||
<yandex>
|
||||
<!- ... -->
|
||||
<user_directories>
|
||||
<!- ... -->
|
||||
<!- Typical LDAP server. -->
|
||||
<ldap>
|
||||
<server>my_ldap_server</server>
|
||||
<roles>
|
||||
@ -122,6 +144,18 @@ Goes into `config.xml`.
|
||||
<prefix>clickhouse_</prefix>
|
||||
</role_mapping>
|
||||
</ldap>
|
||||
|
||||
<!- Typical Active Directory with role mapping that relies on the detected user DN. -->
|
||||
<ldap>
|
||||
<server>my_ad_server</server>
|
||||
<role_mapping>
|
||||
<base_dn>CN=Users,DC=example,DC=com</base_dn>
|
||||
<attribute>CN</attribute>
|
||||
<scope>subtree</scope>
|
||||
<search_filter>(&(objectClass=group)(member={user_dn}))</search_filter>
|
||||
<prefix>clickhouse_</prefix>
|
||||
</role_mapping>
|
||||
</ldap>
|
||||
</user_directories>
|
||||
</yandex>
|
||||
```
|
||||
@ -137,13 +171,13 @@ Note that `my_ldap_server` referred in the `ldap` section inside the `user_direc
|
||||
- When a user authenticates, while still bound to LDAP, an LDAP search is performed using `search_filter` and the name of the logged-in user. For each entry found during that search, the value of the specified attribute is extracted. For each attribute value that has the specified prefix, the prefix is removed, and the rest of the value becomes the name of a local role defined in ClickHouse, which is expected to be created beforehand by the [CREATE ROLE](../../sql-reference/statements/create/role.md#create-role-statement) statement.
|
||||
- There can be multiple `role_mapping` sections defined inside the same `ldap` section. All of them will be applied.
|
||||
- `base_dn` — Template used to construct the base DN for the LDAP search.
|
||||
- The resulting DN will be constructed by replacing all `{user_name}` and `{bind_dn}` substrings of the template with the actual user name and bind DN during each LDAP search.
|
||||
- The resulting DN will be constructed by replacing all `{user_name}`, `{bind_dn}`, and `{user_dn}` substrings of the template with the actual user name, bind DN, and user DN during each LDAP search.
|
||||
- `scope` — Scope of the LDAP search.
|
||||
- Accepted values are: `base`, `one_level`, `children`, `subtree` (the default).
|
||||
- `search_filter` — Template used to construct the search filter for the LDAP search.
|
||||
- The resulting filter will be constructed by replacing all `{user_name}`, `{bind_dn}` and `{base_dn}` substrings of the template with the actual user name, bind DN and base DN during each LDAP search.
|
||||
- The resulting filter will be constructed by replacing all `{user_name}`, `{bind_dn}`, `{user_dn}`, and `{base_dn}` substrings of the template with the actual user name, bind DN, user DN, and base DN during each LDAP search.
|
||||
- Note, that the special characters must be escaped properly in XML.
|
||||
- `attribute` — Attribute name whose values will be returned by the LDAP search.
|
||||
- `attribute` — Attribute name whose values will be returned by the LDAP search. `cn`, by default.
|
||||
- `prefix` — Prefix, that will be expected to be in front of each string in the original list of strings returned by the LDAP search. The prefix will be removed from the original strings and the resulting strings will be treated as local role names. Empty by default.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/external-authenticators/ldap/) <!--hide-->
|
||||
|
@ -11,13 +11,13 @@ To use profiler:
|
||||
|
||||
- Setup the [trace_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-trace_log) section of the server configuration.
|
||||
|
||||
This section configures the [trace_log](../../operations/system-tables/trace_log.md#system_tables-trace_log) system table containing the results of the profiler functioning. It is configured by default. Remember that data in this table is valid only for a running server. After the server restart, ClickHouse doesn’t clean up the table and all the stored virtual memory address may become invalid.
|
||||
This section configures the [trace_log](../../operations/system-tables/trace_log.md#system_tables-trace_log) system table containing the results of the profiler functioning. It is configured by default. Remember that data in this table is valid only for a running server. After the server restart, ClickHouse does not clean up the table and all the stored virtual memory address may become invalid.
|
||||
|
||||
- Setup the [query_profiler_cpu_time_period_ns](../../operations/settings/settings.md#query_profiler_cpu_time_period_ns) or [query_profiler_real_time_period_ns](../../operations/settings/settings.md#query_profiler_real_time_period_ns) settings. Both settings can be used simultaneously.
|
||||
|
||||
These settings allow you to configure profiler timers. As these are the session settings, you can get different sampling frequency for the whole server, individual users or user profiles, for your interactive session, and for each individual query.
|
||||
|
||||
The default sampling frequency is one sample per second and both CPU and real timers are enabled. This frequency allows collecting enough information about ClickHouse cluster. At the same time, working with this frequency, profiler doesn’t affect ClickHouse server’s performance. If you need to profile each individual query try to use higher sampling frequency.
|
||||
The default sampling frequency is one sample per second and both CPU and real timers are enabled. This frequency allows collecting enough information about ClickHouse cluster. At the same time, working with this frequency, profiler does not affect ClickHouse server’s performance. If you need to profile each individual query try to use higher sampling frequency.
|
||||
|
||||
To analyze the `trace_log` system table:
|
||||
|
||||
|
@ -72,7 +72,7 @@ The resource consumption calculated for each interval is output to the server lo
|
||||
</statbox>
|
||||
```
|
||||
|
||||
For the ‘statbox’ quota, restrictions are set for every hour and for every 24 hours (86,400 seconds). The time interval is counted, starting from an implementation-defined fixed moment in time. In other words, the 24-hour interval doesn’t necessarily begin at midnight.
|
||||
For the ‘statbox’ quota, restrictions are set for every hour and for every 24 hours (86,400 seconds). The time interval is counted, starting from an implementation-defined fixed moment in time. In other words, the 24-hour interval does not necessarily begin at midnight.
|
||||
|
||||
When the interval ends, all collected values are cleared. For the next hour, the quota calculation starts over.
|
||||
|
||||
|
@ -502,12 +502,12 @@ The default `max_server_memory_usage` value is calculated as `memory_amount * ma
|
||||
|
||||
## max_server_memory_usage_to_ram_ratio {#max_server_memory_usage_to_ram_ratio}
|
||||
|
||||
Defines the fraction of total physical RAM amount, available to the Clickhouse server. If the server tries to utilize more, the memory is cut down to the appropriate amount.
|
||||
Defines the fraction of total physical RAM amount, available to the ClickHouse server. If the server tries to utilize more, the memory is cut down to the appropriate amount.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Positive double.
|
||||
- 0 — The Clickhouse server can use all available RAM.
|
||||
- 0 — The ClickHouse server can use all available RAM.
|
||||
|
||||
Default value: `0`.
|
||||
|
||||
@ -826,7 +826,7 @@ Use the following parameters to configure logging:
|
||||
- `engine` - [MergeTree Engine Definition](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) for a system table. Can't be used if `partition_by` defined.
|
||||
- `flush_interval_milliseconds` – Interval for flushing data from the buffer in memory to the table.
|
||||
|
||||
If the table doesn’t exist, ClickHouse will create it. If the structure of the query log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
|
||||
If the table does not exist, ClickHouse will create it. If the structure of the query log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -853,7 +853,7 @@ Use the following parameters to configure logging:
|
||||
- `engine` - [MergeTree Engine Definition](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) for a system table. Can't be used if `partition_by` defined.
|
||||
- `flush_interval_milliseconds` – Interval for flushing data from the buffer in memory to the table.
|
||||
|
||||
If the table doesn’t exist, ClickHouse will create it. If the structure of the query thread log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
|
||||
If the table does not exist, ClickHouse will create it. If the structure of the query thread log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1155,7 +1155,7 @@ This setting only applies to the `MergeTree` family. It can be specified:
|
||||
If `use_minimalistic_part_header_in_zookeeper = 1`, then [replicated](../../engines/table-engines/mergetree-family/replication.md) tables store the headers of the data parts compactly using a single `znode`. If the table contains many columns, this storage method significantly reduces the volume of the data stored in Zookeeper.
|
||||
|
||||
!!! attention "Attention"
|
||||
After applying `use_minimalistic_part_header_in_zookeeper = 1`, you can’t downgrade the ClickHouse server to a version that doesn’t support this setting. Be careful when upgrading ClickHouse on servers in a cluster. Don’t upgrade all the servers at once. It is safer to test new versions of ClickHouse in a test environment, or on just a few servers of a cluster.
|
||||
After applying `use_minimalistic_part_header_in_zookeeper = 1`, you can’t downgrade the ClickHouse server to a version that does not support this setting. Be careful when upgrading ClickHouse on servers in a cluster. Don’t upgrade all the servers at once. It is safer to test new versions of ClickHouse in a test environment, or on just a few servers of a cluster.
|
||||
|
||||
Data part headers already stored with this setting can't be restored to their previous (non-compact) representation.
|
||||
|
||||
|
@ -251,4 +251,15 @@ Possible values:
|
||||
|
||||
Default value: -1 (unlimited).
|
||||
|
||||
## allow_floating_point_partition_key {#allow_floating_point_partition_key}
|
||||
|
||||
Enables to allow floating-point number as a partition key.
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — Floating-point partition key not allowed.
|
||||
- 1 — Floating-point partition key allowed.
|
||||
|
||||
Default value: `0`.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/settings/merge_tree_settings/) <!--hide-->
|
||||
|
@ -19,7 +19,7 @@ It can take one of two values: `throw` or `break`. Restrictions on aggregation (
|
||||
|
||||
`break` – Stop executing the query and return the partial result, as if the source data ran out.
|
||||
|
||||
`any (only for group_by_overflow_mode)` – Continuing aggregation for the keys that got into the set, but don’t add new keys to the set.
|
||||
`any (only for group_by_overflow_mode)` – Continuing aggregation for the keys that got into the set, but do not add new keys to the set.
|
||||
|
||||
## max_memory_usage {#settings_max_memory_usage}
|
||||
|
||||
@ -27,7 +27,7 @@ The maximum amount of RAM to use for running a query on a single server.
|
||||
|
||||
In the default configuration file, the maximum is 10 GB.
|
||||
|
||||
The setting doesn’t consider the volume of available memory or the total volume of memory on the machine.
|
||||
The setting does not consider the volume of available memory or the total volume of memory on the machine.
|
||||
The restriction applies to a single query within a single server.
|
||||
You can use `SHOW PROCESSLIST` to see the current memory consumption for each query.
|
||||
Besides, the peak memory consumption is tracked for each query and written to the log.
|
||||
@ -288,7 +288,7 @@ Defines what action ClickHouse performs when any of the following join limits is
|
||||
Possible values:
|
||||
|
||||
- `THROW` — ClickHouse throws an exception and breaks operation.
|
||||
- `BREAK` — ClickHouse breaks operation and doesn’t throw an exception.
|
||||
- `BREAK` — ClickHouse breaks operation and does not throw an exception.
|
||||
|
||||
Default value: `THROW`.
|
||||
|
||||
|
@ -365,13 +365,37 @@ throws an exception.
|
||||
|
||||
## input_format_null_as_default {#settings-input-format-null-as-default}
|
||||
|
||||
Enables or disables using default values if input data contain `NULL`, but the data type of the corresponding column in not `Nullable(T)` (for text input formats).
|
||||
Enables or disables the initialization of [NULL](../../sql-reference/syntax.md#null-literal) fields with [default values](../../sql-reference/statements/create/table.md#create-default-values), if data type of these fields is not [nullable](../../sql-reference/data-types/nullable.md#data_type-nullable).
|
||||
If column type is not nullable and this setting is disabled, then inserting `NULL` causes an exception. If column type is nullable, then `NULL` values are inserted as is, regardless of this setting.
|
||||
|
||||
This setting is applicable to [INSERT ... VALUES](../../sql-reference/statements/insert-into.md) queries for text input formats.
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — Inserting `NULL` into a not nullable column causes an exception.
|
||||
- 1 — `NULL` fields are initialized with default column values.
|
||||
|
||||
Default value: `1`.
|
||||
|
||||
## insert_null_as_default {#insert_null_as_default}
|
||||
|
||||
Enables or disables the insertion of [default values](../../sql-reference/statements/create/table.md#create-default-values) instead of [NULL](../../sql-reference/syntax.md#null-literal) into columns with not [nullable](../../sql-reference/data-types/nullable.md#data_type-nullable) data type.
|
||||
If column type is not nullable and this setting is disabled, then inserting `NULL` causes an exception. If column type is nullable, then `NULL` values are inserted as is, regardless of this setting.
|
||||
|
||||
This setting is applicable to [INSERT ... SELECT](../../sql-reference/statements/insert-into.md#insert_query_insert-select) queries. Note that `SELECT` subqueries may be concatenated with `UNION ALL` clause.
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — Inserting `NULL` into a not nullable column causes an exception.
|
||||
- 1 — Default column value is inserted instead of `NULL`.
|
||||
|
||||
Default value: `1`.
|
||||
|
||||
## input_format_skip_unknown_fields {#settings-input-format-skip-unknown-fields}
|
||||
|
||||
Enables or disables skipping insertion of extra data.
|
||||
|
||||
When writing data, ClickHouse throws an exception if input data contain columns that do not exist in the target table. If skipping is enabled, ClickHouse doesn’t insert extra data and doesn’t throw an exception.
|
||||
When writing data, ClickHouse throws an exception if input data contain columns that do not exist in the target table. If skipping is enabled, ClickHouse does not insert extra data and does not throw an exception.
|
||||
|
||||
Supported formats:
|
||||
|
||||
@ -428,7 +452,7 @@ Default value: 1.
|
||||
|
||||
Allows choosing a parser of the text representation of date and time.
|
||||
|
||||
The setting doesn’t apply to [date and time functions](../../sql-reference/functions/date-time-functions.md).
|
||||
The setting does not apply to [date and time functions](../../sql-reference/functions/date-time-functions.md).
|
||||
|
||||
Possible values:
|
||||
|
||||
@ -455,15 +479,15 @@ Possible values:
|
||||
|
||||
- `simple` - Simple output format.
|
||||
|
||||
Clickhouse output date and time `YYYY-MM-DD hh:mm:ss` format. For example, `2019-08-20 10:18:56`. The calculation is performed according to the data type's time zone (if present) or server time zone.
|
||||
ClickHouse output date and time `YYYY-MM-DD hh:mm:ss` format. For example, `2019-08-20 10:18:56`. The calculation is performed according to the data type's time zone (if present) or server time zone.
|
||||
|
||||
- `iso` - ISO output format.
|
||||
|
||||
Clickhouse output date and time in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) `YYYY-MM-DDThh:mm:ssZ` format. For example, `2019-08-20T10:18:56Z`. Note that output is in UTC (`Z` means UTC).
|
||||
ClickHouse output date and time in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) `YYYY-MM-DDThh:mm:ssZ` format. For example, `2019-08-20T10:18:56Z`. Note that output is in UTC (`Z` means UTC).
|
||||
|
||||
- `unix_timestamp` - Unix timestamp output format.
|
||||
|
||||
Clickhouse output date and time in [Unix timestamp](https://en.wikipedia.org/wiki/Unix_time) format. For example `1566285536`.
|
||||
ClickHouse output date and time in [Unix timestamp](https://en.wikipedia.org/wiki/Unix_time) format. For example `1566285536`.
|
||||
|
||||
Default value: `simple`.
|
||||
|
||||
@ -662,7 +686,7 @@ Default value: 8.
|
||||
|
||||
## merge_tree_max_rows_to_use_cache {#setting-merge-tree-max-rows-to-use-cache}
|
||||
|
||||
If ClickHouse should read more than `merge_tree_max_rows_to_use_cache` rows in one query, it doesn’t use the cache of uncompressed blocks.
|
||||
If ClickHouse should read more than `merge_tree_max_rows_to_use_cache` rows in one query, it does not use the cache of uncompressed blocks.
|
||||
|
||||
The cache of uncompressed blocks stores data extracted for queries. ClickHouse uses this cache to speed up responses to repeated small queries. This setting protects the cache from trashing by queries that read a large amount of data. The [uncompressed_cache_size](../../operations/server-configuration-parameters/settings.md#server-settings-uncompressed_cache_size) server setting defines the size of the cache of uncompressed blocks.
|
||||
|
||||
@ -674,7 +698,7 @@ Default value: 128 ✕ 8192.
|
||||
|
||||
## merge_tree_max_bytes_to_use_cache {#setting-merge-tree-max-bytes-to-use-cache}
|
||||
|
||||
If ClickHouse should read more than `merge_tree_max_bytes_to_use_cache` bytes in one query, it doesn’t use the cache of uncompressed blocks.
|
||||
If ClickHouse should read more than `merge_tree_max_bytes_to_use_cache` bytes in one query, it does not use the cache of uncompressed blocks.
|
||||
|
||||
The cache of uncompressed blocks stores data extracted for queries. ClickHouse uses this cache to speed up responses to repeated small queries. This setting protects the cache from trashing by queries that read a large amount of data. The [uncompressed_cache_size](../../operations/server-configuration-parameters/settings.md#server-settings-uncompressed_cache_size) server setting defines the size of the cache of uncompressed blocks.
|
||||
|
||||
@ -816,8 +840,8 @@ Result:
|
||||
The size of blocks (in a count of rows) to form for insertion into a table.
|
||||
This setting only applies in cases when the server forms the blocks.
|
||||
For example, for an INSERT via the HTTP interface, the server parses the data format and forms blocks of the specified size.
|
||||
But when using clickhouse-client, the client parses the data itself, and the ‘max_insert_block_size’ setting on the server doesn’t affect the size of the inserted blocks.
|
||||
The setting also doesn’t have a purpose when using INSERT SELECT, since data is inserted using the same blocks that are formed after SELECT.
|
||||
But when using clickhouse-client, the client parses the data itself, and the ‘max_insert_block_size’ setting on the server does not affect the size of the inserted blocks.
|
||||
The setting also does not have a purpose when using INSERT SELECT, since data is inserted using the same blocks that are formed after SELECT.
|
||||
|
||||
Default value: 1,048,576.
|
||||
|
||||
@ -887,7 +911,7 @@ Higher values will lead to higher memory usage.
|
||||
The maximum size of blocks of uncompressed data before compressing for writing to a table. By default, 1,048,576 (1 MiB). Specifying smaller block size generally leads to slightly reduced compression ratio, the compression and decompression speed increases slightly due to cache locality, and memory consumption is reduced.
|
||||
|
||||
!!! note "Warning"
|
||||
This is an expert-level setting, and you shouldn't change it if you're just getting started with Clickhouse.
|
||||
This is an expert-level setting, and you shouldn't change it if you're just getting started with ClickHouse.
|
||||
|
||||
Don’t confuse blocks for compression (a chunk of memory consisting of bytes) with blocks for query processing (a set of rows from a table).
|
||||
|
||||
@ -904,7 +928,7 @@ We are writing a UInt32-type column (4 bytes per value). When writing 8192 rows,
|
||||
We are writing a URL column with the String type (average size of 60 bytes per value). When writing 8192 rows, the average will be slightly less than 500 KB of data. Since this is more than 65,536, a compressed block will be formed for each mark. In this case, when reading data from the disk in the range of a single mark, extra data won’t be decompressed.
|
||||
|
||||
!!! note "Warning"
|
||||
This is an expert-level setting, and you shouldn't change it if you're just getting started with Clickhouse.
|
||||
This is an expert-level setting, and you shouldn't change it if you're just getting started with ClickHouse.
|
||||
|
||||
## max_query_size {#settings-max_query_size}
|
||||
|
||||
@ -1018,7 +1042,7 @@ For queries that read at least a somewhat large volume of data (one million rows
|
||||
When using the HTTP interface, the ‘query_id’ parameter can be passed. This is any string that serves as the query identifier.
|
||||
If a query from the same user with the same ‘query_id’ already exists at this time, the behaviour depends on the ‘replace_running_query’ parameter.
|
||||
|
||||
`0` (default) – Throw an exception (don’t allow the query to run if a query with the same ‘query_id’ is already running).
|
||||
`0` (default) – Throw an exception (do not allow the query to run if a query with the same ‘query_id’ is already running).
|
||||
|
||||
`1` – Cancel the old query and start running the new one.
|
||||
|
||||
@ -1077,7 +1101,7 @@ load_balancing = nearest_hostname
|
||||
The number of errors is counted for each replica. Every 5 minutes, the number of errors is integrally divided by 2. Thus, the number of errors is calculated for a recent time with exponential smoothing. If there is one replica with a minimal number of errors (i.e. errors occurred recently on the other replicas), the query is sent to it. If there are multiple replicas with the same minimal number of errors, the query is sent to the replica with a hostname that is most similar to the server’s hostname in the config file (for the number of different characters in identical positions, up to the minimum length of both hostnames).
|
||||
|
||||
For instance, example01-01-1 and example01-01-2.yandex.ru are different in one position, while example01-01-1 and example01-02-2 differ in two places.
|
||||
This method might seem primitive, but it doesn’t require external data about network topology, and it doesn’t compare IP addresses, which would be complicated for our IPv6 addresses.
|
||||
This method might seem primitive, but it does not require external data about network topology, and it does not compare IP addresses, which would be complicated for our IPv6 addresses.
|
||||
|
||||
Thus, if there are equivalent replicas, the closest one by name is preferred.
|
||||
We can also assume that when sending a query to the same server, in the absence of failures, a distributed query will also go to the same servers. So even if different data is placed on the replicas, the query will return mostly the same results.
|
||||
@ -1149,7 +1173,7 @@ Default value: `1`.
|
||||
|
||||
This setting is useful for replicated tables with a sampling key. A query may be processed faster if it is executed on several servers in parallel. But the query performance may degrade in the following cases:
|
||||
|
||||
- The position of the sampling key in the partitioning key doesn't allow efficient range scans.
|
||||
- The position of the sampling key in the partitioning key does not allow efficient range scans.
|
||||
- Adding a sampling key to the table makes filtering by other columns less efficient.
|
||||
- The sampling key is an expression that is expensive to calculate.
|
||||
- The cluster latency distribution has a long tail, so that querying more servers increases the query overall latency.
|
||||
@ -1171,7 +1195,7 @@ For testing, the value can be set to 0: compilation runs synchronously and the q
|
||||
If the value is 1 or more, compilation occurs asynchronously in a separate thread. The result will be used as soon as it is ready, including queries that are currently running.
|
||||
|
||||
Compiled code is required for each different combination of aggregate functions used in the query and the type of keys in the GROUP BY clause.
|
||||
The results of the compilation are saved in the build directory in the form of .so files. There is no restriction on the number of compilation results since they don’t use very much space. Old results will be used after server restarts, except in the case of a server upgrade – in this case, the old results are deleted.
|
||||
The results of the compilation are saved in the build directory in the form of .so files. There is no restriction on the number of compilation results since they do not use very much space. Old results will be used after server restarts, except in the case of a server upgrade – in this case, the old results are deleted.
|
||||
|
||||
## output_format_json_quote_64bit_integers {#session_settings-output_format_json_quote_64bit_integers}
|
||||
|
||||
@ -1505,7 +1529,7 @@ Possible values:
|
||||
|
||||
- 1 — skipping enabled.
|
||||
|
||||
If a shard is unavailable, ClickHouse returns a result based on partial data and doesn’t report node availability issues.
|
||||
If a shard is unavailable, ClickHouse returns a result based on partial data and does not report node availability issues.
|
||||
|
||||
- 0 — skipping disabled.
|
||||
|
||||
@ -1520,8 +1544,8 @@ Do not merge aggregation states from different servers for distributed query pro
|
||||
Possible values:
|
||||
|
||||
- 0 — Disabled (final query processing is done on the initiator node).
|
||||
- 1 - Do not merge aggregation states from different servers for distributed query processing (query completelly processed on the shard, initiator only proxy the data).
|
||||
- 2 - Same as 1 but apply `ORDER BY` and `LIMIT` on the initiator (can be used for queries with `ORDER BY` and/or `LIMIT`).
|
||||
- 1 - Do not merge aggregation states from different servers for distributed query processing (query completelly processed on the shard, initiator only proxy the data), can be used in case it is for certain that there are different keys on different shards.
|
||||
- 2 - Same as `1` but applies `ORDER BY` and `LIMIT` (it is not possilbe when the query processed completelly on the remote node, like for `distributed_group_by_no_merge=1`) on the initiator (can be used for queries with `ORDER BY` and/or `LIMIT`).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1613,7 +1637,7 @@ Enables or disables query execution if [optimize_skip_unused_shards](#optimize-s
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — Disabled. ClickHouse doesn’t throw an exception.
|
||||
- 0 — Disabled. ClickHouse does not throw an exception.
|
||||
- 1 — Enabled. Query execution is disabled only if the table has a sharding key.
|
||||
- 2 — Enabled. Query execution is disabled regardless of whether a sharding key is defined for the table.
|
||||
|
||||
@ -1758,7 +1782,7 @@ Default value: 0.
|
||||
Sets the priority ([nice](https://en.wikipedia.org/wiki/Nice_(Unix))) for threads that execute queries. The OS scheduler considers this priority when choosing the next thread to run on each available CPU core.
|
||||
|
||||
!!! warning "Warning"
|
||||
To use this setting, you need to set the `CAP_SYS_NICE` capability. The `clickhouse-server` package sets it up during installation. Some virtual environments don’t allow you to set the `CAP_SYS_NICE` capability. In this case, `clickhouse-server` shows a message about it at the start.
|
||||
To use this setting, you need to set the `CAP_SYS_NICE` capability. The `clickhouse-server` package sets it up during installation. Some virtual environments do not allow you to set the `CAP_SYS_NICE` capability. In this case, `clickhouse-server` shows a message about it at the start.
|
||||
|
||||
Possible values:
|
||||
|
||||
@ -2034,6 +2058,16 @@ Possible values:
|
||||
|
||||
Default value: 16.
|
||||
|
||||
## background_fetches_pool_size {#background_fetches_pool_size}
|
||||
|
||||
Sets the number of threads performing background fetches for [replicated](../../engines/table-engines/mergetree-family/replication.md) tables. This setting is applied at the ClickHouse server start and can’t be changed in a user session. For production usage with frequent small insertions or slow ZooKeeper cluster is recomended to use default value.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 8.
|
||||
|
||||
## always_fetch_merged_part {#always_fetch_merged_part}
|
||||
|
||||
Prohibits data parts merging in [Replicated\*MergeTree](../../engines/table-engines/mergetree-family/replication.md)-engine tables.
|
||||
@ -2043,7 +2077,7 @@ When merging is prohibited, the replica never merges parts and always downloads
|
||||
Possible values:
|
||||
|
||||
- 0 — `Replicated*MergeTree`-engine tables merge data parts at the replica.
|
||||
- 1 — `Replicated*MergeTree`-engine tables don’t merge data parts at the replica. The tables download merged data parts from other replicas.
|
||||
- 1 — `Replicated*MergeTree`-engine tables do not merge data parts at the replica. The tables download merged data parts from other replicas.
|
||||
|
||||
Default value: 0.
|
||||
|
||||
@ -2174,7 +2208,7 @@ Allows or restricts using the [LowCardinality](../../sql-reference/data-types/lo
|
||||
|
||||
If usage of `LowCardinality` is restricted, ClickHouse server converts `LowCardinality`-columns to ordinary ones for `SELECT` queries, and convert ordinary columns to `LowCardinality`-columns for `INSERT` queries.
|
||||
|
||||
This setting is required mainly for third-party clients which don’t support `LowCardinality` data type.
|
||||
This setting is required mainly for third-party clients which do not support `LowCardinality` data type.
|
||||
|
||||
Possible values:
|
||||
|
||||
@ -2662,7 +2696,7 @@ Possible values:
|
||||
|
||||
- `'DISTINCT'` — ClickHouse outputs rows as a result of combining queries removing duplicate rows.
|
||||
- `'ALL'` — ClickHouse outputs all rows as a result of combining queries including duplicate rows.
|
||||
- `''` — Clickhouse generates an exception when used with `UNION`.
|
||||
- `''` — ClickHouse generates an exception when used with `UNION`.
|
||||
|
||||
Default value: `''`.
|
||||
|
||||
@ -2989,7 +3023,6 @@ SET limit = 5;
|
||||
SET offset = 7;
|
||||
SELECT * FROM test LIMIT 10 OFFSET 100;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
@ -3000,4 +3033,36 @@ Result:
|
||||
└─────┘
|
||||
```
|
||||
|
||||
## optimize_fuse_sum_count_avg {#optimize_fuse_sum_count_avg}
|
||||
|
||||
Enables to fuse aggregate functions with identical argument. It rewrites query contains at least two aggregate functions from [sum](../../sql-reference/aggregate-functions/reference/sum.md#agg_function-sum), [count](../../sql-reference/aggregate-functions/reference/count.md#agg_function-count) or [avg](../../sql-reference/aggregate-functions/reference/avg.md#agg_function-avg) with identical argument to [sumCount](../../sql-reference/aggregate-functions/reference/sumcount.md#agg_function-sumCount).
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — Functions with identical argument are not fused.
|
||||
- 1 — Functions with identical argument are fused.
|
||||
|
||||
Default value: `0`.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE fuse_tbl(a Int8, b Int8) Engine = Log;
|
||||
SET optimize_fuse_sum_count_avg = 1;
|
||||
EXPLAIN SYNTAX SELECT sum(a), sum(b), count(b), avg(b) from fuse_tbl FORMAT TSV;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
SELECT
|
||||
sum(a),
|
||||
sumCount(b).1,
|
||||
sumCount(b).2,
|
||||
(sumCount(b).1) / (sumCount(b).2)
|
||||
FROM fuse_tbl
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/settings/settings/) <!-- hide -->
|
||||
|
@ -59,7 +59,7 @@ For collecting system metrics ClickHouse server uses:
|
||||
|
||||
**procfs**
|
||||
|
||||
If ClickHouse server doesn’t have `CAP_NET_ADMIN` capability, it tries to fall back to `ProcfsMetricsProvider`. `ProcfsMetricsProvider` allows collecting per-query system metrics (for CPU and I/O).
|
||||
If ClickHouse server does not have `CAP_NET_ADMIN` capability, it tries to fall back to `ProcfsMetricsProvider`. `ProcfsMetricsProvider` allows collecting per-query system metrics (for CPU and I/O).
|
||||
|
||||
If procfs is supported and enabled on the system, ClickHouse server collects these metrics:
|
||||
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
This table contains a single row with a single `dummy` UInt8 column containing the value 0.
|
||||
|
||||
This table is used if a `SELECT` query doesn’t specify the `FROM` clause.
|
||||
This table is used if a `SELECT` query does not specify the `FROM` clause.
|
||||
|
||||
This is similar to the `DUAL` table found in other DBMSs.
|
||||
|
||||
|
@ -26,7 +26,7 @@ Columns:
|
||||
|
||||
- `active` ([UInt8](../../sql-reference/data-types/int-uint.md)) – Flag that indicates whether the data part is active. If a data part is active, it’s used in a table. Otherwise, it’s deleted. Inactive data parts remain after merging.
|
||||
|
||||
- `marks` ([UInt64](../../sql-reference/data-types/int-uint.md)) – The number of marks. To get the approximate number of rows in a data part, multiply `marks` by the index granularity (usually 8192) (this hint doesn’t work for adaptive granularity).
|
||||
- `marks` ([UInt64](../../sql-reference/data-types/int-uint.md)) – The number of marks. To get the approximate number of rows in a data part, multiply `marks` by the index granularity (usually 8192) (this hint does not work for adaptive granularity).
|
||||
|
||||
- `rows` ([UInt64](../../sql-reference/data-types/int-uint.md)) – The number of rows.
|
||||
|
||||
@ -66,7 +66,7 @@ Columns:
|
||||
|
||||
- `primary_key_bytes_in_memory_allocated` ([UInt64](../../sql-reference/data-types/int-uint.md)) – The amount of memory (in bytes) reserved for primary key values.
|
||||
|
||||
- `is_frozen` ([UInt8](../../sql-reference/data-types/int-uint.md)) – Flag that shows that a partition data backup exists. 1, the backup exists. 0, the backup doesn’t exist. For more details, see [FREEZE PARTITION](../../sql-reference/statements/alter/partition.md#alter_freeze-partition)
|
||||
- `is_frozen` ([UInt8](../../sql-reference/data-types/int-uint.md)) – Flag that shows that a partition data backup exists. 1, the backup exists. 0, the backup does not exist. For more details, see [FREEZE PARTITION](../../sql-reference/statements/alter/partition.md#alter_freeze-partition)
|
||||
|
||||
- `database` ([String](../../sql-reference/data-types/string.md)) – Name of the database.
|
||||
|
||||
|
@ -26,7 +26,7 @@ Columns:
|
||||
|
||||
- `active` ([UInt8](../../sql-reference/data-types/int-uint.md)) — Flag that indicates whether the data part is active. If a data part is active, it’s used in a table. Otherwise, it’s deleted. Inactive data parts remain after merging.
|
||||
|
||||
- `marks` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The number of marks. To get the approximate number of rows in a data part, multiply `marks` by the index granularity (usually 8192) (this hint doesn’t work for adaptive granularity).
|
||||
- `marks` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The number of marks. To get the approximate number of rows in a data part, multiply `marks` by the index granularity (usually 8192) (this hint does not work for adaptive granularity).
|
||||
|
||||
- `rows` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The number of rows.
|
||||
|
||||
|
@ -11,7 +11,7 @@ Columns:
|
||||
- `bytes_read` (UInt64) – The number of uncompressed bytes read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
|
||||
- `total_rows_approx` (UInt64) – The approximation of the total number of rows that should be read. For distributed processing, on the requestor server, this is the total for all remote servers. It can be updated during request processing, when new sources to process become known.
|
||||
- `memory_usage` (UInt64) – Amount of RAM the request uses. It might not include some types of dedicated memory. See the [max_memory_usage](../../operations/settings/query-complexity.md#settings_max_memory_usage) setting.
|
||||
- `query` (String) – The query text. For `INSERT`, it doesn’t include the data to insert.
|
||||
- `query` (String) – The query text. For `INSERT`, it does not include the data to insert.
|
||||
- `query_id` (String) – Query ID, if defined.
|
||||
|
||||
|
||||
|
@ -3,15 +3,15 @@
|
||||
Contains information about executed queries, for example, start time, duration of processing, error messages.
|
||||
|
||||
!!! note "Note"
|
||||
This table doesn’t contain the ingested data for `INSERT` queries.
|
||||
This table does not contain the ingested data for `INSERT` queries.
|
||||
|
||||
You can change settings of queries logging in the [query_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query-log) section of the server configuration.
|
||||
|
||||
You can disable queries logging by setting [log_queries = 0](../../operations/settings/settings.md#settings-log-queries). We don’t recommend to turn off logging because information in this table is important for solving issues.
|
||||
You can disable queries logging by setting [log_queries = 0](../../operations/settings/settings.md#settings-log-queries). We do not recommend to turn off logging because information in this table is important for solving issues.
|
||||
|
||||
The flushing period of data is set in `flush_interval_milliseconds` parameter of the [query_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query-log) server settings section. To force flushing, use the [SYSTEM FLUSH LOGS](../../sql-reference/statements/system.md#query_language-system-flush_logs) query.
|
||||
|
||||
ClickHouse doesn’t delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details.
|
||||
ClickHouse does not delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details.
|
||||
|
||||
The `system.query_log` table registers two kinds of queries:
|
||||
|
||||
@ -37,8 +37,8 @@ Columns:
|
||||
- `query_start_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Start time of query execution.
|
||||
- `query_start_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — Start time of query execution with microsecond precision.
|
||||
- `query_duration_ms` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Duration of query execution in milliseconds.
|
||||
- `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number of rows read from all tables and table functions participated in query. It includes usual subqueries, subqueries for `IN` and `JOIN`. For distributed queries `read_rows` includes the total number of rows read at all replicas. Each replica sends it’s `read_rows` value, and the server-initiator of the query summarizes all received and local values. The cache volumes don’t affect this value.
|
||||
- `read_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number of bytes read from all tables and table functions participated in query. It includes usual subqueries, subqueries for `IN` and `JOIN`. For distributed queries `read_bytes` includes the total number of rows read at all replicas. Each replica sends it’s `read_bytes` value, and the server-initiator of the query summarizes all received and local values. The cache volumes don’t affect this value.
|
||||
- `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number of rows read from all tables and table functions participated in query. It includes usual subqueries, subqueries for `IN` and `JOIN`. For distributed queries `read_rows` includes the total number of rows read at all replicas. Each replica sends it’s `read_rows` value, and the server-initiator of the query summarizes all received and local values. The cache volumes do not affect this value.
|
||||
- `read_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number of bytes read from all tables and table functions participated in query. It includes usual subqueries, subqueries for `IN` and `JOIN`. For distributed queries `read_bytes` includes the total number of rows read at all replicas. Each replica sends it’s `read_bytes` value, and the server-initiator of the query summarizes all received and local values. The cache volumes do not affect this value.
|
||||
- `written_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — For `INSERT` queries, the number of written rows. For other queries, the column value is 0.
|
||||
- `written_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — For `INSERT` queries, the number of written bytes. For other queries, the column value is 0.
|
||||
- `result_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of rows in a result of the `SELECT` query, or a number of rows in the `INSERT` query.
|
||||
|
@ -9,7 +9,7 @@ To start logging:
|
||||
|
||||
The flushing period of data is set in `flush_interval_milliseconds` parameter of the [query_thread_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_thread_log) server settings section. To force flushing, use the [SYSTEM FLUSH LOGS](../../sql-reference/statements/system.md#query_language-system-flush_logs) query.
|
||||
|
||||
ClickHouse doesn’t delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details.
|
||||
ClickHouse does not delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details.
|
||||
|
||||
Columns:
|
||||
|
||||
|
@ -57,7 +57,7 @@ Columns:
|
||||
Note that writes can be performed to any replica that is available and has a session in ZK, regardless of whether it is a leader.
|
||||
- `can_become_leader` (`UInt8`) - Whether the replica can be a leader.
|
||||
- `is_readonly` (`UInt8`) - Whether the replica is in read-only mode.
|
||||
This mode is turned on if the config doesn’t have sections with ZooKeeper, if an unknown error occurred when reinitializing sessions in ZooKeeper, and during session reinitialization in ZooKeeper.
|
||||
This mode is turned on if the config does not have sections with ZooKeeper, if an unknown error occurred when reinitializing sessions in ZooKeeper, and during session reinitialization in ZooKeeper.
|
||||
- `is_session_expired` (`UInt8`) - the session with ZooKeeper has expired. Basically the same as `is_readonly`.
|
||||
- `future_parts` (`UInt32`) - The number of data parts that will appear as the result of INSERTs or merges that haven’t been done yet.
|
||||
- `parts_to_check` (`UInt32`) - The number of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged.
|
||||
@ -84,7 +84,7 @@ The next 4 columns have a non-zero value only where there is an active session w
|
||||
- `active_replicas` (`UInt8`) - The number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas).
|
||||
|
||||
If you request all the columns, the table may work a bit slowly, since several reads from ZooKeeper are made for each row.
|
||||
If you don’t request the last 4 columns (log_max_index, log_pointer, total_replicas, active_replicas), the table works quickly.
|
||||
If you do not request the last 4 columns (log_max_index, log_pointer, total_replicas, active_replicas), the table works quickly.
|
||||
|
||||
For example, you can check that everything is working correctly like this:
|
||||
|
||||
@ -118,7 +118,7 @@ WHERE
|
||||
OR active_replicas < total_replicas
|
||||
```
|
||||
|
||||
If this query doesn’t return anything, it means that everything is fine.
|
||||
If this query does not return anything, it means that everything is fine.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/system_tables/replicas) <!--hide-->
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user