mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-10 09:32:06 +00:00
Merge branch 'master' into feature/create-simple-lambda-function
This commit is contained in:
commit
63dfa8559f
4
.github/ISSUE_TEMPLATE/10_question.md
vendored
4
.github/ISSUE_TEMPLATE/10_question.md
vendored
@ -7,6 +7,6 @@ assignees: ''
|
||||
|
||||
---
|
||||
|
||||
Make sure to check documentation https://clickhouse.yandex/docs/en/ first. If the question is concise and probably has a short answer, asking it in Telegram chat https://telegram.me/clickhouse_en is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse
|
||||
> Make sure to check documentation https://clickhouse.yandex/docs/en/ first. If the question is concise and probably has a short answer, asking it in Telegram chat https://telegram.me/clickhouse_en is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse
|
||||
|
||||
If you still prefer GitHub issues, remove all this text and ask your question here.
|
||||
> If you still prefer GitHub issues, remove all this text and ask your question here.
|
||||
|
14
.github/ISSUE_TEMPLATE/20_feature-request.md
vendored
14
.github/ISSUE_TEMPLATE/20_feature-request.md
vendored
@ -7,16 +7,20 @@ assignees: ''
|
||||
|
||||
---
|
||||
|
||||
(you don't have to strictly follow this form)
|
||||
> (you don't have to strictly follow this form)
|
||||
|
||||
**Use case**
|
||||
A clear and concise description of what is the intended usage scenario is.
|
||||
|
||||
> A clear and concise description of what is the intended usage scenario is.
|
||||
|
||||
**Describe the solution you'd like**
|
||||
A clear and concise description of what you want to happen.
|
||||
|
||||
> A clear and concise description of what you want to happen.
|
||||
|
||||
**Describe alternatives you've considered**
|
||||
A clear and concise description of any alternative solutions or features you've considered.
|
||||
|
||||
> A clear and concise description of any alternative solutions or features you've considered.
|
||||
|
||||
**Additional context**
|
||||
Add any other context or screenshots about the feature request here.
|
||||
|
||||
> Add any other context or screenshots about the feature request here.
|
||||
|
5
.github/ISSUE_TEMPLATE/50_build-issue.md
vendored
5
.github/ISSUE_TEMPLATE/50_build-issue.md
vendored
@ -7,10 +7,11 @@ assignees: ''
|
||||
|
||||
---
|
||||
|
||||
Make sure that `git diff` result is empty and you've just pulled fresh master. Try cleaning up cmake cache. Just in case, official build instructions are published here: https://clickhouse.yandex/docs/en/development/build/
|
||||
> Make sure that `git diff` result is empty and you've just pulled fresh master. Try cleaning up cmake cache. Just in case, official build instructions are published here: https://clickhouse.yandex/docs/en/development/build/
|
||||
|
||||
**Operating system**
|
||||
OS kind or distribution, specific version/release, non-standard kernel if any. If you are trying to build inside virtual machine, please mention it too.
|
||||
|
||||
> OS kind or distribution, specific version/release, non-standard kernel if any. If you are trying to build inside virtual machine, please mention it too.
|
||||
|
||||
**Cmake version**
|
||||
|
||||
|
@ -1,17 +1,17 @@
|
||||
---
|
||||
name: Bug report
|
||||
about: Create a report to help us improve ClickHouse
|
||||
about: Wrong behaviour (visible to users) in official ClickHouse release.
|
||||
title: ''
|
||||
labels: bug
|
||||
labels: 'potential bug'
|
||||
assignees: ''
|
||||
|
||||
---
|
||||
|
||||
You have to provide the following information whenever possible.
|
||||
> You have to provide the following information whenever possible.
|
||||
|
||||
**Describe the bug**
|
||||
|
||||
A clear and concise description of what works not as it is supposed to.
|
||||
> A clear and concise description of what works not as it is supposed to.
|
||||
|
||||
**Does it reproduce on recent release?**
|
||||
|
||||
@ -19,7 +19,7 @@ A clear and concise description of what works not as it is supposed to.
|
||||
|
||||
**Enable crash reporting**
|
||||
|
||||
If possible, change "enabled" to true in "send_crash_reports" section in `config.xml`:
|
||||
> If possible, change "enabled" to true in "send_crash_reports" section in `config.xml`:
|
||||
|
||||
```
|
||||
<send_crash_reports>
|
||||
@ -39,12 +39,12 @@ If possible, change "enabled" to true in "send_crash_reports" section in `config
|
||||
|
||||
**Expected behavior**
|
||||
|
||||
A clear and concise description of what you expected to happen.
|
||||
> A clear and concise description of what you expected to happen.
|
||||
|
||||
**Error message and/or stacktrace**
|
||||
|
||||
If applicable, add screenshots to help explain your problem.
|
||||
> If applicable, add screenshots to help explain your problem.
|
||||
|
||||
**Additional context**
|
||||
|
||||
Add any other context about the problem here.
|
||||
> Add any other context about the problem here.
|
6
.github/PULL_REQUEST_TEMPLATE.md
vendored
6
.github/PULL_REQUEST_TEMPLATE.md
vendored
@ -19,9 +19,9 @@ Detailed description / Documentation draft:
|
||||
...
|
||||
|
||||
|
||||
By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in [docs](https://github.com/ClickHouse/ClickHouse/tree/master/docs) folder.
|
||||
> By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in [docs](https://github.com/ClickHouse/ClickHouse/tree/master/docs) folder.
|
||||
|
||||
If you are doing this for the first time, it's recommended to read the lightweight [Contributing to ClickHouse Documentation](https://github.com/ClickHouse/ClickHouse/tree/master/docs/README.md) guide first.
|
||||
> If you are doing this for the first time, it's recommended to read the lightweight [Contributing to ClickHouse Documentation](https://github.com/ClickHouse/ClickHouse/tree/master/docs/README.md) guide first.
|
||||
|
||||
|
||||
Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/
|
||||
> Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/
|
||||
|
12
.gitmodules
vendored
12
.gitmodules
vendored
@ -225,6 +225,15 @@
|
||||
[submodule "contrib/yaml-cpp"]
|
||||
path = contrib/yaml-cpp
|
||||
url = https://github.com/ClickHouse-Extras/yaml-cpp.git
|
||||
[submodule "contrib/libstemmer_c"]
|
||||
path = contrib/libstemmer_c
|
||||
url = https://github.com/ClickHouse-Extras/libstemmer_c.git
|
||||
[submodule "contrib/wordnet-blast"]
|
||||
path = contrib/wordnet-blast
|
||||
url = https://github.com/ClickHouse-Extras/wordnet-blast.git
|
||||
[submodule "contrib/lemmagen-c"]
|
||||
path = contrib/lemmagen-c
|
||||
url = https://github.com/ClickHouse-Extras/lemmagen-c.git
|
||||
[submodule "contrib/libpqxx"]
|
||||
path = contrib/libpqxx
|
||||
url = https://github.com/ClickHouse-Extras/libpqxx.git
|
||||
@ -234,3 +243,6 @@
|
||||
[submodule "contrib/s2geometry"]
|
||||
path = contrib/s2geometry
|
||||
url = https://github.com/ClickHouse-Extras/s2geometry.git
|
||||
[submodule "contrib/bzip2"]
|
||||
path = contrib/bzip2
|
||||
url = https://github.com/ClickHouse-Extras/bzip2.git
|
||||
|
106
CHANGELOG.md
106
CHANGELOG.md
@ -1,3 +1,102 @@
|
||||
### ClickHouse release v21.8, 2021-08-12
|
||||
|
||||
#### New Features
|
||||
|
||||
* Add support for a part of SQL/JSON standard. [#24148](https://github.com/ClickHouse/ClickHouse/pull/24148) ([l1tsolaiki](https://github.com/l1tsolaiki), [Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Collect common system metrics (in `system.asynchronous_metrics` and `system.asynchronous_metric_log`) on CPU usage, disk usage, memory usage, IO, network, files, load average, CPU frequencies, thermal sensors, EDAC counters, system uptime; also added metrics about the scheduling jitter and the time spent collecting the metrics. It works similar to `atop` in ClickHouse and allows access to monitoring data even if you have no additional tools installed. Close [#9430](https://github.com/ClickHouse/ClickHouse/issues/9430). [#24416](https://github.com/ClickHouse/ClickHouse/pull/24416) ([alexey-milovidov](https://github.com/alexey-milovidov), [Yegor Levankov](https://github.com/elevankoff)).
|
||||
* Add MaterializedPostgreSQL table engine and database engine. This database engine allows replicating a whole database or any subset of database tables. [#20470](https://github.com/ClickHouse/ClickHouse/pull/20470) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Add new functions `leftPad()`, `rightPad()`, `leftPadUTF8()`, `rightPadUTF8()`. [#26075](https://github.com/ClickHouse/ClickHouse/pull/26075) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Add the `FIRST` keyword to the `ADD INDEX` command to be able to add the index at the beginning of the indices list. [#25904](https://github.com/ClickHouse/ClickHouse/pull/25904) ([xjewer](https://github.com/xjewer)).
|
||||
* Introduce `system.data_skipping_indices` table containing information about existing data skipping indices. Close [#7659](https://github.com/ClickHouse/ClickHouse/issues/7659). [#25693](https://github.com/ClickHouse/ClickHouse/pull/25693) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* Add `bin`/`unbin` functions. [#25609](https://github.com/ClickHouse/ClickHouse/pull/25609) ([zhaoyu](https://github.com/zxc111)).
|
||||
* Support `Map` and `UInt128`, `Int128`, `UInt256`, `Int256` types in `mapAdd` and `mapSubtract` functions. [#25596](https://github.com/ClickHouse/ClickHouse/pull/25596) ([Ildus Kurbangaliev](https://github.com/ildus)).
|
||||
* Support `DISTINCT ON (columns)` expression, close [#25404](https://github.com/ClickHouse/ClickHouse/issues/25404). [#25589](https://github.com/ClickHouse/ClickHouse/pull/25589) ([Zijie Lu](https://github.com/TszKitLo40)).
|
||||
* Add an ability to reset a custom setting to default and remove it from the table's metadata. It allows rolling back the change without knowing the system/config's default. Closes [#14449](https://github.com/ClickHouse/ClickHouse/issues/14449). [#17769](https://github.com/ClickHouse/ClickHouse/pull/17769) ([xjewer](https://github.com/xjewer)).
|
||||
* Render pipelines as graphs in Web UI if `EXPLAIN PIPELINE graph = 1` query is submitted. [#26067](https://github.com/ClickHouse/ClickHouse/pull/26067) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
#### Performance Improvements
|
||||
|
||||
* Compile aggregate functions. Use option `compile_aggregate_expressions` to enable it. [#24789](https://github.com/ClickHouse/ClickHouse/pull/24789) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve latency of short queries that require reading from tables with many columns. [#26371](https://github.com/ClickHouse/ClickHouse/pull/26371) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
|
||||
#### Improvements
|
||||
|
||||
* Use `Map` data type for system logs tables (`system.query_log`, `system.query_thread_log`, `system.processes`, `system.opentelemetry_span_log`). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes [#18698](https://github.com/ClickHouse/ClickHouse/issues/18698). [#23934](https://github.com/ClickHouse/ClickHouse/pull/23934), [#25773](https://github.com/ClickHouse/ClickHouse/pull/25773) ([hexiaoting](https://github.com/hexiaoting), [sundy-li](https://github.com/sundy-li), [Maksim Kita](https://github.com/kitaisreal)).
|
||||
* For a dictionary with a complex key containing only one attribute, allow not wrapping the key expression in tuple for functions `dictGet`, `dictHas`. [#26130](https://github.com/ClickHouse/ClickHouse/pull/26130) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Implement function `bin`/`hex` from `AggregateFunction` states. [#26094](https://github.com/ClickHouse/ClickHouse/pull/26094) ([zhaoyu](https://github.com/zxc111)).
|
||||
* Support arguments of `UUID` type for `empty` and `notEmpty` functions. `UUID` is empty if it is all zeros (nil UUID). Closes [#3446](https://github.com/ClickHouse/ClickHouse/issues/3446). [#25974](https://github.com/ClickHouse/ClickHouse/pull/25974) ([zhaoyu](https://github.com/zxc111)).
|
||||
* Add support for `SET SQL_SELECT_LIMIT` in MySQL protocol. Closes [#17115](https://github.com/ClickHouse/ClickHouse/issues/17115). [#25972](https://github.com/ClickHouse/ClickHouse/pull/25972) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* More instrumentation for network interaction: add counters for recv/send bytes; add gauges for recvs/sends. Added missing documentation. Close [#5897](https://github.com/ClickHouse/ClickHouse/issues/5897). [#25962](https://github.com/ClickHouse/ClickHouse/pull/25962) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add setting `optimize_move_to_prewhere_if_final`. If query has `FINAL`, the optimization `move_to_prewhere` will be enabled only if both `optimize_move_to_prewhere` and `optimize_move_to_prewhere_if_final` are enabled. Closes [#8684](https://github.com/ClickHouse/ClickHouse/issues/8684). [#25940](https://github.com/ClickHouse/ClickHouse/pull/25940) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Allow complex quoted identifiers of JOINed tables. Close [#17861](https://github.com/ClickHouse/ClickHouse/issues/17861). [#25924](https://github.com/ClickHouse/ClickHouse/pull/25924) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add support for Unicode (e.g. Chinese, Cyrillic) components in `Nested` data types. Close [#25594](https://github.com/ClickHouse/ClickHouse/issues/25594). [#25923](https://github.com/ClickHouse/ClickHouse/pull/25923) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow `quantiles*` functions to work with `aggregate_functions_null_for_empty`. Close [#25892](https://github.com/ClickHouse/ClickHouse/issues/25892). [#25919](https://github.com/ClickHouse/ClickHouse/pull/25919) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow parameters for parametric aggregate functions to be arbitrary constant expressions (e.g., `1 + 2`), not just literals. It also allows using the query parameters (in parameterized queries like `{param:UInt8}`) inside parametric aggregate functions. Closes [#11607](https://github.com/ClickHouse/ClickHouse/issues/11607). [#25910](https://github.com/ClickHouse/ClickHouse/pull/25910) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Correctly throw the exception on the attempt to parse an invalid `Date`. Closes [#6481](https://github.com/ClickHouse/ClickHouse/issues/6481). [#25909](https://github.com/ClickHouse/ClickHouse/pull/25909) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Support for multiple includes in configuration. It is possible to include users configuration, remote server configuration from multiple sources. Simply place `<include />` element with `from_zk`, `from_env` or `incl` attribute, and it will be replaced with the substitution. [#24404](https://github.com/ClickHouse/ClickHouse/pull/24404) ([nvartolomei](https://github.com/nvartolomei)).
|
||||
* Support for queries with a column named `"null"` (it must be specified in back-ticks or double quotes) and `ON CLUSTER`. Closes [#24035](https://github.com/ClickHouse/ClickHouse/issues/24035). [#25907](https://github.com/ClickHouse/ClickHouse/pull/25907) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Support `LowCardinality`, `Decimal`, and `UUID` for `JSONExtract`. Closes [#24606](https://github.com/ClickHouse/ClickHouse/issues/24606). [#25900](https://github.com/ClickHouse/ClickHouse/pull/25900) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Convert history file from `readline` format to `replxx` format. [#25888](https://github.com/ClickHouse/ClickHouse/pull/25888) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix an issue which can lead to intersecting parts after `DROP PART` or background deletion of an empty part. [#25884](https://github.com/ClickHouse/ClickHouse/pull/25884) ([alesapin](https://github.com/alesapin)).
|
||||
* Better handling of lost parts for `ReplicatedMergeTree` tables. Fixes rare inconsistencies in `ReplicationQueue`. Fixes [#10368](https://github.com/ClickHouse/ClickHouse/issues/10368). [#25820](https://github.com/ClickHouse/ClickHouse/pull/25820) ([alesapin](https://github.com/alesapin)).
|
||||
* Allow starting clickhouse-client with unreadable working directory. [#25817](https://github.com/ClickHouse/ClickHouse/pull/25817) ([ianton-ru](https://github.com/ianton-ru)).
|
||||
* Fix "No available columns" error for `Merge` storage. [#25801](https://github.com/ClickHouse/ClickHouse/pull/25801) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* MySQL Engine now supports the exchange of column comments between MySQL and ClickHouse. [#25795](https://github.com/ClickHouse/ClickHouse/pull/25795) ([Storozhuk Kostiantyn](https://github.com/sand6255)).
|
||||
* Fix inconsistent behaviour of `GROUP BY` constant on empty set. Closes [#6842](https://github.com/ClickHouse/ClickHouse/issues/6842). [#25786](https://github.com/ClickHouse/ClickHouse/pull/25786) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Cancel already running merges in partition on `DROP PARTITION` and `TRUNCATE` for `ReplicatedMergeTree`. Resolves [#17151](https://github.com/ClickHouse/ClickHouse/issues/17151). [#25684](https://github.com/ClickHouse/ClickHouse/pull/25684) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Support ENUM` data type for MaterializeMySQL. [#25676](https://github.com/ClickHouse/ClickHouse/pull/25676) ([Storozhuk Kostiantyn](https://github.com/sand6255)).
|
||||
* Support materialized and aliased columns in JOIN, close [#13274](https://github.com/ClickHouse/ClickHouse/issues/13274). [#25634](https://github.com/ClickHouse/ClickHouse/pull/25634) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix possible logical race condition between `ALTER TABLE ... DETACH` and background merges. [#25605](https://github.com/ClickHouse/ClickHouse/pull/25605) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Make `NetworkReceiveElapsedMicroseconds` metric to correctly include the time spent waiting for data from the client to `INSERT`. Close [#9958](https://github.com/ClickHouse/ClickHouse/issues/9958). [#25602](https://github.com/ClickHouse/ClickHouse/pull/25602) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Support `TRUNCATE TABLE` for S3 and HDFS. Close [#25530](https://github.com/ClickHouse/ClickHouse/issues/25530). [#25550](https://github.com/ClickHouse/ClickHouse/pull/25550) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Support for dynamic reloading of config to change number of threads in pool for background jobs execution (merges, mutations, fetches). [#25548](https://github.com/ClickHouse/ClickHouse/pull/25548) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Allow extracting of non-string element as string using `JSONExtract`. This is for [#25414](https://github.com/ClickHouse/ClickHouse/issues/25414). [#25452](https://github.com/ClickHouse/ClickHouse/pull/25452) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Support regular expression in `Database` argument for `StorageMerge`. Close [#776](https://github.com/ClickHouse/ClickHouse/issues/776). [#25064](https://github.com/ClickHouse/ClickHouse/pull/25064) ([flynn](https://github.com/ucasfl)).
|
||||
* Web UI: if the value looks like a URL, automatically generate a link. [#25965](https://github.com/ClickHouse/ClickHouse/pull/25965) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Make `sudo service clickhouse-server start` to work on systems with `systemd` like Centos 8. Close [#14298](https://github.com/ClickHouse/ClickHouse/issues/14298). Close [#17799](https://github.com/ClickHouse/ClickHouse/issues/17799). [#25921](https://github.com/ClickHouse/ClickHouse/pull/25921) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
#### Bug Fixes
|
||||
|
||||
* Fix incorrect `SET ROLE` in some cases. [#26707](https://github.com/ClickHouse/ClickHouse/pull/26707) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix potential `nullptr` dereference in window functions. Fix [#25276](https://github.com/ClickHouse/ClickHouse/issues/25276). [#26668](https://github.com/ClickHouse/ClickHouse/pull/26668) ([Alexander Kuzmenkov](https://github.com/akuzm)).
|
||||
* Fix incorrect function names of `groupBitmapAnd/Or/Xor`. Fix [#26557](https://github.com/ClickHouse/ClickHouse/pull/26557) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix crash in RabbitMQ shutdown in case RabbitMQ setup was not started. Closes [#26504](https://github.com/ClickHouse/ClickHouse/issues/26504). [#26529](https://github.com/ClickHouse/ClickHouse/pull/26529) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix issues with `CREATE DICTIONARY` query if dictionary name or database name was quoted. Closes [#26491](https://github.com/ClickHouse/ClickHouse/issues/26491). [#26508](https://github.com/ClickHouse/ClickHouse/pull/26508) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix broken name resolution after rewriting column aliases. Fix [#26432](https://github.com/ClickHouse/ClickHouse/issues/26432). [#26475](https://github.com/ClickHouse/ClickHouse/pull/26475) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix infinite non-joined block stream in `partial_merge_join` close [#26325](https://github.com/ClickHouse/ClickHouse/issues/26325). [#26374](https://github.com/ClickHouse/ClickHouse/pull/26374) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix possible crash when login as dropped user. Fix [#26073](https://github.com/ClickHouse/ClickHouse/issues/26073). [#26363](https://github.com/ClickHouse/ClickHouse/pull/26363) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix `optimize_distributed_group_by_sharding_key` for multiple columns (leads to incorrect result w/ `optimize_skip_unused_shards=1`/`allow_nondeterministic_optimize_skip_unused_shards=1` and multiple columns in sharding key expression). [#26353](https://github.com/ClickHouse/ClickHouse/pull/26353) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* `CAST` from `Date` to `DateTime` (or `DateTime64`) was not using the timezone of the `DateTime` type. It can also affect the comparison between `Date` and `DateTime`. Inference of the common type for `Date` and `DateTime` also was not using the corresponding timezone. It affected the results of function `if` and array construction. Closes [#24128](https://github.com/ClickHouse/ClickHouse/issues/24128). [#24129](https://github.com/ClickHouse/ClickHouse/pull/24129) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fixed rare bug in lost replica recovery that may cause replicas to diverge. [#26321](https://github.com/ClickHouse/ClickHouse/pull/26321) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fix zstd decompression in case there are escape sequences at the end of internal buffer. Closes [#26013](https://github.com/ClickHouse/ClickHouse/issues/26013). [#26314](https://github.com/ClickHouse/ClickHouse/pull/26314) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix logical error on join with totals, close [#26017](https://github.com/ClickHouse/ClickHouse/issues/26017). [#26250](https://github.com/ClickHouse/ClickHouse/pull/26250) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Remove excessive newline in `thread_name` column in `system.stack_trace` table. Fix [#24124](https://github.com/ClickHouse/ClickHouse/issues/24124). [#26210](https://github.com/ClickHouse/ClickHouse/pull/26210) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix `joinGet` with `LowCarinality` columns, close [#25993](https://github.com/ClickHouse/ClickHouse/issues/25993). [#26118](https://github.com/ClickHouse/ClickHouse/pull/26118) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix possible crash in `pointInPolygon` if the setting `validate_polygons` is turned off. [#26113](https://github.com/ClickHouse/ClickHouse/pull/26113) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix throwing exception when iterate over non-existing remote directory. [#26087](https://github.com/ClickHouse/ClickHouse/pull/26087) ([ianton-ru](https://github.com/ianton-ru)).
|
||||
* Fix rare server crash because of `abort` in ZooKeeper client. Fixes [#25813](https://github.com/ClickHouse/ClickHouse/issues/25813). [#26079](https://github.com/ClickHouse/ClickHouse/pull/26079) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix wrong thread count estimation for right subquery join in some cases. Close [#24075](https://github.com/ClickHouse/ClickHouse/issues/24075). [#26052](https://github.com/ClickHouse/ClickHouse/pull/26052) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fixed incorrect `sequence_id` in MySQL protocol packets that ClickHouse sends on exception during query execution. It might cause MySQL client to reset connection to ClickHouse server. Fixes [#21184](https://github.com/ClickHouse/ClickHouse/issues/21184). [#26051](https://github.com/ClickHouse/ClickHouse/pull/26051) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fix possible mismatched header when using normal projection with `PREWHERE`. Fix [#26020](https://github.com/ClickHouse/ClickHouse/issues/26020). [#26038](https://github.com/ClickHouse/ClickHouse/pull/26038) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix formatting of type `Map` with integer keys to `JSON`. [#25982](https://github.com/ClickHouse/ClickHouse/pull/25982) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix possible deadlock during query profiler stack unwinding. Fix [#25968](https://github.com/ClickHouse/ClickHouse/issues/25968). [#25970](https://github.com/ClickHouse/ClickHouse/pull/25970) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix crash on call `dictGet()` with bad arguments. [#25913](https://github.com/ClickHouse/ClickHouse/pull/25913) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fixed `scram-sha-256` authentication for PostgreSQL engines. Closes [#24516](https://github.com/ClickHouse/ClickHouse/issues/24516). [#25906](https://github.com/ClickHouse/ClickHouse/pull/25906) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix extremely long backoff for background tasks when the background pool is full. Fixes [#25836](https://github.com/ClickHouse/ClickHouse/issues/25836). [#25893](https://github.com/ClickHouse/ClickHouse/pull/25893) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix ARM exception handling with non default page size. Fixes [#25512](https://github.com/ClickHouse/ClickHouse/issues/25512), [#25044](https://github.com/ClickHouse/ClickHouse/issues/25044), [#24901](https://github.com/ClickHouse/ClickHouse/issues/24901), [#23183](https://github.com/ClickHouse/ClickHouse/issues/23183), [#20221](https://github.com/ClickHouse/ClickHouse/issues/20221), [#19703](https://github.com/ClickHouse/ClickHouse/issues/19703), [#19028](https://github.com/ClickHouse/ClickHouse/issues/19028), [#18391](https://github.com/ClickHouse/ClickHouse/issues/18391), [#18121](https://github.com/ClickHouse/ClickHouse/issues/18121), [#17994](https://github.com/ClickHouse/ClickHouse/issues/17994), [#12483](https://github.com/ClickHouse/ClickHouse/issues/12483). [#25854](https://github.com/ClickHouse/ClickHouse/pull/25854) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix sharding_key from column w/o function for `remote()` (before `select * from remote('127.1', system.one, dummy)` leads to `Unknown column: dummy, there are only columns .` error). [#25824](https://github.com/ClickHouse/ClickHouse/pull/25824) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fixed `Not found column ...` and `Missing column ...` errors when selecting from `MaterializeMySQL`. Fixes [#23708](https://github.com/ClickHouse/ClickHouse/issues/23708), [#24830](https://github.com/ClickHouse/ClickHouse/issues/24830), [#25794](https://github.com/ClickHouse/ClickHouse/issues/25794). [#25822](https://github.com/ClickHouse/ClickHouse/pull/25822) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fix `optimize_skip_unused_shards_rewrite_in` for non-UInt64 types (may select incorrect shards eventually or throw `Cannot infer type of an empty tuple` or `Function tuple requires at least one argument`). [#25798](https://github.com/ClickHouse/ClickHouse/pull/25798) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix rare bug with `DROP PART` query for `ReplicatedMergeTree` tables which can lead to error message `Unexpected merged part intersecting drop range`. [#25783](https://github.com/ClickHouse/ClickHouse/pull/25783) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix bug in `TTL` with `GROUP BY` expression which refuses to execute `TTL` after first execution in part. [#25743](https://github.com/ClickHouse/ClickHouse/pull/25743) ([alesapin](https://github.com/alesapin)).
|
||||
* Allow StorageMerge to access tables with aliases. Closes [#6051](https://github.com/ClickHouse/ClickHouse/issues/6051). [#25694](https://github.com/ClickHouse/ClickHouse/pull/25694) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix slow dict join in some cases, close [#24209](https://github.com/ClickHouse/ClickHouse/issues/24209). [#25618](https://github.com/ClickHouse/ClickHouse/pull/25618) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix `ALTER MODIFY COLUMN` of columns, which participates in TTL expressions. [#25554](https://github.com/ClickHouse/ClickHouse/pull/25554) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix assertion in `PREWHERE` with non-UInt8 type, close [#19589](https://github.com/ClickHouse/ClickHouse/issues/19589). [#25484](https://github.com/ClickHouse/ClickHouse/pull/25484) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix some fuzzed msan crash. Fixes [#22517](https://github.com/ClickHouse/ClickHouse/issues/22517). [#26428](https://github.com/ClickHouse/ClickHouse/pull/26428) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Update `chown` cmd check in `clickhouse-server` docker entrypoint. It fixes error 'cluster pod restart failed (or timeout)' on kubernetes. [#26545](https://github.com/ClickHouse/ClickHouse/pull/26545) ([Ky Li](https://github.com/Kylinrix)).
|
||||
|
||||
|
||||
### ClickHouse release v21.7, 2021-07-09
|
||||
|
||||
#### Backward Incompatible Change
|
||||
@ -1183,13 +1282,6 @@
|
||||
* PODArray: Avoid call to memcpy with (nullptr, 0) arguments (Fix UBSan report). This fixes [#18525](https://github.com/ClickHouse/ClickHouse/issues/18525). [#18526](https://github.com/ClickHouse/ClickHouse/pull/18526) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Minor improvement for path concatenation of zookeeper paths inside DDLWorker. [#17767](https://github.com/ClickHouse/ClickHouse/pull/17767) ([Bharat Nallan](https://github.com/bharatnc)).
|
||||
* Allow to reload symbols from debug file. This PR also fixes a build-id issue. [#17637](https://github.com/ClickHouse/ClickHouse/pull/17637) ([Amos Bird](https://github.com/amosbird)).
|
||||
* TestFlows: fixes to LDAP tests that fail due to slow test execution. [#18790](https://github.com/ClickHouse/ClickHouse/pull/18790) ([vzakaznikov](https://github.com/vzakaznikov)).
|
||||
* TestFlows: Merging requirements for AES encryption functions. Updating aes_encryption tests to use new requirements. Updating TestFlows version to 1.6.72. [#18221](https://github.com/ClickHouse/ClickHouse/pull/18221) ([vzakaznikov](https://github.com/vzakaznikov)).
|
||||
* TestFlows: Updating TestFlows version to the latest 1.6.72. Re-generating requirements.py. [#18208](https://github.com/ClickHouse/ClickHouse/pull/18208) ([vzakaznikov](https://github.com/vzakaznikov)).
|
||||
* TestFlows: Updating TestFlows README.md to include "How To Debug Why Test Failed" section. [#17808](https://github.com/ClickHouse/ClickHouse/pull/17808) ([vzakaznikov](https://github.com/vzakaznikov)).
|
||||
* TestFlows: tests for RBAC [ACCESS MANAGEMENT](https://clickhouse.tech/docs/en/sql-reference/statements/grant/#grant-access-management) privileges. [#17804](https://github.com/ClickHouse/ClickHouse/pull/17804) ([MyroTk](https://github.com/MyroTk)).
|
||||
* TestFlows: RBAC tests for SHOW, TRUNCATE, KILL, and OPTIMIZE. - Updates to old tests. - Resolved comments from #https://github.com/ClickHouse/ClickHouse/pull/16977. [#17657](https://github.com/ClickHouse/ClickHouse/pull/17657) ([MyroTk](https://github.com/MyroTk)).
|
||||
* TestFlows: Added RBAC tests for `ATTACH`, `CREATE`, `DROP`, and `DETACH`. [#16977](https://github.com/ClickHouse/ClickHouse/pull/16977) ([MyroTk](https://github.com/MyroTk)).
|
||||
|
||||
|
||||
## [Changelog for 2020](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/whats-new/changelog/2020.md)
|
||||
|
@ -271,12 +271,6 @@ endif()
|
||||
|
||||
include(cmake/cpu_features.cmake)
|
||||
|
||||
option(ARCH_NATIVE "Add -march=native compiler flag. This makes your binaries non-portable but more performant code may be generated.")
|
||||
|
||||
if (ARCH_NATIVE)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native")
|
||||
endif ()
|
||||
|
||||
# Asynchronous unwind tables are needed for Query Profiler.
|
||||
# They are already by default on some platforms but possibly not on all platforms.
|
||||
# Enable it explicitly.
|
||||
@ -401,9 +395,10 @@ endif ()
|
||||
# Turns on all external libs like s3, kafka, ODBC, ...
|
||||
option(ENABLE_LIBRARIES "Enable all external libraries by default" ON)
|
||||
|
||||
# We recommend avoiding this mode for production builds because we can't guarantee all needed libraries exist in your
|
||||
# system.
|
||||
# We recommend avoiding this mode for production builds because we can't guarantee
|
||||
# all needed libraries exist in your system.
|
||||
# This mode exists for enthusiastic developers who are searching for trouble.
|
||||
# The whole idea of using unknown version of libraries from the OS distribution is deeply flawed.
|
||||
# Useful for maintainers of OS packages.
|
||||
option (UNBUNDLED "Use system libraries instead of ones in contrib/" OFF)
|
||||
|
||||
@ -542,6 +537,8 @@ include (cmake/find/libpqxx.cmake)
|
||||
include (cmake/find/nuraft.cmake)
|
||||
include (cmake/find/yaml-cpp.cmake)
|
||||
include (cmake/find/s2geometry.cmake)
|
||||
include (cmake/find/nlp.cmake)
|
||||
include (cmake/find/bzip2.cmake)
|
||||
|
||||
if(NOT USE_INTERNAL_PARQUET_LIBRARY)
|
||||
set (ENABLE_ORC OFF CACHE INTERNAL "")
|
||||
|
@ -13,3 +13,6 @@ ClickHouse® is an open-source column-oriented database management system that a
|
||||
* [Code Browser](https://clickhouse.tech/codebrowser/html_report/ClickHouse/index.html) with syntax highlight and navigation.
|
||||
* [Contacts](https://clickhouse.tech/#contacts) can help to get your questions answered if there are any.
|
||||
* You can also [fill this form](https://clickhouse.tech/#meet) to meet Yandex ClickHouse team in person.
|
||||
|
||||
## Upcoming Events
|
||||
* [SF Bay Area ClickHouse August Community Meetup (online)](https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/279109379/) on 25 August 2021.
|
||||
|
@ -60,6 +60,7 @@ DateLUTImpl::DateLUTImpl(const std::string & time_zone_)
|
||||
offset_at_start_of_epoch = cctz_time_zone.lookup(cctz_time_zone.lookup(epoch).pre).offset;
|
||||
offset_at_start_of_lut = cctz_time_zone.lookup(cctz_time_zone.lookup(lut_start).pre).offset;
|
||||
offset_is_whole_number_of_hours_during_epoch = true;
|
||||
offset_is_whole_number_of_minutes_during_epoch = true;
|
||||
|
||||
cctz::civil_day date = lut_start;
|
||||
|
||||
@ -108,6 +109,9 @@ DateLUTImpl::DateLUTImpl(const std::string & time_zone_)
|
||||
if (offset_is_whole_number_of_hours_during_epoch && start_of_day > 0 && start_of_day % 3600)
|
||||
offset_is_whole_number_of_hours_during_epoch = false;
|
||||
|
||||
if (offset_is_whole_number_of_minutes_during_epoch && start_of_day > 0 && start_of_day % 60)
|
||||
offset_is_whole_number_of_minutes_during_epoch = false;
|
||||
|
||||
/// If UTC offset was changed this day.
|
||||
/// Change in time zone without transition is possible, e.g. Moscow 1991 Sun, 31 Mar, 02:00 MSK to EEST
|
||||
cctz::time_zone::civil_transition transition{};
|
||||
|
@ -193,6 +193,7 @@ private:
|
||||
/// UTC offset at the beginning of the first supported year.
|
||||
Time offset_at_start_of_lut;
|
||||
bool offset_is_whole_number_of_hours_during_epoch;
|
||||
bool offset_is_whole_number_of_minutes_during_epoch;
|
||||
|
||||
/// Time zone name.
|
||||
std::string time_zone;
|
||||
@ -251,20 +252,25 @@ private:
|
||||
}
|
||||
|
||||
template <typename T, typename Divisor>
|
||||
static inline T roundDown(T x, Divisor divisor)
|
||||
inline T roundDown(T x, Divisor divisor) const
|
||||
{
|
||||
static_assert(std::is_integral_v<T> && std::is_integral_v<Divisor>);
|
||||
assert(divisor > 0);
|
||||
|
||||
if (likely(offset_is_whole_number_of_hours_during_epoch))
|
||||
{
|
||||
if (likely(x >= 0))
|
||||
return x / divisor * divisor;
|
||||
|
||||
/// Integer division for negative numbers rounds them towards zero (up).
|
||||
/// We will shift the number so it will be rounded towards -inf (down).
|
||||
|
||||
return (x + 1 - divisor) / divisor * divisor;
|
||||
}
|
||||
|
||||
Time date = find(x).date;
|
||||
return date + (x - date) / divisor * divisor;
|
||||
}
|
||||
|
||||
public:
|
||||
const std::string & getTimeZone() const { return time_zone; }
|
||||
|
||||
@ -459,12 +465,23 @@ public:
|
||||
|
||||
inline unsigned toSecond(Time t) const
|
||||
{
|
||||
auto res = t % 60;
|
||||
if (likely(offset_is_whole_number_of_minutes_during_epoch))
|
||||
{
|
||||
Time res = t % 60;
|
||||
if (likely(res >= 0))
|
||||
return res;
|
||||
return res + 60;
|
||||
}
|
||||
|
||||
LUTIndex index = findIndex(t);
|
||||
Time time = t - lut[index].date;
|
||||
|
||||
if (time >= lut[index].time_at_offset_change())
|
||||
time += lut[index].amount_of_offset_change();
|
||||
|
||||
return time % 60;
|
||||
}
|
||||
|
||||
inline unsigned toMinute(Time t) const
|
||||
{
|
||||
if (t >= 0 && offset_is_whole_number_of_hours_during_epoch)
|
||||
@ -483,29 +500,11 @@ public:
|
||||
}
|
||||
|
||||
/// NOTE: Assuming timezone offset is a multiple of 15 minutes.
|
||||
inline Time toStartOfMinute(Time t) const { return roundDown(t, 60); }
|
||||
inline Time toStartOfFiveMinute(Time t) const { return roundDown(t, 300); }
|
||||
inline Time toStartOfFifteenMinutes(Time t) const { return roundDown(t, 900); }
|
||||
|
||||
inline Time toStartOfTenMinutes(Time t) const
|
||||
{
|
||||
if (t >= 0 && offset_is_whole_number_of_hours_during_epoch)
|
||||
return t / 600 * 600;
|
||||
|
||||
/// More complex logic is for Nepal - it has offset 05:45. Australia/Eucla is also unfortunate.
|
||||
Time date = find(t).date;
|
||||
return date + (t - date) / 600 * 600;
|
||||
}
|
||||
|
||||
/// NOTE: Assuming timezone transitions are multiple of hours. Lord Howe Island in Australia is a notable exception.
|
||||
inline Time toStartOfHour(Time t) const
|
||||
{
|
||||
if (t >= 0 && offset_is_whole_number_of_hours_during_epoch)
|
||||
return t / 3600 * 3600;
|
||||
|
||||
Time date = find(t).date;
|
||||
return date + (t - date) / 3600 * 3600;
|
||||
}
|
||||
inline Time toStartOfMinute(Time t) const { return toStartOfMinuteInterval(t, 1); }
|
||||
inline Time toStartOfFiveMinute(Time t) const { return toStartOfMinuteInterval(t, 5); }
|
||||
inline Time toStartOfFifteenMinutes(Time t) const { return toStartOfMinuteInterval(t, 15); }
|
||||
inline Time toStartOfTenMinutes(Time t) const { return toStartOfMinuteInterval(t, 10); }
|
||||
inline Time toStartOfHour(Time t) const { return roundDown(t, 3600); }
|
||||
|
||||
/** Number of calendar day since the beginning of UNIX epoch (1970-01-01 is zero)
|
||||
* We use just two bytes for it. It covers the range up to 2105 and slightly more.
|
||||
@ -903,25 +902,24 @@ public:
|
||||
|
||||
inline Time toStartOfMinuteInterval(Time t, UInt64 minutes) const
|
||||
{
|
||||
if (minutes == 1)
|
||||
return toStartOfMinute(t);
|
||||
UInt64 divisor = 60 * minutes;
|
||||
if (likely(offset_is_whole_number_of_minutes_during_epoch))
|
||||
{
|
||||
if (likely(t >= 0))
|
||||
return t / divisor * divisor;
|
||||
return (t + 1 - divisor) / divisor * divisor;
|
||||
}
|
||||
|
||||
/** In contrast to "toStartOfHourInterval" function above,
|
||||
* the minute intervals are not aligned to the midnight.
|
||||
* You will get unexpected results if for example, you round down to 60 minute interval
|
||||
* and there was a time shift to 30 minutes.
|
||||
*
|
||||
* But this is not specified in docs and can be changed in future.
|
||||
*/
|
||||
|
||||
UInt64 seconds = 60 * minutes;
|
||||
return roundDown(t, seconds);
|
||||
Time date = find(t).date;
|
||||
return date + (t - date) / divisor * divisor;
|
||||
}
|
||||
|
||||
inline Time toStartOfSecondInterval(Time t, UInt64 seconds) const
|
||||
{
|
||||
if (seconds == 1)
|
||||
return t;
|
||||
if (seconds % 60 == 0)
|
||||
return toStartOfMinuteInterval(t, seconds / 60);
|
||||
|
||||
return roundDown(t, seconds);
|
||||
}
|
||||
@ -955,7 +953,7 @@ public:
|
||||
inline Time makeDateTime(Int16 year, UInt8 month, UInt8 day_of_month, UInt8 hour, UInt8 minute, UInt8 second) const
|
||||
{
|
||||
size_t index = makeLUTIndex(year, month, day_of_month);
|
||||
UInt32 time_offset = hour * 3600 + minute * 60 + second;
|
||||
Time time_offset = hour * 3600 + minute * 60 + second;
|
||||
|
||||
if (time_offset >= lut[index].time_at_offset_change())
|
||||
time_offset -= lut[index].amount_of_offset_change();
|
||||
|
@ -1,57 +0,0 @@
|
||||
#pragma once
|
||||
|
||||
#include <new>
|
||||
#include "defines.h"
|
||||
|
||||
#if USE_JEMALLOC
|
||||
# include <jemalloc/jemalloc.h>
|
||||
#endif
|
||||
|
||||
#if !USE_JEMALLOC || JEMALLOC_VERSION_MAJOR < 4
|
||||
# include <cstdlib>
|
||||
#endif
|
||||
|
||||
|
||||
namespace Memory
|
||||
{
|
||||
|
||||
inline ALWAYS_INLINE void * newImpl(std::size_t size)
|
||||
{
|
||||
auto * ptr = malloc(size);
|
||||
if (likely(ptr != nullptr))
|
||||
return ptr;
|
||||
|
||||
/// @note no std::get_new_handler logic implemented
|
||||
throw std::bad_alloc{};
|
||||
}
|
||||
|
||||
inline ALWAYS_INLINE void * newNoExept(std::size_t size) noexcept
|
||||
{
|
||||
return malloc(size);
|
||||
}
|
||||
|
||||
inline ALWAYS_INLINE void deleteImpl(void * ptr) noexcept
|
||||
{
|
||||
free(ptr);
|
||||
}
|
||||
|
||||
#if USE_JEMALLOC && JEMALLOC_VERSION_MAJOR >= 4
|
||||
|
||||
inline ALWAYS_INLINE void deleteSized(void * ptr, std::size_t size) noexcept
|
||||
{
|
||||
if (unlikely(ptr == nullptr))
|
||||
return;
|
||||
|
||||
sdallocx(ptr, size, 0);
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
inline ALWAYS_INLINE void deleteSized(void * ptr, std::size_t size [[maybe_unused]]) noexcept
|
||||
{
|
||||
free(ptr);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
}
|
24
base/common/removeDuplicates.h
Normal file
24
base/common/removeDuplicates.h
Normal file
@ -0,0 +1,24 @@
|
||||
#pragma once
|
||||
#include <vector>
|
||||
|
||||
/// Removes duplicates from a container without changing the order of its elements.
|
||||
/// Keeps the last occurrence of each element.
|
||||
/// Should NOT be used for containers with a lot of elements because it has O(N^2) complexity.
|
||||
template <typename T>
|
||||
void removeDuplicatesKeepLast(std::vector<T> & vec)
|
||||
{
|
||||
auto begin = vec.begin();
|
||||
auto end = vec.end();
|
||||
auto new_begin = end;
|
||||
for (auto current = end; current != begin;)
|
||||
{
|
||||
--current;
|
||||
if (std::find(new_begin, end, *current) == end)
|
||||
{
|
||||
--new_begin;
|
||||
if (new_begin != current)
|
||||
*new_begin = *current;
|
||||
}
|
||||
}
|
||||
vec.erase(begin, new_begin);
|
||||
}
|
@ -259,10 +259,25 @@ private:
|
||||
Poco::Logger * log;
|
||||
BaseDaemon & daemon;
|
||||
|
||||
void onTerminate(const std::string & message, UInt32 thread_num) const
|
||||
void onTerminate(std::string_view message, UInt32 thread_num) const
|
||||
{
|
||||
size_t pos = message.find('\n');
|
||||
|
||||
LOG_FATAL(log, "(version {}{}, {}) (from thread {}) {}",
|
||||
VERSION_STRING, VERSION_OFFICIAL, daemon.build_id_info, thread_num, message);
|
||||
VERSION_STRING, VERSION_OFFICIAL, daemon.build_id_info, thread_num, message.substr(0, pos));
|
||||
|
||||
/// Print trace from std::terminate exception line-by-line to make it easy for grep.
|
||||
while (pos != std::string_view::npos)
|
||||
{
|
||||
++pos;
|
||||
size_t next_pos = message.find('\n', pos);
|
||||
size_t size = next_pos;
|
||||
if (next_pos != std::string_view::npos)
|
||||
size = next_pos - pos;
|
||||
|
||||
LOG_FATAL(log, "{}", message.substr(pos, size));
|
||||
pos = next_pos;
|
||||
}
|
||||
}
|
||||
|
||||
void onFault(
|
||||
|
@ -9,10 +9,6 @@ if (GLIBC_COMPATIBILITY)
|
||||
|
||||
check_include_file("sys/random.h" HAVE_SYS_RANDOM_H)
|
||||
|
||||
if(COMPILER_CLANG)
|
||||
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-builtin-requires-header")
|
||||
endif()
|
||||
|
||||
add_headers_and_sources(glibc_compatibility .)
|
||||
add_headers_and_sources(glibc_compatibility musl)
|
||||
if (ARCH_AARCH64)
|
||||
@ -35,11 +31,9 @@ if (GLIBC_COMPATIBILITY)
|
||||
|
||||
add_library(glibc-compatibility STATIC ${glibc_compatibility_sources})
|
||||
|
||||
if (COMPILER_CLANG)
|
||||
target_compile_options(glibc-compatibility PRIVATE -Wno-unused-command-line-argument)
|
||||
elseif (COMPILER_GCC)
|
||||
target_compile_options(glibc-compatibility PRIVATE -Wno-unused-but-set-variable)
|
||||
endif ()
|
||||
target_no_warning(glibc-compatibility unused-command-line-argument)
|
||||
target_no_warning(glibc-compatibility unused-but-set-variable)
|
||||
target_no_warning(glibc-compatibility builtin-requires-header)
|
||||
|
||||
target_include_directories(glibc-compatibility PRIVATE libcxxabi ${musl_arch_include_dir})
|
||||
|
||||
|
@ -1,4 +1,5 @@
|
||||
#include <sys/auxv.h>
|
||||
#include "atomic.h"
|
||||
#include <unistd.h> // __environ
|
||||
#include <errno.h>
|
||||
|
||||
@ -17,18 +18,7 @@ static size_t __find_auxv(unsigned long type)
|
||||
return (size_t) -1;
|
||||
}
|
||||
|
||||
__attribute__((constructor)) static void __auxv_init()
|
||||
{
|
||||
size_t i;
|
||||
for (i = 0; __environ[i]; i++);
|
||||
__auxv = (unsigned long *) (__environ + i + 1);
|
||||
|
||||
size_t secure_idx = __find_auxv(AT_SECURE);
|
||||
if (secure_idx != ((size_t) -1))
|
||||
__auxv_secure = __auxv[secure_idx];
|
||||
}
|
||||
|
||||
unsigned long getauxval(unsigned long type)
|
||||
unsigned long __getauxval(unsigned long type)
|
||||
{
|
||||
if (type == AT_SECURE)
|
||||
return __auxv_secure;
|
||||
@ -43,3 +33,38 @@ unsigned long getauxval(unsigned long type)
|
||||
errno = ENOENT;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void * volatile getauxval_func;
|
||||
|
||||
static unsigned long __auxv_init(unsigned long type)
|
||||
{
|
||||
if (!__environ)
|
||||
{
|
||||
// __environ is not initialized yet so we can't initialize __auxv right now.
|
||||
// That's normally occurred only when getauxval() is called from some sanitizer's internal code.
|
||||
errno = ENOENT;
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Initialize __auxv and __auxv_secure.
|
||||
size_t i;
|
||||
for (i = 0; __environ[i]; i++);
|
||||
__auxv = (unsigned long *) (__environ + i + 1);
|
||||
|
||||
size_t secure_idx = __find_auxv(AT_SECURE);
|
||||
if (secure_idx != ((size_t) -1))
|
||||
__auxv_secure = __auxv[secure_idx];
|
||||
|
||||
// Now we've initialized __auxv, next time getauxval() will only call __get_auxval().
|
||||
a_cas_p(&getauxval_func, (void *)__auxv_init, (void *)__getauxval);
|
||||
|
||||
return __getauxval(type);
|
||||
}
|
||||
|
||||
// First time getauxval() will call __auxv_init().
|
||||
static void * volatile getauxval_func = (void *)__auxv_init;
|
||||
|
||||
unsigned long getauxval(unsigned long type)
|
||||
{
|
||||
return ((unsigned long (*)(unsigned long))getauxval_func)(type);
|
||||
}
|
||||
|
@ -296,7 +296,7 @@ void Pool::initialize()
|
||||
|
||||
Pool::Connection * Pool::allocConnection(bool dont_throw_if_failed_first_time)
|
||||
{
|
||||
std::unique_ptr<Connection> conn_ptr{new Connection};
|
||||
std::unique_ptr conn_ptr = std::make_unique<Connection>();
|
||||
|
||||
try
|
||||
{
|
||||
|
@ -4,13 +4,24 @@ QUERIES_FILE="queries.sql"
|
||||
TABLE=$1
|
||||
TRIES=3
|
||||
|
||||
if [ -x ./clickhouse ]
|
||||
then
|
||||
CLICKHOUSE_CLIENT="./clickhouse client"
|
||||
elif command -v clickhouse-client >/dev/null 2>&1
|
||||
then
|
||||
CLICKHOUSE_CLIENT="clickhouse-client"
|
||||
else
|
||||
echo "clickhouse-client is not found"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
cat "$QUERIES_FILE" | sed "s/{table}/${TABLE}/g" | while read query; do
|
||||
sync
|
||||
echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null
|
||||
|
||||
echo -n "["
|
||||
for i in $(seq 1 $TRIES); do
|
||||
RES=$(clickhouse-client --time --format=Null --query="$query" 2>&1)
|
||||
RES=$(${CLICKHOUSE_CLIENT} --time --format=Null --max_memory_usage=100G --query="$query" 2>&1)
|
||||
[[ "$?" == "0" ]] && echo -n "${RES}" || echo -n "null"
|
||||
[[ "$i" != $TRIES ]] && echo -n ", "
|
||||
done
|
||||
|
@ -11,8 +11,8 @@ DATASET="${TABLE}_v1.tar.xz"
|
||||
QUERIES_FILE="queries.sql"
|
||||
TRIES=3
|
||||
|
||||
AMD64_BIN_URL="https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_build_check/gcc-10_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse"
|
||||
AARCH64_BIN_URL="https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_special_build_check/clang-10-aarch64_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse"
|
||||
AMD64_BIN_URL="https://builds.clickhouse.tech/master/amd64/clickhouse"
|
||||
AARCH64_BIN_URL="https://builds.clickhouse.tech/master/aarch64/clickhouse"
|
||||
|
||||
# Note: on older Ubuntu versions, 'axel' does not support IPv6. If you are using IPv6-only servers on very old Ubuntu, just don't install 'axel'.
|
||||
|
||||
@ -89,7 +89,7 @@ cat "$QUERIES_FILE" | sed "s/{table}/${TABLE}/g" | while read query; do
|
||||
|
||||
echo -n "["
|
||||
for i in $(seq 1 $TRIES); do
|
||||
RES=$(./clickhouse client --max_memory_usage 100000000000 --time --format=Null --query="$query" 2>&1 ||:)
|
||||
RES=$(./clickhouse client --max_memory_usage 100G --time --format=Null --query="$query" 2>&1 ||:)
|
||||
[[ "$?" == "0" ]] && echo -n "${RES}" || echo -n "null"
|
||||
[[ "$i" != $TRIES ]] && echo -n ", "
|
||||
done
|
||||
|
@ -27,3 +27,22 @@ endmacro ()
|
||||
macro (no_warning flag)
|
||||
add_warning(no-${flag})
|
||||
endmacro ()
|
||||
|
||||
|
||||
# The same but only for specified target.
|
||||
macro (target_add_warning target flag)
|
||||
string (REPLACE "-" "_" underscored_flag ${flag})
|
||||
string (REPLACE "+" "x" underscored_flag ${underscored_flag})
|
||||
|
||||
check_cxx_compiler_flag("-W${flag}" SUPPORTS_CXXFLAG_${underscored_flag})
|
||||
|
||||
if (SUPPORTS_CXXFLAG_${underscored_flag})
|
||||
target_compile_options (${target} PRIVATE "-W${flag}")
|
||||
else ()
|
||||
message (WARNING "Flag -W${flag} is unsupported")
|
||||
endif ()
|
||||
endmacro ()
|
||||
|
||||
macro (target_no_warning target flag)
|
||||
target_add_warning(${target} no-${flag})
|
||||
endmacro ()
|
||||
|
@ -6,7 +6,7 @@ SET(VERSION_REVISION 54454)
|
||||
SET(VERSION_MAJOR 21)
|
||||
SET(VERSION_MINOR 9)
|
||||
SET(VERSION_PATCH 1)
|
||||
SET(VERSION_GITHASH f48c5af90c2ad51955d1ee3b6b05d006b03e4238)
|
||||
SET(VERSION_DESCRIBE v21.9.1.1-prestable)
|
||||
SET(VERSION_STRING 21.9.1.1)
|
||||
SET(VERSION_GITHASH f063e44131a048ba2d9af8075f03700fd5ec3e69)
|
||||
SET(VERSION_DESCRIBE v21.9.1.7770-prestable)
|
||||
SET(VERSION_STRING 21.9.1.7770)
|
||||
# end of autochange
|
||||
|
@ -5,13 +5,40 @@ include (CMakePushCheckState)
|
||||
|
||||
cmake_push_check_state ()
|
||||
|
||||
# gcc -dM -E -mno-sse2 - < /dev/null | sort > gcc-dump-nosse2
|
||||
# gcc -dM -E -msse2 - < /dev/null | sort > gcc-dump-sse2
|
||||
#define __SSE2__ 1
|
||||
#define __SSE2_MATH__ 1
|
||||
# The variables HAVE_* determine if compiler has support for the flag to use the corresponding instruction set.
|
||||
# The options ENABLE_* determine if we will tell compiler to actually use the corresponding instruction set if compiler can do it.
|
||||
|
||||
# All of them are unrelated to the instruction set at the host machine
|
||||
# (you can compile for newer instruction set on old machines and vice versa).
|
||||
|
||||
option (ENABLE_SSSE3 "Use SSSE3 instructions on x86_64" 1)
|
||||
option (ENABLE_SSE41 "Use SSE4.1 instructions on x86_64" 1)
|
||||
option (ENABLE_SSE42 "Use SSE4.2 instructions on x86_64" 1)
|
||||
option (ENABLE_PCLMULQDQ "Use pclmulqdq instructions on x86_64" 1)
|
||||
option (ENABLE_POPCNT "Use popcnt instructions on x86_64" 1)
|
||||
option (ENABLE_AVX "Use AVX instructions on x86_64" 0)
|
||||
option (ENABLE_AVX2 "Use AVX2 instructions on x86_64" 0)
|
||||
|
||||
option (ARCH_NATIVE "Add -march=native compiler flag. This makes your binaries non-portable but more performant code may be generated. This option overrides ENABLE_* options for specific instruction set. Highly not recommended to use." 0)
|
||||
|
||||
if (ARCH_NATIVE)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native")
|
||||
|
||||
else ()
|
||||
set (TEST_FLAG "-mssse3")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
check_cxx_source_compiles("
|
||||
#include <tmmintrin.h>
|
||||
int main() {
|
||||
__m64 a = _mm_abs_pi8(__m64());
|
||||
(void)a;
|
||||
return 0;
|
||||
}
|
||||
" HAVE_SSSE3)
|
||||
if (HAVE_SSSE3 AND ENABLE_SSSE3)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} ${TEST_FLAG}")
|
||||
endif ()
|
||||
|
||||
# gcc -dM -E -msse4.1 - < /dev/null | sort > gcc-dump-sse41
|
||||
#define __SSE4_1__ 1
|
||||
|
||||
set (TEST_FLAG "-msse4.1")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
@ -23,7 +50,7 @@ check_cxx_source_compiles("
|
||||
return 0;
|
||||
}
|
||||
" HAVE_SSE41)
|
||||
if (HAVE_SSE41)
|
||||
if (HAVE_SSE41 AND ENABLE_SSE41)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} ${TEST_FLAG}")
|
||||
endif ()
|
||||
|
||||
@ -31,9 +58,6 @@ if (ARCH_PPC64LE)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} -maltivec -D__SSE2__=1 -DNO_WARN_X86_INTRINSICS")
|
||||
endif ()
|
||||
|
||||
# gcc -dM -E -msse4.2 - < /dev/null | sort > gcc-dump-sse42
|
||||
#define __SSE4_2__ 1
|
||||
|
||||
set (TEST_FLAG "-msse4.2")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
check_cxx_source_compiles("
|
||||
@ -44,43 +68,10 @@ check_cxx_source_compiles("
|
||||
return 0;
|
||||
}
|
||||
" HAVE_SSE42)
|
||||
if (HAVE_SSE42)
|
||||
if (HAVE_SSE42 AND ENABLE_SSE42)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} ${TEST_FLAG}")
|
||||
endif ()
|
||||
|
||||
set (TEST_FLAG "-mssse3")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
check_cxx_source_compiles("
|
||||
#include <tmmintrin.h>
|
||||
int main() {
|
||||
__m64 a = _mm_abs_pi8(__m64());
|
||||
(void)a;
|
||||
return 0;
|
||||
}
|
||||
" HAVE_SSSE3)
|
||||
|
||||
set (TEST_FLAG "-mavx")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
check_cxx_source_compiles("
|
||||
#include <immintrin.h>
|
||||
int main() {
|
||||
auto a = _mm256_insert_epi8(__m256i(), 0, 0);
|
||||
(void)a;
|
||||
return 0;
|
||||
}
|
||||
" HAVE_AVX)
|
||||
|
||||
set (TEST_FLAG "-mavx2")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
check_cxx_source_compiles("
|
||||
#include <immintrin.h>
|
||||
int main() {
|
||||
auto a = _mm256_add_epi16(__m256i(), __m256i());
|
||||
(void)a;
|
||||
return 0;
|
||||
}
|
||||
" HAVE_AVX2)
|
||||
|
||||
set (TEST_FLAG "-mpclmul")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
check_cxx_source_compiles("
|
||||
@ -91,9 +82,9 @@ check_cxx_source_compiles("
|
||||
return 0;
|
||||
}
|
||||
" HAVE_PCLMULQDQ)
|
||||
|
||||
# gcc -dM -E -mpopcnt - < /dev/null | sort > gcc-dump-popcnt
|
||||
#define __POPCNT__ 1
|
||||
if (HAVE_PCLMULQDQ AND ENABLE_PCLMULQDQ)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} ${TEST_FLAG}")
|
||||
endif ()
|
||||
|
||||
set (TEST_FLAG "-mpopcnt")
|
||||
|
||||
@ -105,9 +96,37 @@ check_cxx_source_compiles("
|
||||
return 0;
|
||||
}
|
||||
" HAVE_POPCNT)
|
||||
|
||||
if (HAVE_POPCNT AND NOT ARCH_AARCH64)
|
||||
if (HAVE_POPCNT AND ENABLE_POPCNT)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} ${TEST_FLAG}")
|
||||
endif ()
|
||||
|
||||
set (TEST_FLAG "-mavx")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
check_cxx_source_compiles("
|
||||
#include <immintrin.h>
|
||||
int main() {
|
||||
auto a = _mm256_insert_epi8(__m256i(), 0, 0);
|
||||
(void)a;
|
||||
return 0;
|
||||
}
|
||||
" HAVE_AVX)
|
||||
if (HAVE_AVX AND ENABLE_AVX)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} ${TEST_FLAG}")
|
||||
endif ()
|
||||
|
||||
set (TEST_FLAG "-mavx2")
|
||||
set (CMAKE_REQUIRED_FLAGS "${TEST_FLAG} -O0")
|
||||
check_cxx_source_compiles("
|
||||
#include <immintrin.h>
|
||||
int main() {
|
||||
auto a = _mm256_add_epi16(__m256i(), __m256i());
|
||||
(void)a;
|
||||
return 0;
|
||||
}
|
||||
" HAVE_AVX2)
|
||||
if (HAVE_AVX2 AND ENABLE_AVX2)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} ${TEST_FLAG}")
|
||||
endif ()
|
||||
endif ()
|
||||
|
||||
cmake_pop_check_state ()
|
||||
|
19
cmake/find/bzip2.cmake
Normal file
19
cmake/find/bzip2.cmake
Normal file
@ -0,0 +1,19 @@
|
||||
option(ENABLE_BZIP2 "Enable bzip2 compression support" ${ENABLE_LIBRARIES})
|
||||
|
||||
if (NOT ENABLE_BZIP2)
|
||||
message (STATUS "bzip2 compression disabled")
|
||||
return()
|
||||
endif()
|
||||
|
||||
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/bzip2/bzlib.h")
|
||||
message (WARNING "submodule contrib/bzip2 is missing. to fix try run: \n git submodule update --init --recursive")
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find internal bzip2 library")
|
||||
set (USE_NLP 0)
|
||||
return()
|
||||
endif ()
|
||||
|
||||
set (USE_BZIP2 1)
|
||||
set (BZIP2_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/bzip2")
|
||||
set (BZIP2_LIBRARY bzip2)
|
||||
|
||||
message (STATUS "Using bzip2=${USE_BZIP2}: ${BZIP2_INCLUDE_DIR} : ${BZIP2_LIBRARY}")
|
32
cmake/find/nlp.cmake
Normal file
32
cmake/find/nlp.cmake
Normal file
@ -0,0 +1,32 @@
|
||||
option(ENABLE_NLP "Enable NLP functions support" ${ENABLE_LIBRARIES})
|
||||
|
||||
if (NOT ENABLE_NLP)
|
||||
|
||||
message (STATUS "NLP functions disabled")
|
||||
return()
|
||||
endif()
|
||||
|
||||
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libstemmer_c/Makefile")
|
||||
message (WARNING "submodule contrib/libstemmer_c is missing. to fix try run: \n git submodule update --init --recursive")
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find internal libstemmer_c library, NLP functions will be disabled")
|
||||
set (USE_NLP 0)
|
||||
return()
|
||||
endif ()
|
||||
|
||||
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/wordnet-blast/CMakeLists.txt")
|
||||
message (WARNING "submodule contrib/wordnet-blast is missing. to fix try run: \n git submodule update --init --recursive")
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find internal wordnet-blast library, NLP functions will be disabled")
|
||||
set (USE_NLP 0)
|
||||
return()
|
||||
endif ()
|
||||
|
||||
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/lemmagen-c/README.md")
|
||||
message (WARNING "submodule contrib/lemmagen-c is missing. to fix try run: \n git submodule update --init --recursive")
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find internal lemmagen-c library, NLP functions will be disabled")
|
||||
set (USE_NLP 0)
|
||||
return()
|
||||
endif ()
|
||||
|
||||
set (USE_NLP 1)
|
||||
|
||||
message (STATUS "Using Libraries for NLP functions: contrib/wordnet-blast, contrib/libstemmer_c, contrib/lemmagen-c")
|
2
contrib/AMQP-CPP
vendored
2
contrib/AMQP-CPP
vendored
@ -1 +1 @@
|
||||
Subproject commit 03781aaff0f10ef41f902b8cf865fe0067180c10
|
||||
Subproject commit 1a6c51f4ac51ac56610fa95081bd2f349911375a
|
10
contrib/CMakeLists.txt
vendored
10
contrib/CMakeLists.txt
vendored
@ -328,6 +328,16 @@ endif()
|
||||
|
||||
add_subdirectory(fast_float)
|
||||
|
||||
if (USE_NLP)
|
||||
add_subdirectory(libstemmer-c-cmake)
|
||||
add_subdirectory(wordnet-blast-cmake)
|
||||
add_subdirectory(lemmagen-c-cmake)
|
||||
endif()
|
||||
|
||||
if (USE_BZIP2)
|
||||
add_subdirectory(bzip2-cmake)
|
||||
endif()
|
||||
|
||||
if (USE_SQLITE)
|
||||
add_subdirectory(sqlite-cmake)
|
||||
endif()
|
||||
|
2
contrib/NuRaft
vendored
2
contrib/NuRaft
vendored
@ -1 +1 @@
|
||||
Subproject commit 976874b7aa7f422bf4ea595bb7d1166c617b1c26
|
||||
Subproject commit 7ecb16844af6a9c283ad432d85ecc2e7d1544676
|
@ -10,11 +10,12 @@ set (SRCS
|
||||
"${LIBRARY_DIR}/src/deferredconsumer.cpp"
|
||||
"${LIBRARY_DIR}/src/deferredextreceiver.cpp"
|
||||
"${LIBRARY_DIR}/src/deferredget.cpp"
|
||||
"${LIBRARY_DIR}/src/deferredpublisher.cpp"
|
||||
"${LIBRARY_DIR}/src/deferredrecall.cpp"
|
||||
"${LIBRARY_DIR}/src/deferredreceiver.cpp"
|
||||
"${LIBRARY_DIR}/src/field.cpp"
|
||||
"${LIBRARY_DIR}/src/flags.cpp"
|
||||
"${LIBRARY_DIR}/src/linux_tcp/openssl.cpp"
|
||||
"${LIBRARY_DIR}/src/linux_tcp/sslerrorprinter.cpp"
|
||||
"${LIBRARY_DIR}/src/linux_tcp/tcpconnection.cpp"
|
||||
"${LIBRARY_DIR}/src/inbuffer.cpp"
|
||||
"${LIBRARY_DIR}/src/receivedframe.cpp"
|
||||
|
2
contrib/arrow
vendored
2
contrib/arrow
vendored
@ -1 +1 @@
|
||||
Subproject commit debf751a129bdda9ff4d1e895e08957ff77000a1
|
||||
Subproject commit 078e21bad344747b7656ef2d7a4f7410a0a303eb
|
@ -119,12 +119,9 @@ set(ORC_SRCS
|
||||
"${ORC_SOURCE_SRC_DIR}/ColumnWriter.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Common.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Compression.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Exceptions.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Int128.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/LzoDecompressor.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/MemoryPool.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/OrcFile.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Reader.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLE.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLEv1.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLEv2.cc"
|
||||
@ -194,9 +191,18 @@ set(ARROW_SRCS
|
||||
"${LIBRARY_DIR}/compute/cast.cc"
|
||||
"${LIBRARY_DIR}/compute/exec.cc"
|
||||
"${LIBRARY_DIR}/compute/function.cc"
|
||||
"${LIBRARY_DIR}/compute/function_internal.cc"
|
||||
"${LIBRARY_DIR}/compute/kernel.cc"
|
||||
"${LIBRARY_DIR}/compute/registry.cc"
|
||||
|
||||
"${LIBRARY_DIR}/compute/exec/exec_plan.cc"
|
||||
"${LIBRARY_DIR}/compute/exec/expression.cc"
|
||||
"${LIBRARY_DIR}/compute/exec/key_compare.cc"
|
||||
"${LIBRARY_DIR}/compute/exec/key_encode.cc"
|
||||
"${LIBRARY_DIR}/compute/exec/key_hash.cc"
|
||||
"${LIBRARY_DIR}/compute/exec/key_map.cc"
|
||||
"${LIBRARY_DIR}/compute/exec/util.cc"
|
||||
|
||||
"${LIBRARY_DIR}/compute/kernels/aggregate_basic.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/aggregate_mode.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/aggregate_quantile.cc"
|
||||
@ -207,6 +213,7 @@ set(ARROW_SRCS
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_arithmetic.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_boolean.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_cast_boolean.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_cast_dictionary.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_cast_internal.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_cast_nested.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_cast_numeric.cc"
|
||||
@ -214,15 +221,18 @@ set(ARROW_SRCS
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_cast_temporal.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_compare.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_fill_null.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_if_else.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_nested.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_set_lookup.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_string.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_temporal.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/scalar_validity.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/util_internal.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/vector_hash.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/vector_nested.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/vector_replace.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/vector_selection.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/vector_sort.cc"
|
||||
"${LIBRARY_DIR}/compute/kernels/util_internal.cc"
|
||||
|
||||
"${LIBRARY_DIR}/csv/chunker.cc"
|
||||
"${LIBRARY_DIR}/csv/column_builder.cc"
|
||||
@ -231,6 +241,7 @@ set(ARROW_SRCS
|
||||
"${LIBRARY_DIR}/csv/options.cc"
|
||||
"${LIBRARY_DIR}/csv/parser.cc"
|
||||
"${LIBRARY_DIR}/csv/reader.cc"
|
||||
"${LIBRARY_DIR}/csv/writer.cc"
|
||||
|
||||
"${LIBRARY_DIR}/ipc/dictionary.cc"
|
||||
"${LIBRARY_DIR}/ipc/feather.cc"
|
||||
@ -247,6 +258,7 @@ set(ARROW_SRCS
|
||||
"${LIBRARY_DIR}/io/interfaces.cc"
|
||||
"${LIBRARY_DIR}/io/memory.cc"
|
||||
"${LIBRARY_DIR}/io/slow.cc"
|
||||
"${LIBRARY_DIR}/io/stdio.cc"
|
||||
"${LIBRARY_DIR}/io/transform.cc"
|
||||
|
||||
"${LIBRARY_DIR}/tensor/coo_converter.cc"
|
||||
@ -257,9 +269,9 @@ set(ARROW_SRCS
|
||||
"${LIBRARY_DIR}/util/bit_block_counter.cc"
|
||||
"${LIBRARY_DIR}/util/bit_run_reader.cc"
|
||||
"${LIBRARY_DIR}/util/bit_util.cc"
|
||||
"${LIBRARY_DIR}/util/bitmap.cc"
|
||||
"${LIBRARY_DIR}/util/bitmap_builders.cc"
|
||||
"${LIBRARY_DIR}/util/bitmap_ops.cc"
|
||||
"${LIBRARY_DIR}/util/bitmap.cc"
|
||||
"${LIBRARY_DIR}/util/bpacking.cc"
|
||||
"${LIBRARY_DIR}/util/cancel.cc"
|
||||
"${LIBRARY_DIR}/util/compression.cc"
|
||||
|
2
contrib/boost
vendored
2
contrib/boost
vendored
@ -1 +1 @@
|
||||
Subproject commit 1ccbb5a522a571ce83b606dbc2e1011c42ecccfb
|
||||
Subproject commit 9cf09dbfd55a5c6202dedbdf40781a51b02c2675
|
@ -13,11 +13,12 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY)
|
||||
regex
|
||||
context
|
||||
coroutine
|
||||
graph
|
||||
)
|
||||
|
||||
if(Boost_INCLUDE_DIR AND Boost_FILESYSTEM_LIBRARY AND Boost_FILESYSTEM_LIBRARY AND
|
||||
Boost_PROGRAM_OPTIONS_LIBRARY AND Boost_REGEX_LIBRARY AND Boost_SYSTEM_LIBRARY AND Boost_CONTEXT_LIBRARY AND
|
||||
Boost_COROUTINE_LIBRARY)
|
||||
Boost_COROUTINE_LIBRARY AND Boost_GRAPH_LIBRARY)
|
||||
|
||||
set(EXTERNAL_BOOST_FOUND 1)
|
||||
|
||||
@ -32,6 +33,7 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY)
|
||||
add_library (_boost_system INTERFACE)
|
||||
add_library (_boost_context INTERFACE)
|
||||
add_library (_boost_coroutine INTERFACE)
|
||||
add_library (_boost_graph INTERFACE)
|
||||
|
||||
target_link_libraries (_boost_filesystem INTERFACE ${Boost_FILESYSTEM_LIBRARY})
|
||||
target_link_libraries (_boost_iostreams INTERFACE ${Boost_IOSTREAMS_LIBRARY})
|
||||
@ -40,6 +42,7 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY)
|
||||
target_link_libraries (_boost_system INTERFACE ${Boost_SYSTEM_LIBRARY})
|
||||
target_link_libraries (_boost_context INTERFACE ${Boost_CONTEXT_LIBRARY})
|
||||
target_link_libraries (_boost_coroutine INTERFACE ${Boost_COROUTINE_LIBRARY})
|
||||
target_link_libraries (_boost_graph INTERFACE ${Boost_GRAPH_LIBRARY})
|
||||
|
||||
add_library (boost::filesystem ALIAS _boost_filesystem)
|
||||
add_library (boost::iostreams ALIAS _boost_iostreams)
|
||||
@ -48,6 +51,7 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY)
|
||||
add_library (boost::system ALIAS _boost_system)
|
||||
add_library (boost::context ALIAS _boost_context)
|
||||
add_library (boost::coroutine ALIAS _boost_coroutine)
|
||||
add_library (boost::graph ALIAS _boost_graph)
|
||||
else()
|
||||
set(EXTERNAL_BOOST_FOUND 0)
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find system boost")
|
||||
@ -221,4 +225,17 @@ if (NOT EXTERNAL_BOOST_FOUND)
|
||||
add_library (boost::coroutine ALIAS _boost_coroutine)
|
||||
target_include_directories (_boost_coroutine PRIVATE ${LIBRARY_DIR})
|
||||
target_link_libraries(_boost_coroutine PRIVATE _boost_context)
|
||||
|
||||
# graph
|
||||
|
||||
set (SRCS_GRAPH
|
||||
"${LIBRARY_DIR}/libs/graph/src/graphml.cpp"
|
||||
"${LIBRARY_DIR}/libs/graph/src/read_graphviz_new.cpp"
|
||||
)
|
||||
|
||||
add_library (_boost_graph ${SRCS_GRAPH})
|
||||
add_library (boost::graph ALIAS _boost_graph)
|
||||
target_include_directories (_boost_graph PRIVATE ${LIBRARY_DIR})
|
||||
target_link_libraries(_boost_graph PRIVATE _boost_regex)
|
||||
|
||||
endif ()
|
||||
|
1
contrib/bzip2
vendored
Submodule
1
contrib/bzip2
vendored
Submodule
@ -0,0 +1 @@
|
||||
Subproject commit bf905ea2251191ff9911ae7ec0cfc35d41f9f7f6
|
23
contrib/bzip2-cmake/CMakeLists.txt
Normal file
23
contrib/bzip2-cmake/CMakeLists.txt
Normal file
@ -0,0 +1,23 @@
|
||||
set(BZIP2_SOURCE_DIR "${ClickHouse_SOURCE_DIR}/contrib/bzip2")
|
||||
set(BZIP2_BINARY_DIR "${ClickHouse_BINARY_DIR}/contrib/bzip2")
|
||||
|
||||
set(SRCS
|
||||
"${BZIP2_SOURCE_DIR}/blocksort.c"
|
||||
"${BZIP2_SOURCE_DIR}/huffman.c"
|
||||
"${BZIP2_SOURCE_DIR}/crctable.c"
|
||||
"${BZIP2_SOURCE_DIR}/randtable.c"
|
||||
"${BZIP2_SOURCE_DIR}/compress.c"
|
||||
"${BZIP2_SOURCE_DIR}/decompress.c"
|
||||
"${BZIP2_SOURCE_DIR}/bzlib.c"
|
||||
)
|
||||
|
||||
# From bzip2/CMakeLists.txt
|
||||
set(BZ_VERSION "1.0.7")
|
||||
configure_file (
|
||||
"${BZIP2_SOURCE_DIR}/bz_version.h.in"
|
||||
"${BZIP2_BINARY_DIR}/bz_version.h"
|
||||
)
|
||||
|
||||
add_library(bzip2 ${SRCS})
|
||||
|
||||
target_include_directories(bzip2 PUBLIC "${BZIP2_SOURCE_DIR}" "${BZIP2_BINARY_DIR}")
|
@ -24,3 +24,19 @@ add_library(roaring ${SRCS})
|
||||
target_include_directories(roaring PRIVATE "${LIBRARY_DIR}/include/roaring")
|
||||
target_include_directories(roaring SYSTEM BEFORE PUBLIC "${LIBRARY_DIR}/include")
|
||||
target_include_directories(roaring SYSTEM BEFORE PUBLIC "${LIBRARY_DIR}/cpp")
|
||||
|
||||
# We redirect malloc/free family of functions to different functions that will track memory in ClickHouse.
|
||||
# Also note that we exploit implicit function declarations.
|
||||
# Also it is disabled on Mac OS because it fails).
|
||||
|
||||
if (NOT OS_DARWIN)
|
||||
target_compile_definitions(roaring PRIVATE
|
||||
-Dmalloc=clickhouse_malloc
|
||||
-Dcalloc=clickhouse_calloc
|
||||
-Drealloc=clickhouse_realloc
|
||||
-Dreallocarray=clickhouse_reallocarray
|
||||
-Dfree=clickhouse_free
|
||||
-Dposix_memalign=clickhouse_posix_memalign)
|
||||
|
||||
target_link_libraries(roaring PUBLIC clickhouse_common_io)
|
||||
endif ()
|
||||
|
1
contrib/lemmagen-c
vendored
Submodule
1
contrib/lemmagen-c
vendored
Submodule
@ -0,0 +1 @@
|
||||
Subproject commit 59537bdcf57bbed17913292cb4502d15657231f1
|
9
contrib/lemmagen-c-cmake/CMakeLists.txt
Normal file
9
contrib/lemmagen-c-cmake/CMakeLists.txt
Normal file
@ -0,0 +1,9 @@
|
||||
set(LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/lemmagen-c")
|
||||
set(LEMMAGEN_INCLUDE_DIR "${LIBRARY_DIR}/include")
|
||||
|
||||
set(SRCS
|
||||
"${LIBRARY_DIR}/src/RdrLemmatizer.cpp"
|
||||
)
|
||||
|
||||
add_library(lemmagen STATIC ${SRCS})
|
||||
target_include_directories(lemmagen PUBLIC "${LEMMAGEN_INCLUDE_DIR}")
|
@ -2,9 +2,5 @@ set (SRCS
|
||||
src/metrohash64.cpp
|
||||
src/metrohash128.cpp
|
||||
)
|
||||
if (HAVE_SSE42) # Not used. Pretty easy to port.
|
||||
list (APPEND SRCS src/metrohash128crc.cpp)
|
||||
endif ()
|
||||
|
||||
add_library(metrohash ${SRCS})
|
||||
target_include_directories(metrohash PUBLIC src)
|
||||
|
31
contrib/libstemmer-c-cmake/CMakeLists.txt
Normal file
31
contrib/libstemmer-c-cmake/CMakeLists.txt
Normal file
@ -0,0 +1,31 @@
|
||||
set(LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/libstemmer_c")
|
||||
set(STEMMER_INCLUDE_DIR "${LIBRARY_DIR}/include")
|
||||
|
||||
FILE ( READ "${LIBRARY_DIR}/mkinc.mak" _CONTENT )
|
||||
# replace '\ ' into one big line
|
||||
STRING ( REGEX REPLACE "\\\\\n " " ${LIBRARY_DIR}/" _CONTENT "${_CONTENT}" )
|
||||
# escape ';' (if any)
|
||||
STRING ( REGEX REPLACE ";" "\\\\;" _CONTENT "${_CONTENT}" )
|
||||
# now replace lf into ';' (it makes list from the line)
|
||||
STRING ( REGEX REPLACE "\n" ";" _CONTENT "${_CONTENT}" )
|
||||
FOREACH ( LINE ${_CONTENT} )
|
||||
# skip comments (beginning with #)
|
||||
IF ( NOT "${LINE}" MATCHES "^#.*" )
|
||||
# parse 'name=value1 value2..." - extract the 'name' part
|
||||
STRING ( REGEX REPLACE "=.*$" "" _NAME "${LINE}" )
|
||||
# extract the list of values part
|
||||
STRING ( REGEX REPLACE "^.*=" "" _LIST "${LINE}" )
|
||||
# replace (multi)spaces into ';' (it makes list from the line)
|
||||
STRING ( REGEX REPLACE " +" ";" _LIST "${_LIST}" )
|
||||
# finally get our two variables
|
||||
IF ( "${_NAME}" MATCHES "snowball_sources" )
|
||||
SET ( _SOURCES "${_LIST}" )
|
||||
ELSEIF ( "${_NAME}" MATCHES "snowball_headers" )
|
||||
SET ( _HEADERS "${_LIST}" )
|
||||
ENDIF ()
|
||||
endif ()
|
||||
endforeach ()
|
||||
|
||||
# all the sources parsed. Now just add the lib
|
||||
add_library ( stemmer STATIC ${_SOURCES} ${_HEADERS} )
|
||||
target_include_directories (stemmer PUBLIC "${STEMMER_INCLUDE_DIR}")
|
1
contrib/libstemmer_c
vendored
Submodule
1
contrib/libstemmer_c
vendored
Submodule
@ -0,0 +1 @@
|
||||
Subproject commit c753054304d87daf460057c1a649c482aa094835
|
@ -22,6 +22,7 @@ set(SRCS
|
||||
"${LIBRARY_DIR}/src/launcher.cxx"
|
||||
"${LIBRARY_DIR}/src/srv_config.cxx"
|
||||
"${LIBRARY_DIR}/src/snapshot_sync_req.cxx"
|
||||
"${LIBRARY_DIR}/src/snapshot_sync_ctx.cxx"
|
||||
"${LIBRARY_DIR}/src/handle_timeout.cxx"
|
||||
"${LIBRARY_DIR}/src/handle_append_entries.cxx"
|
||||
"${LIBRARY_DIR}/src/cluster_config.cxx"
|
||||
|
2
contrib/protobuf
vendored
2
contrib/protobuf
vendored
@ -1 +1 @@
|
||||
Subproject commit 73b12814204ad9068ba352914d0dc244648b48ee
|
||||
Subproject commit 75601841d172c73ae6bf4ce8121f42b875cdbabd
|
2
contrib/rocksdb
vendored
2
contrib/rocksdb
vendored
@ -1 +1 @@
|
||||
Subproject commit dac0e9a68080c837d6b6223921f3fc151abbfcdc
|
||||
Subproject commit b6480c69bf3ab6e298e0d019a07fd4f69029b26a
|
@ -4,3 +4,6 @@ set(SIMDJSON_SRC "${SIMDJSON_SRC_DIR}/simdjson.cpp")
|
||||
|
||||
add_library(simdjson ${SIMDJSON_SRC})
|
||||
target_include_directories(simdjson SYSTEM PUBLIC "${SIMDJSON_INCLUDE_DIR}" PRIVATE "${SIMDJSON_SRC_DIR}")
|
||||
|
||||
# simdjson is using its own CPU dispatching and get confused if we enable AVX/AVX2 flags.
|
||||
target_compile_options(simdjson PRIVATE -mno-avx -mno-avx2)
|
||||
|
1
contrib/wordnet-blast
vendored
Submodule
1
contrib/wordnet-blast
vendored
Submodule
@ -0,0 +1 @@
|
||||
Subproject commit 1d16ac28036e19fe8da7ba72c16a307fbdf8c87e
|
13
contrib/wordnet-blast-cmake/CMakeLists.txt
Normal file
13
contrib/wordnet-blast-cmake/CMakeLists.txt
Normal file
@ -0,0 +1,13 @@
|
||||
set(LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/wordnet-blast")
|
||||
|
||||
set(SRCS
|
||||
"${LIBRARY_DIR}/wnb/core/info_helper.cc"
|
||||
"${LIBRARY_DIR}/wnb/core/load_wordnet.cc"
|
||||
"${LIBRARY_DIR}/wnb/core/wordnet.cc"
|
||||
)
|
||||
|
||||
add_library(wnb ${SRCS})
|
||||
|
||||
target_link_libraries(wnb PRIVATE boost::headers_only boost::graph)
|
||||
|
||||
target_include_directories(wnb PUBLIC "${LIBRARY_DIR}")
|
2
contrib/zlib-ng
vendored
2
contrib/zlib-ng
vendored
@ -1 +1 @@
|
||||
Subproject commit db232d30b4c72fd58e6d7eae2d12cebf9c3d90db
|
||||
Subproject commit 6a5e93b9007782115f7f7e5235dedc81c4f1facb
|
@ -151,8 +151,14 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ
|
||||
cmake_flags.append('-DENABLE_TESTS=1')
|
||||
cmake_flags.append('-DUSE_GTEST=1')
|
||||
|
||||
# "Unbundled" build is not suitable for any production usage.
|
||||
# But it is occasionally used by some developers.
|
||||
# The whole idea of using unknown version of libraries from the OS distribution is deeply flawed.
|
||||
# We wish these developers good luck.
|
||||
if unbundled:
|
||||
cmake_flags.append('-DUNBUNDLED=1 -DUSE_INTERNAL_RDKAFKA_LIBRARY=1 -DENABLE_ARROW=0 -DENABLE_AVRO=0 -DENABLE_ORC=0 -DENABLE_PARQUET=0')
|
||||
# We also disable all CPU features except basic x86_64.
|
||||
# It is only slightly related to "unbundled" build, but it is a good place to test if code compiles without these instruction sets.
|
||||
cmake_flags.append('-DUNBUNDLED=1 -DUSE_INTERNAL_RDKAFKA_LIBRARY=1 -DENABLE_ARROW=0 -DENABLE_AVRO=0 -DENABLE_ORC=0 -DENABLE_PARQUET=0 -DENABLE_SSSE3=0 -DENABLE_SSE41=0 -DENABLE_SSE42=0 -DENABLE_PCLMULQDQ=0 -DENABLE_POPCNT=0 -DENABLE_AVX=0 -DENABLE_AVX2=0')
|
||||
|
||||
if split_binary:
|
||||
cmake_flags.append('-DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1')
|
||||
|
@ -23,6 +23,7 @@ RUN apt-get update \
|
||||
libboost-regex-dev \
|
||||
libboost-context-dev \
|
||||
libboost-coroutine-dev \
|
||||
libboost-graph-dev \
|
||||
zlib1g-dev \
|
||||
liblz4-dev \
|
||||
libdouble-conversion-dev \
|
||||
|
@ -164,6 +164,10 @@ fi
|
||||
|
||||
# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
|
||||
if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
|
||||
# Watchdog is launched by default, but does not send SIGINT to the main process,
|
||||
# so the container can't be finished by ctrl+c
|
||||
CLICKHOUSE_WATCHDOG_ENABLE=${CLICKHOUSE_WATCHDOG_ENABLE:-0}
|
||||
export CLICKHOUSE_WATCHDOG_ENABLE
|
||||
exec $gosu /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" "$@"
|
||||
fi
|
||||
|
||||
|
@ -61,4 +61,7 @@ ENV TSAN_OPTIONS='halt_on_error=1 history_size=7'
|
||||
ENV UBSAN_OPTIONS='print_stacktrace=1'
|
||||
ENV MSAN_OPTIONS='abort_on_error=1 poison_in_dtor=1'
|
||||
|
||||
ENV TZ=Europe/Moscow
|
||||
RUN ln -snf "/usr/share/zoneinfo/$TZ" /etc/localtime && echo "$TZ" > /etc/timezone
|
||||
|
||||
CMD sleep 1
|
||||
|
@ -279,6 +279,7 @@ function run_tests
|
||||
00926_multimatch
|
||||
00929_multi_match_edit_distance
|
||||
01681_hyperscan_debug_assertion
|
||||
02004_max_hyperscan_regex_length
|
||||
|
||||
01176_mysql_client_interactive # requires mysql client
|
||||
01031_mutations_interpreter_and_context
|
||||
@ -299,6 +300,7 @@ function run_tests
|
||||
01318_decrypt # Depends on OpenSSL
|
||||
01663_aes_msan # Depends on OpenSSL
|
||||
01667_aes_args_check # Depends on OpenSSL
|
||||
01683_codec_encrypted # Depends on OpenSSL
|
||||
01776_decrypt_aead_size_check # Depends on OpenSSL
|
||||
01811_filter_by_null # Depends on OpenSSL
|
||||
01281_unsucceeded_insert_select_queries_counter
|
||||
@ -310,6 +312,8 @@ function run_tests
|
||||
01411_bayesian_ab_testing
|
||||
01798_uniq_theta_sketch
|
||||
01799_long_uniq_theta_sketch
|
||||
01890_stem # depends on libstemmer_c
|
||||
02003_compress_bz2 # depends on bzip2
|
||||
collate
|
||||
collation
|
||||
_orc_
|
||||
|
@ -194,6 +194,10 @@ continue
|
||||
jobs
|
||||
pstree -aspgT
|
||||
|
||||
server_exit_code=0
|
||||
wait $server_pid || server_exit_code=$?
|
||||
echo "Server exit code is $server_exit_code"
|
||||
|
||||
# Make files with status and description we'll show for this check on Github.
|
||||
task_exit_code=$fuzzer_exit_code
|
||||
if [ "$server_died" == 1 ]
|
||||
@ -222,7 +226,7 @@ continue
|
||||
task_exit_code=$fuzzer_exit_code
|
||||
echo "failure" > status.txt
|
||||
{ grep --text -o "Found error:.*" fuzzer.log \
|
||||
|| grep --text -o "Exception.*" fuzzer.log \
|
||||
|| grep --text -ao "Exception:.*" fuzzer.log \
|
||||
|| echo "Fuzzer failed ($fuzzer_exit_code). See the logs." ; } \
|
||||
| tail -1 > description.txt
|
||||
fi
|
||||
|
@ -14,10 +14,14 @@ services:
|
||||
}
|
||||
EOF
|
||||
./docker-entrypoint.sh'
|
||||
ports:
|
||||
- 9020:9019
|
||||
expose:
|
||||
- 9019
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-s", "localhost:9019/ping"]
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 30
|
||||
volumes:
|
||||
- type: ${JDBC_BRIDGE_FS:-tmpfs}
|
||||
source: ${JDBC_BRIDGE_LOGS:-}
|
||||
target: /app/logs
|
@ -0,0 +1,13 @@
|
||||
version: '2.3'
|
||||
services:
|
||||
mongo1:
|
||||
image: mongo:3.6
|
||||
restart: always
|
||||
environment:
|
||||
MONGO_INITDB_ROOT_USERNAME: root
|
||||
MONGO_INITDB_ROOT_PASSWORD: clickhouse
|
||||
volumes:
|
||||
- ${MONGO_CONFIG_PATH}:/mongo/
|
||||
ports:
|
||||
- ${MONGO_EXTERNAL_PORT}:${MONGO_INTERNAL_PORT}
|
||||
command: --config /mongo/mongo_secure.conf --profile=2 --verbose
|
@ -2,7 +2,7 @@ version: '2.3'
|
||||
|
||||
services:
|
||||
rabbitmq1:
|
||||
image: rabbitmq:3-management-alpine
|
||||
image: rabbitmq:3.8-management-alpine
|
||||
hostname: rabbitmq1
|
||||
expose:
|
||||
- ${RABBITMQ_PORT}
|
||||
|
@ -1196,7 +1196,7 @@ create table changes engine File(TSV, 'metrics/changes.tsv') as
|
||||
if(left > right, left / right, right / left) times_diff
|
||||
from metrics
|
||||
group by metric
|
||||
having abs(diff) > 0.05 and isFinite(diff)
|
||||
having abs(diff) > 0.05 and isFinite(diff) and isFinite(times_diff)
|
||||
)
|
||||
order by diff desc
|
||||
;
|
||||
|
@ -183,6 +183,10 @@ for conn_index, c in enumerate(all_connections):
|
||||
# requires clickhouse-driver >= 1.1.5 to accept arbitrary new settings
|
||||
# (https://github.com/mymarilyn/clickhouse-driver/pull/142)
|
||||
c.settings[s.tag] = s.text
|
||||
# We have to perform a query to make sure the settings work. Otherwise an
|
||||
# unknown setting will lead to failing precondition check, and we will skip
|
||||
# the test, which is wrong.
|
||||
c.execute("select 1")
|
||||
|
||||
reportStageEnd('settings')
|
||||
|
||||
|
@ -28,7 +28,7 @@ RUN apt-get update --yes \
|
||||
ENV PKG_VERSION="pvs-studio-latest"
|
||||
|
||||
RUN set -x \
|
||||
&& export PUBKEY_HASHSUM="486a0694c7f92e96190bbfac01c3b5ac2cb7823981db510a28f744c99eabbbf17a7bcee53ca42dc6d84d4323c2742761" \
|
||||
&& export PUBKEY_HASHSUM="686e5eb8b3c543a5c54442c39ec876b6c2d912fe8a729099e600017ae53c877dda3368fe38ed7a66024fe26df6b5892a" \
|
||||
&& wget -nv https://files.viva64.com/etc/pubkey.txt -O /tmp/pubkey.txt \
|
||||
&& echo "${PUBKEY_HASHSUM} /tmp/pubkey.txt" | sha384sum -c \
|
||||
&& apt-key add /tmp/pubkey.txt \
|
||||
|
@ -2,6 +2,11 @@
|
||||
|
||||
set -e -x
|
||||
|
||||
# Choose random timezone for this test run
|
||||
TZ="$(grep -v '#' /usr/share/zoneinfo/zone.tab | awk '{print $3}' | shuf | head -n1)"
|
||||
echo "Choosen random timezone $TZ"
|
||||
ln -snf "/usr/share/zoneinfo/$TZ" /etc/localtime && echo "$TZ" > /etc/timezone
|
||||
|
||||
dpkg -i package_folder/clickhouse-common-static_*.deb;
|
||||
dpkg -i package_folder/clickhouse-common-static-dbg_*.deb
|
||||
dpkg -i package_folder/clickhouse-server_*.deb
|
||||
|
@ -105,6 +105,10 @@ def process_result(result_path):
|
||||
description += ", skipped: {}".format(skipped)
|
||||
if unknown != 0:
|
||||
description += ", unknown: {}".format(unknown)
|
||||
|
||||
# Temporary green for tests with DatabaseReplicated:
|
||||
if 1 == int(os.environ.get('USE_DATABASE_REPLICATED', 0)):
|
||||
state = "success"
|
||||
else:
|
||||
state = "failure"
|
||||
description = "Output log doesn't exist"
|
||||
|
@ -3,6 +3,11 @@
|
||||
# fail on errors, verbose and export all env variables
|
||||
set -e -x -a
|
||||
|
||||
# Choose random timezone for this test run.
|
||||
TZ="$(grep -v '#' /usr/share/zoneinfo/zone.tab | awk '{print $3}' | shuf | head -n1)"
|
||||
echo "Choosen random timezone $TZ"
|
||||
ln -snf "/usr/share/zoneinfo/$TZ" /etc/localtime && echo "$TZ" > /etc/timezone
|
||||
|
||||
dpkg -i package_folder/clickhouse-common-static_*.deb
|
||||
dpkg -i package_folder/clickhouse-common-static-dbg_*.deb
|
||||
dpkg -i package_folder/clickhouse-server_*.deb
|
||||
@ -138,6 +143,7 @@ if [[ -n "$WITH_COVERAGE" ]] && [[ "$WITH_COVERAGE" -eq 1 ]]; then
|
||||
fi
|
||||
tar -chf /test_output/text_log_dump.tar /var/lib/clickhouse/data/system/text_log ||:
|
||||
tar -chf /test_output/query_log_dump.tar /var/lib/clickhouse/data/system/query_log ||:
|
||||
tar -chf /test_output/zookeeper_log_dump.tar /var/lib/clickhouse/data/system/zookeeper_log ||:
|
||||
tar -chf /test_output/coordination.tar /var/lib/clickhouse/coordination ||:
|
||||
|
||||
if [[ -n "$USE_DATABASE_REPLICATED" ]] && [[ "$USE_DATABASE_REPLICATED" -eq 1 ]]; then
|
||||
@ -147,6 +153,8 @@ if [[ -n "$USE_DATABASE_REPLICATED" ]] && [[ "$USE_DATABASE_REPLICATED" -eq 1 ]]
|
||||
pigz < /var/log/clickhouse-server/clickhouse-server2.log > /test_output/clickhouse-server2.log.gz ||:
|
||||
mv /var/log/clickhouse-server/stderr1.log /test_output/ ||:
|
||||
mv /var/log/clickhouse-server/stderr2.log /test_output/ ||:
|
||||
tar -chf /test_output/zookeeper_log_dump1.tar /var/lib/clickhouse1/data/system/zookeeper_log ||:
|
||||
tar -chf /test_output/zookeeper_log_dump2.tar /var/lib/clickhouse2/data/system/zookeeper_log ||:
|
||||
tar -chf /test_output/coordination1.tar /var/lib/clickhouse1/coordination ||:
|
||||
tar -chf /test_output/coordination2.tar /var/lib/clickhouse2/coordination ||:
|
||||
fi
|
||||
|
@ -77,9 +77,6 @@ RUN mkdir -p /tmp/clickhouse-odbc-tmp \
|
||||
&& odbcinst -i -s -l -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbc.ini.sample \
|
||||
&& rm -rf /tmp/clickhouse-odbc-tmp
|
||||
|
||||
ENV TZ=Europe/Moscow
|
||||
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
|
||||
|
||||
COPY run.sh /
|
||||
CMD ["/bin/bash", "/run.sh"]
|
||||
|
||||
|
@ -20,6 +20,7 @@ def get_skip_list_cmd(path):
|
||||
|
||||
def get_options(i):
|
||||
options = []
|
||||
client_options = []
|
||||
if 0 < i:
|
||||
options.append("--order=random")
|
||||
|
||||
@ -27,25 +28,29 @@ def get_options(i):
|
||||
options.append("--db-engine=Ordinary")
|
||||
|
||||
if i % 3 == 2:
|
||||
options.append('''--client-option='allow_experimental_database_replicated=1' --db-engine="Replicated('/test/db/test_{}', 's1', 'r1')"'''.format(i))
|
||||
options.append('''--db-engine="Replicated('/test/db/test_{}', 's1', 'r1')"'''.format(i))
|
||||
client_options.append('allow_experimental_database_replicated=1')
|
||||
|
||||
# If database name is not specified, new database is created for each functional test.
|
||||
# Run some threads with one database for all tests.
|
||||
if i % 2 == 1:
|
||||
options.append(" --database=test_{}".format(i))
|
||||
|
||||
if i % 7 == 0:
|
||||
options.append(" --client-option='join_use_nulls=1'")
|
||||
if i % 5 == 1:
|
||||
client_options.append("join_use_nulls=1")
|
||||
|
||||
if i % 14 == 0:
|
||||
options.append(' --client-option="join_algorithm=\'partial_merge\'"')
|
||||
if i % 15 == 6:
|
||||
client_options.append("join_algorithm='partial_merge'")
|
||||
|
||||
if i % 21 == 0:
|
||||
options.append(' --client-option="join_algorithm=\'auto\'"')
|
||||
options.append(' --client-option="max_rows_in_join=1000"')
|
||||
if i % 15 == 11:
|
||||
client_options.append("join_algorithm='auto'")
|
||||
client_options.append('max_rows_in_join=1000')
|
||||
|
||||
if i == 13:
|
||||
options.append(" --client-option='memory_tracker_fault_probability=0.00001'")
|
||||
client_options.append('memory_tracker_fault_probability=0.001')
|
||||
|
||||
if client_options:
|
||||
options.append(" --client-option " + ' '.join(client_options))
|
||||
|
||||
return ' '.join(options)
|
||||
|
||||
|
@ -35,7 +35,7 @@ RUN apt-get update \
|
||||
ENV TZ=Europe/Moscow
|
||||
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
|
||||
|
||||
RUN pip3 install urllib3 testflows==1.6.90 docker-compose==1.29.1 docker==5.0.0 dicttoxml kazoo tzlocal python-dateutil numpy
|
||||
RUN pip3 install urllib3 testflows==1.7.20 docker-compose==1.29.1 docker==5.0.0 dicttoxml kazoo tzlocal python-dateutil numpy
|
||||
|
||||
ENV DOCKER_CHANNEL stable
|
||||
ENV DOCKER_VERSION 20.10.6
|
||||
|
@ -1,8 +1,6 @@
|
||||
# docker build -t yandex/clickhouse-unit-test .
|
||||
FROM yandex/clickhouse-stateless-test
|
||||
|
||||
ENV TZ=Europe/Moscow
|
||||
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
|
||||
RUN apt-get install gdb
|
||||
|
||||
COPY run.sh /
|
||||
|
@ -155,6 +155,10 @@ Normally ClickHouse is statically linked into a single static `clickhouse` binar
|
||||
-DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1
|
||||
```
|
||||
|
||||
Note that in this configuration there is no single `clickhouse` binary, and you have to run `clickhouse-server`, `clickhouse-client` etc.
|
||||
Note that the split build has several drawbacks:
|
||||
* There is no single `clickhouse` binary, and you have to run `clickhouse-server`, `clickhouse-client`, etc.
|
||||
* Risk of segfault if you run any of the programs while rebuilding the project.
|
||||
* You cannot run the integration tests since they only work a single complete binary.
|
||||
* You can't easily copy the binaries elsewhere. Instead of moving a single binary you'll need to copy all binaries and libraries.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/development/build/) <!--hide-->
|
||||
|
@ -8,7 +8,7 @@ toc_title: Third-Party Libraries Used
|
||||
The list of third-party libraries can be obtained by the following query:
|
||||
|
||||
``` sql
|
||||
SELECT library_name, license_type, license_path FROM system.licenses ORDER BY library_name COLLATE 'en'
|
||||
SELECT library_name, license_type, license_path FROM system.licenses ORDER BY library_name COLLATE 'en';
|
||||
```
|
||||
|
||||
[Example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUIGxpYnJhcnlfbmFtZSwgbGljZW5zZV90eXBlLCBsaWNlbnNlX3BhdGggRlJPTSBzeXN0ZW0ubGljZW5zZXMgT1JERVIgQlkgbGlicmFyeV9uYW1lIENPTExBVEUgJ2VuJw==)
|
||||
|
@ -749,7 +749,7 @@ If your code in the `master` branch is not buildable yet, exclude it from the bu
|
||||
|
||||
**1.** The C++20 standard library is used (experimental extensions are allowed), as well as `boost` and `Poco` frameworks.
|
||||
|
||||
**2.** It is not allowed to use libraries from OS packages. It is also not allowed to use pre-installed libraries. All libraries should be placed in form of source code in `contrib` directory and built with ClickHouse.
|
||||
**2.** It is not allowed to use libraries from OS packages. It is also not allowed to use pre-installed libraries. All libraries should be placed in form of source code in `contrib` directory and built with ClickHouse. See [Guidelines for adding new third-party libraries](contrib.md#adding-third-party-libraries) for details.
|
||||
|
||||
**3.** Preference is always given to libraries that are already in use.
|
||||
|
||||
|
@ -70,7 +70,13 @@ Note that integration of ClickHouse with third-party drivers is not tested. Also
|
||||
|
||||
Unit tests are useful when you want to test not the ClickHouse as a whole, but a single isolated library or class. You can enable or disable build of tests with `ENABLE_TESTS` CMake option. Unit tests (and other test programs) are located in `tests` subdirectories across the code. To run unit tests, type `ninja test`. Some tests use `gtest`, but some are just programs that return non-zero exit code on test failure.
|
||||
|
||||
It’s not necessarily to have unit tests if the code is already covered by functional tests (and functional tests are usually much more simple to use).
|
||||
It’s not necessary to have unit tests if the code is already covered by functional tests (and functional tests are usually much more simple to use).
|
||||
|
||||
You can run individual gtest checks by calling the executable directly, for example:
|
||||
|
||||
```bash
|
||||
$ ./src/unit_tests_dbms --gtest_filter=LocalAddress*
|
||||
```
|
||||
|
||||
## Performance Tests {#performance-tests}
|
||||
|
||||
|
@ -5,7 +5,7 @@ toc_title: Atomic
|
||||
|
||||
# Atomic {#atomic}
|
||||
|
||||
It supports non-blocking [DROP TABLE](#drop-detach-table) and [RENAME TABLE](#rename-table) queries and atomic [EXCHANGE TABLES t1 AND t2](#exchange-tables) queries. `Atomic` database engine is used by default.
|
||||
It supports non-blocking [DROP TABLE](#drop-detach-table) and [RENAME TABLE](#rename-table) queries and atomic [EXCHANGE TABLES](#exchange-tables) queries. `Atomic` database engine is used by default.
|
||||
|
||||
## Creating a Database {#creating-a-database}
|
||||
|
||||
@ -25,16 +25,16 @@ CREATE TABLE name UUID '28f1c61c-2970-457a-bffe-454156ddcfef' (n UInt64) ENGINE
|
||||
```
|
||||
### RENAME TABLE {#rename-table}
|
||||
|
||||
`RENAME` queries are performed without changing UUID and moving table data. These queries do not wait for the completion of queries using the table and will be executed instantly.
|
||||
[RENAME](../../sql-reference/statements/rename.md) queries are performed without changing UUID and moving table data. These queries do not wait for the completion of queries using the table and are executed instantly.
|
||||
|
||||
### DROP/DETACH TABLE {#drop-detach-table}
|
||||
|
||||
On `DROP TABLE` no data is removed, database `Atomic` just marks table as dropped by moving metadata to `/clickhouse_path/metadata_dropped/` and notifies background thread. Delay before final table data deletion is specify by [database_atomic_delay_before_drop_table_sec](../../operations/server-configuration-parameters/settings.md#database_atomic_delay_before_drop_table_sec) setting.
|
||||
On `DROP TABLE` no data is removed, database `Atomic` just marks table as dropped by moving metadata to `/clickhouse_path/metadata_dropped/` and notifies background thread. Delay before final table data deletion is specified by the [database_atomic_delay_before_drop_table_sec](../../operations/server-configuration-parameters/settings.md#database_atomic_delay_before_drop_table_sec) setting.
|
||||
You can specify synchronous mode using `SYNC` modifier. Use the [database_atomic_wait_for_drop_and_detach_synchronously](../../operations/settings/settings.md#database_atomic_wait_for_drop_and_detach_synchronously) setting to do this. In this case `DROP` waits for running `SELECT`, `INSERT` and other queries which are using the table to finish. Table will be actually removed when it's not in use.
|
||||
|
||||
### EXCHANGE TABLES {#exchange-tables}
|
||||
### EXCHANGE TABLES/DICTIONARIES {#exchange-tables}
|
||||
|
||||
`EXCHANGE` query swaps tables atomically. So instead of this non-atomic operation:
|
||||
[EXCHANGE](../../sql-reference/statements/exchange.md) query swaps tables or dictionaries atomically. For instance, instead of this non-atomic operation:
|
||||
|
||||
```sql
|
||||
RENAME TABLE new_table TO tmp, old_table TO new_table, tmp TO old_table;
|
||||
@ -47,7 +47,7 @@ EXCHANGE TABLES new_table AND old_table;
|
||||
|
||||
### ReplicatedMergeTree in Atomic Database {#replicatedmergetree-in-atomic-database}
|
||||
|
||||
For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables, it is recommended to not specify engine parameters - path in ZooKeeper and replica name. In this case, configuration parameters will be used [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). If you want to specify engine parameters explicitly, it is recommended to use `{uuid}` macros. This is useful so that unique paths are automatically generated for each table in ZooKeeper.
|
||||
For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables, it is recommended not to specify engine parameters - path in ZooKeeper and replica name. In this case, configuration parameters [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name) will be used. If you want to specify engine parameters explicitly, it is recommended to use `{uuid}` macros. This is useful so that unique paths are automatically generated for each table in ZooKeeper.
|
||||
|
||||
## See Also
|
||||
|
||||
|
@ -8,13 +8,13 @@ toc_title: Introduction
|
||||
|
||||
Database engines allow you to work with tables.
|
||||
|
||||
By default, ClickHouse uses database engine [Atomic](../../engines/database-engines/atomic.md). It is provides configurable [table engines](../../engines/table-engines/index.md) and an [SQL dialect](../../sql-reference/syntax.md).
|
||||
By default, ClickHouse uses database engine [Atomic](../../engines/database-engines/atomic.md). It provides configurable [table engines](../../engines/table-engines/index.md) and an [SQL dialect](../../sql-reference/syntax.md).
|
||||
|
||||
You can also use the following database engines:
|
||||
|
||||
- [MySQL](../../engines/database-engines/mysql.md)
|
||||
|
||||
- [MaterializeMySQL](../../engines/database-engines/materialize-mysql.md)
|
||||
- [MaterializedMySQL](../../engines/database-engines/materialized-mysql.md)
|
||||
|
||||
- [Lazy](../../engines/database-engines/lazy.md)
|
||||
|
||||
|
@ -1,21 +1,22 @@
|
||||
---
|
||||
toc_priority: 29
|
||||
toc_title: MaterializeMySQL
|
||||
toc_title: "[experimental] MaterializedMySQL"
|
||||
---
|
||||
|
||||
# MaterializeMySQL {#materialize-mysql}
|
||||
# [experimental] MaterializedMySQL {#materialized-mysql}
|
||||
|
||||
!!! warning "Warning"
|
||||
This is an experimental feature that should not be used in production.
|
||||
|
||||
Creates ClickHouse database with all the tables existing in MySQL, and all the data in those tables.
|
||||
|
||||
ClickHouse server works as MySQL replica. It reads binlog and performs DDL and DML queries.
|
||||
|
||||
This feature is experimental.
|
||||
|
||||
## Creating a Database {#creating-a-database}
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
|
||||
ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
|
||||
ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
|
||||
```
|
||||
|
||||
**Engine Parameters**
|
||||
@ -25,9 +26,36 @@ ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'passwor
|
||||
- `user` — MySQL user.
|
||||
- `password` — User password.
|
||||
|
||||
## Virtual columns {#virtual-columns}
|
||||
**Engine Settings**
|
||||
|
||||
When working with the `MaterializeMySQL` database engine, [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns.
|
||||
- `max_rows_in_buffer` — Maximum number of rows that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`.
|
||||
- `max_bytes_in_buffer` — Maximum number of bytes that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`.
|
||||
- `max_rows_in_buffers` — Maximum number of rows that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`.
|
||||
- `max_bytes_in_buffers` — Maximum number of bytes that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`.
|
||||
- `max_flush_data_time` — Maximum number of milliseconds that data is allowed to cache in memory (for database and the cache data unable to query). When this time is exceeded, the data will be materialized. Default: `1000`.
|
||||
- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disables retry. Default: `1000`.
|
||||
- `allows_query_when_mysql_lost` — Allows to query a materialized table when MySQL is lost. Default: `0` (`false`).
|
||||
|
||||
```sql
|
||||
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***')
|
||||
SETTINGS
|
||||
allows_query_when_mysql_lost=true,
|
||||
max_wait_time_when_mysql_unavailable=10000;
|
||||
```
|
||||
|
||||
**Settings on MySQL-server Side**
|
||||
|
||||
For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-side configuration settings that must be set:
|
||||
|
||||
- `default_authentication_plugin = mysql_native_password` since `MaterializedMySQL` can only authorize with this method.
|
||||
- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication.
|
||||
|
||||
!!! attention "Attention"
|
||||
While turning on `gtid_mode` you should also specify `enforce_gtid_consistency = on`.
|
||||
|
||||
## Virtual Columns {#virtual-columns}
|
||||
|
||||
When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns.
|
||||
|
||||
- `_version` — Transaction counter. Type [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
- `_sign` — Deletion mark. Type [Int8](../../sql-reference/data-types/int-uint.md). Possible values:
|
||||
@ -53,20 +81,29 @@ When working with the `MaterializeMySQL` database engine, [ReplacingMergeTree](.
|
||||
| STRING | [String](../../sql-reference/data-types/string.md) |
|
||||
| VARCHAR, VAR_STRING | [String](../../sql-reference/data-types/string.md) |
|
||||
| BLOB | [String](../../sql-reference/data-types/string.md) |
|
||||
|
||||
Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication.
|
||||
| BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) |
|
||||
|
||||
[Nullable](../../sql-reference/data-types/nullable.md) is supported.
|
||||
|
||||
Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication.
|
||||
|
||||
## Specifics and Recommendations {#specifics-and-recommendations}
|
||||
|
||||
### Compatibility Restrictions {#compatibility-restrictions}
|
||||
|
||||
Apart of the data types limitations there are few restrictions comparing to `MySQL` databases, that should be resolved before replication will be possible:
|
||||
|
||||
- Each table in `MySQL` should contain `PRIMARY KEY`.
|
||||
|
||||
- Replication for tables, those are containing rows with `ENUM` field values out of range (specified in `ENUM` signature) will not work.
|
||||
|
||||
### DDL Queries {#ddl-queries}
|
||||
|
||||
MySQL DDL queries are converted into the corresponding ClickHouse DDL queries ([ALTER](../../sql-reference/statements/alter/index.md), [CREATE](../../sql-reference/statements/create/index.md), [DROP](../../sql-reference/statements/drop.md), [RENAME](../../sql-reference/statements/rename.md)). If ClickHouse cannot parse some DDL query, the query is ignored.
|
||||
|
||||
### Data Replication {#data-replication}
|
||||
|
||||
`MaterializeMySQL` does not support direct `INSERT`, `DELETE` and `UPDATE` queries. However, they are supported in terms of data replication:
|
||||
`MaterializedMySQL` does not support direct `INSERT`, `DELETE` and `UPDATE` queries. However, they are supported in terms of data replication:
|
||||
|
||||
- MySQL `INSERT` query is converted into `INSERT` with `_sign=1`.
|
||||
|
||||
@ -74,9 +111,9 @@ MySQL DDL queries are converted into the corresponding ClickHouse DDL queries ([
|
||||
|
||||
- MySQL `UPDATE` query is converted into `INSERT` with `_sign=-1` and `INSERT` with `_sign=1`.
|
||||
|
||||
### Selecting from MaterializeMySQL Tables {#select}
|
||||
### Selecting from MaterializedMySQL Tables {#select}
|
||||
|
||||
`SELECT` query from `MaterializeMySQL` tables has some specifics:
|
||||
`SELECT` query from `MaterializedMySQL` tables has some specifics:
|
||||
|
||||
- If `_version` is not specified in the `SELECT` query, [FINAL](../../sql-reference/statements/select/from.md#select-from-final) modifier is used. So only rows with `MAX(_version)` are selected.
|
||||
|
||||
@ -93,10 +130,10 @@ ClickHouse has only one physical order, which is determined by `ORDER BY` clause
|
||||
**Notes**
|
||||
|
||||
- Rows with `_sign=-1` are not deleted physically from the tables.
|
||||
- Cascade `UPDATE/DELETE` queries are not supported by the `MaterializeMySQL` engine.
|
||||
- Cascade `UPDATE/DELETE` queries are not supported by the `MaterializedMySQL` engine.
|
||||
- Replication can be easily broken.
|
||||
- Manual operations on database and tables are forbidden.
|
||||
- `MaterializeMySQL` is influenced by [optimize_on_insert](../../operations/settings/settings.md#optimize-on-insert) setting. The data is merged in the corresponding table in the `MaterializeMySQL` database when a table in the MySQL server changes.
|
||||
- `MaterializedMySQL` is influenced by [optimize_on_insert](../../operations/settings/settings.md#optimize-on-insert) setting. The data is merged in the corresponding table in the `MaterializedMySQL` database when a table in the MySQL server changes.
|
||||
|
||||
## Examples of Use {#examples-of-use}
|
||||
|
||||
@ -125,7 +162,7 @@ Database in ClickHouse, exchanging data with the MySQL server:
|
||||
The database and the table created:
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***');
|
||||
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***');
|
||||
SHOW TABLES FROM mysql;
|
||||
```
|
||||
|
||||
@ -160,4 +197,4 @@ SELECT * FROM mysql.test;
|
||||
└───┴─────┴──────┘
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/engines/database-engines/materialize-mysql/) <!--hide-->
|
||||
[Original article](https://clickhouse.tech/docs/en/engines/database-engines/materialized-mysql/) <!--hide-->
|
@ -3,45 +3,52 @@ toc_priority: 30
|
||||
toc_title: MaterializedPostgreSQL
|
||||
---
|
||||
|
||||
# MaterializedPostgreSQL {#materialize-postgresql}
|
||||
# [experimental] MaterializedPostgreSQL {#materialize-postgresql}
|
||||
|
||||
Creates ClickHouse database with an initial data dump of PostgreSQL database tables and starts replication process, i.e. executes background job to apply new changes as they happen on PostgreSQL database tables in the remote PostgreSQL database.
|
||||
|
||||
ClickHouse server works as PostgreSQL replica. It reads WAL and performs DML queries. DDL is not replicated, but can be handled (described below).
|
||||
|
||||
## Creating a Database {#creating-a-database}
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE test_database
|
||||
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgres_user', 'postgres_password'
|
||||
|
||||
SELECT * FROM test_database.postgres_table;
|
||||
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
|
||||
ENGINE = MaterializedPostgreSQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
|
||||
```
|
||||
|
||||
**Engine Parameters**
|
||||
|
||||
- `host:port` — PostgreSQL server endpoint.
|
||||
- `database` — PostgreSQL database name.
|
||||
- `user` — PostgreSQL user.
|
||||
- `password` — User password.
|
||||
|
||||
## Settings {#settings}
|
||||
|
||||
1. `materialized_postgresql_max_block_size` - Number of rows collected before flushing data into table. Default: `65536`.
|
||||
- [materialized_postgresql_max_block_size](../../operations/settings/settings.md#materialized-postgresql-max-block-size)
|
||||
|
||||
2. `materialized_postgresql_tables_list` - List of tables for MaterializedPostgreSQL database engine. Default: `whole database`.
|
||||
- [materialized_postgresql_tables_list](../../operations/settings/settings.md#materialized-postgresql-tables-list)
|
||||
|
||||
3. `materialized_postgresql_allow_automatic_update` - Allow to reload table in the background, when schema changes are detected. Default: `0` (`false`).
|
||||
- [materialized_postgresql_allow_automatic_update](../../operations/settings/settings.md#materialized-postgresql-allow-automatic-update)
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE test_database
|
||||
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgres_user', 'postgres_password'
|
||||
CREATE DATABASE database1
|
||||
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgres_user', 'postgres_password')
|
||||
SETTINGS materialized_postgresql_max_block_size = 65536,
|
||||
materialized_postgresql_tables_list = 'table1,table2,table3';
|
||||
|
||||
SELECT * FROM test_database.table1;
|
||||
SELECT * FROM database1.table1;
|
||||
```
|
||||
|
||||
|
||||
## Requirements {#requirements}
|
||||
|
||||
- Setting `wal_level`to `logical` and `max_replication_slots` to at least `2` in the postgresql config file.
|
||||
1. The [wal_level](https://www.postgresql.org/docs/current/runtime-config-wal.html) setting must have a value `logical` and `max_replication_slots` parameter must have a value at least `2` in the PostgreSQL config file.
|
||||
|
||||
- Each replicated table must have one of the following **replica identity**:
|
||||
2. Each replicated table must have one of the following [replica identity](https://www.postgresql.org/docs/10/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY):
|
||||
|
||||
1. **default** (primary key)
|
||||
- primary key (by default)
|
||||
|
||||
2. **index**
|
||||
- index
|
||||
|
||||
``` bash
|
||||
postgres# CREATE TABLE postgres_table (a Integer NOT NULL, b Integer, c Integer NOT NULL, d Integer, e Integer NOT NULL);
|
||||
@ -49,9 +56,8 @@ postgres# CREATE unique INDEX postgres_table_index on postgres_table(a, c, e);
|
||||
postgres# ALTER TABLE postgres_table REPLICA IDENTITY USING INDEX postgres_table_index;
|
||||
```
|
||||
|
||||
|
||||
Primary key is always checked first. If it is absent, then index, defined as replica identity index, is checked.
|
||||
If index is used as replica identity, there has to be only one such index in a table.
|
||||
The primary key is always checked first. If it is absent, then the index, defined as replica identity index, is checked.
|
||||
If the index is used as a replica identity, there has to be only one such index in a table.
|
||||
You can check what type is used for a specific table with the following command:
|
||||
|
||||
``` bash
|
||||
@ -65,7 +71,14 @@ FROM pg_class
|
||||
WHERE oid = 'postgres_table'::regclass;
|
||||
```
|
||||
|
||||
!!! warning "Warning"
|
||||
Replication of [**TOAST**](https://www.postgresql.org/docs/9.5/storage-toast.html) values is not supported. The default value for the data type will be used.
|
||||
|
||||
## Warning {#warning}
|
||||
## Example of Use {#example-of-use}
|
||||
|
||||
1. **TOAST** values convertion is not supported. Default value for the data type will be used.
|
||||
``` sql
|
||||
CREATE DATABASE postgresql_db
|
||||
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgres_user', 'postgres_password');
|
||||
|
||||
SELECT * FROM postgresql_db.postgres_table;
|
||||
```
|
||||
|
@ -15,7 +15,7 @@ Supports table structure modifications (`ALTER TABLE ... ADD|DROP COLUMN`). If `
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE test_database
|
||||
ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `use_table_cache`]);
|
||||
ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `schema`, `use_table_cache`]);
|
||||
```
|
||||
|
||||
**Engine Parameters**
|
||||
@ -24,6 +24,7 @@ ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `use_table_cac
|
||||
- `database` — Remote database name.
|
||||
- `user` — PostgreSQL user.
|
||||
- `password` — User password.
|
||||
- `schema` — PostgreSQL schema.
|
||||
- `use_table_cache` — Defines if the database table structure is cached or not. Optional. Default value: `0`.
|
||||
|
||||
## Data Types Support {#data_types-support}
|
||||
|
@ -39,4 +39,46 @@ ENGINE = EmbeddedRocksDB
|
||||
PRIMARY KEY key
|
||||
```
|
||||
|
||||
## Metrics
|
||||
|
||||
There is also `system.rocksdb` table, that expose rocksdb statistics:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
name,
|
||||
value
|
||||
FROM system.rocksdb
|
||||
|
||||
┌─name──────────────────────┬─value─┐
|
||||
│ no.file.opens │ 1 │
|
||||
│ number.block.decompressed │ 1 │
|
||||
└───────────────────────────┴───────┘
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
You can also change any [rocksdb options](https://github.com/facebook/rocksdb/wiki/Option-String-and-Option-Map) using config:
|
||||
|
||||
```xml
|
||||
<rocksdb>
|
||||
<options>
|
||||
<max_background_jobs>8</max_background_jobs>
|
||||
</options>
|
||||
<column_family_options>
|
||||
<num_levels>2</num_levels>
|
||||
</column_family_options>
|
||||
<tables>
|
||||
<table>
|
||||
<name>TABLE</name>
|
||||
<options>
|
||||
<max_background_jobs>8</max_background_jobs>
|
||||
</options>
|
||||
<column_family_options>
|
||||
<num_levels>2</num_levels>
|
||||
</column_family_options>
|
||||
</table>
|
||||
</tables>
|
||||
</rocksdb>
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/embedded-rocksdb/) <!--hide-->
|
||||
|
@ -50,11 +50,11 @@ SELECT * FROM hdfs_engine_table LIMIT 2
|
||||
|
||||
## Implementation Details {#implementation-details}
|
||||
|
||||
- Reads and writes can be parallel
|
||||
- Reads and writes can be parallel.
|
||||
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is supported.
|
||||
- Not supported:
|
||||
- `ALTER` and `SELECT...SAMPLE` operations.
|
||||
- Indexes.
|
||||
- Replication.
|
||||
|
||||
**Globs in path**
|
||||
|
||||
@ -71,12 +71,12 @@ Constructions with `{}` are similar to the [remote](../../../sql-reference/table
|
||||
|
||||
1. Suppose we have several files in TSV format with the following URIs on HDFS:
|
||||
|
||||
- ‘hdfs://hdfs1:9000/some_dir/some_file_1’
|
||||
- ‘hdfs://hdfs1:9000/some_dir/some_file_2’
|
||||
- ‘hdfs://hdfs1:9000/some_dir/some_file_3’
|
||||
- ‘hdfs://hdfs1:9000/another_dir/some_file_1’
|
||||
- ‘hdfs://hdfs1:9000/another_dir/some_file_2’
|
||||
- ‘hdfs://hdfs1:9000/another_dir/some_file_3’
|
||||
- 'hdfs://hdfs1:9000/some_dir/some_file_1'
|
||||
- 'hdfs://hdfs1:9000/some_dir/some_file_2'
|
||||
- 'hdfs://hdfs1:9000/some_dir/some_file_3'
|
||||
- 'hdfs://hdfs1:9000/another_dir/some_file_1'
|
||||
- 'hdfs://hdfs1:9000/another_dir/some_file_2'
|
||||
- 'hdfs://hdfs1:9000/another_dir/some_file_3'
|
||||
|
||||
1. There are several ways to make a table consisting of all six files:
|
||||
|
||||
@ -126,8 +126,9 @@ Similar to GraphiteMergeTree, the HDFS engine supports extended configuration us
|
||||
</hdfs_root>
|
||||
```
|
||||
|
||||
### List of possible configuration options with default values
|
||||
#### Supported by libhdfs3
|
||||
### Configuration Options {#configuration-options}
|
||||
|
||||
#### Supported by libhdfs3 {#supported-by-libhdfs3}
|
||||
|
||||
|
||||
| **parameter** | **default value** |
|
||||
@ -184,7 +185,7 @@ Similar to GraphiteMergeTree, the HDFS engine supports extended configuration us
|
||||
|hadoop\_kerberos\_principal | "" |
|
||||
|hadoop\_kerberos\_kinit\_command | kinit |
|
||||
|
||||
#### Limitations {#limitations}
|
||||
### Limitations {#limitations}
|
||||
* hadoop\_security\_kerberos\_ticket\_cache\_path can be global only, not user specific
|
||||
|
||||
## Kerberos support {#kerberos-support}
|
||||
|
@ -5,42 +5,52 @@ toc_title: MaterializedPostgreSQL
|
||||
|
||||
# MaterializedPostgreSQL {#materialize-postgresql}
|
||||
|
||||
Creates ClickHouse table with an initial data dump of PostgreSQL table and starts replication process, i.e. executes background job to apply new changes as they happen on PostgreSQL table in the remote PostgreSQL database.
|
||||
|
||||
If more than one table is required, it is highly recommended to use the [MaterializedPostgreSQL](../../../engines/database-engines/materialized-postgresql.md) database engine instead of the table engine and use the [materialized_postgresql_tables_list](../../../operations/settings/settings.md#materialized-postgresql-tables-list) setting, which specifies the tables to be replicated. It will be much better in terms of CPU, fewer connections and fewer replication slots inside the remote PostgreSQL database.
|
||||
|
||||
## Creating a Table {#creating-a-table}
|
||||
|
||||
``` sql
|
||||
CREATE TABLE test.postgresql_replica (key UInt64, value UInt64)
|
||||
CREATE TABLE postgresql_db.postgresql_replica (key UInt64, value UInt64)
|
||||
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgresql_replica', 'postgres_user', 'postgres_password')
|
||||
PRIMARY KEY key;
|
||||
```
|
||||
|
||||
**Engine Parameters**
|
||||
|
||||
- `host:port` — PostgreSQL server address.
|
||||
- `database` — Remote database name.
|
||||
- `table` — Remote table name.
|
||||
- `user` — PostgreSQL user.
|
||||
- `password` — User password.
|
||||
|
||||
## Requirements {#requirements}
|
||||
|
||||
- Setting `wal_level`to `logical` and `max_replication_slots` to at least `2` in the postgresql config file.
|
||||
1. The [wal_level](https://www.postgresql.org/docs/current/runtime-config-wal.html) setting must have a value `logical` and `max_replication_slots` parameter must have a value at least `2` in the PostgreSQL config file.
|
||||
|
||||
- A table with engine `MaterializedPostgreSQL` must have a primary key - the same as a replica identity index (default: primary key) of a postgres table (See [details on replica identity index](../../database-engines/materialized-postgresql.md#requirements)).
|
||||
2. A table with `MaterializedPostgreSQL` engine must have a primary key — the same as a replica identity index (by default: primary key) of a PostgreSQL table (see [details on replica identity index](../../../engines/database-engines/materialized-postgresql.md#requirements)).
|
||||
|
||||
- Only database `Atomic` is allowed.
|
||||
3. Only database [Atomic](https://en.wikipedia.org/wiki/Atomicity_(database_systems)) is allowed.
|
||||
|
||||
## Virtual columns {#virtual-columns}
|
||||
|
||||
## Virtual columns {#creating-a-table}
|
||||
- `_version` — Transaction counter. Type: [UInt64](../../../sql-reference/data-types/int-uint.md).
|
||||
|
||||
- `_version` (`UInt64`)
|
||||
- `_sign` — Deletion mark. Type: [Int8](../../../sql-reference/data-types/int-uint.md). Possible values:
|
||||
- `1` — Row is not deleted,
|
||||
- `-1` — Row is deleted.
|
||||
|
||||
- `_sign` (`Int8`)
|
||||
|
||||
These columns do not need to be added, when table is created. They are always accessible in `SELECT` query.
|
||||
These columns do not need to be added when a table is created. They are always accessible in `SELECT` query.
|
||||
`_version` column equals `LSN` position in `WAL`, so it might be used to check how up-to-date replication is.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE test.postgresql_replica (key UInt64, value UInt64)
|
||||
CREATE TABLE postgresql_db.postgresql_replica (key UInt64, value UInt64)
|
||||
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgresql_replica', 'postgres_user', 'postgres_password')
|
||||
PRIMARY KEY key;
|
||||
|
||||
SELECT key, value, _version FROM test.postgresql_replica;
|
||||
SELECT key, value, _version FROM postgresql_db.postgresql_replica;
|
||||
```
|
||||
|
||||
|
||||
## Warning {#warning}
|
||||
|
||||
1. **TOAST** values convertion is not supported. Default value for the data type will be used.
|
||||
!!! warning "Warning"
|
||||
Replication of [**TOAST**](https://www.postgresql.org/docs/9.5/storage-toast.html) values is not supported. The default value for the data type will be used.
|
||||
|
@ -15,7 +15,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
|
||||
name1 [type1],
|
||||
name2 [type2],
|
||||
...
|
||||
) ENGINE = MongoDB(host:port, database, collection, user, password);
|
||||
) ENGINE = MongoDB(host:port, database, collection, user, password [, options]);
|
||||
```
|
||||
|
||||
**Engine Parameters**
|
||||
@ -30,9 +30,11 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
|
||||
|
||||
- `password` — User password.
|
||||
|
||||
- `options` — MongoDB connection string options (optional parameter).
|
||||
|
||||
## Usage Example {#usage-example}
|
||||
|
||||
Table in ClickHouse which allows to read data from MongoDB collection:
|
||||
Create a table in ClickHouse which allows to read data from MongoDB collection:
|
||||
|
||||
``` text
|
||||
CREATE TABLE mongo_table
|
||||
@ -42,6 +44,16 @@ CREATE TABLE mongo_table
|
||||
) ENGINE = MongoDB('mongo1:27017', 'test', 'simple_table', 'testuser', 'clickhouse');
|
||||
```
|
||||
|
||||
To read from an SSL secured MongoDB server:
|
||||
|
||||
``` text
|
||||
CREATE TABLE mongo_table_ssl
|
||||
(
|
||||
key UInt64,
|
||||
data String
|
||||
) ENGINE = MongoDB('mongo2:27017', 'test', 'simple_table', 'testuser', 'clickhouse', 'ssl=true');
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
|
@ -57,10 +57,10 @@ For more information about virtual columns see [here](../../../engines/table-eng
|
||||
## Implementation Details {#implementation-details}
|
||||
|
||||
- Reads and writes can be parallel
|
||||
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is supported.
|
||||
- Not supported:
|
||||
- `ALTER` and `SELECT...SAMPLE` operations.
|
||||
- Indexes.
|
||||
- Replication.
|
||||
|
||||
## Wildcards In Path {#wildcards-in-path}
|
||||
|
||||
@ -77,12 +77,12 @@ Constructions with `{}` are similar to the [remote](../../../sql-reference/table
|
||||
|
||||
1. Suppose we have several files in CSV format with the following URIs on S3:
|
||||
|
||||
- ‘https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_1.csv’
|
||||
- ‘https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_2.csv’
|
||||
- ‘https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_3.csv’
|
||||
- ‘https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_1.csv’
|
||||
- ‘https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_2.csv’
|
||||
- ‘https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_3.csv’
|
||||
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_1.csv'
|
||||
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_2.csv'
|
||||
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_3.csv'
|
||||
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_1.csv'
|
||||
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_2.csv'
|
||||
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_3.csv'
|
||||
|
||||
There are several ways to make a table consisting of all six files:
|
||||
|
||||
|
@ -14,6 +14,8 @@ Engines of the family:
|
||||
- [Log](../../../engines/table-engines/log-family/log.md)
|
||||
- [TinyLog](../../../engines/table-engines/log-family/tinylog.md)
|
||||
|
||||
`Log` family table engines can store data to [HDFS](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-hdfs) or [S3](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-s3) distributed file systems.
|
||||
|
||||
## Common Properties {#common-properties}
|
||||
|
||||
Engines:
|
||||
|
@ -5,10 +5,8 @@ toc_title: Log
|
||||
|
||||
# Log {#log}
|
||||
|
||||
Engine belongs to the family of log engines. See the common properties of log engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/index.md) article.
|
||||
The engine belongs to the family of `Log` engines. See the common properties of `Log` engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/index.md) article.
|
||||
|
||||
Log differs from [TinyLog](../../../engines/table-engines/log-family/tinylog.md) in that a small file of “marks” resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads.
|
||||
`Log` differs from [TinyLog](../../../engines/table-engines/log-family/tinylog.md) in that a small file of "marks" resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads.
|
||||
For concurrent data access, the read operations can be performed simultaneously, while write operations block reads and each other.
|
||||
The Log engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The Log engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/table_engines/log/) <!--hide-->
|
||||
The `Log` engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The `Log` engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes.
|
||||
|
@ -39,7 +39,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
|
||||
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
|
||||
...
|
||||
INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
|
||||
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
|
||||
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2,
|
||||
...
|
||||
PROJECTION projection_name_1 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]),
|
||||
PROJECTION projection_name_2 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY])
|
||||
) ENGINE = MergeTree()
|
||||
ORDER BY expr
|
||||
[PARTITION BY expr]
|
||||
@ -76,7 +79,7 @@ For a description of parameters, see the [CREATE query description](../../../sql
|
||||
|
||||
- `SAMPLE BY` — An expression for sampling. Optional.
|
||||
|
||||
If a sampling expression is used, the primary key must contain it. The result of sampling expression must be unsigned integer. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`.
|
||||
If a sampling expression is used, the primary key must contain it. The result of a sampling expression must be an unsigned integer. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`.
|
||||
|
||||
- `TTL` — A list of rules specifying storage duration of rows and defining logic of automatic parts movement [between disks and volumes](#table_engine-mergetree-multiple-volumes). Optional.
|
||||
|
||||
@ -385,6 +388,24 @@ Functions with a constant argument that is less than ngram size can’t be used
|
||||
- `s != 1`
|
||||
- `NOT startsWith(s, 'test')`
|
||||
|
||||
### Projections {#projections}
|
||||
Projections are like materialized views but defined in part-level. It provides consistency guarantees along with automatic usage in queries.
|
||||
|
||||
#### Query {#projection-query}
|
||||
A projection query is what defines a projection. It has the following grammar:
|
||||
|
||||
`SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]`
|
||||
|
||||
It implicitly selects data from the parent table.
|
||||
|
||||
#### Storage {#projection-storage}
|
||||
Projections are stored inside the part directory. It's similar to an index but contains a subdirectory that stores an anonymous MergeTree table's part. The table is induced by the definition query of the projection. If there is a GROUP BY clause, the underlying storage engine becomes AggregatedMergeTree, and all aggregate functions are converted to AggregateFunction. If there is an ORDER BY clause, the MergeTree table will use it as its primary key expression. During the merge process, the projection part will be merged via its storage's merge routine. The checksum of the parent table's part will combine the projection's part. Other maintenance jobs are similar to skip indices.
|
||||
|
||||
#### Query Analysis {#projection-query-analysis}
|
||||
1. Check if the projection can be used to answer the given query, that is, it generates the same answer as querying the base table.
|
||||
2. Select the best feasible match, which contains the least granules to read.
|
||||
3. The query pipeline which uses projections will be different from the one that uses the original parts. If the projection is absent in some parts, we can add the pipeline to "project" it on the fly.
|
||||
|
||||
## Concurrent Data Access {#concurrent-data-access}
|
||||
|
||||
For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.
|
||||
@ -728,7 +749,9 @@ During this time, they are not moved to other volumes or disks. Therefore, until
|
||||
|
||||
## Using S3 for Data Storage {#table_engine-mergetree-s3}
|
||||
|
||||
`MergeTree` family table engines is able to store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`.
|
||||
`MergeTree` family table engines can store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`.
|
||||
|
||||
This feature is under development and not ready for production. There are known drawbacks such as very low performance.
|
||||
|
||||
Configuration markup:
|
||||
``` xml
|
||||
@ -762,11 +785,13 @@ Configuration markup:
|
||||
```
|
||||
|
||||
Required parameters:
|
||||
- `endpoint` — S3 endpoint url in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint url should contain bucket and root path to store data.
|
||||
|
||||
- `endpoint` — S3 endpoint URL in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint URL should contain a bucket and root path to store data.
|
||||
- `access_key_id` — S3 access key id.
|
||||
- `secret_access_key` — S3 secret access key.
|
||||
|
||||
Optional parameters:
|
||||
|
||||
- `region` — S3 region name.
|
||||
- `use_environment_credentials` — Reads AWS credentials from the Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN if they exist. Default value is `false`.
|
||||
- `use_insecure_imds_request` — If set to `true`, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Default value is `false`.
|
||||
@ -782,7 +807,6 @@ Optional parameters:
|
||||
- `skip_access_check` — If true, disk access checks will not be performed on disk start-up. Default value is `false`.
|
||||
- `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set.
|
||||
|
||||
|
||||
S3 disk can be configured as `main` or `cold` storage:
|
||||
``` xml
|
||||
<storage_configuration>
|
||||
@ -821,4 +845,43 @@ S3 disk can be configured as `main` or `cold` storage:
|
||||
|
||||
In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/ru/operations/table_engines/mergetree/) <!--hide-->
|
||||
## Using HDFS for Data Storage {#table_engine-mergetree-hdfs}
|
||||
|
||||
[HDFS](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) is a distributed file system for remote data storage.
|
||||
|
||||
`MergeTree` family table engines can store data to HDFS using a disk with type `HDFS`.
|
||||
|
||||
Configuration markup:
|
||||
``` xml
|
||||
<yandex>
|
||||
<storage_configuration>
|
||||
<disks>
|
||||
<hdfs>
|
||||
<type>hdfs</type>
|
||||
<endpoint>hdfs://hdfs1:9000/clickhouse/</endpoint>
|
||||
</hdfs>
|
||||
</disks>
|
||||
<policies>
|
||||
<hdfs>
|
||||
<volumes>
|
||||
<main>
|
||||
<disk>hdfs</disk>
|
||||
</main>
|
||||
</volumes>
|
||||
</hdfs>
|
||||
</policies>
|
||||
</storage_configuration>
|
||||
|
||||
<merge_tree>
|
||||
<min_bytes_for_wide_part>0</min_bytes_for_wide_part>
|
||||
</merge_tree>
|
||||
</yandex>
|
||||
```
|
||||
|
||||
Required parameters:
|
||||
|
||||
- `endpoint` — HDFS endpoint URL in `path` format. Endpoint URL should contain a root path to store data.
|
||||
|
||||
Optional parameters:
|
||||
|
||||
- `min_bytes_for_seek` — The minimal number of bytes to use seek operation instead of sequential read. Default value: `1 Mb`.
|
||||
|
@ -30,65 +30,27 @@ See the detailed description of the [CREATE TABLE](../../../sql-reference/statem
|
||||
|
||||
Enter `join_strictness` and `join_type` parameters without quotes, for example, `Join(ANY, LEFT, col1)`. They must match the `JOIN` operation that the table will be used for. If the parameters do not match, ClickHouse does not throw an exception and may return incorrect data.
|
||||
|
||||
## Table Usage {#table-usage}
|
||||
## Specifics and Recommendations {#specifics-and-recommendations}
|
||||
|
||||
### Example {#example}
|
||||
### Data Storage {#data-storage}
|
||||
|
||||
Creating the left-side table:
|
||||
`Join` table data is always located in the RAM. When inserting rows into a table, ClickHouse writes data blocks to the directory on the disk so that they can be restored when the server restarts.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE id_val(`id` UInt32, `val` UInt32) ENGINE = TinyLog
|
||||
```
|
||||
|
||||
``` sql
|
||||
INSERT INTO id_val VALUES (1,11)(2,12)(3,13)
|
||||
```
|
||||
|
||||
Creating the right-side `Join` table:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE id_val_join(`id` UInt32, `val` UInt8) ENGINE = Join(ANY, LEFT, id)
|
||||
```
|
||||
|
||||
``` sql
|
||||
INSERT INTO id_val_join VALUES (1,21)(1,22)(3,23)
|
||||
```
|
||||
|
||||
Joining the tables:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM id_val ANY LEFT JOIN id_val_join USING (id) SETTINGS join_use_nulls = 1
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─id─┬─val─┬─id_val_join.val─┐
|
||||
│ 1 │ 11 │ 21 │
|
||||
│ 2 │ 12 │ ᴺᵁᴸᴸ │
|
||||
│ 3 │ 13 │ 23 │
|
||||
└────┴─────┴─────────────────┘
|
||||
```
|
||||
|
||||
As an alternative, you can retrieve data from the `Join` table, specifying the join key value:
|
||||
|
||||
``` sql
|
||||
SELECT joinGet('id_val_join', 'val', toUInt32(1))
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─joinGet('id_val_join', 'val', toUInt32(1))─┐
|
||||
│ 21 │
|
||||
└────────────────────────────────────────────┘
|
||||
```
|
||||
If the server restarts incorrectly, the data block on the disk might get lost or damaged. In this case, you may need to manually delete the file with damaged data.
|
||||
|
||||
### Selecting and Inserting Data {#selecting-and-inserting-data}
|
||||
|
||||
You can use `INSERT` queries to add data to the `Join`-engine tables. If the table was created with the `ANY` strictness, data for duplicate keys are ignored. With the `ALL` strictness, all rows are added.
|
||||
|
||||
You cannot perform a `SELECT` query directly from the table. Instead, use one of the following methods:
|
||||
Main use-cases for `Join`-engine tables are following:
|
||||
|
||||
- Place the table to the right side in a `JOIN` clause.
|
||||
- Call the [joinGet](../../../sql-reference/functions/other-functions.md#joinget) function, which lets you extract data from the table the same way as from a dictionary.
|
||||
|
||||
### Deleting Data {#deleting-data}
|
||||
|
||||
`ALTER DELETE` queries for `Join`-engine tables are implemented as [mutations](../../../sql-reference/statements/alter/index.md#mutations). `DELETE` mutation reads filtered data and overwrites data of memory and disk.
|
||||
|
||||
### Limitations and Settings {#join-limitations-and-settings}
|
||||
|
||||
When creating a table, the following settings are applied:
|
||||
@ -102,12 +64,64 @@ When creating a table, the following settings are applied:
|
||||
|
||||
The `Join`-engine tables can’t be used in `GLOBAL JOIN` operations.
|
||||
|
||||
The `Join`-engine allows use [join_use_nulls](../../../operations/settings/settings.md#join_use_nulls) setting in the `CREATE TABLE` statement. And [SELECT](../../../sql-reference/statements/select/index.md) query allows use `join_use_nulls` too. If you have different `join_use_nulls` settings, you can get an error joining table. It depends on kind of JOIN. When you use [joinGet](../../../sql-reference/functions/other-functions.md#joinget) function, you have to use the same `join_use_nulls` setting in `CRATE TABLE` and `SELECT` statements.
|
||||
The `Join`-engine allows to specify [join_use_nulls](../../../operations/settings/settings.md#join_use_nulls) setting in the `CREATE TABLE` statement. [SELECT](../../../sql-reference/statements/select/index.md) query should have the same `join_use_nulls` value.
|
||||
|
||||
## Data Storage {#data-storage}
|
||||
## Usage Examples {#example}
|
||||
|
||||
`Join` table data is always located in the RAM. When inserting rows into a table, ClickHouse writes data blocks to the directory on the disk so that they can be restored when the server restarts.
|
||||
Creating the left-side table:
|
||||
|
||||
If the server restarts incorrectly, the data block on the disk might get lost or damaged. In this case, you may need to manually delete the file with damaged data.
|
||||
``` sql
|
||||
CREATE TABLE id_val(`id` UInt32, `val` UInt32) ENGINE = TinyLog;
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/table_engines/join/) <!--hide-->
|
||||
``` sql
|
||||
INSERT INTO id_val VALUES (1,11)(2,12)(3,13);
|
||||
```
|
||||
|
||||
Creating the right-side `Join` table:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE id_val_join(`id` UInt32, `val` UInt8) ENGINE = Join(ANY, LEFT, id);
|
||||
```
|
||||
|
||||
``` sql
|
||||
INSERT INTO id_val_join VALUES (1,21)(1,22)(3,23);
|
||||
```
|
||||
|
||||
Joining the tables:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM id_val ANY LEFT JOIN id_val_join USING (id);
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─id─┬─val─┬─id_val_join.val─┐
|
||||
│ 1 │ 11 │ 21 │
|
||||
│ 2 │ 12 │ 0 │
|
||||
│ 3 │ 13 │ 23 │
|
||||
└────┴─────┴─────────────────┘
|
||||
```
|
||||
|
||||
As an alternative, you can retrieve data from the `Join` table, specifying the join key value:
|
||||
|
||||
``` sql
|
||||
SELECT joinGet('id_val_join', 'val', toUInt32(1));
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─joinGet('id_val_join', 'val', toUInt32(1))─┐
|
||||
│ 21 │
|
||||
└────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Deleting a row from the `Join` table:
|
||||
|
||||
```sql
|
||||
ALTER TABLE id_val_join DELETE WHERE id = 3;
|
||||
```
|
||||
|
||||
```text
|
||||
┌─id─┬─val─┐
|
||||
│ 1 │ 21 │
|
||||
└────┴─────┘
|
||||
```
|
||||
|
@ -14,7 +14,10 @@ The list of documented datasets:
|
||||
- [Anonymized Yandex.Metrica Dataset](../../getting-started/example-datasets/metrica.md)
|
||||
- [Recipes](../../getting-started/example-datasets/recipes.md)
|
||||
- [OnTime](../../getting-started/example-datasets/ontime.md)
|
||||
- [OpenSky](../../getting-started/example-datasets/opensky.md)
|
||||
- [New York Taxi Data](../../getting-started/example-datasets/nyc-taxi.md)
|
||||
- [UK Property Price Paid](../../getting-started/example-datasets/uk-price-paid.md)
|
||||
- [What's on the Menu?](../../getting-started/example-datasets/menus.md)
|
||||
- [Star Schema Benchmark](../../getting-started/example-datasets/star-schema.md)
|
||||
- [WikiStat](../../getting-started/example-datasets/wikistat.md)
|
||||
- [Terabyte of Click Logs from Criteo](../../getting-started/example-datasets/criteo.md)
|
||||
|
324
docs/en/getting-started/example-datasets/menus.md
Normal file
324
docs/en/getting-started/example-datasets/menus.md
Normal file
@ -0,0 +1,324 @@
|
||||
---
|
||||
toc_priority: 21
|
||||
toc_title: Menus
|
||||
---
|
||||
|
||||
# New York Public Library "What's on the Menu?" Dataset
|
||||
|
||||
The dataset is created by the New York Public Library. It contains historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices.
|
||||
|
||||
Source: http://menus.nypl.org/data
|
||||
The data is in public domain.
|
||||
|
||||
The data is from library's archive and it may be incomplete and difficult for statistical analysis. Nevertheless it is also very yummy.
|
||||
The size is just 1.3 million records about dishes in the menus (a very small data volume for ClickHouse, but it's still a good example).
|
||||
|
||||
## Download the Dataset
|
||||
|
||||
```
|
||||
wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.tgz
|
||||
```
|
||||
|
||||
Replace the link to the up to date link from http://menus.nypl.org/data if needed.
|
||||
Download size is about 35 MB.
|
||||
|
||||
## Unpack the Dataset
|
||||
|
||||
```
|
||||
tar xvf 2021_08_01_07_01_17_data.tgz
|
||||
```
|
||||
|
||||
Uncompressed size is about 150 MB.
|
||||
|
||||
The data is normalized consisted of four tables:
|
||||
- Menu: information about menus: the name of the restaurant, the date when menu was seen, etc;
|
||||
- Dish: information about dishes: the name of the dish along with some characteristic;
|
||||
- MenuPage: information about the pages in the menus; every page belongs to some menu;
|
||||
- MenuItem: an item of the menu - a dish along with its price on some menu page: links to dish and menu page.
|
||||
|
||||
## Create the Tables
|
||||
|
||||
```
|
||||
CREATE TABLE dish
|
||||
(
|
||||
id UInt32,
|
||||
name String,
|
||||
description String,
|
||||
menus_appeared UInt32,
|
||||
times_appeared Int32,
|
||||
first_appeared UInt16,
|
||||
last_appeared UInt16,
|
||||
lowest_price Decimal64(3),
|
||||
highest_price Decimal64(3)
|
||||
) ENGINE = MergeTree ORDER BY id;
|
||||
|
||||
CREATE TABLE menu
|
||||
(
|
||||
id UInt32,
|
||||
name String,
|
||||
sponsor String,
|
||||
event String,
|
||||
venue String,
|
||||
place String,
|
||||
physical_description String,
|
||||
occasion String,
|
||||
notes String,
|
||||
call_number String,
|
||||
keywords String,
|
||||
language String,
|
||||
date String,
|
||||
location String,
|
||||
location_type String,
|
||||
currency String,
|
||||
currency_symbol String,
|
||||
status String,
|
||||
page_count UInt16,
|
||||
dish_count UInt16
|
||||
) ENGINE = MergeTree ORDER BY id;
|
||||
|
||||
CREATE TABLE menu_page
|
||||
(
|
||||
id UInt32,
|
||||
menu_id UInt32,
|
||||
page_number UInt16,
|
||||
image_id String,
|
||||
full_height UInt16,
|
||||
full_width UInt16,
|
||||
uuid UUID
|
||||
) ENGINE = MergeTree ORDER BY id;
|
||||
|
||||
CREATE TABLE menu_item
|
||||
(
|
||||
id UInt32,
|
||||
menu_page_id UInt32,
|
||||
price Decimal64(3),
|
||||
high_price Decimal64(3),
|
||||
dish_id UInt32,
|
||||
created_at DateTime,
|
||||
updated_at DateTime,
|
||||
xpos Float64,
|
||||
ypos Float64
|
||||
) ENGINE = MergeTree ORDER BY id;
|
||||
```
|
||||
|
||||
We use `Decimal` data type to store prices. Everything else is quite straightforward.
|
||||
|
||||
## Import Data
|
||||
|
||||
Upload data into ClickHouse in parallel:
|
||||
|
||||
```
|
||||
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO dish FORMAT CSVWithNames" < Dish.csv
|
||||
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu FORMAT CSVWithNames" < Menu.csv
|
||||
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu_page FORMAT CSVWithNames" < MenuPage.csv
|
||||
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --date_time_input_format best_effort --query "INSERT INTO menu_item FORMAT CSVWithNames" < MenuItem.csv
|
||||
```
|
||||
|
||||
We use `CSVWithNames` format as the data is represented by CSV with header.
|
||||
|
||||
We disable `format_csv_allow_single_quotes` as only double quotes are used for data fields and single quotes can be inside the values and should not confuse the CSV parser.
|
||||
|
||||
We disable `input_format_null_as_default` as our data does not have NULLs. Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data.
|
||||
|
||||
The setting `--date_time_input_format best_effort` allows to parse `DateTime` fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed.
|
||||
|
||||
## Denormalize the Data
|
||||
|
||||
Data is presented in multiple tables in normalized form. It means you have to perform JOINs if you want to query, e.g. dish names from menu items.
|
||||
For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing JOIN every time. It is called "denormalized" data.
|
||||
|
||||
We will create a table that will contain all the data JOINed together:
|
||||
|
||||
```
|
||||
CREATE TABLE menu_item_denorm
|
||||
ENGINE = MergeTree ORDER BY (dish_name, created_at)
|
||||
AS SELECT
|
||||
price,
|
||||
high_price,
|
||||
created_at,
|
||||
updated_at,
|
||||
xpos,
|
||||
ypos,
|
||||
dish.id AS dish_id,
|
||||
dish.name AS dish_name,
|
||||
dish.description AS dish_description,
|
||||
dish.menus_appeared AS dish_menus_appeared,
|
||||
dish.times_appeared AS dish_times_appeared,
|
||||
dish.first_appeared AS dish_first_appeared,
|
||||
dish.last_appeared AS dish_last_appeared,
|
||||
dish.lowest_price AS dish_lowest_price,
|
||||
dish.highest_price AS dish_highest_price,
|
||||
menu.id AS menu_id,
|
||||
menu.name AS menu_name,
|
||||
menu.sponsor AS menu_sponsor,
|
||||
menu.event AS menu_event,
|
||||
menu.venue AS menu_venue,
|
||||
menu.place AS menu_place,
|
||||
menu.physical_description AS menu_physical_description,
|
||||
menu.occasion AS menu_occasion,
|
||||
menu.notes AS menu_notes,
|
||||
menu.call_number AS menu_call_number,
|
||||
menu.keywords AS menu_keywords,
|
||||
menu.language AS menu_language,
|
||||
menu.date AS menu_date,
|
||||
menu.location AS menu_location,
|
||||
menu.location_type AS menu_location_type,
|
||||
menu.currency AS menu_currency,
|
||||
menu.currency_symbol AS menu_currency_symbol,
|
||||
menu.status AS menu_status,
|
||||
menu.page_count AS menu_page_count,
|
||||
menu.dish_count AS menu_dish_count
|
||||
FROM menu_item
|
||||
JOIN dish ON menu_item.dish_id = dish.id
|
||||
JOIN menu_page ON menu_item.menu_page_id = menu_page.id
|
||||
JOIN menu ON menu_page.menu_id = menu.id
|
||||
```
|
||||
|
||||
## Validate the Data
|
||||
|
||||
```
|
||||
SELECT count() FROM menu_item_denorm
|
||||
1329175
|
||||
```
|
||||
|
||||
## Run Some Queries
|
||||
|
||||
Averaged historical prices of dishes:
|
||||
|
||||
```
|
||||
SELECT
|
||||
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
|
||||
count(),
|
||||
round(avg(price), 2),
|
||||
bar(avg(price), 0, 100, 100)
|
||||
FROM menu_item_denorm
|
||||
WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022)
|
||||
GROUP BY d
|
||||
ORDER BY d ASC
|
||||
|
||||
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 100, 100)─┐
|
||||
│ 1850 │ 618 │ 1.5 │ █▍ │
|
||||
│ 1860 │ 1634 │ 1.29 │ █▎ │
|
||||
│ 1870 │ 2215 │ 1.36 │ █▎ │
|
||||
│ 1880 │ 3909 │ 1.01 │ █ │
|
||||
│ 1890 │ 8837 │ 1.4 │ █▍ │
|
||||
│ 1900 │ 176292 │ 0.68 │ ▋ │
|
||||
│ 1910 │ 212196 │ 0.88 │ ▊ │
|
||||
│ 1920 │ 179590 │ 0.74 │ ▋ │
|
||||
│ 1930 │ 73707 │ 0.6 │ ▌ │
|
||||
│ 1940 │ 58795 │ 0.57 │ ▌ │
|
||||
│ 1950 │ 41407 │ 0.95 │ ▊ │
|
||||
│ 1960 │ 51179 │ 1.32 │ █▎ │
|
||||
│ 1970 │ 12914 │ 1.86 │ █▋ │
|
||||
│ 1980 │ 7268 │ 4.35 │ ████▎ │
|
||||
│ 1990 │ 11055 │ 6.03 │ ██████ │
|
||||
│ 2000 │ 2467 │ 11.85 │ ███████████▋ │
|
||||
│ 2010 │ 597 │ 25.66 │ █████████████████████████▋ │
|
||||
└──────┴─────────┴──────────────────────┴──────────────────────────────┘
|
||||
|
||||
17 rows in set. Elapsed: 0.044 sec. Processed 1.33 million rows, 54.62 MB (30.00 million rows/s., 1.23 GB/s.)
|
||||
```
|
||||
|
||||
Take it with a grain of salt.
|
||||
|
||||
### Burger Prices:
|
||||
|
||||
```
|
||||
SELECT
|
||||
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
|
||||
count(),
|
||||
round(avg(price), 2),
|
||||
bar(avg(price), 0, 50, 100)
|
||||
FROM menu_item_denorm
|
||||
WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%burger%')
|
||||
GROUP BY d
|
||||
ORDER BY d ASC
|
||||
|
||||
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)───────────┐
|
||||
│ 1880 │ 2 │ 0.42 │ ▋ │
|
||||
│ 1890 │ 7 │ 0.85 │ █▋ │
|
||||
│ 1900 │ 399 │ 0.49 │ ▊ │
|
||||
│ 1910 │ 589 │ 0.68 │ █▎ │
|
||||
│ 1920 │ 280 │ 0.56 │ █ │
|
||||
│ 1930 │ 74 │ 0.42 │ ▋ │
|
||||
│ 1940 │ 119 │ 0.59 │ █▏ │
|
||||
│ 1950 │ 134 │ 1.09 │ ██▏ │
|
||||
│ 1960 │ 272 │ 0.92 │ █▋ │
|
||||
│ 1970 │ 108 │ 1.18 │ ██▎ │
|
||||
│ 1980 │ 88 │ 2.82 │ █████▋ │
|
||||
│ 1990 │ 184 │ 3.68 │ ███████▎ │
|
||||
│ 2000 │ 21 │ 7.14 │ ██████████████▎ │
|
||||
│ 2010 │ 6 │ 18.42 │ ████████████████████████████████████▋ │
|
||||
└──────┴─────────┴──────────────────────┴───────────────────────────────────────┘
|
||||
|
||||
14 rows in set. Elapsed: 0.052 sec. Processed 1.33 million rows, 94.15 MB (25.48 million rows/s., 1.80 GB/s.)
|
||||
```
|
||||
|
||||
### Vodka:
|
||||
|
||||
```
|
||||
SELECT
|
||||
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
|
||||
count(),
|
||||
round(avg(price), 2),
|
||||
bar(avg(price), 0, 50, 100)
|
||||
FROM menu_item_denorm
|
||||
WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%vodka%')
|
||||
GROUP BY d
|
||||
ORDER BY d ASC
|
||||
|
||||
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)─┐
|
||||
│ 1910 │ 2 │ 0 │ │
|
||||
│ 1920 │ 1 │ 0.3 │ ▌ │
|
||||
│ 1940 │ 21 │ 0.42 │ ▋ │
|
||||
│ 1950 │ 14 │ 0.59 │ █▏ │
|
||||
│ 1960 │ 113 │ 2.17 │ ████▎ │
|
||||
│ 1970 │ 37 │ 0.68 │ █▎ │
|
||||
│ 1980 │ 19 │ 2.55 │ █████ │
|
||||
│ 1990 │ 86 │ 3.6 │ ███████▏ │
|
||||
│ 2000 │ 2 │ 3.98 │ ███████▊ │
|
||||
└──────┴─────────┴──────────────────────┴─────────────────────────────┘
|
||||
```
|
||||
|
||||
To get vodka we have to write `ILIKE '%vodka%'` and this definitely makes a statement.
|
||||
|
||||
### Caviar:
|
||||
|
||||
Let's print caviar prices. Also let's print a name of any dish with caviar.
|
||||
|
||||
```
|
||||
SELECT
|
||||
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
|
||||
count(),
|
||||
round(avg(price), 2),
|
||||
bar(avg(price), 0, 50, 100),
|
||||
any(dish_name)
|
||||
FROM menu_item_denorm
|
||||
WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%caviar%')
|
||||
GROUP BY d
|
||||
ORDER BY d ASC
|
||||
|
||||
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)──────┬─any(dish_name)──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ 1090 │ 1 │ 0 │ │ Caviar │
|
||||
│ 1880 │ 3 │ 0 │ │ Caviar │
|
||||
│ 1890 │ 39 │ 0.59 │ █▏ │ Butter and caviar │
|
||||
│ 1900 │ 1014 │ 0.34 │ ▋ │ Anchovy Caviar on Toast │
|
||||
│ 1910 │ 1588 │ 1.35 │ ██▋ │ 1/1 Brötchen Caviar │
|
||||
│ 1920 │ 927 │ 1.37 │ ██▋ │ ASTRAKAN CAVIAR │
|
||||
│ 1930 │ 289 │ 1.91 │ ███▋ │ Astrachan caviar │
|
||||
│ 1940 │ 201 │ 0.83 │ █▋ │ (SPECIAL) Domestic Caviar Sandwich │
|
||||
│ 1950 │ 81 │ 2.27 │ ████▌ │ Beluga Caviar │
|
||||
│ 1960 │ 126 │ 2.21 │ ████▍ │ Beluga Caviar │
|
||||
│ 1970 │ 105 │ 0.95 │ █▊ │ BELUGA MALOSSOL CAVIAR AMERICAN DRESSING │
|
||||
│ 1980 │ 12 │ 7.22 │ ██████████████▍ │ Authentic Iranian Beluga Caviar the world's finest black caviar presented in ice garni and a sampling of chilled 100° Russian vodka │
|
||||
│ 1990 │ 74 │ 14.42 │ ████████████████████████████▋ │ Avocado Salad, Fresh cut avocado with caviare │
|
||||
│ 2000 │ 3 │ 7.82 │ ███████████████▋ │ Aufgeschlagenes Kartoffelsueppchen mit Forellencaviar │
|
||||
│ 2010 │ 6 │ 15.58 │ ███████████████████████████████▏ │ "OYSTERS AND PEARLS" "Sabayon" of Pearl Tapioca with Island Creek Oysters and Russian Sevruga Caviar │
|
||||
└──────┴─────────┴──────────────────────┴──────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
At least they have caviar with vodka. Very nice.
|
||||
|
||||
### Test it in Playground
|
||||
|
||||
The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICByb3VuZCh0b1VJbnQzMk9yWmVybyhleHRyYWN0KG1lbnVfZGF0ZSwgJ15cXGR7NH0nKSksIC0xKSBBUyBkLAogICAgY291bnQoKSwKICAgIHJvdW5kKGF2ZyhwcmljZSksIDIpLAogICAgYmFyKGF2ZyhwcmljZSksIDAsIDUwLCAxMDApLAogICAgYW55KGRpc2hfbmFtZSkKRlJPTSBtZW51X2l0ZW1fZGVub3JtCldIRVJFIChtZW51X2N1cnJlbmN5IElOICgnRG9sbGFycycsICcnKSkgQU5EIChkID4gMCkgQU5EIChkIDwgMjAyMikgQU5EIChkaXNoX25hbWUgSUxJS0UgJyVjYXZpYXIlJykKR1JPVVAgQlkgZApPUkRFUiBCWSBkIEFTQw==).
|
384
docs/en/getting-started/example-datasets/opensky.md
Normal file
384
docs/en/getting-started/example-datasets/opensky.md
Normal file
@ -0,0 +1,384 @@
|
||||
---
|
||||
toc_priority: 20
|
||||
toc_title: OpenSky
|
||||
---
|
||||
|
||||
# Crowdsourced air traffic data from The OpenSky Network 2020
|
||||
|
||||
"The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 2500 members since 1 January 2019. More data will be periodically included in the dataset until the end of the COVID-19 pandemic".
|
||||
|
||||
Source: https://zenodo.org/record/5092942#.YRBCyTpRXYd
|
||||
|
||||
Martin Strohmeier, Xavier Olive, Jannis Lübbe, Matthias Schäfer, and Vincent Lenders
|
||||
"Crowdsourced air traffic data from the OpenSky Network 2019–2020"
|
||||
Earth System Science Data 13(2), 2021
|
||||
https://doi.org/10.5194/essd-13-357-2021
|
||||
|
||||
## Download the Dataset
|
||||
|
||||
```
|
||||
wget -O- https://zenodo.org/record/5092942 | grep -oP 'https://zenodo.org/record/5092942/files/flightlist_\d+_\d+\.csv\.gz' | xargs wget
|
||||
```
|
||||
|
||||
Download will take about 2 minutes with good internet connection. There are 30 files with total size of 4.3 GB.
|
||||
|
||||
## Create the Table
|
||||
|
||||
```
|
||||
CREATE TABLE opensky
|
||||
(
|
||||
callsign String,
|
||||
number String,
|
||||
icao24 String,
|
||||
registration String,
|
||||
typecode String,
|
||||
origin String,
|
||||
destination String,
|
||||
firstseen DateTime,
|
||||
lastseen DateTime,
|
||||
day DateTime,
|
||||
latitude_1 Float64,
|
||||
longitude_1 Float64,
|
||||
altitude_1 Float64,
|
||||
latitude_2 Float64,
|
||||
longitude_2 Float64,
|
||||
altitude_2 Float64
|
||||
) ENGINE = MergeTree ORDER BY (origin, destination, callsign);
|
||||
```
|
||||
|
||||
## Import Data
|
||||
|
||||
Upload data into ClickHouse in parallel:
|
||||
|
||||
```
|
||||
ls -1 flightlist_*.csv.gz | xargs -P100 -I{} bash -c '
|
||||
gzip -c -d "{}" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"'
|
||||
```
|
||||
|
||||
Here we pass the list of files (`ls -1 flightlist_*.csv.gz`) to `xargs` for parallel processing.
|
||||
`xargs -P100` specifies to use up to 100 parallel workers but as we only have 30 files, the number of workers will be only 30.
|
||||
|
||||
For every file, `xargs` will run a script with `bash -c`. The script has substitution in form of `{}` and the `xargs` command will substitute the filename to it (we have asked it for xargs with `-I{}`).
|
||||
|
||||
The script will decompress the file (`gzip -c -d "{}"`) to standard output (`-c` parameter) and the output is redirected to `clickhouse-client`.
|
||||
|
||||
Finally, `clickhouse-client` will do insertion. It will read input data in `CSVWithNames` format. We also asked to parse DateTime fields with extended parser (`--date_time_input_format best_effort`) to recognize ISO-8601 format with timezone offsets.
|
||||
|
||||
Parallel upload takes 24 seconds.
|
||||
|
||||
If you don't like parallel upload, here is sequential variant:
|
||||
```
|
||||
for file in flightlist_*.csv.gz; do gzip -c -d "$file" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"; done
|
||||
```
|
||||
|
||||
## Validate the Data
|
||||
|
||||
```
|
||||
SELECT count() FROM opensky
|
||||
66010819
|
||||
```
|
||||
|
||||
The size of dataset in ClickHouse is just 2.64 GiB:
|
||||
|
||||
```
|
||||
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'opensky'
|
||||
2.64 GiB
|
||||
```
|
||||
|
||||
## Run Some Queries
|
||||
|
||||
Total distance travelled is 68 billion kilometers:
|
||||
|
||||
```
|
||||
SELECT formatReadableQuantity(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) / 1000) FROM opensky
|
||||
|
||||
┌─formatReadableQuantity(divide(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)), 1000))─┐
|
||||
│ 68.72 billion │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Average flight distance is around 1000 km.
|
||||
```
|
||||
SELECT avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) FROM opensky
|
||||
|
||||
┌─avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))─┐
|
||||
│ 1041090.6465708319 │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Most busy origin airports and the average distance seen:
|
||||
|
||||
```
|
||||
SELECT
|
||||
origin,
|
||||
count(),
|
||||
round(avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))) AS distance,
|
||||
bar(distance, 0, 10000000, 100) AS bar
|
||||
FROM opensky
|
||||
WHERE origin != ''
|
||||
GROUP BY origin
|
||||
ORDER BY count() DESC
|
||||
LIMIT 100
|
||||
|
||||
Query id: f9010ea5-97d0-45a3-a5bd-9657906cd105
|
||||
|
||||
┌─origin─┬─count()─┬─distance─┬─bar────────────────────────────────────┐
|
||||
1. │ KORD │ 745007 │ 1546108 │ ███████████████▍ │
|
||||
2. │ KDFW │ 696702 │ 1358721 │ █████████████▌ │
|
||||
3. │ KATL │ 667286 │ 1169661 │ ███████████▋ │
|
||||
4. │ KDEN │ 582709 │ 1287742 │ ████████████▊ │
|
||||
5. │ KLAX │ 581952 │ 2628393 │ ██████████████████████████▎ │
|
||||
6. │ KLAS │ 447789 │ 1336967 │ █████████████▎ │
|
||||
7. │ KPHX │ 428558 │ 1345635 │ █████████████▍ │
|
||||
8. │ KSEA │ 412592 │ 1757317 │ █████████████████▌ │
|
||||
9. │ KCLT │ 404612 │ 880355 │ ████████▋ │
|
||||
10. │ VIDP │ 363074 │ 1445052 │ ██████████████▍ │
|
||||
11. │ EDDF │ 362643 │ 2263960 │ ██████████████████████▋ │
|
||||
12. │ KSFO │ 361869 │ 2445732 │ ████████████████████████▍ │
|
||||
13. │ KJFK │ 349232 │ 2996550 │ █████████████████████████████▊ │
|
||||
14. │ KMSP │ 346010 │ 1287328 │ ████████████▋ │
|
||||
15. │ LFPG │ 344748 │ 2206203 │ ██████████████████████ │
|
||||
16. │ EGLL │ 341370 │ 3216593 │ ████████████████████████████████▏ │
|
||||
17. │ EHAM │ 340272 │ 2116425 │ █████████████████████▏ │
|
||||
18. │ KEWR │ 337696 │ 1826545 │ ██████████████████▎ │
|
||||
19. │ KPHL │ 320762 │ 1291761 │ ████████████▊ │
|
||||
20. │ OMDB │ 308855 │ 2855706 │ ████████████████████████████▌ │
|
||||
21. │ UUEE │ 307098 │ 1555122 │ ███████████████▌ │
|
||||
22. │ KBOS │ 304416 │ 1621675 │ ████████████████▏ │
|
||||
23. │ LEMD │ 291787 │ 1695097 │ ████████████████▊ │
|
||||
24. │ YSSY │ 272979 │ 1875298 │ ██████████████████▋ │
|
||||
25. │ KMIA │ 265121 │ 1923542 │ ███████████████████▏ │
|
||||
26. │ ZGSZ │ 263497 │ 745086 │ ███████▍ │
|
||||
27. │ EDDM │ 256691 │ 1361453 │ █████████████▌ │
|
||||
28. │ WMKK │ 254264 │ 1626688 │ ████████████████▎ │
|
||||
29. │ CYYZ │ 251192 │ 2175026 │ █████████████████████▋ │
|
||||
30. │ KLGA │ 248699 │ 1106935 │ ███████████ │
|
||||
31. │ VHHH │ 248473 │ 3457658 │ ██████████████████████████████████▌ │
|
||||
32. │ RJTT │ 243477 │ 1272744 │ ████████████▋ │
|
||||
33. │ KBWI │ 241440 │ 1187060 │ ███████████▋ │
|
||||
34. │ KIAD │ 239558 │ 1683485 │ ████████████████▋ │
|
||||
35. │ KIAH │ 234202 │ 1538335 │ ███████████████▍ │
|
||||
36. │ KFLL │ 223447 │ 1464410 │ ██████████████▋ │
|
||||
37. │ KDAL │ 212055 │ 1082339 │ ██████████▋ │
|
||||
38. │ KDCA │ 207883 │ 1013359 │ ██████████▏ │
|
||||
39. │ LIRF │ 207047 │ 1427965 │ ██████████████▎ │
|
||||
40. │ PANC │ 206007 │ 2525359 │ █████████████████████████▎ │
|
||||
41. │ LTFJ │ 205415 │ 860470 │ ████████▌ │
|
||||
42. │ KDTW │ 204020 │ 1106716 │ ███████████ │
|
||||
43. │ VABB │ 201679 │ 1300865 │ █████████████ │
|
||||
44. │ OTHH │ 200797 │ 3759544 │ █████████████████████████████████████▌ │
|
||||
45. │ KMDW │ 200796 │ 1232551 │ ████████████▎ │
|
||||
46. │ KSAN │ 198003 │ 1495195 │ ██████████████▊ │
|
||||
47. │ KPDX │ 197760 │ 1269230 │ ████████████▋ │
|
||||
48. │ SBGR │ 197624 │ 2041697 │ ████████████████████▍ │
|
||||
49. │ VOBL │ 189011 │ 1040180 │ ██████████▍ │
|
||||
50. │ LEBL │ 188956 │ 1283190 │ ████████████▋ │
|
||||
51. │ YBBN │ 188011 │ 1253405 │ ████████████▌ │
|
||||
52. │ LSZH │ 187934 │ 1572029 │ ███████████████▋ │
|
||||
53. │ YMML │ 187643 │ 1870076 │ ██████████████████▋ │
|
||||
54. │ RCTP │ 184466 │ 2773976 │ ███████████████████████████▋ │
|
||||
55. │ KSNA │ 180045 │ 778484 │ ███████▋ │
|
||||
56. │ EGKK │ 176420 │ 1694770 │ ████████████████▊ │
|
||||
57. │ LOWW │ 176191 │ 1274833 │ ████████████▋ │
|
||||
58. │ UUDD │ 176099 │ 1368226 │ █████████████▋ │
|
||||
59. │ RKSI │ 173466 │ 3079026 │ ██████████████████████████████▋ │
|
||||
60. │ EKCH │ 172128 │ 1229895 │ ████████████▎ │
|
||||
61. │ KOAK │ 171119 │ 1114447 │ ███████████▏ │
|
||||
62. │ RPLL │ 170122 │ 1440735 │ ██████████████▍ │
|
||||
63. │ KRDU │ 167001 │ 830521 │ ████████▎ │
|
||||
64. │ KAUS │ 164524 │ 1256198 │ ████████████▌ │
|
||||
65. │ KBNA │ 163242 │ 1022726 │ ██████████▏ │
|
||||
66. │ KSDF │ 162655 │ 1380867 │ █████████████▋ │
|
||||
67. │ ENGM │ 160732 │ 910108 │ █████████ │
|
||||
68. │ LIMC │ 160696 │ 1564620 │ ███████████████▋ │
|
||||
69. │ KSJC │ 159278 │ 1081125 │ ██████████▋ │
|
||||
70. │ KSTL │ 157984 │ 1026699 │ ██████████▎ │
|
||||
71. │ UUWW │ 156811 │ 1261155 │ ████████████▌ │
|
||||
72. │ KIND │ 153929 │ 987944 │ █████████▊ │
|
||||
73. │ ESSA │ 153390 │ 1203439 │ ████████████ │
|
||||
74. │ KMCO │ 153351 │ 1508657 │ ███████████████ │
|
||||
75. │ KDVT │ 152895 │ 74048 │ ▋ │
|
||||
76. │ VTBS │ 152645 │ 2255591 │ ██████████████████████▌ │
|
||||
77. │ CYVR │ 149574 │ 2027413 │ ████████████████████▎ │
|
||||
78. │ EIDW │ 148723 │ 1503985 │ ███████████████ │
|
||||
79. │ LFPO │ 143277 │ 1152964 │ ███████████▌ │
|
||||
80. │ EGSS │ 140830 │ 1348183 │ █████████████▍ │
|
||||
81. │ KAPA │ 140776 │ 420441 │ ████▏ │
|
||||
82. │ KHOU │ 138985 │ 1068806 │ ██████████▋ │
|
||||
83. │ KTPA │ 138033 │ 1338223 │ █████████████▍ │
|
||||
84. │ KFFZ │ 137333 │ 55397 │ ▌ │
|
||||
85. │ NZAA │ 136092 │ 1581264 │ ███████████████▋ │
|
||||
86. │ YPPH │ 133916 │ 1271550 │ ████████████▋ │
|
||||
87. │ RJBB │ 133522 │ 1805623 │ ██████████████████ │
|
||||
88. │ EDDL │ 133018 │ 1265919 │ ████████████▋ │
|
||||
89. │ ULLI │ 130501 │ 1197108 │ ███████████▊ │
|
||||
90. │ KIWA │ 127195 │ 250876 │ ██▌ │
|
||||
91. │ KTEB │ 126969 │ 1189414 │ ███████████▊ │
|
||||
92. │ VOMM │ 125616 │ 1127757 │ ███████████▎ │
|
||||
93. │ LSGG │ 123998 │ 1049101 │ ██████████▍ │
|
||||
94. │ LPPT │ 122733 │ 1779187 │ █████████████████▋ │
|
||||
95. │ WSSS │ 120493 │ 3264122 │ ████████████████████████████████▋ │
|
||||
96. │ EBBR │ 118539 │ 1579939 │ ███████████████▋ │
|
||||
97. │ VTBD │ 118107 │ 661627 │ ██████▌ │
|
||||
98. │ KVNY │ 116326 │ 692960 │ ██████▊ │
|
||||
99. │ EDDT │ 115122 │ 941740 │ █████████▍ │
|
||||
100. │ EFHK │ 114860 │ 1629143 │ ████████████████▎ │
|
||||
└────────┴─────────┴──────────┴────────────────────────────────────────┘
|
||||
|
||||
100 rows in set. Elapsed: 0.186 sec. Processed 48.31 million rows, 2.17 GB (259.27 million rows/s., 11.67 GB/s.)
|
||||
```
|
||||
|
||||
### Number of flights from three major Moscow airports, weekly:
|
||||
|
||||
```
|
||||
SELECT
|
||||
toMonday(day) AS k,
|
||||
count() AS c,
|
||||
bar(c, 0, 10000, 100) AS bar
|
||||
FROM opensky
|
||||
WHERE origin IN ('UUEE', 'UUDD', 'UUWW')
|
||||
GROUP BY k
|
||||
ORDER BY k ASC
|
||||
|
||||
Query id: 1b446157-9519-4cc4-a1cb-178dfcc15a8e
|
||||
|
||||
┌──────────k─┬────c─┬─bar──────────────────────────────────────────────────────────────────────────┐
|
||||
1. │ 2018-12-31 │ 5248 │ ████████████████████████████████████████████████████▍ │
|
||||
2. │ 2019-01-07 │ 6302 │ ███████████████████████████████████████████████████████████████ │
|
||||
3. │ 2019-01-14 │ 5701 │ █████████████████████████████████████████████████████████ │
|
||||
4. │ 2019-01-21 │ 5638 │ ████████████████████████████████████████████████████████▍ │
|
||||
5. │ 2019-01-28 │ 5731 │ █████████████████████████████████████████████████████████▎ │
|
||||
6. │ 2019-02-04 │ 5683 │ ████████████████████████████████████████████████████████▋ │
|
||||
7. │ 2019-02-11 │ 5759 │ █████████████████████████████████████████████████████████▌ │
|
||||
8. │ 2019-02-18 │ 5736 │ █████████████████████████████████████████████████████████▎ │
|
||||
9. │ 2019-02-25 │ 5873 │ ██████████████████████████████████████████████████████████▋ │
|
||||
10. │ 2019-03-04 │ 5965 │ ███████████████████████████████████████████████████████████▋ │
|
||||
11. │ 2019-03-11 │ 5900 │ ███████████████████████████████████████████████████████████ │
|
||||
12. │ 2019-03-18 │ 5823 │ ██████████████████████████████████████████████████████████▏ │
|
||||
13. │ 2019-03-25 │ 5899 │ ██████████████████████████████████████████████████████████▊ │
|
||||
14. │ 2019-04-01 │ 6043 │ ████████████████████████████████████████████████████████████▍ │
|
||||
15. │ 2019-04-08 │ 6098 │ ████████████████████████████████████████████████████████████▊ │
|
||||
16. │ 2019-04-15 │ 6196 │ █████████████████████████████████████████████████████████████▊ │
|
||||
17. │ 2019-04-22 │ 6486 │ ████████████████████████████████████████████████████████████████▋ │
|
||||
18. │ 2019-04-29 │ 6682 │ ██████████████████████████████████████████████████████████████████▋ │
|
||||
19. │ 2019-05-06 │ 6739 │ ███████████████████████████████████████████████████████████████████▍ │
|
||||
20. │ 2019-05-13 │ 6600 │ ██████████████████████████████████████████████████████████████████ │
|
||||
21. │ 2019-05-20 │ 6575 │ █████████████████████████████████████████████████████████████████▋ │
|
||||
22. │ 2019-05-27 │ 6786 │ ███████████████████████████████████████████████████████████████████▋ │
|
||||
23. │ 2019-06-03 │ 6872 │ ████████████████████████████████████████████████████████████████████▋ │
|
||||
24. │ 2019-06-10 │ 7045 │ ██████████████████████████████████████████████████████████████████████▍ │
|
||||
25. │ 2019-06-17 │ 7045 │ ██████████████████████████████████████████████████████████████████████▍ │
|
||||
26. │ 2019-06-24 │ 6852 │ ████████████████████████████████████████████████████████████████████▌ │
|
||||
27. │ 2019-07-01 │ 7248 │ ████████████████████████████████████████████████████████████████████████▍ │
|
||||
28. │ 2019-07-08 │ 7284 │ ████████████████████████████████████████████████████████████████████████▋ │
|
||||
29. │ 2019-07-15 │ 7142 │ ███████████████████████████████████████████████████████████████████████▍ │
|
||||
30. │ 2019-07-22 │ 7108 │ ███████████████████████████████████████████████████████████████████████ │
|
||||
31. │ 2019-07-29 │ 7251 │ ████████████████████████████████████████████████████████████████████████▌ │
|
||||
32. │ 2019-08-05 │ 7403 │ ██████████████████████████████████████████████████████████████████████████ │
|
||||
33. │ 2019-08-12 │ 7457 │ ██████████████████████████████████████████████████████████████████████████▌ │
|
||||
34. │ 2019-08-19 │ 7502 │ ███████████████████████████████████████████████████████████████████████████ │
|
||||
35. │ 2019-08-26 │ 7540 │ ███████████████████████████████████████████████████████████████████████████▍ │
|
||||
36. │ 2019-09-02 │ 7237 │ ████████████████████████████████████████████████████████████████████████▎ │
|
||||
37. │ 2019-09-09 │ 7328 │ █████████████████████████████████████████████████████████████████████████▎ │
|
||||
38. │ 2019-09-16 │ 5566 │ ███████████████████████████████████████████████████████▋ │
|
||||
39. │ 2019-09-23 │ 7049 │ ██████████████████████████████████████████████████████████████████████▍ │
|
||||
40. │ 2019-09-30 │ 6880 │ ████████████████████████████████████████████████████████████████████▋ │
|
||||
41. │ 2019-10-07 │ 6518 │ █████████████████████████████████████████████████████████████████▏ │
|
||||
42. │ 2019-10-14 │ 6688 │ ██████████████████████████████████████████████████████████████████▊ │
|
||||
43. │ 2019-10-21 │ 6667 │ ██████████████████████████████████████████████████████████████████▋ │
|
||||
44. │ 2019-10-28 │ 6303 │ ███████████████████████████████████████████████████████████████ │
|
||||
45. │ 2019-11-04 │ 6298 │ ██████████████████████████████████████████████████████████████▊ │
|
||||
46. │ 2019-11-11 │ 6137 │ █████████████████████████████████████████████████████████████▎ │
|
||||
47. │ 2019-11-18 │ 6051 │ ████████████████████████████████████████████████████████████▌ │
|
||||
48. │ 2019-11-25 │ 5820 │ ██████████████████████████████████████████████████████████▏ │
|
||||
49. │ 2019-12-02 │ 5942 │ ███████████████████████████████████████████████████████████▍ │
|
||||
50. │ 2019-12-09 │ 4891 │ ████████████████████████████████████████████████▊ │
|
||||
51. │ 2019-12-16 │ 5682 │ ████████████████████████████████████████████████████████▋ │
|
||||
52. │ 2019-12-23 │ 6111 │ █████████████████████████████████████████████████████████████ │
|
||||
53. │ 2019-12-30 │ 5870 │ ██████████████████████████████████████████████████████████▋ │
|
||||
54. │ 2020-01-06 │ 5953 │ ███████████████████████████████████████████████████████████▌ │
|
||||
55. │ 2020-01-13 │ 5698 │ ████████████████████████████████████████████████████████▊ │
|
||||
56. │ 2020-01-20 │ 5339 │ █████████████████████████████████████████████████████▍ │
|
||||
57. │ 2020-01-27 │ 5566 │ ███████████████████████████████████████████████████████▋ │
|
||||
58. │ 2020-02-03 │ 5801 │ ██████████████████████████████████████████████████████████ │
|
||||
59. │ 2020-02-10 │ 5692 │ ████████████████████████████████████████████████████████▊ │
|
||||
60. │ 2020-02-17 │ 5912 │ ███████████████████████████████████████████████████████████ │
|
||||
61. │ 2020-02-24 │ 6031 │ ████████████████████████████████████████████████████████████▎ │
|
||||
62. │ 2020-03-02 │ 6105 │ █████████████████████████████████████████████████████████████ │
|
||||
63. │ 2020-03-09 │ 5823 │ ██████████████████████████████████████████████████████████▏ │
|
||||
64. │ 2020-03-16 │ 4659 │ ██████████████████████████████████████████████▌ │
|
||||
65. │ 2020-03-23 │ 3720 │ █████████████████████████████████████▏ │
|
||||
66. │ 2020-03-30 │ 1720 │ █████████████████▏ │
|
||||
67. │ 2020-04-06 │ 849 │ ████████▍ │
|
||||
68. │ 2020-04-13 │ 710 │ ███████ │
|
||||
69. │ 2020-04-20 │ 725 │ ███████▏ │
|
||||
70. │ 2020-04-27 │ 920 │ █████████▏ │
|
||||
71. │ 2020-05-04 │ 859 │ ████████▌ │
|
||||
72. │ 2020-05-11 │ 1047 │ ██████████▍ │
|
||||
73. │ 2020-05-18 │ 1135 │ ███████████▎ │
|
||||
74. │ 2020-05-25 │ 1266 │ ████████████▋ │
|
||||
75. │ 2020-06-01 │ 1793 │ █████████████████▊ │
|
||||
76. │ 2020-06-08 │ 1979 │ ███████████████████▋ │
|
||||
77. │ 2020-06-15 │ 2297 │ ██████████████████████▊ │
|
||||
78. │ 2020-06-22 │ 2788 │ ███████████████████████████▊ │
|
||||
79. │ 2020-06-29 │ 3389 │ █████████████████████████████████▊ │
|
||||
80. │ 2020-07-06 │ 3545 │ ███████████████████████████████████▍ │
|
||||
81. │ 2020-07-13 │ 3569 │ ███████████████████████████████████▋ │
|
||||
82. │ 2020-07-20 │ 3784 │ █████████████████████████████████████▋ │
|
||||
83. │ 2020-07-27 │ 3960 │ ███████████████████████████████████████▌ │
|
||||
84. │ 2020-08-03 │ 4323 │ ███████████████████████████████████████████▏ │
|
||||
85. │ 2020-08-10 │ 4581 │ █████████████████████████████████████████████▋ │
|
||||
86. │ 2020-08-17 │ 4791 │ ███████████████████████████████████████████████▊ │
|
||||
87. │ 2020-08-24 │ 4928 │ █████████████████████████████████████████████████▎ │
|
||||
88. │ 2020-08-31 │ 4687 │ ██████████████████████████████████████████████▋ │
|
||||
89. │ 2020-09-07 │ 4643 │ ██████████████████████████████████████████████▍ │
|
||||
90. │ 2020-09-14 │ 4594 │ █████████████████████████████████████████████▊ │
|
||||
91. │ 2020-09-21 │ 4478 │ ████████████████████████████████████████████▋ │
|
||||
92. │ 2020-09-28 │ 4382 │ ███████████████████████████████████████████▋ │
|
||||
93. │ 2020-10-05 │ 4261 │ ██████████████████████████████████████████▌ │
|
||||
94. │ 2020-10-12 │ 4243 │ ██████████████████████████████████████████▍ │
|
||||
95. │ 2020-10-19 │ 3941 │ ███████████████████████████████████████▍ │
|
||||
96. │ 2020-10-26 │ 3616 │ ████████████████████████████████████▏ │
|
||||
97. │ 2020-11-02 │ 3586 │ ███████████████████████████████████▋ │
|
||||
98. │ 2020-11-09 │ 3403 │ ██████████████████████████████████ │
|
||||
99. │ 2020-11-16 │ 3336 │ █████████████████████████████████▎ │
|
||||
100. │ 2020-11-23 │ 3230 │ ████████████████████████████████▎ │
|
||||
101. │ 2020-11-30 │ 3183 │ ███████████████████████████████▋ │
|
||||
102. │ 2020-12-07 │ 3285 │ ████████████████████████████████▋ │
|
||||
103. │ 2020-12-14 │ 3367 │ █████████████████████████████████▋ │
|
||||
104. │ 2020-12-21 │ 3748 │ █████████████████████████████████████▍ │
|
||||
105. │ 2020-12-28 │ 3986 │ ███████████████████████████████████████▋ │
|
||||
106. │ 2021-01-04 │ 3906 │ ███████████████████████████████████████ │
|
||||
107. │ 2021-01-11 │ 3425 │ ██████████████████████████████████▎ │
|
||||
108. │ 2021-01-18 │ 3144 │ ███████████████████████████████▍ │
|
||||
109. │ 2021-01-25 │ 3115 │ ███████████████████████████████▏ │
|
||||
110. │ 2021-02-01 │ 3285 │ ████████████████████████████████▋ │
|
||||
111. │ 2021-02-08 │ 3321 │ █████████████████████████████████▏ │
|
||||
112. │ 2021-02-15 │ 3475 │ ██████████████████████████████████▋ │
|
||||
113. │ 2021-02-22 │ 3549 │ ███████████████████████████████████▍ │
|
||||
114. │ 2021-03-01 │ 3755 │ █████████████████████████████████████▌ │
|
||||
115. │ 2021-03-08 │ 3080 │ ██████████████████████████████▋ │
|
||||
116. │ 2021-03-15 │ 3789 │ █████████████████████████████████████▊ │
|
||||
117. │ 2021-03-22 │ 3804 │ ██████████████████████████████████████ │
|
||||
118. │ 2021-03-29 │ 4238 │ ██████████████████████████████████████████▍ │
|
||||
119. │ 2021-04-05 │ 4307 │ ███████████████████████████████████████████ │
|
||||
120. │ 2021-04-12 │ 4225 │ ██████████████████████████████████████████▎ │
|
||||
121. │ 2021-04-19 │ 4391 │ ███████████████████████████████████████████▊ │
|
||||
122. │ 2021-04-26 │ 4868 │ ████████████████████████████████████████████████▋ │
|
||||
123. │ 2021-05-03 │ 4977 │ █████████████████████████████████████████████████▋ │
|
||||
124. │ 2021-05-10 │ 5164 │ ███████████████████████████████████████████████████▋ │
|
||||
125. │ 2021-05-17 │ 4986 │ █████████████████████████████████████████████████▋ │
|
||||
126. │ 2021-05-24 │ 5024 │ ██████████████████████████████████████████████████▏ │
|
||||
127. │ 2021-05-31 │ 4824 │ ████████████████████████████████████████████████▏ │
|
||||
128. │ 2021-06-07 │ 5652 │ ████████████████████████████████████████████████████████▌ │
|
||||
129. │ 2021-06-14 │ 5613 │ ████████████████████████████████████████████████████████▏ │
|
||||
130. │ 2021-06-21 │ 6061 │ ████████████████████████████████████████████████████████████▌ │
|
||||
131. │ 2021-06-28 │ 2554 │ █████████████████████████▌ │
|
||||
└────────────┴──────┴──────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
131 rows in set. Elapsed: 0.014 sec. Processed 655.36 thousand rows, 11.14 MB (47.56 million rows/s., 808.48 MB/s.)
|
||||
```
|
||||
|
||||
### Test it in Playground
|
||||
|
||||
The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICBvcmlnaW4sCiAgICBjb3VudCgpLAogICAgcm91bmQoYXZnKGdlb0Rpc3RhbmNlKGxvbmdpdHVkZV8xLCBsYXRpdHVkZV8xLCBsb25naXR1ZGVfMiwgbGF0aXR1ZGVfMikpKSBBUyBkaXN0YW5jZSwKICAgIGJhcihkaXN0YW5jZSwgMCwgMTAwMDAwMDAsIDEwMCkgQVMgYmFyCkZST00gb3BlbnNreQpXSEVSRSBvcmlnaW4gIT0gJycKR1JPVVAgQlkgb3JpZ2luCk9SREVSIEJZIGNvdW50KCkgREVTQwpMSU1JVCAxMDA=).
|
581
docs/en/getting-started/example-datasets/uk-price-paid.md
Normal file
581
docs/en/getting-started/example-datasets/uk-price-paid.md
Normal file
@ -0,0 +1,581 @@
|
||||
---
|
||||
toc_priority: 20
|
||||
toc_title: UK Property Price Paid
|
||||
---
|
||||
|
||||
# UK Property Price Paid
|
||||
|
||||
The dataset contains data about prices paid for real-estate property in England and Wales. The data is available since year 1995.
|
||||
The size of the dataset in uncompressed form is about 4 GiB and it will take about 226 MiB in ClickHouse.
|
||||
|
||||
Source: https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
|
||||
Description of the fields: https://www.gov.uk/guidance/about-the-price-paid-data
|
||||
|
||||
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
|
||||
|
||||
## Download the Dataset
|
||||
|
||||
```
|
||||
wget http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv
|
||||
```
|
||||
|
||||
Download will take about 2 minutes with good internet connection.
|
||||
|
||||
## Create the Table
|
||||
|
||||
```
|
||||
CREATE TABLE uk_price_paid
|
||||
(
|
||||
price UInt32,
|
||||
date Date,
|
||||
postcode1 LowCardinality(String),
|
||||
postcode2 LowCardinality(String),
|
||||
type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
|
||||
is_new UInt8,
|
||||
duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
|
||||
addr1 String,
|
||||
addr2 String,
|
||||
street LowCardinality(String),
|
||||
locality LowCardinality(String),
|
||||
town LowCardinality(String),
|
||||
district LowCardinality(String),
|
||||
county LowCardinality(String),
|
||||
category UInt8
|
||||
) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);
|
||||
```
|
||||
|
||||
## Preprocess and Import Data
|
||||
|
||||
We will use `clickhouse-local` tool for data preprocessing and `clickhouse-client` to upload it.
|
||||
|
||||
In this example, we define the structure of source data from the CSV file and specify a query to preprocess the data with `clickhouse-local`.
|
||||
|
||||
The preprocessing is:
|
||||
- splitting the postcode to two different columns `postcode1` and `postcode2` that is better for storage and queries;
|
||||
- coverting the `time` field to date as it only contains 00:00 time;
|
||||
- ignoring the `uuid` field because we don't need it for analysis;
|
||||
- transforming `type` and `duration` to more readable Enum fields with function `transform`;
|
||||
- transforming `is_new` and `category` fields from single-character string (`Y`/`N` and `A`/`B`) to UInt8 field with 0 and 1.
|
||||
|
||||
Preprocessed data is piped directly to `clickhouse-client` to be inserted into ClickHouse table in streaming fashion.
|
||||
|
||||
```
|
||||
clickhouse-local --input-format CSV --structure '
|
||||
uuid String,
|
||||
price UInt32,
|
||||
time DateTime,
|
||||
postcode String,
|
||||
a String,
|
||||
b String,
|
||||
c String,
|
||||
addr1 String,
|
||||
addr2 String,
|
||||
street String,
|
||||
locality String,
|
||||
town String,
|
||||
district String,
|
||||
county String,
|
||||
d String,
|
||||
e String
|
||||
' --query "
|
||||
WITH splitByChar(' ', postcode) AS p
|
||||
SELECT
|
||||
price,
|
||||
toDate(time) AS date,
|
||||
p[1] AS postcode1,
|
||||
p[2] AS postcode2,
|
||||
transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
|
||||
b = 'Y' AS is_new,
|
||||
transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
|
||||
addr1,
|
||||
addr2,
|
||||
street,
|
||||
locality,
|
||||
town,
|
||||
district,
|
||||
county,
|
||||
d = 'B' AS category
|
||||
FROM table" --date_time_input_format best_effort < pp-complete.csv | clickhouse-client --query "INSERT INTO uk_price_paid FORMAT TSV"
|
||||
```
|
||||
|
||||
It will take about 40 seconds.
|
||||
|
||||
## Validate the Data
|
||||
|
||||
```
|
||||
SELECT count() FROM uk_price_paid
|
||||
26248711
|
||||
```
|
||||
|
||||
The size of dataset in ClickHouse is just 226 MiB:
|
||||
|
||||
```
|
||||
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid'
|
||||
226.40 MiB
|
||||
```
|
||||
|
||||
## Run Some Queries
|
||||
|
||||
### Average price per year:
|
||||
|
||||
```
|
||||
SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 1000000, 80) FROM uk_price_paid GROUP BY year ORDER BY year
|
||||
|
||||
┌─year─┬──price─┬─bar(round(avg(price)), 0, 1000000, 80)─┐
|
||||
│ 1995 │ 67932 │ █████▍ │
|
||||
│ 1996 │ 71505 │ █████▋ │
|
||||
│ 1997 │ 78532 │ ██████▎ │
|
||||
│ 1998 │ 85435 │ ██████▋ │
|
||||
│ 1999 │ 96036 │ ███████▋ │
|
||||
│ 2000 │ 107478 │ ████████▌ │
|
||||
│ 2001 │ 118886 │ █████████▌ │
|
||||
│ 2002 │ 137940 │ ███████████ │
|
||||
│ 2003 │ 155888 │ ████████████▍ │
|
||||
│ 2004 │ 178885 │ ██████████████▎ │
|
||||
│ 2005 │ 189350 │ ███████████████▏ │
|
||||
│ 2006 │ 203528 │ ████████████████▎ │
|
||||
│ 2007 │ 219377 │ █████████████████▌ │
|
||||
│ 2008 │ 217056 │ █████████████████▎ │
|
||||
│ 2009 │ 213419 │ █████████████████ │
|
||||
│ 2010 │ 236110 │ ██████████████████▊ │
|
||||
│ 2011 │ 232804 │ ██████████████████▌ │
|
||||
│ 2012 │ 238366 │ ███████████████████ │
|
||||
│ 2013 │ 256931 │ ████████████████████▌ │
|
||||
│ 2014 │ 279917 │ ██████████████████████▍ │
|
||||
│ 2015 │ 297264 │ ███████████████████████▋ │
|
||||
│ 2016 │ 313197 │ █████████████████████████ │
|
||||
│ 2017 │ 346070 │ ███████████████████████████▋ │
|
||||
│ 2018 │ 350117 │ ████████████████████████████ │
|
||||
│ 2019 │ 351010 │ ████████████████████████████ │
|
||||
│ 2020 │ 368974 │ █████████████████████████████▌ │
|
||||
│ 2021 │ 384351 │ ██████████████████████████████▋ │
|
||||
└──────┴────────┴────────────────────────────────────────┘
|
||||
|
||||
27 rows in set. Elapsed: 0.027 sec. Processed 26.25 million rows, 157.49 MB (955.96 million rows/s., 5.74 GB/s.)
|
||||
```
|
||||
|
||||
### Average price per year in London:
|
||||
|
||||
```
|
||||
SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 2000000, 100) FROM uk_price_paid WHERE town = 'LONDON' GROUP BY year ORDER BY year
|
||||
|
||||
┌─year─┬───price─┬─bar(round(avg(price)), 0, 2000000, 100)───────────────┐
|
||||
│ 1995 │ 109112 │ █████▍ │
|
||||
│ 1996 │ 118667 │ █████▊ │
|
||||
│ 1997 │ 136518 │ ██████▋ │
|
||||
│ 1998 │ 152983 │ ███████▋ │
|
||||
│ 1999 │ 180633 │ █████████ │
|
||||
│ 2000 │ 215830 │ ██████████▋ │
|
||||
│ 2001 │ 232996 │ ███████████▋ │
|
||||
│ 2002 │ 263672 │ █████████████▏ │
|
||||
│ 2003 │ 278394 │ █████████████▊ │
|
||||
│ 2004 │ 304665 │ ███████████████▏ │
|
||||
│ 2005 │ 322875 │ ████████████████▏ │
|
||||
│ 2006 │ 356192 │ █████████████████▋ │
|
||||
│ 2007 │ 404055 │ ████████████████████▏ │
|
||||
│ 2008 │ 420741 │ █████████████████████ │
|
||||
│ 2009 │ 427754 │ █████████████████████▍ │
|
||||
│ 2010 │ 480306 │ ████████████████████████ │
|
||||
│ 2011 │ 496274 │ ████████████████████████▋ │
|
||||
│ 2012 │ 519441 │ █████████████████████████▊ │
|
||||
│ 2013 │ 616209 │ ██████████████████████████████▋ │
|
||||
│ 2014 │ 724144 │ ████████████████████████████████████▏ │
|
||||
│ 2015 │ 792112 │ ███████████████████████████████████████▌ │
|
||||
│ 2016 │ 843568 │ ██████████████████████████████████████████▏ │
|
||||
│ 2017 │ 982566 │ █████████████████████████████████████████████████▏ │
|
||||
│ 2018 │ 1016845 │ ██████████████████████████████████████████████████▋ │
|
||||
│ 2019 │ 1043277 │ ████████████████████████████████████████████████████▏ │
|
||||
│ 2020 │ 1003963 │ ██████████████████████████████████████████████████▏ │
|
||||
│ 2021 │ 940794 │ ███████████████████████████████████████████████ │
|
||||
└──────┴─────────┴───────────────────────────────────────────────────────┘
|
||||
|
||||
27 rows in set. Elapsed: 0.024 sec. Processed 26.25 million rows, 76.88 MB (1.08 billion rows/s., 3.15 GB/s.)
|
||||
```
|
||||
|
||||
Something happened in 2013. I don't have a clue. Maybe you have a clue what happened in 2020?
|
||||
|
||||
### The most expensive neighborhoods:
|
||||
|
||||
```
|
||||
SELECT
|
||||
town,
|
||||
district,
|
||||
count() AS c,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 5000000, 100)
|
||||
FROM uk_price_paid
|
||||
WHERE date >= '2020-01-01'
|
||||
GROUP BY
|
||||
town,
|
||||
district
|
||||
HAVING c >= 100
|
||||
ORDER BY price DESC
|
||||
LIMIT 100
|
||||
|
||||
┌─town─────────────────┬─district───────────────┬────c─┬───price─┬─bar(round(avg(price)), 0, 5000000, 100)────────────────────────────┐
|
||||
│ LONDON │ CITY OF WESTMINSTER │ 3372 │ 3305225 │ ██████████████████████████████████████████████████████████████████ │
|
||||
│ LONDON │ CITY OF LONDON │ 257 │ 3294478 │ █████████████████████████████████████████████████████████████████▊ │
|
||||
│ LONDON │ KENSINGTON AND CHELSEA │ 2367 │ 2342422 │ ██████████████████████████████████████████████▋ │
|
||||
│ LEATHERHEAD │ ELMBRIDGE │ 108 │ 1927143 │ ██████████████████████████████████████▌ │
|
||||
│ VIRGINIA WATER │ RUNNYMEDE │ 142 │ 1868819 │ █████████████████████████████████████▍ │
|
||||
│ LONDON │ CAMDEN │ 2815 │ 1736788 │ ██████████████████████████████████▋ │
|
||||
│ THORNTON HEATH │ CROYDON │ 521 │ 1733051 │ ██████████████████████████████████▋ │
|
||||
│ WINDLESHAM │ SURREY HEATH │ 103 │ 1717255 │ ██████████████████████████████████▎ │
|
||||
│ BARNET │ ENFIELD │ 115 │ 1503458 │ ██████████████████████████████ │
|
||||
│ OXFORD │ SOUTH OXFORDSHIRE │ 298 │ 1275200 │ █████████████████████████▌ │
|
||||
│ LONDON │ ISLINGTON │ 2458 │ 1274308 │ █████████████████████████▍ │
|
||||
│ COBHAM │ ELMBRIDGE │ 364 │ 1260005 │ █████████████████████████▏ │
|
||||
│ LONDON │ HOUNSLOW │ 618 │ 1215682 │ ████████████████████████▎ │
|
||||
│ ASCOT │ WINDSOR AND MAIDENHEAD │ 379 │ 1215146 │ ████████████████████████▎ │
|
||||
│ LONDON │ RICHMOND UPON THAMES │ 654 │ 1207551 │ ████████████████████████▏ │
|
||||
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 307 │ 1186220 │ ███████████████████████▋ │
|
||||
│ RICHMOND │ RICHMOND UPON THAMES │ 805 │ 1100420 │ ██████████████████████ │
|
||||
│ LONDON │ HAMMERSMITH AND FULHAM │ 2888 │ 1062959 │ █████████████████████▎ │
|
||||
│ WEYBRIDGE │ ELMBRIDGE │ 607 │ 1027161 │ ████████████████████▌ │
|
||||
│ RADLETT │ HERTSMERE │ 265 │ 1015896 │ ████████████████████▎ │
|
||||
│ SALCOMBE │ SOUTH HAMS │ 124 │ 1014393 │ ████████████████████▎ │
|
||||
│ BURFORD │ WEST OXFORDSHIRE │ 102 │ 993100 │ ███████████████████▋ │
|
||||
│ ESHER │ ELMBRIDGE │ 454 │ 969770 │ ███████████████████▍ │
|
||||
│ HINDHEAD │ WAVERLEY │ 128 │ 967786 │ ███████████████████▎ │
|
||||
│ BROCKENHURST │ NEW FOREST │ 121 │ 967046 │ ███████████████████▎ │
|
||||
│ LEATHERHEAD │ GUILDFORD │ 191 │ 964489 │ ███████████████████▎ │
|
||||
│ GERRARDS CROSS │ BUCKINGHAMSHIRE │ 376 │ 958555 │ ███████████████████▏ │
|
||||
│ EAST MOLESEY │ ELMBRIDGE │ 181 │ 943457 │ ██████████████████▋ │
|
||||
│ OLNEY │ MILTON KEYNES │ 220 │ 942892 │ ██████████████████▋ │
|
||||
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 135 │ 926950 │ ██████████████████▌ │
|
||||
│ HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 509 │ 905732 │ ██████████████████ │
|
||||
│ KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 889 │ 899689 │ █████████████████▊ │
|
||||
│ BELVEDERE │ BEXLEY │ 313 │ 895336 │ █████████████████▊ │
|
||||
│ CRANBROOK │ TUNBRIDGE WELLS │ 404 │ 888190 │ █████████████████▋ │
|
||||
│ LONDON │ EALING │ 2460 │ 865893 │ █████████████████▎ │
|
||||
│ MAIDENHEAD │ BUCKINGHAMSHIRE │ 114 │ 863814 │ █████████████████▎ │
|
||||
│ LONDON │ MERTON │ 1958 │ 857192 │ █████████████████▏ │
|
||||
│ GUILDFORD │ WAVERLEY │ 131 │ 854447 │ █████████████████ │
|
||||
│ LONDON │ HACKNEY │ 3088 │ 846571 │ ████████████████▊ │
|
||||
│ LYMM │ WARRINGTON │ 285 │ 839920 │ ████████████████▋ │
|
||||
│ HARPENDEN │ ST ALBANS │ 606 │ 836994 │ ████████████████▋ │
|
||||
│ LONDON │ WANDSWORTH │ 6113 │ 832292 │ ████████████████▋ │
|
||||
│ LONDON │ SOUTHWARK │ 3612 │ 831319 │ ████████████████▋ │
|
||||
│ BERKHAMSTED │ DACORUM │ 502 │ 830356 │ ████████████████▌ │
|
||||
│ KINGS LANGLEY │ DACORUM │ 137 │ 821358 │ ████████████████▍ │
|
||||
│ TONBRIDGE │ TUNBRIDGE WELLS │ 339 │ 806736 │ ████████████████▏ │
|
||||
│ EPSOM │ REIGATE AND BANSTEAD │ 157 │ 805903 │ ████████████████ │
|
||||
│ WOKING │ GUILDFORD │ 161 │ 803283 │ ████████████████ │
|
||||
│ STOCKBRIDGE │ TEST VALLEY │ 168 │ 801973 │ ████████████████ │
|
||||
│ TEDDINGTON │ RICHMOND UPON THAMES │ 539 │ 798591 │ ███████████████▊ │
|
||||
│ OXFORD │ VALE OF WHITE HORSE │ 329 │ 792907 │ ███████████████▋ │
|
||||
│ LONDON │ BARNET │ 3624 │ 789583 │ ███████████████▋ │
|
||||
│ TWICKENHAM │ RICHMOND UPON THAMES │ 1090 │ 787760 │ ███████████████▋ │
|
||||
│ LUTON │ CENTRAL BEDFORDSHIRE │ 196 │ 786051 │ ███████████████▋ │
|
||||
│ TONBRIDGE │ MAIDSTONE │ 277 │ 785746 │ ███████████████▋ │
|
||||
│ TOWCESTER │ WEST NORTHAMPTONSHIRE │ 186 │ 783532 │ ███████████████▋ │
|
||||
│ LONDON │ LAMBETH │ 4832 │ 783422 │ ███████████████▋ │
|
||||
│ LUTTERWORTH │ HARBOROUGH │ 515 │ 781775 │ ███████████████▋ │
|
||||
│ WOODSTOCK │ WEST OXFORDSHIRE │ 135 │ 777499 │ ███████████████▌ │
|
||||
│ ALRESFORD │ WINCHESTER │ 196 │ 775577 │ ███████████████▌ │
|
||||
│ LONDON │ NEWHAM │ 2942 │ 768551 │ ███████████████▎ │
|
||||
│ ALDERLEY EDGE │ CHESHIRE EAST │ 168 │ 768280 │ ███████████████▎ │
|
||||
│ MARLOW │ BUCKINGHAMSHIRE │ 301 │ 762784 │ ███████████████▎ │
|
||||
│ BILLINGSHURST │ CHICHESTER │ 134 │ 760920 │ ███████████████▏ │
|
||||
│ LONDON │ TOWER HAMLETS │ 4183 │ 759635 │ ███████████████▏ │
|
||||
│ MIDHURST │ CHICHESTER │ 245 │ 759101 │ ███████████████▏ │
|
||||
│ THAMES DITTON │ ELMBRIDGE │ 227 │ 753347 │ ███████████████ │
|
||||
│ POTTERS BAR │ WELWYN HATFIELD │ 163 │ 752926 │ ███████████████ │
|
||||
│ REIGATE │ REIGATE AND BANSTEAD │ 555 │ 740961 │ ██████████████▋ │
|
||||
│ TADWORTH │ REIGATE AND BANSTEAD │ 477 │ 738997 │ ██████████████▋ │
|
||||
│ SEVENOAKS │ SEVENOAKS │ 1074 │ 734658 │ ██████████████▋ │
|
||||
│ PETWORTH │ CHICHESTER │ 138 │ 732432 │ ██████████████▋ │
|
||||
│ BOURNE END │ BUCKINGHAMSHIRE │ 127 │ 730742 │ ██████████████▌ │
|
||||
│ PURLEY │ CROYDON │ 540 │ 727721 │ ██████████████▌ │
|
||||
│ OXTED │ TANDRIDGE │ 320 │ 726078 │ ██████████████▌ │
|
||||
│ LONDON │ HARINGEY │ 2988 │ 724573 │ ██████████████▍ │
|
||||
│ BANSTEAD │ REIGATE AND BANSTEAD │ 373 │ 713834 │ ██████████████▎ │
|
||||
│ PINNER │ HARROW │ 480 │ 712166 │ ██████████████▏ │
|
||||
│ MALMESBURY │ WILTSHIRE │ 293 │ 707747 │ ██████████████▏ │
|
||||
│ RICKMANSWORTH │ THREE RIVERS │ 732 │ 705400 │ ██████████████ │
|
||||
│ SLOUGH │ BUCKINGHAMSHIRE │ 359 │ 705002 │ ██████████████ │
|
||||
│ GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 214 │ 704904 │ ██████████████ │
|
||||
│ READING │ SOUTH OXFORDSHIRE │ 295 │ 701697 │ ██████████████ │
|
||||
│ HYTHE │ FOLKESTONE AND HYTHE │ 457 │ 700334 │ ██████████████ │
|
||||
│ WELWYN │ WELWYN HATFIELD │ 217 │ 699649 │ █████████████▊ │
|
||||
│ CHIGWELL │ EPPING FOREST │ 242 │ 697869 │ █████████████▊ │
|
||||
│ BARNET │ BARNET │ 906 │ 695680 │ █████████████▊ │
|
||||
│ HASLEMERE │ CHICHESTER │ 120 │ 694028 │ █████████████▊ │
|
||||
│ LEATHERHEAD │ MOLE VALLEY │ 748 │ 692026 │ █████████████▋ │
|
||||
│ LONDON │ BRENT │ 1945 │ 690799 │ █████████████▋ │
|
||||
│ HASLEMERE │ WAVERLEY │ 258 │ 690765 │ █████████████▋ │
|
||||
│ NORTHWOOD │ HILLINGDON │ 252 │ 690753 │ █████████████▋ │
|
||||
│ WALTON-ON-THAMES │ ELMBRIDGE │ 871 │ 689431 │ █████████████▋ │
|
||||
│ INGATESTONE │ BRENTWOOD │ 150 │ 688345 │ █████████████▋ │
|
||||
│ OXFORD │ OXFORD │ 1761 │ 686114 │ █████████████▋ │
|
||||
│ CHISLEHURST │ BROMLEY │ 410 │ 682892 │ █████████████▋ │
|
||||
│ KINGS LANGLEY │ THREE RIVERS │ 109 │ 682320 │ █████████████▋ │
|
||||
│ ASHTEAD │ MOLE VALLEY │ 280 │ 680483 │ █████████████▌ │
|
||||
│ WOKING │ SURREY HEATH │ 269 │ 679035 │ █████████████▌ │
|
||||
│ ASCOT │ BRACKNELL FOREST │ 160 │ 678632 │ █████████████▌ │
|
||||
└──────────────────────┴────────────────────────┴──────┴─────────┴────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
100 rows in set. Elapsed: 0.039 sec. Processed 26.25 million rows, 278.03 MB (674.32 million rows/s., 7.14 GB/s.)
|
||||
```
|
||||
|
||||
### Test it in Playground
|
||||
|
||||
The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUIHRvd24sIGRpc3RyaWN0LCBjb3VudCgpIEFTIGMsIHJvdW5kKGF2ZyhwcmljZSkpIEFTIHByaWNlLCBiYXIocHJpY2UsIDAsIDUwMDAwMDAsIDEwMCkgRlJPTSB1a19wcmljZV9wYWlkIFdIRVJFIGRhdGUgPj0gJzIwMjAtMDEtMDEnIEdST1VQIEJZIHRvd24sIGRpc3RyaWN0IEhBVklORyBjID49IDEwMCBPUkRFUiBCWSBwcmljZSBERVNDIExJTUlUIDEwMA==).
|
||||
|
||||
## Let's speed up queries using projections
|
||||
|
||||
[Projections](https://../../sql-reference/statements/alter/projection/) allow to improve queries speed by storing pre-aggregated data.
|
||||
|
||||
### Build a projection
|
||||
|
||||
```
|
||||
-- create an aggregate projection by dimensions (toYear(date), district, town)
|
||||
|
||||
ALTER TABLE uk_price_paid
|
||||
ADD PROJECTION projection_by_year_district_town
|
||||
(
|
||||
SELECT
|
||||
toYear(date),
|
||||
district,
|
||||
town,
|
||||
avg(price),
|
||||
sum(price),
|
||||
count()
|
||||
GROUP BY
|
||||
toYear(date),
|
||||
district,
|
||||
town
|
||||
);
|
||||
|
||||
-- populate the projection for existing data (without it projection will be
|
||||
-- created for only newly inserted data)
|
||||
|
||||
ALTER TABLE uk_price_paid
|
||||
MATERIALIZE PROJECTION projection_by_year_district_town
|
||||
SETTINGS mutations_sync = 1;
|
||||
```
|
||||
|
||||
## Test performance
|
||||
|
||||
Let's run the same 3 queries.
|
||||
|
||||
```
|
||||
-- enable projections for selects
|
||||
set allow_experimental_projection_optimization=1;
|
||||
|
||||
-- Q1) Average price per year:
|
||||
|
||||
SELECT
|
||||
toYear(date) AS year,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 1000000, 80)
|
||||
FROM uk_price_paid
|
||||
GROUP BY year
|
||||
ORDER BY year ASC;
|
||||
|
||||
┌─year─┬──price─┬─bar(round(avg(price)), 0, 1000000, 80)─┐
|
||||
│ 1995 │ 67932 │ █████▍ │
|
||||
│ 1996 │ 71505 │ █████▋ │
|
||||
│ 1997 │ 78532 │ ██████▎ │
|
||||
│ 1998 │ 85435 │ ██████▋ │
|
||||
│ 1999 │ 96036 │ ███████▋ │
|
||||
│ 2000 │ 107478 │ ████████▌ │
|
||||
│ 2001 │ 118886 │ █████████▌ │
|
||||
│ 2002 │ 137940 │ ███████████ │
|
||||
│ 2003 │ 155888 │ ████████████▍ │
|
||||
│ 2004 │ 178885 │ ██████████████▎ │
|
||||
│ 2005 │ 189350 │ ███████████████▏ │
|
||||
│ 2006 │ 203528 │ ████████████████▎ │
|
||||
│ 2007 │ 219377 │ █████████████████▌ │
|
||||
│ 2008 │ 217056 │ █████████████████▎ │
|
||||
│ 2009 │ 213419 │ █████████████████ │
|
||||
│ 2010 │ 236110 │ ██████████████████▊ │
|
||||
│ 2011 │ 232804 │ ██████████████████▌ │
|
||||
│ 2012 │ 238366 │ ███████████████████ │
|
||||
│ 2013 │ 256931 │ ████████████████████▌ │
|
||||
│ 2014 │ 279917 │ ██████████████████████▍ │
|
||||
│ 2015 │ 297264 │ ███████████████████████▋ │
|
||||
│ 2016 │ 313197 │ █████████████████████████ │
|
||||
│ 2017 │ 346070 │ ███████████████████████████▋ │
|
||||
│ 2018 │ 350117 │ ████████████████████████████ │
|
||||
│ 2019 │ 351010 │ ████████████████████████████ │
|
||||
│ 2020 │ 368974 │ █████████████████████████████▌ │
|
||||
│ 2021 │ 384351 │ ██████████████████████████████▋ │
|
||||
└──────┴────────┴────────────────────────────────────────┘
|
||||
|
||||
27 rows in set. Elapsed: 0.003 sec. Processed 106.87 thousand rows, 3.21 MB (31.92 million rows/s., 959.03 MB/s.)
|
||||
|
||||
-- Q2) Average price per year in London:
|
||||
|
||||
SELECT
|
||||
toYear(date) AS year,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 2000000, 100)
|
||||
FROM uk_price_paid
|
||||
WHERE town = 'LONDON'
|
||||
GROUP BY year
|
||||
ORDER BY year ASC;
|
||||
|
||||
┌─year─┬───price─┬─bar(round(avg(price)), 0, 2000000, 100)───────────────┐
|
||||
│ 1995 │ 109112 │ █████▍ │
|
||||
│ 1996 │ 118667 │ █████▊ │
|
||||
│ 1997 │ 136518 │ ██████▋ │
|
||||
│ 1998 │ 152983 │ ███████▋ │
|
||||
│ 1999 │ 180633 │ █████████ │
|
||||
│ 2000 │ 215830 │ ██████████▋ │
|
||||
│ 2001 │ 232996 │ ███████████▋ │
|
||||
│ 2002 │ 263672 │ █████████████▏ │
|
||||
│ 2003 │ 278394 │ █████████████▊ │
|
||||
│ 2004 │ 304665 │ ███████████████▏ │
|
||||
│ 2005 │ 322875 │ ████████████████▏ │
|
||||
│ 2006 │ 356192 │ █████████████████▋ │
|
||||
│ 2007 │ 404055 │ ████████████████████▏ │
|
||||
│ 2008 │ 420741 │ █████████████████████ │
|
||||
│ 2009 │ 427754 │ █████████████████████▍ │
|
||||
│ 2010 │ 480306 │ ████████████████████████ │
|
||||
│ 2011 │ 496274 │ ████████████████████████▋ │
|
||||
│ 2012 │ 519441 │ █████████████████████████▊ │
|
||||
│ 2013 │ 616209 │ ██████████████████████████████▋ │
|
||||
│ 2014 │ 724144 │ ████████████████████████████████████▏ │
|
||||
│ 2015 │ 792112 │ ███████████████████████████████████████▌ │
|
||||
│ 2016 │ 843568 │ ██████████████████████████████████████████▏ │
|
||||
│ 2017 │ 982566 │ █████████████████████████████████████████████████▏ │
|
||||
│ 2018 │ 1016845 │ ██████████████████████████████████████████████████▋ │
|
||||
│ 2019 │ 1043277 │ ████████████████████████████████████████████████████▏ │
|
||||
│ 2020 │ 1003963 │ ██████████████████████████████████████████████████▏ │
|
||||
│ 2021 │ 940794 │ ███████████████████████████████████████████████ │
|
||||
└──────┴─────────┴───────────────────────────────────────────────────────┘
|
||||
|
||||
27 rows in set. Elapsed: 0.005 sec. Processed 106.87 thousand rows, 3.53 MB (23.49 million rows/s., 775.95 MB/s.)
|
||||
|
||||
-- Q3) The most expensive neighborhoods:
|
||||
-- the condition (date >= '2020-01-01') needs to be modified to match projection dimension (toYear(date) >= 2020)
|
||||
|
||||
SELECT
|
||||
town,
|
||||
district,
|
||||
count() AS c,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 5000000, 100)
|
||||
FROM uk_price_paid
|
||||
WHERE toYear(date) >= 2020
|
||||
GROUP BY
|
||||
town,
|
||||
district
|
||||
HAVING c >= 100
|
||||
ORDER BY price DESC
|
||||
LIMIT 100
|
||||
|
||||
┌─town─────────────────┬─district───────────────┬────c─┬───price─┬─bar(round(avg(price)), 0, 5000000, 100)────────────────────────────┐
|
||||
│ LONDON │ CITY OF WESTMINSTER │ 3372 │ 3305225 │ ██████████████████████████████████████████████████████████████████ │
|
||||
│ LONDON │ CITY OF LONDON │ 257 │ 3294478 │ █████████████████████████████████████████████████████████████████▊ │
|
||||
│ LONDON │ KENSINGTON AND CHELSEA │ 2367 │ 2342422 │ ██████████████████████████████████████████████▋ │
|
||||
│ LEATHERHEAD │ ELMBRIDGE │ 108 │ 1927143 │ ██████████████████████████████████████▌ │
|
||||
│ VIRGINIA WATER │ RUNNYMEDE │ 142 │ 1868819 │ █████████████████████████████████████▍ │
|
||||
│ LONDON │ CAMDEN │ 2815 │ 1736788 │ ██████████████████████████████████▋ │
|
||||
│ THORNTON HEATH │ CROYDON │ 521 │ 1733051 │ ██████████████████████████████████▋ │
|
||||
│ WINDLESHAM │ SURREY HEATH │ 103 │ 1717255 │ ██████████████████████████████████▎ │
|
||||
│ BARNET │ ENFIELD │ 115 │ 1503458 │ ██████████████████████████████ │
|
||||
│ OXFORD │ SOUTH OXFORDSHIRE │ 298 │ 1275200 │ █████████████████████████▌ │
|
||||
│ LONDON │ ISLINGTON │ 2458 │ 1274308 │ █████████████████████████▍ │
|
||||
│ COBHAM │ ELMBRIDGE │ 364 │ 1260005 │ █████████████████████████▏ │
|
||||
│ LONDON │ HOUNSLOW │ 618 │ 1215682 │ ████████████████████████▎ │
|
||||
│ ASCOT │ WINDSOR AND MAIDENHEAD │ 379 │ 1215146 │ ████████████████████████▎ │
|
||||
│ LONDON │ RICHMOND UPON THAMES │ 654 │ 1207551 │ ████████████████████████▏ │
|
||||
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 307 │ 1186220 │ ███████████████████████▋ │
|
||||
│ RICHMOND │ RICHMOND UPON THAMES │ 805 │ 1100420 │ ██████████████████████ │
|
||||
│ LONDON │ HAMMERSMITH AND FULHAM │ 2888 │ 1062959 │ █████████████████████▎ │
|
||||
│ WEYBRIDGE │ ELMBRIDGE │ 607 │ 1027161 │ ████████████████████▌ │
|
||||
│ RADLETT │ HERTSMERE │ 265 │ 1015896 │ ████████████████████▎ │
|
||||
│ SALCOMBE │ SOUTH HAMS │ 124 │ 1014393 │ ████████████████████▎ │
|
||||
│ BURFORD │ WEST OXFORDSHIRE │ 102 │ 993100 │ ███████████████████▋ │
|
||||
│ ESHER │ ELMBRIDGE │ 454 │ 969770 │ ███████████████████▍ │
|
||||
│ HINDHEAD │ WAVERLEY │ 128 │ 967786 │ ███████████████████▎ │
|
||||
│ BROCKENHURST │ NEW FOREST │ 121 │ 967046 │ ███████████████████▎ │
|
||||
│ LEATHERHEAD │ GUILDFORD │ 191 │ 964489 │ ███████████████████▎ │
|
||||
│ GERRARDS CROSS │ BUCKINGHAMSHIRE │ 376 │ 958555 │ ███████████████████▏ │
|
||||
│ EAST MOLESEY │ ELMBRIDGE │ 181 │ 943457 │ ██████████████████▋ │
|
||||
│ OLNEY │ MILTON KEYNES │ 220 │ 942892 │ ██████████████████▋ │
|
||||
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 135 │ 926950 │ ██████████████████▌ │
|
||||
│ HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 509 │ 905732 │ ██████████████████ │
|
||||
│ KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 889 │ 899689 │ █████████████████▊ │
|
||||
│ BELVEDERE │ BEXLEY │ 313 │ 895336 │ █████████████████▊ │
|
||||
│ CRANBROOK │ TUNBRIDGE WELLS │ 404 │ 888190 │ █████████████████▋ │
|
||||
│ LONDON │ EALING │ 2460 │ 865893 │ █████████████████▎ │
|
||||
│ MAIDENHEAD │ BUCKINGHAMSHIRE │ 114 │ 863814 │ █████████████████▎ │
|
||||
│ LONDON │ MERTON │ 1958 │ 857192 │ █████████████████▏ │
|
||||
│ GUILDFORD │ WAVERLEY │ 131 │ 854447 │ █████████████████ │
|
||||
│ LONDON │ HACKNEY │ 3088 │ 846571 │ ████████████████▊ │
|
||||
│ LYMM │ WARRINGTON │ 285 │ 839920 │ ████████████████▋ │
|
||||
│ HARPENDEN │ ST ALBANS │ 606 │ 836994 │ ████████████████▋ │
|
||||
│ LONDON │ WANDSWORTH │ 6113 │ 832292 │ ████████████████▋ │
|
||||
│ LONDON │ SOUTHWARK │ 3612 │ 831319 │ ████████████████▋ │
|
||||
│ BERKHAMSTED │ DACORUM │ 502 │ 830356 │ ████████████████▌ │
|
||||
│ KINGS LANGLEY │ DACORUM │ 137 │ 821358 │ ████████████████▍ │
|
||||
│ TONBRIDGE │ TUNBRIDGE WELLS │ 339 │ 806736 │ ████████████████▏ │
|
||||
│ EPSOM │ REIGATE AND BANSTEAD │ 157 │ 805903 │ ████████████████ │
|
||||
│ WOKING │ GUILDFORD │ 161 │ 803283 │ ████████████████ │
|
||||
│ STOCKBRIDGE │ TEST VALLEY │ 168 │ 801973 │ ████████████████ │
|
||||
│ TEDDINGTON │ RICHMOND UPON THAMES │ 539 │ 798591 │ ███████████████▊ │
|
||||
│ OXFORD │ VALE OF WHITE HORSE │ 329 │ 792907 │ ███████████████▋ │
|
||||
│ LONDON │ BARNET │ 3624 │ 789583 │ ███████████████▋ │
|
||||
│ TWICKENHAM │ RICHMOND UPON THAMES │ 1090 │ 787760 │ ███████████████▋ │
|
||||
│ LUTON │ CENTRAL BEDFORDSHIRE │ 196 │ 786051 │ ███████████████▋ │
|
||||
│ TONBRIDGE │ MAIDSTONE │ 277 │ 785746 │ ███████████████▋ │
|
||||
│ TOWCESTER │ WEST NORTHAMPTONSHIRE │ 186 │ 783532 │ ███████████████▋ │
|
||||
│ LONDON │ LAMBETH │ 4832 │ 783422 │ ███████████████▋ │
|
||||
│ LUTTERWORTH │ HARBOROUGH │ 515 │ 781775 │ ███████████████▋ │
|
||||
│ WOODSTOCK │ WEST OXFORDSHIRE │ 135 │ 777499 │ ███████████████▌ │
|
||||
│ ALRESFORD │ WINCHESTER │ 196 │ 775577 │ ███████████████▌ │
|
||||
│ LONDON │ NEWHAM │ 2942 │ 768551 │ ███████████████▎ │
|
||||
│ ALDERLEY EDGE │ CHESHIRE EAST │ 168 │ 768280 │ ███████████████▎ │
|
||||
│ MARLOW │ BUCKINGHAMSHIRE │ 301 │ 762784 │ ███████████████▎ │
|
||||
│ BILLINGSHURST │ CHICHESTER │ 134 │ 760920 │ ███████████████▏ │
|
||||
│ LONDON │ TOWER HAMLETS │ 4183 │ 759635 │ ███████████████▏ │
|
||||
│ MIDHURST │ CHICHESTER │ 245 │ 759101 │ ███████████████▏ │
|
||||
│ THAMES DITTON │ ELMBRIDGE │ 227 │ 753347 │ ███████████████ │
|
||||
│ POTTERS BAR │ WELWYN HATFIELD │ 163 │ 752926 │ ███████████████ │
|
||||
│ REIGATE │ REIGATE AND BANSTEAD │ 555 │ 740961 │ ██████████████▋ │
|
||||
│ TADWORTH │ REIGATE AND BANSTEAD │ 477 │ 738997 │ ██████████████▋ │
|
||||
│ SEVENOAKS │ SEVENOAKS │ 1074 │ 734658 │ ██████████████▋ │
|
||||
│ PETWORTH │ CHICHESTER │ 138 │ 732432 │ ██████████████▋ │
|
||||
│ BOURNE END │ BUCKINGHAMSHIRE │ 127 │ 730742 │ ██████████████▌ │
|
||||
│ PURLEY │ CROYDON │ 540 │ 727721 │ ██████████████▌ │
|
||||
│ OXTED │ TANDRIDGE │ 320 │ 726078 │ ██████████████▌ │
|
||||
│ LONDON │ HARINGEY │ 2988 │ 724573 │ ██████████████▍ │
|
||||
│ BANSTEAD │ REIGATE AND BANSTEAD │ 373 │ 713834 │ ██████████████▎ │
|
||||
│ PINNER │ HARROW │ 480 │ 712166 │ ██████████████▏ │
|
||||
│ MALMESBURY │ WILTSHIRE │ 293 │ 707747 │ ██████████████▏ │
|
||||
│ RICKMANSWORTH │ THREE RIVERS │ 732 │ 705400 │ ██████████████ │
|
||||
│ SLOUGH │ BUCKINGHAMSHIRE │ 359 │ 705002 │ ██████████████ │
|
||||
│ GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 214 │ 704904 │ ██████████████ │
|
||||
│ READING │ SOUTH OXFORDSHIRE │ 295 │ 701697 │ ██████████████ │
|
||||
│ HYTHE │ FOLKESTONE AND HYTHE │ 457 │ 700334 │ ██████████████ │
|
||||
│ WELWYN │ WELWYN HATFIELD │ 217 │ 699649 │ █████████████▊ │
|
||||
│ CHIGWELL │ EPPING FOREST │ 242 │ 697869 │ █████████████▊ │
|
||||
│ BARNET │ BARNET │ 906 │ 695680 │ █████████████▊ │
|
||||
│ HASLEMERE │ CHICHESTER │ 120 │ 694028 │ █████████████▊ │
|
||||
│ LEATHERHEAD │ MOLE VALLEY │ 748 │ 692026 │ █████████████▋ │
|
||||
│ LONDON │ BRENT │ 1945 │ 690799 │ █████████████▋ │
|
||||
│ HASLEMERE │ WAVERLEY │ 258 │ 690765 │ █████████████▋ │
|
||||
│ NORTHWOOD │ HILLINGDON │ 252 │ 690753 │ █████████████▋ │
|
||||
│ WALTON-ON-THAMES │ ELMBRIDGE │ 871 │ 689431 │ █████████████▋ │
|
||||
│ INGATESTONE │ BRENTWOOD │ 150 │ 688345 │ █████████████▋ │
|
||||
│ OXFORD │ OXFORD │ 1761 │ 686114 │ █████████████▋ │
|
||||
│ CHISLEHURST │ BROMLEY │ 410 │ 682892 │ █████████████▋ │
|
||||
│ KINGS LANGLEY │ THREE RIVERS │ 109 │ 682320 │ █████████████▋ │
|
||||
│ ASHTEAD │ MOLE VALLEY │ 280 │ 680483 │ █████████████▌ │
|
||||
│ WOKING │ SURREY HEATH │ 269 │ 679035 │ █████████████▌ │
|
||||
│ ASCOT │ BRACKNELL FOREST │ 160 │ 678632 │ █████████████▌ │
|
||||
└──────────────────────┴────────────────────────┴──────┴─────────┴────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
100 rows in set. Elapsed: 0.005 sec. Processed 12.85 thousand rows, 813.40 KB (2.73 million rows/s., 172.95 MB/s.)
|
||||
```
|
||||
|
||||
All 3 queries work much faster and read fewer rows.
|
||||
|
||||
```
|
||||
Q1)
|
||||
no projection: 27 rows in set. Elapsed: 0.027 sec. Processed 26.25 million rows, 157.49 MB (955.96 million rows/s., 5.74 GB/s.)
|
||||
projection: 27 rows in set. Elapsed: 0.003 sec. Processed 106.87 thousand rows, 3.21 MB (31.92 million rows/s., 959.03 MB/s.)
|
||||
```
|
2
docs/en/interfaces/third-party/gui.md
vendored
2
docs/en/interfaces/third-party/gui.md
vendored
@ -84,6 +84,8 @@ Features:
|
||||
- Table data preview.
|
||||
- Full-text search.
|
||||
|
||||
By default, DBeaver does not connect using a session (the CLI for example does). If you require session support (for example to set settings for your session), edit the driver connection properties and set session_id to a random string (it uses the http connection under the hood). Then you can use any setting from the query window
|
||||
|
||||
### clickhouse-cli {#clickhouse-cli}
|
||||
|
||||
[clickhouse-cli](https://github.com/hatarist/clickhouse-cli) is an alternative command-line client for ClickHouse, written in Python 3.
|
||||
|
@ -43,7 +43,7 @@ toc_title: Integrations
|
||||
- Monitoring
|
||||
- [Graphite](https://graphiteapp.org)
|
||||
- [graphouse](https://github.com/yandex/graphouse)
|
||||
- [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) +
|
||||
- [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse)
|
||||
- [graphite-clickhouse](https://github.com/lomik/graphite-clickhouse)
|
||||
- [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer) - optimizes staled partitions in [\*GraphiteMergeTree](../../engines/table-engines/mergetree-family/graphitemergetree.md#graphitemergetree) if rules from [rollup configuration](../../engines/table-engines/mergetree-family/graphitemergetree.md#rollup-configuration) could be applied
|
||||
- [Grafana](https://grafana.com/)
|
||||
|
@ -115,6 +115,7 @@ toc_title: Adopters
|
||||
| <a href="http://english.sina.com/index.html" class="favicon">Sina</a> | News | — | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/6.%20ClickHouse最佳实践%20高鹏_新浪.pdf) |
|
||||
| <a href="https://smi2.ru/" class="favicon">SMI2</a> | News | Analytics | — | — | [Blog Post in Russian, November 2017](https://habr.com/ru/company/smi2/blog/314558/) |
|
||||
| <a href="https://www.spark.co.nz/" class="favicon">Spark New Zealand</a> | Telecommunications | Security Operations | — | — | [Blog Post, Feb 2020](https://blog.n0p.me/2020/02/2020-02-05-dnsmonster/) |
|
||||
| <a href="https://splitbee.io" class="favicon">Splitbee</a> | Analytics | Main Product | — | — | [Blog Post, Mai 2021](https://splitbee.io/blog/new-pricing) |
|
||||
| <a href="https://www.splunk.com/" class="favicon">Splunk</a> | Business Analytics | Main product | — | — | [Slides in English, January 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup12/splunk.pdf) |
|
||||
| <a href="https://www.spotify.com" class="favicon">Spotify</a> | Music | Experimentation | — | — | [Slides, July 2018](https://www.slideshare.net/glebus/using-clickhouse-for-experimentation-104247173) |
|
||||
| <a href="https://www.staffcop.ru/" class="favicon">Staffcop</a> | Information Security | Main Product | — | — | [Official website, Documentation](https://www.staffcop.ru/sce43) |
|
||||
|
@ -30,21 +30,25 @@ Other common parameters are inherited from clickhouse-server config (`listen_hos
|
||||
|
||||
Internal coordination settings are located in `<keeper_server>.<coordination_settings>` section:
|
||||
|
||||
- `operation_timeout_ms` — timeout for a single client operation
|
||||
- `session_timeout_ms` — timeout for client session
|
||||
- `dead_session_check_period_ms` — how often clickhouse-keeper check dead sessions and remove them
|
||||
- `heart_beat_interval_ms` — how often a clickhouse-keeper leader will send heartbeats to followers
|
||||
- `election_timeout_lower_bound_ms` — if follower didn't receive heartbeats from the leader in this interval, then it can initiate leader election
|
||||
- `election_timeout_upper_bound_ms` — if follower didn't receive heartbeats from the leader in this interval, then it must initiate leader election
|
||||
- `rotate_log_storage_interval` — how many logs to store in a single file
|
||||
- `reserved_log_items` — how many coordination logs to store before compaction
|
||||
- `snapshot_distance` — how often clickhouse-keeper will create new snapshots (in the number of logs)
|
||||
- `snapshots_to_keep` — how many snapshots to keep
|
||||
- `stale_log_gap` — the threshold when leader consider follower as stale and send snapshot to it instead of logs
|
||||
- `force_sync` — call `fsync` on each write to coordination log
|
||||
- `raft_logs_level` — text logging level about coordination (trace, debug, and so on)
|
||||
- `shutdown_timeout` — wait to finish internal connections and shutdown
|
||||
- `startup_timeout` — if the server doesn't connect to other quorum participants in the specified timeout it will terminate
|
||||
- `operation_timeout_ms` — timeout for a single client operation (default: 10000)
|
||||
- `session_timeout_ms` — timeout for client session (default: 30000)
|
||||
- `dead_session_check_period_ms` — how often clickhouse-keeper check dead sessions and remove them (default: 500)
|
||||
- `heart_beat_interval_ms` — how often a clickhouse-keeper leader will send heartbeats to followers (default: 500)
|
||||
- `election_timeout_lower_bound_ms` — if follower didn't receive heartbeats from the leader in this interval, then it can initiate leader election (default: 1000)
|
||||
- `election_timeout_upper_bound_ms` — if follower didn't receive heartbeats from the leader in this interval, then it must initiate leader election (default: 2000)
|
||||
- `rotate_log_storage_interval` — how many log records to store in a single file (default: 100000)
|
||||
- `reserved_log_items` — how many coordination log records to store before compaction (default: 100000)
|
||||
- `snapshot_distance` — how often clickhouse-keeper will create new snapshots (in the number of records in logs) (default: 100000)
|
||||
- `snapshots_to_keep` — how many snapshots to keep (default: 3)
|
||||
- `stale_log_gap` — the threshold when leader consider follower as stale and send snapshot to it instead of logs (default: 10000)
|
||||
- `fresh_log_gap` - when node became fresh (default: 200)
|
||||
- `max_requests_batch_size` - max size of batch in requests count before it will be sent to RAFT (default: 100)
|
||||
- `force_sync` — call `fsync` on each write to coordination log (default: true)
|
||||
- `quorum_reads` - execute read requests as writes through whole RAFT consesus with similar speed (default: false)
|
||||
- `raft_logs_level` — text logging level about coordination (trace, debug, and so on) (default: system default)
|
||||
- `auto_forwarding` - allow to forward write requests from followers to leader (default: true)
|
||||
- `shutdown_timeout` — wait to finish internal connections and shutdown (ms) (default: 5000)
|
||||
- `startup_timeout` — if the server doesn't connect to other quorum participants in the specified timeout it will terminate (ms) (default: 30000)
|
||||
|
||||
Quorum configuration is located in `<keeper_server>.<raft_configuration>` section and contain servers description. The only parameter for the whole quorum is `secure`, which enables encrypted connection for communication between quorum participants. The main parameters for each `<server>` are:
|
||||
|
||||
|
@ -5,50 +5,67 @@ toc_title: Testing Hardware
|
||||
|
||||
# How to Test Your Hardware with ClickHouse {#how-to-test-your-hardware-with-clickhouse}
|
||||
|
||||
With this instruction you can run basic ClickHouse performance test on any server without installation of ClickHouse packages.
|
||||
You can run basic ClickHouse performance test on any server without installation of ClickHouse packages.
|
||||
|
||||
1. Go to “commits” page: https://github.com/ClickHouse/ClickHouse/commits/master
|
||||
2. Click on the first green check mark or red cross with green “ClickHouse Build Check” and click on the “Details” link near “ClickHouse Build Check”. There is no such link in some commits, for example commits with documentation. In this case, choose the nearest commit having this link.
|
||||
3. Copy the link to `clickhouse` binary for amd64 or aarch64.
|
||||
4. ssh to the server and download it with wget:
|
||||
|
||||
## Automated Run
|
||||
|
||||
You can run benchmark with a single script.
|
||||
|
||||
1. Download the script.
|
||||
```
|
||||
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/hardware.sh
|
||||
```
|
||||
|
||||
2. Run the script.
|
||||
```
|
||||
chmod a+x ./hardware.sh
|
||||
./hardware.sh
|
||||
```
|
||||
|
||||
3. Copy the output and send it to clickhouse-feedback@yandex-team.com
|
||||
|
||||
All the results are published here: https://clickhouse.tech/benchmark/hardware/
|
||||
|
||||
|
||||
## Manual Run
|
||||
|
||||
Alternatively you can perform benchmark in the following steps.
|
||||
|
||||
1. ssh to the server and download the binary with wget:
|
||||
```bash
|
||||
# These links are outdated, please obtain the fresh link from the "commits" page.
|
||||
# For amd64:
|
||||
wget https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_build_check/gcc-10_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse
|
||||
wget https://builds.clickhouse.tech/master/amd64/clickhouse
|
||||
# For aarch64:
|
||||
wget https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_special_build_check/clang-10-aarch64_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse
|
||||
wget https://builds.clickhouse.tech/master/aarch64/clickhouse
|
||||
# Then do:
|
||||
chmod a+x clickhouse
|
||||
```
|
||||
5. Download benchmark files:
|
||||
2. Download benchmark files:
|
||||
```bash
|
||||
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/clickhouse/benchmark-new.sh
|
||||
chmod a+x benchmark-new.sh
|
||||
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/clickhouse/queries.sql
|
||||
```
|
||||
6. Download test data according to the [Yandex.Metrica dataset](../getting-started/example-datasets/metrica.md) instruction (“hits” table containing 100 million rows).
|
||||
3. Download test data according to the [Yandex.Metrica dataset](../getting-started/example-datasets/metrica.md) instruction (“hits” table containing 100 million rows).
|
||||
```bash
|
||||
wget https://datasets.clickhouse.tech/hits/partitions/hits_100m_obfuscated_v1.tar.xz
|
||||
tar xvf hits_100m_obfuscated_v1.tar.xz -C .
|
||||
mv hits_100m_obfuscated_v1/* .
|
||||
```
|
||||
7. Run the server:
|
||||
4. Run the server:
|
||||
```bash
|
||||
./clickhouse server
|
||||
```
|
||||
8. Check the data: ssh to the server in another terminal
|
||||
5. Check the data: ssh to the server in another terminal
|
||||
```bash
|
||||
./clickhouse client --query "SELECT count() FROM hits_100m_obfuscated"
|
||||
100000000
|
||||
```
|
||||
9. Edit the benchmark-new.sh, change `clickhouse-client` to `./clickhouse client` and add `--max_memory_usage 100000000000` parameter.
|
||||
```bash
|
||||
mcedit benchmark-new.sh
|
||||
```
|
||||
10. Run the benchmark:
|
||||
6. Run the benchmark:
|
||||
```bash
|
||||
./benchmark-new.sh hits_100m_obfuscated
|
||||
```
|
||||
11. Send the numbers and the info about your hardware configuration to clickhouse-feedback@yandex-team.com
|
||||
7. Send the numbers and the info about your hardware configuration to clickhouse-feedback@yandex-team.com
|
||||
|
||||
All the results are published here: https://clickhouse.tech/benchmark/hardware/
|
||||
|
@ -34,6 +34,7 @@ Configuration template:
|
||||
<min_part_size>...</min_part_size>
|
||||
<min_part_size_ratio>...</min_part_size_ratio>
|
||||
<method>...</method>
|
||||
<level>...</level>
|
||||
</case>
|
||||
...
|
||||
</compression>
|
||||
@ -43,7 +44,8 @@ Configuration template:
|
||||
|
||||
- `min_part_size` – The minimum size of a data part.
|
||||
- `min_part_size_ratio` – The ratio of the data part size to the table size.
|
||||
- `method` – Compression method. Acceptable values: `lz4` or `zstd`.
|
||||
- `method` – Compression method. Acceptable values: `lz4`, `lz4hc`, `zstd`.
|
||||
- `level` – Compression level. See [Codecs](../../sql-reference/statements/create/table/#create-query-general-purpose-codecs).
|
||||
|
||||
You can configure multiple `<case>` sections.
|
||||
|
||||
@ -62,10 +64,34 @@ If no conditions met for a data part, ClickHouse uses the `lz4` compression.
|
||||
<min_part_size>10000000000</min_part_size>
|
||||
<min_part_size_ratio>0.01</min_part_size_ratio>
|
||||
<method>zstd</method>
|
||||
<level>1</level>
|
||||
</case>
|
||||
</compression>
|
||||
```
|
||||
|
||||
<!--
|
||||
## encryption {#server-settings-encryption}
|
||||
|
||||
Configures a command to obtain a key to be used by [encryption codecs](../../sql-reference/statements/create/table.md#create-query-encryption-codecs). The command, or a shell script, is expected to write a Base64-encoded key of any length to the stdout.
|
||||
|
||||
**Example**
|
||||
|
||||
For Linux with systemd:
|
||||
|
||||
```xml
|
||||
<encryption>
|
||||
<key_command>/usr/bin/systemd-ask-password --id="clickhouse-server" --timeout=0 "Enter the ClickHouse encryption passphrase:" | base64</key_command>
|
||||
</encryption>
|
||||
```
|
||||
|
||||
For other systems:
|
||||
|
||||
```xml
|
||||
<encryption>
|
||||
<key_command><![CDATA[IFS=; echo -n >/dev/tty "Enter the ClickHouse encryption passphrase: "; stty=`stty -F /dev/tty -g`; stty -F /dev/tty -echo; read k </dev/tty; stty -F /dev/tty "$stty"; echo -n $k | base64]]></key_command>
|
||||
</encryption>
|
||||
```
|
||||
-->
|
||||
## custom_settings_prefixes {#custom_settings_prefixes}
|
||||
|
||||
List of prefixes for [custom settings](../../operations/settings/index.md#custom_settings). The prefixes must be separated with commas.
|
||||
@ -713,7 +739,7 @@ Keys for server/client settings:
|
||||
- extendedVerification – Automatically extended verification of certificates after the session ends. Acceptable values: `true`, `false`.
|
||||
- requireTLSv1 – Require a TLSv1 connection. Acceptable values: `true`, `false`.
|
||||
- requireTLSv1_1 – Require a TLSv1.1 connection. Acceptable values: `true`, `false`.
|
||||
- requireTLSv1 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`.
|
||||
- requireTLSv1_2 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`.
|
||||
- fips – Activates OpenSSL FIPS mode. Supported if the library’s OpenSSL version supports FIPS.
|
||||
- privateKeyPassphraseHandler – Class (PrivateKeyPassphraseHandler subclass) that requests the passphrase for accessing the private key. For example: `<privateKeyPassphraseHandler>`, `<name>KeyFileHandler</name>`, `<options><password>test</password></options>`, `</privateKeyPassphraseHandler>`.
|
||||
- invalidCertificateHandler – Class (a subclass of CertificateHandler) for verifying invalid certificates. For example: `<invalidCertificateHandler> <name>ConsoleCertificateHandler</name> </invalidCertificateHandler>` .
|
||||
@ -866,6 +892,33 @@ If the table does not exist, ClickHouse will create it. If the structure of the
|
||||
</query_thread_log>
|
||||
```
|
||||
|
||||
## query_views_log {#server_configuration_parameters-query_views_log}
|
||||
|
||||
Setting for logging views dependant of queries received with the [log_query_views=1](../../operations/settings/settings.md#settings-log-query-views) setting.
|
||||
|
||||
Queries are logged in the [system.query_views_log](../../operations/system-tables/query_thread_log.md#system_tables-query_views_log) table, not in a separate file. You can change the name of the table in the `table` parameter (see below).
|
||||
|
||||
Use the following parameters to configure logging:
|
||||
|
||||
- `database` – Name of the database.
|
||||
- `table` – Name of the system table the queries will be logged in.
|
||||
- `partition_by` — [Custom partitioning key](../../engines/table-engines/mergetree-family/custom-partitioning-key.md) for a system table. Can't be used if `engine` defined.
|
||||
- `engine` - [MergeTree Engine Definition](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) for a system table. Can't be used if `partition_by` defined.
|
||||
- `flush_interval_milliseconds` – Interval for flushing data from the buffer in memory to the table.
|
||||
|
||||
If the table does not exist, ClickHouse will create it. If the structure of the query views log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
|
||||
|
||||
**Example**
|
||||
|
||||
``` xml
|
||||
<query_views_log>
|
||||
<database>system</database>
|
||||
<table>query_views_log</table>
|
||||
<partition_by>toYYYYMM(event_date)</partition_by>
|
||||
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
|
||||
</query_views_log>
|
||||
```
|
||||
|
||||
## text_log {#server_configuration_parameters-text_log}
|
||||
|
||||
Settings for the [text_log](../../operations/system-tables/text_log.md#system_tables-text_log) system table for logging text messages.
|
||||
|
@ -280,14 +280,13 @@ Default value: `0`.
|
||||
|
||||
## check_sample_column_is_correct {#check_sample_column_is_correct}
|
||||
|
||||
Enables to check column for sampling or sampling expression is correct at table creation.
|
||||
Enables the check at table creation, that the data type of a column for sampling or sampling expression is correct. The data type must be one of unsigned [integer types](../../sql-reference/data-types/int-uint.md): `UInt8`, `UInt16`, `UInt32`, `UInt64`.
|
||||
|
||||
Possible values:
|
||||
|
||||
- true — Check column or sampling expression is correct at table creation.
|
||||
- false — Do not check column or sampling expression is correct at table creation.
|
||||
- true — The check is enabled.
|
||||
- false — The check is disabled at table creation.
|
||||
|
||||
Default value: `true`.
|
||||
|
||||
By default, the ClickHouse server check column for sampling or sampling expression at table creation. If you already had tables with incorrect sampling expression, set value `false` to make ClickHouse server do not raise exception when ClickHouse server is starting.
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/settings/merge_tree_settings/) <!--hide-->
|
||||
By default, the ClickHouse server checks at table creation the data type of a column for sampling or sampling expression. If you already have tables with incorrect sampling expression and do not want the server to raise an exception during startup, set `check_sample_column_is_correct` to `false`.
|
||||
|
@ -28,7 +28,7 @@ Structure of the `users` section:
|
||||
<profile>profile_name</profile>
|
||||
|
||||
<quota>default</quota>
|
||||
|
||||
<default_database>default<default_database>
|
||||
<databases>
|
||||
<database_name>
|
||||
<table_name>
|
||||
|
@ -20,6 +20,29 @@ Possible values:
|
||||
- `global` — Replaces the `IN`/`JOIN` query with `GLOBAL IN`/`GLOBAL JOIN.`
|
||||
- `allow` — Allows the use of these types of subqueries.
|
||||
|
||||
## prefer_global_in_and_join {#prefer-global-in-and-join}
|
||||
|
||||
Enables the replacement of `IN`/`JOIN` operators with `GLOBAL IN`/`GLOBAL JOIN`.
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — Disabled. `IN`/`JOIN` operators are not replaced with `GLOBAL IN`/`GLOBAL JOIN`.
|
||||
- 1 — Enabled. `IN`/`JOIN` operators are replaced with `GLOBAL IN`/`GLOBAL JOIN`.
|
||||
|
||||
Default value: `0`.
|
||||
|
||||
**Usage**
|
||||
|
||||
Although `SET distributed_product_mode=global` can change the queries behavior for the distributed tables, it's not suitable for local tables or tables from external resources. Here is when the `prefer_global_in_and_join` setting comes into play.
|
||||
|
||||
For example, we have query serving nodes that contain local tables, which are not suitable for distribution. We need to scatter their data on the fly during distributed processing with the `GLOBAL` keyword — `GLOBAL IN`/`GLOBAL JOIN`.
|
||||
|
||||
Another use case of `prefer_global_in_and_join` is accessing tables created by external engines. This setting helps to reduce the number of calls to external sources while joining such tables: only one call per query.
|
||||
|
||||
**See also:**
|
||||
|
||||
- [Distributed subqueries](../../sql-reference/operators/in.md#select-distributed-subqueries) for more information on how to use `GLOBAL IN`/`GLOBAL JOIN`
|
||||
|
||||
## enable_optimize_predicate_expression {#enable-optimize-predicate-expression}
|
||||
|
||||
Turns on predicate pushdown in `SELECT` queries.
|
||||
@ -109,6 +132,21 @@ Enables or disables [fsync](http://pubs.opengroup.org/onlinepubs/9699919799/func
|
||||
|
||||
It makes sense to disable it if the server has millions of tiny tables that are constantly being created and destroyed.
|
||||
|
||||
## function_range_max_elements_in_block {#settings-function_range_max_elements_in_block}
|
||||
|
||||
Sets the safety threshold for data volume generated by function [range](../../sql-reference/functions/array-functions.md#range). Defines the maximum number of values generated by function per block of data (sum of array sizes for every row in a block).
|
||||
|
||||
Possible values:
|
||||
|
||||
- Positive integer.
|
||||
|
||||
Default value: `500,000,000`.
|
||||
|
||||
**See Also**
|
||||
|
||||
- [max_block_size](#setting-max_block_size)
|
||||
- [min_insert_block_size_rows](#min-insert-block-size-rows)
|
||||
|
||||
## enable_http_compression {#settings-enable_http_compression}
|
||||
|
||||
Enables or disables data compression in the response to an HTTP request.
|
||||
@ -153,6 +191,26 @@ Possible values:
|
||||
|
||||
Default value: 1048576.
|
||||
|
||||
## table_function_remote_max_addresses {#table_function_remote_max_addresses}
|
||||
|
||||
Sets the maximum number of addresses generated from patterns for the [remote](../../sql-reference/table-functions/remote.md) function.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Positive integer.
|
||||
|
||||
Default value: `1000`.
|
||||
|
||||
## glob_expansion_max_elements {#glob_expansion_max_elements }
|
||||
|
||||
Sets the maximum number of addresses generated from patterns for external storages and table functions (like [url](../../sql-reference/table-functions/url.md)) except the `remote` function.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Positive integer.
|
||||
|
||||
Default value: `1000`.
|
||||
|
||||
## send_progress_in_http_headers {#settings-send_progress_in_http_headers}
|
||||
|
||||
Enables or disables `X-ClickHouse-Progress` HTTP response headers in `clickhouse-server` responses.
|
||||
@ -725,6 +783,26 @@ Possible value:
|
||||
|
||||
Default value: 2013265920.
|
||||
|
||||
## merge_tree_clear_old_temporary_directories_interval_seconds {#setting-merge-tree-clear-old-temporary-directories-interval-seconds}
|
||||
|
||||
The interval in seconds for ClickHouse to execute the cleanup old temporary directories.
|
||||
|
||||
Possible value:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 60.
|
||||
|
||||
## merge_tree_clear_old_parts_interval_seconds {#setting-merge-tree-clear-old-parts-interval-seconds}
|
||||
|
||||
The interval in seconds for ClickHouse to execute the cleanup old parts, WALs, and mutations.
|
||||
|
||||
Possible value:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 1.
|
||||
|
||||
## min_bytes_to_use_direct_io {#settings-min-bytes-to-use-direct-io}
|
||||
|
||||
The minimum data volume required for using direct I/O access to the storage disk.
|
||||
@ -812,7 +890,7 @@ log_queries_min_type='EXCEPTION_WHILE_PROCESSING'
|
||||
|
||||
Setting up query threads logging.
|
||||
|
||||
Queries’ threads runned by ClickHouse with this setup are logged according to the rules in the [query_thread_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_thread_log) server configuration parameter.
|
||||
Queries’ threads run by ClickHouse with this setup are logged according to the rules in the [query_thread_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_thread_log) server configuration parameter.
|
||||
|
||||
Example:
|
||||
|
||||
@ -820,6 +898,19 @@ Example:
|
||||
log_query_threads=1
|
||||
```
|
||||
|
||||
## log_query_views {#settings-log-query-views}
|
||||
|
||||
Setting up query views logging.
|
||||
|
||||
When a query run by ClickHouse with this setup on has associated views (materialized or live views), they are logged in the [query_views_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_views_log) server configuration parameter.
|
||||
|
||||
Example:
|
||||
|
||||
``` text
|
||||
log_query_views=1
|
||||
```
|
||||
|
||||
|
||||
## log_comment {#settings-log-comment}
|
||||
|
||||
Specifies the value for the `log_comment` field of the [system.query_log](../system-tables/query_log.md) table and comment text for the server log.
|
||||
@ -2024,13 +2115,13 @@ Default value: 16.
|
||||
|
||||
## merge_selecting_sleep_ms {#merge_selecting_sleep_ms}
|
||||
|
||||
Sleep time for merge selecting when no part selected, a lower setting will trigger selecting tasks in background_schedule_pool frequently which result in large amount of requests to zookeeper in large-scale clusters
|
||||
Sleep time for merge selecting when no part is selected. A lower setting triggers selecting tasks in `background_schedule_pool` frequently, which results in a large number of requests to Zookeeper in large-scale clusters.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 5000
|
||||
Default value: `5000`.
|
||||
|
||||
## parallel_distributed_insert_select {#parallel_distributed_insert_select}
|
||||
|
||||
@ -2907,7 +2998,7 @@ Result:
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
Note that this setting influences [Materialized view](../../sql-reference/statements/create/view.md#materialized) and [MaterializeMySQL](../../engines/database-engines/materialize-mysql.md) behaviour.
|
||||
Note that this setting influences [Materialized view](../../sql-reference/statements/create/view.md#materialized) and [MaterializedMySQL](../../engines/database-engines/materialized-mysql.md) behaviour.
|
||||
|
||||
## engine_file_empty_if_not_exists {#engine-file-empty_if-not-exists}
|
||||
|
||||
@ -3302,3 +3393,30 @@ Possible values:
|
||||
- 1 — The `LowCardinality` type is converted to the `DICTIONARY` type.
|
||||
|
||||
Default value: `0`.
|
||||
|
||||
## materialized_postgresql_max_block_size {#materialized-postgresql-max-block-size}
|
||||
|
||||
Sets the number of rows collected in memory before flushing data into PostgreSQL database table.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Positive integer.
|
||||
|
||||
Default value: `65536`.
|
||||
|
||||
## materialized_postgresql_tables_list {#materialized-postgresql-tables-list}
|
||||
|
||||
Sets a comma-separated list of PostgreSQL database tables, which will be replicated via [MaterializedPostgreSQL](../../engines/database-engines/materialized-postgresql.md) database engine.
|
||||
|
||||
Default value: empty list — means whole PostgreSQL database will be replicated.
|
||||
|
||||
## materialized_postgresql_allow_automatic_update {#materialized-postgresql-allow-automatic-update}
|
||||
|
||||
Allow reloading table in the background, when schema changes are detected. DDL queries on the PostgreSQL side are not replicated via ClickHouse [MaterializedPostgreSQL](../../engines/database-engines/materialized-postgresql.md) engine, because it is not allowed with PostgreSQL logical replication protocol, but the fact of DDL changes is detected transactionally. In this case, the default behaviour is to stop replicating those tables once DDL is detected. However, if this setting is enabled, then, instead of stopping the replication of those tables, they will be reloaded in the background via database snapshot without data losses and replication will continue for them.
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — The table is not automatically updated in the background, when schema changes are detected.
|
||||
- 1 — The table is automatically updated in the background, when schema changes are detected.
|
||||
|
||||
Default value: `0`.
|
||||
|
14
docs/en/operations/storing-data.md
Normal file
14
docs/en/operations/storing-data.md
Normal file
@ -0,0 +1,14 @@
|
||||
---
|
||||
toc_priority: 68
|
||||
toc_title: External Disks for Storing Data
|
||||
---
|
||||
|
||||
# External Disks for Storing Data {#external-disks}
|
||||
|
||||
Data, processed in ClickHouse, is usually stored in the local file system — on the same machine with the ClickHouse server. That requires large-capacity disks, which can be expensive enough. To avoid that you can store the data remotely — on [Amazon s3](https://aws.amazon.com/s3/) disks or in the Hadoop Distributed File System ([HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html)).
|
||||
|
||||
To work with data stored on `Amazon s3` disks use [s3](../engines/table-engines/integrations/s3.md) table engine, and to work with data in the Hadoop Distributed File System — [HDFS](../engines/table-engines/integrations/hdfs.md) table engine.
|
||||
|
||||
## Zero-copy Replication {#zero-copy}
|
||||
|
||||
ClickHouse supports zero-copy replication for `s3` and `HDFS` disks, which means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.
|
@ -62,4 +62,3 @@ exception_code: ZOK
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/operations/system_tables/distributed_ddl_queuedistributed_ddl_queue.md) <!--hide-->
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user