Merge branch 'master' into fix-odbc-invalid-cursor

This commit is contained in:
mergify[bot] 2021-09-30 06:17:49 +00:00 committed by GitHub
commit 86a0e3e332
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
620 changed files with 11835 additions and 10940 deletions

View File

@ -1,12 +0,0 @@
# .arcignore is the same as .gitignore but for Arc VCS.
# Arc VCS is a proprietary VCS in Yandex that is very similar to Git
# from the user perspective but with the following differences:
# 1. Data is stored in distributed object storage.
# 2. Local copy works via FUSE without downloading all the objects.
# For this reason, it is better suited for huge monorepositories that can be found in large companies (e.g. Yandex, Google).
# As ClickHouse developers, we don't use Arc as a VCS (we use Git).
# But the ClickHouse source code is also mirrored into internal monorepository and our collegues are using Arc.
# You can read more about Arc here: https://habr.com/en/company/yandex/blog/482926/
# Repository is synchronized without 3rd-party submodules.
contrib

View File

@ -1,5 +1,3 @@
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
- New Feature
- Improvement

View File

@ -7,6 +7,7 @@
* Under clickhouse-local, always treat local addresses with a port as remote. [#26736](https://github.com/ClickHouse/ClickHouse/pull/26736) ([Raúl Marín](https://github.com/Algunenano)).
* Fix the issue that in case of some sophisticated query with column aliases identical to the names of expressions, bad cast may happen. This fixes [#25447](https://github.com/ClickHouse/ClickHouse/issues/25447). This fixes [#26914](https://github.com/ClickHouse/ClickHouse/issues/26914). This fix may introduce backward incompatibility: if there are different expressions with identical names, exception will be thrown. It may break some rare cases when `enable_optimize_predicate_expression` is set. [#26639](https://github.com/ClickHouse/ClickHouse/pull/26639) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Now, scalar subquery always returns `Nullable` result if it's type can be `Nullable`. It is needed because in case of empty subquery it's result should be `Null`. Previously, it was possible to get error about incompatible types (type deduction does not execute scalar subquery, and it could use not-nullable type). Scalar subquery with empty result which can't be converted to `Nullable` (like `Array` or `Tuple`) now throws error. Fixes [#25411](https://github.com/ClickHouse/ClickHouse/issues/25411). [#26423](https://github.com/ClickHouse/ClickHouse/pull/26423) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Introduce syntax for here documents. Example `SELECT $doc$ VALUE $doc$`. [#26671](https://github.com/ClickHouse/ClickHouse/pull/26671) ([Maksim Kita](https://github.com/kitaisreal)). This change is backward incompatible if in query there are identifiers that contain `$` [#28768](https://github.com/ClickHouse/ClickHouse/issues/28768).
#### New Feature
@ -17,7 +18,6 @@
* Added integration with S2 geometry library. [#24980](https://github.com/ClickHouse/ClickHouse/pull/24980) ([Andr0901](https://github.com/Andr0901)). ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add SQLite table engine, table function, database engine. [#24194](https://github.com/ClickHouse/ClickHouse/pull/24194) ([Arslan Gumerov](https://github.com/g-arslan)). ([Kseniia Sumarokova](https://github.com/kssenii)).
* Added support for custom query for `MySQL`, `PostgreSQL`, `ClickHouse`, `JDBC`, `Cassandra` dictionary source. Closes [#1270](https://github.com/ClickHouse/ClickHouse/issues/1270). [#26995](https://github.com/ClickHouse/ClickHouse/pull/26995) ([Maksim Kita](https://github.com/kitaisreal)).
* Introduce syntax for here documents. Example `SELECT $doc$ VALUE $doc$`. [#26671](https://github.com/ClickHouse/ClickHouse/pull/26671) ([Maksim Kita](https://github.com/kitaisreal)).
* Add shared (replicated) storage of user, roles, row policies, quotas and settings profiles through ZooKeeper. [#27426](https://github.com/ClickHouse/ClickHouse/pull/27426) ([Kevin Michel](https://github.com/kmichel-aiven)).
* Add compression for `INTO OUTFILE` that automatically choose compression algorithm. Closes [#3473](https://github.com/ClickHouse/ClickHouse/issues/3473). [#27134](https://github.com/ClickHouse/ClickHouse/pull/27134) ([Filatenkov Artur](https://github.com/FArthur-cmd)).
* Add `INSERT ... FROM INFILE` similarly to `SELECT ... INTO OUTFILE`. [#27655](https://github.com/ClickHouse/ClickHouse/pull/27655) ([Filatenkov Artur](https://github.com/FArthur-cmd)).
@ -34,7 +34,6 @@
* New functions `currentProfiles()`, `enabledProfiles()`, `defaultProfiles()`. [#26714](https://github.com/ClickHouse/ClickHouse/pull/26714) ([Vitaly Baranov](https://github.com/vitlibar)).
* Add functions that return (initial_)query_id of the current query. This closes [#23682](https://github.com/ClickHouse/ClickHouse/issues/23682). [#26410](https://github.com/ClickHouse/ClickHouse/pull/26410) ([Alexey Boykov](https://github.com/mathalex)).
* Add `REPLACE GRANT` feature. [#26384](https://github.com/ClickHouse/ClickHouse/pull/26384) ([Caspian](https://github.com/Cas-pian)).
* Implement window function `nth_value(expr, N)` that returns the value of the Nth row of the window frame. [#26334](https://github.com/ClickHouse/ClickHouse/pull/26334) ([Zuo, RuoYu](https://github.com/ryzuo)).
* `EXPLAIN` query now has `EXPLAIN ESTIMATE ...` mode that will show information about read rows, marks and parts from MergeTree tables. Closes [#23941](https://github.com/ClickHouse/ClickHouse/issues/23941). [#26131](https://github.com/ClickHouse/ClickHouse/pull/26131) ([fastio](https://github.com/fastio)).
* Added `system.zookeeper_log` table. All actions of ZooKeeper client are logged into this table. Implements [#25449](https://github.com/ClickHouse/ClickHouse/issues/25449). [#26129](https://github.com/ClickHouse/ClickHouse/pull/26129) ([tavplubix](https://github.com/tavplubix)).
* Zero-copy replication for `ReplicatedMergeTree` over `HDFS` storage. [#25918](https://github.com/ClickHouse/ClickHouse/pull/25918) ([Zhichang Yu](https://github.com/yuzhichang)).

View File

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.3)
cmake_minimum_required(VERSION 3.14)
foreach(policy
CMP0023

View File

@ -6,38 +6,6 @@ Thank you.
## Technical Info
We have a [developer's guide](https://clickhouse.yandex/docs/en/development/developer_instruction/) for writing code for ClickHouse. Besides this guide, you can find [Overview of ClickHouse Architecture](https://clickhouse.yandex/docs/en/development/architecture/) and instructions on how to build ClickHouse in different environments.
We have a [developer's guide](https://clickhouse.com/docs/en/development/developer_instruction/) for writing code for ClickHouse. Besides this guide, you can find [Overview of ClickHouse Architecture](https://clickhouse.com/docs/en/development/architecture/) and instructions on how to build ClickHouse in different environments.
If you want to contribute to documentation, read the [Contributing to ClickHouse Documentation](docs/README.md) guide.
## Legal Info
In order for us (YANDEX LLC) to accept patches and other contributions from you, you may adopt our Yandex Contributor License Agreement (the "**CLA**"). The current version of the CLA you may find here:
1) https://yandex.ru/legal/cla/?lang=en (in English) and
2) https://yandex.ru/legal/cla/?lang=ru (in Russian).
By adopting the CLA, you state the following:
* You obviously wish and are willingly licensing your contributions to us for our open source projects under the terms of the CLA,
* You have read the terms and conditions of the CLA and agree with them in full,
* You are legally able to provide and license your contributions as stated,
* We may use your contributions for our open source projects and for any other our project too,
* We rely on your assurances concerning the rights of third parties in relation to your contributions.
If you agree with these principles, please read and adopt our CLA. By providing us your contributions, you hereby declare that you have already read and adopt our CLA, and we may freely merge your contributions with our corresponding open source project and use it in further in accordance with terms and conditions of the CLA.
If you have already adopted terms and conditions of the CLA, you are able to provide your contributes. When you submit your pull request, please add the following information into it:
```
I hereby agree to the terms of the CLA available at: [link].
```
Replace the bracketed text as follows:
* [link] is the link at the current version of the CLA (you may add here a link https://yandex.ru/legal/cla/?lang=en (in English) or a link https://yandex.ru/legal/cla/?lang=ru (in Russian).
It is enough to provide us such notification once.
As an alternative, you can provide DCO instead of CLA. You can find the text of DCO here: https://developercertificate.org/
It is enough to read and copy it verbatim to your pull request.
If you don't agree with the CLA and don't want to provide DCO, you still can open a pull request to provide your contributions.

View File

@ -1,63 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
ADDINCL(
GLOBAL clickhouse/base
)
CFLAGS (GLOBAL -DARCADIA_BUILD)
CFLAGS (GLOBAL -DUSE_CPUID=1)
CFLAGS (GLOBAL -DUSE_JEMALLOC=0)
CFLAGS (GLOBAL -DUSE_RAPIDJSON=1)
CFLAGS (GLOBAL -DUSE_SSL=1)
IF (OS_DARWIN)
CFLAGS (GLOBAL -DOS_DARWIN)
ELSEIF (OS_FREEBSD)
CFLAGS (GLOBAL -DOS_FREEBSD)
ELSEIF (OS_LINUX)
CFLAGS (GLOBAL -DOS_LINUX)
ENDIF ()
PEERDIR(
contrib/libs/cctz
contrib/libs/cxxsupp/libcxx-filesystem
contrib/libs/poco/Net
contrib/libs/poco/Util
contrib/libs/poco/NetSSL_OpenSSL
contrib/libs/fmt
contrib/restricted/boost
contrib/restricted/cityhash-1.0.2
)
CFLAGS(-g0)
SRCS(
DateLUT.cpp
DateLUTImpl.cpp
JSON.cpp
LineReader.cpp
StringRef.cpp
argsToConfig.cpp
coverage.cpp
demangle.cpp
errnoToString.cpp
getFQDNOrHostName.cpp
getMemoryAmount.cpp
getPageSize.cpp
getResource.cpp
getThreadId.cpp
mremap.cpp
phdr_cache.cpp
preciseExp10.cpp
setTerminalEcho.cpp
shift10.cpp
sleep.cpp
terminalColors.cpp
)
END()

View File

@ -1,41 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
ADDINCL(
GLOBAL clickhouse/base
)
CFLAGS (GLOBAL -DARCADIA_BUILD)
CFLAGS (GLOBAL -DUSE_CPUID=1)
CFLAGS (GLOBAL -DUSE_JEMALLOC=0)
CFLAGS (GLOBAL -DUSE_RAPIDJSON=1)
CFLAGS (GLOBAL -DUSE_SSL=1)
IF (OS_DARWIN)
CFLAGS (GLOBAL -DOS_DARWIN)
ELSEIF (OS_FREEBSD)
CFLAGS (GLOBAL -DOS_FREEBSD)
ELSEIF (OS_LINUX)
CFLAGS (GLOBAL -DOS_LINUX)
ENDIF ()
PEERDIR(
contrib/libs/cctz
contrib/libs/cxxsupp/libcxx-filesystem
contrib/libs/poco/Net
contrib/libs/poco/Util
contrib/libs/poco/NetSSL_OpenSSL
contrib/libs/fmt
contrib/restricted/boost
contrib/restricted/cityhash-1.0.2
)
CFLAGS(-g0)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests/ | grep -v -F examples | grep -v -F Replxx | grep -v -F Readline | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -1,19 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
NO_COMPILER_WARNINGS()
PEERDIR(
clickhouse/src/Common
)
CFLAGS(-g0)
SRCS(
BaseDaemon.cpp
GraphiteWriter.cpp
SentryWriter.cpp
)
END()

View File

@ -1,19 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
CFLAGS(-g0)
SRCS(
ExtendedLogChannel.cpp
Loggers.cpp
OwnFormattingChannel.cpp
OwnPatternFormatter.cpp
OwnSplitChannel.cpp
)
END()

View File

@ -1,39 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
LIBRARY()
OWNER(g:clickhouse)
CFLAGS(-g0)
PEERDIR(
contrib/restricted/boost/libs
contrib/libs/libmysql_r
contrib/libs/poco/Foundation
contrib/libs/poco/Util
)
ADDINCL(
GLOBAL clickhouse/base
clickhouse/base
contrib/libs/libmysql_r
)
NO_COMPILER_WARNINGS()
NO_UTIL()
SRCS(
Connection.cpp
Exception.cpp
Pool.cpp
PoolFactory.cpp
PoolWithFailover.cpp
Query.cpp
ResultBase.cpp
Row.cpp
UseQueryResult.cpp
Value.cpp
)
END()

View File

@ -1,28 +0,0 @@
LIBRARY()
OWNER(g:clickhouse)
CFLAGS(-g0)
PEERDIR(
contrib/restricted/boost/libs
contrib/libs/libmysql_r
contrib/libs/poco/Foundation
contrib/libs/poco/Util
)
ADDINCL(
GLOBAL clickhouse/base
clickhouse/base
contrib/libs/libmysql_r
)
NO_COMPILER_WARNINGS()
NO_UTIL()
SRCS(
<? find . -name '*.cpp' | grep -v -F tests/ | grep -v -F examples | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -1,7 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
ADDINCL (GLOBAL clickhouse/base/pcg-random)
END()

View File

@ -1,11 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
CFLAGS(-g0)
SRCS(
readpassphrase.c
)
END()

View File

@ -1,13 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
ADDINCL(GLOBAL clickhouse/base/widechar_width)
CFLAGS(-g0)
SRCS(
widechar_width.cpp
)
END()

View File

@ -1,11 +0,0 @@
OWNER(g:clickhouse)
RECURSE(
common
daemon
loggers
mysqlxx
pcg-random
widechar_width
readpassphrase
)

View File

@ -1,25 +0,0 @@
INCLUDE(${ARCADIA_ROOT}/clickhouse/cmake/autogenerated_versions.txt)
# TODO: not sure if this is customizable per-binary
SET(VERSION_NAME "ClickHouse")
# TODO: not quite sure how to replace dash with space in ya.make
SET(VERSION_FULL "${VERSION_NAME}-${VERSION_STRING}")
CFLAGS (GLOBAL -DDBMS_NAME=\"ClickHouse\")
CFLAGS (GLOBAL -DDBMS_VERSION_MAJOR=${VERSION_MAJOR})
CFLAGS (GLOBAL -DDBMS_VERSION_MINOR=${VERSION_MINOR})
CFLAGS (GLOBAL -DDBMS_VERSION_PATCH=${VERSION_PATCH})
CFLAGS (GLOBAL -DVERSION_FULL=\"\\\"${VERSION_FULL}\\\"\")
CFLAGS (GLOBAL -DVERSION_MAJOR=${VERSION_MAJOR})
CFLAGS (GLOBAL -DVERSION_MINOR=${VERSION_MINOR})
CFLAGS (GLOBAL -DVERSION_PATCH=${VERSION_PATCH})
# TODO: not supported yet, not sure if ya.make supports arithmetic.
CFLAGS (GLOBAL -DVERSION_INTEGER=0)
CFLAGS (GLOBAL -DVERSION_NAME=\"\\\"${VERSION_NAME}\\\"\")
CFLAGS (GLOBAL -DVERSION_OFFICIAL=\"-arcadia\")
CFLAGS (GLOBAL -DVERSION_REVISION=${VERSION_REVISION})
CFLAGS (GLOBAL -DVERSION_STRING=\"\\\"${VERSION_STRING}\\\"\")

2
contrib/rocksdb vendored

@ -1 +1 @@
Subproject commit 5ea892c8673e6c5a052887653673b967d44cc59b
Subproject commit 296c1b8b95fd448b8097a1b2cc9f704ff4a73a2c

1
debian/control vendored
View File

@ -7,6 +7,7 @@ Build-Depends: debhelper (>= 9),
ninja-build,
clang-13,
llvm-13,
lld-13,
libc6-dev,
tzdata
Standards-Version: 3.9.8

View File

@ -92,7 +92,7 @@ if __name__ == "__main__":
logging.info("Some exception occured %s", str(ex))
raise
finally:
logging.info("Will remove dowloaded file %s from filesystem if it exists", temp_archive_path)
logging.info("Will remove downloaded file %s from filesystem if it exists", temp_archive_path)
if os.path.exists(temp_archive_path):
os.remove(temp_archive_path)
logging.info("Processing of %s finished", dataset)

View File

@ -92,7 +92,7 @@ if __name__ == "__main__":
logging.info("Some exception occured %s", str(ex))
raise
finally:
logging.info("Will remove dowloaded file %s from filesystem if it exists", temp_archive_path)
logging.info("Will remove downloaded file %s from filesystem if it exists", temp_archive_path)
if os.path.exists(temp_archive_path):
os.remove(temp_archive_path)
logging.info("Processing of %s finished", dataset)

View File

@ -1,7 +1,7 @@
sudo apt-get install apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4
echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee \
echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \
/etc/apt/sources.list.d/clickhouse.list
sudo apt-get update

View File

@ -1,6 +1,6 @@
sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/clickhouse.repo
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
sudo yum install clickhouse-server clickhouse-client
sudo /etc/init.d/clickhouse-server start

View File

@ -1,9 +1,9 @@
export LATEST_VERSION=$(curl -s https://repo.clickhouse.tech/tgz/stable/ | \
export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \
grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1)
curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh

View File

@ -13,16 +13,16 @@ The list of documented datasets:
- [GitHub Events](../../getting-started/example-datasets/github-events.md)
- [Anonymized Yandex.Metrica Dataset](../../getting-started/example-datasets/metrica.md)
- [Recipes](../../getting-started/example-datasets/recipes.md)
- [OnTime](../../getting-started/example-datasets/ontime.md)
- [OpenSky](../../getting-started/example-datasets/opensky.md)
- [New York Taxi Data](../../getting-started/example-datasets/nyc-taxi.md)
- [UK Property Price Paid](../../getting-started/example-datasets/uk-price-paid.md)
- [What's on the Menu?](../../getting-started/example-datasets/menus.md)
- [Star Schema Benchmark](../../getting-started/example-datasets/star-schema.md)
- [WikiStat](../../getting-started/example-datasets/wikistat.md)
- [Terabyte of Click Logs from Criteo](../../getting-started/example-datasets/criteo.md)
- [AMPLab Big Data Benchmark](../../getting-started/example-datasets/amplab-benchmark.md)
- [Brown University Benchmark](../../getting-started/example-datasets/brown-benchmark.md)
- [New York Taxi Data](../../getting-started/example-datasets/nyc-taxi.md)
- [OpenSky](../../getting-started/example-datasets/opensky.md)
- [UK Property Price Paid](../../getting-started/example-datasets/uk-price-paid.md)
- [Cell Towers](../../getting-started/example-datasets/cell-towers.md)
- [What's on the Menu?](../../getting-started/example-datasets/menus.md)
- [OnTime](../../getting-started/example-datasets/ontime.md)
[Original article](https://clickhouse.com/docs/en/getting_started/example_datasets) <!--hide-->

View File

@ -3,7 +3,7 @@ toc_priority: 21
toc_title: Menus
---
# New York Public Library "What's on the Menu?" Dataset
# New York Public Library "What's on the Menu?" Dataset {#menus-dataset}
The dataset is created by the New York Public Library. It contains historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices.
@ -11,34 +11,38 @@ Source: http://menus.nypl.org/data
The data is in public domain.
The data is from library's archive and it may be incomplete and difficult for statistical analysis. Nevertheless it is also very yummy.
The size is just 1.3 million records about dishes in the menus (a very small data volume for ClickHouse, but it's still a good example).
The size is just 1.3 million records about dishes in the menus — it's a very small data volume for ClickHouse, but it's still a good example.
## Download the Dataset
## Download the Dataset {#download-dataset}
```
Run the command:
```bash
wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.tgz
```
Replace the link to the up to date link from http://menus.nypl.org/data if needed.
Download size is about 35 MB.
## Unpack the Dataset
## Unpack the Dataset {#unpack-dataset}
```
```bash
tar xvf 2021_08_01_07_01_17_data.tgz
```
Uncompressed size is about 150 MB.
The data is normalized consisted of four tables:
- Menu: information about menus: the name of the restaurant, the date when menu was seen, etc;
- Dish: information about dishes: the name of the dish along with some characteristic;
- MenuPage: information about the pages in the menus; every page belongs to some menu;
- MenuItem: an item of the menu - a dish along with its price on some menu page: links to dish and menu page.
- `Menu` — Information about menus: the name of the restaurant, the date when menu was seen, etc.
- `Dish` — Information about dishes: the name of the dish along with some characteristic.
- `MenuPage` — Information about the pages in the menus, because every page belongs to some menu.
- `MenuItem` — An item of the menu. A dish along with its price on some menu page: links to dish and menu page.
## Create the Tables
## Create the Tables {#create-tables}
```
We use [Decimal](../../sql-reference/data-types/decimal.md) data type to store prices.
```sql
CREATE TABLE dish
(
id UInt32,
@ -101,35 +105,33 @@ CREATE TABLE menu_item
) ENGINE = MergeTree ORDER BY id;
```
We use `Decimal` data type to store prices. Everything else is quite straightforward.
## Import the Data {#import-data}
## Import Data
Upload data into ClickHouse, run:
Upload data into ClickHouse:
```
```bash
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO dish FORMAT CSVWithNames" < Dish.csv
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu FORMAT CSVWithNames" < Menu.csv
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu_page FORMAT CSVWithNames" < MenuPage.csv
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --date_time_input_format best_effort --query "INSERT INTO menu_item FORMAT CSVWithNames" < MenuItem.csv
```
We use `CSVWithNames` format as the data is represented by CSV with header.
We use [CSVWithNames](../../interfaces/formats.md#csvwithnames) format as the data is represented by CSV with header.
We disable `format_csv_allow_single_quotes` as only double quotes are used for data fields and single quotes can be inside the values and should not confuse the CSV parser.
We disable `input_format_null_as_default` as our data does not have NULLs. Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data.
We disable [input_format_null_as_default](../../operations/settings/settings.md#settings-input-format-null-as-default) as our data does not have [NULL](../../sql-reference/syntax.md#null-literal). Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data.
The setting `--date_time_input_format best_effort` allows to parse `DateTime` fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed.
The setting [date_time_input_format best_effort](../../operations/settings/settings.md#settings-date_time_input_format) allows to parse [DateTime](../../sql-reference/data-types/datetime.md) fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed.
## Denormalize the Data
## Denormalize the Data {#denormalize-data}
Data is presented in multiple tables in normalized form. It means you have to perform JOINs if you want to query, e.g. dish names from menu items.
For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing JOIN every time. It is called "denormalized" data.
Data is presented in multiple tables in [normalized form](https://en.wikipedia.org/wiki/Database_normalization#Normal_forms). It means you have to perform [JOIN](../../sql-reference/statements/select/join.md#select-join) if you want to query, e.g. dish names from menu items.
For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing `JOIN` every time. It is called "denormalized" data.
We will create a table that will contain all the data JOINed together:
We will create a table `menu_item_denorm` where will contain all the data JOINed together:
```
```sql
CREATE TABLE menu_item_denorm
ENGINE = MergeTree ORDER BY (dish_name, created_at)
AS SELECT
@ -171,21 +173,32 @@ AS SELECT
FROM menu_item
JOIN dish ON menu_item.dish_id = dish.id
JOIN menu_page ON menu_item.menu_page_id = menu_page.id
JOIN menu ON menu_page.menu_id = menu.id
JOIN menu ON menu_page.menu_id = menu.id;
```
## Validate the Data
## Validate the Data {#validate-data}
```
SELECT count() FROM menu_item_denorm
1329175
Query:
```sql
SELECT count() FROM menu_item_denorm;
```
## Run Some Queries
Averaged historical prices of dishes:
Result:
```text
┌─count()─┐
│ 1329175 │
└─────────┘
```
## Run Some Queries {#run-queries}
### Averaged historical prices of dishes {#query-averaged-historical-prices}
Query:
```sql
SELECT
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
count(),
@ -194,8 +207,12 @@ SELECT
FROM menu_item_denorm
WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022)
GROUP BY d
ORDER BY d ASC
ORDER BY d ASC;
```
Result:
```text
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 100, 100)─┐
│ 1850 │ 618 │ 1.5 │ █▍ │
│ 1860 │ 1634 │ 1.29 │ █▎ │
@ -215,15 +232,15 @@ ORDER BY d ASC
│ 2000 │ 2467 │ 11.85 │ ███████████▋ │
│ 2010 │ 597 │ 25.66 │ █████████████████████████▋ │
└──────┴─────────┴──────────────────────┴──────────────────────────────┘
17 rows in set. Elapsed: 0.044 sec. Processed 1.33 million rows, 54.62 MB (30.00 million rows/s., 1.23 GB/s.)
```
Take it with a grain of salt.
### Burger Prices:
### Burger Prices {#query-burger-prices}
```
Query:
```sql
SELECT
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
count(),
@ -232,8 +249,12 @@ SELECT
FROM menu_item_denorm
WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%burger%')
GROUP BY d
ORDER BY d ASC
ORDER BY d ASC;
```
Result:
```text
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)───────────┐
│ 1880 │ 2 │ 0.42 │ ▋ │
│ 1890 │ 7 │ 0.85 │ █▋ │
@ -250,13 +271,13 @@ ORDER BY d ASC
│ 2000 │ 21 │ 7.14 │ ██████████████▎ │
│ 2010 │ 6 │ 18.42 │ ████████████████████████████████████▋ │
└──────┴─────────┴──────────────────────┴───────────────────────────────────────┘
14 rows in set. Elapsed: 0.052 sec. Processed 1.33 million rows, 94.15 MB (25.48 million rows/s., 1.80 GB/s.)
```
### Vodka:
### Vodka {#query-vodka}
```
Query:
```sql
SELECT
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
count(),
@ -265,8 +286,12 @@ SELECT
FROM menu_item_denorm
WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%vodka%')
GROUP BY d
ORDER BY d ASC
ORDER BY d ASC;
```
Result:
```text
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)─┐
│ 1910 │ 2 │ 0 │ │
│ 1920 │ 1 │ 0.3 │ ▌ │
@ -282,11 +307,13 @@ ORDER BY d ASC
To get vodka we have to write `ILIKE '%vodka%'` and this definitely makes a statement.
### Caviar:
### Caviar {#query-caviar}
Let's print caviar prices. Also let's print a name of any dish with caviar.
```
Query:
```sql
SELECT
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
count(),
@ -296,8 +323,12 @@ SELECT
FROM menu_item_denorm
WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%caviar%')
GROUP BY d
ORDER BY d ASC
ORDER BY d ASC;
```
Result:
```text
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)──────┬─any(dish_name)──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 1090 │ 1 │ 0 │ │ Caviar │
│ 1880 │ 3 │ 0 │ │ Caviar │
@ -319,6 +350,6 @@ ORDER BY d ASC
At least they have caviar with vodka. Very nice.
### Test it in Playground
## Online Playground {#playground}
The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICByb3VuZCh0b1VJbnQzMk9yWmVybyhleHRyYWN0KG1lbnVfZGF0ZSwgJ15cXGR7NH0nKSksIC0xKSBBUyBkLAogICAgY291bnQoKSwKICAgIHJvdW5kKGF2ZyhwcmljZSksIDIpLAogICAgYmFyKGF2ZyhwcmljZSksIDAsIDUwLCAxMDApLAogICAgYW55KGRpc2hfbmFtZSkKRlJPTSBtZW51X2l0ZW1fZGVub3JtCldIRVJFIChtZW51X2N1cnJlbmN5IElOICgnRG9sbGFycycsICcnKSkgQU5EIChkID4gMCkgQU5EIChkIDwgMjAyMikgQU5EIChkaXNoX25hbWUgSUxJS0UgJyVjYXZpYXIlJykKR1JPVVAgQlkgZApPUkRFUiBCWSBkIEFTQw==).

View File

@ -42,7 +42,11 @@ md5sum hits_v1.tsv
# Checksum should be equal to: f3631b6295bf06989c1437491f7592cb
# now create table
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
# for hits_v1
clickhouse-client --query "CREATE TABLE datasets.hits_v1 ( WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, URLDomain String, RefererDomain String, Refresh UInt8, IsRobot UInt8, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), UTCEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), RemoteIP UInt32, RemoteIP6 FixedString(16), WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming Int32, DNSTiming Int32, ConnectTiming Int32, ResponseStartTiming Int32, ResponseEndTiming Int32, FetchTiming Int32, RedirectTiming Int32, DOMInteractiveTiming Int32, DOMContentLoadedTiming Int32, DOMCompleteTiming Int32, LoadEventStartTiming Int32, LoadEventEndTiming Int32, NSToDOMContentLoadedTiming Int32, FirstPaintTiming Int32, RedirectCount Int8, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, GoalsReached Array(UInt32), OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32, YCLID UInt64, ShareService String, ShareURL String, ShareTitle String, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), IslandID FixedString(16), RequestNum UInt32, RequestTry UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
# for hits_100m_obfuscated
clickhouse-client --query="CREATE TABLE hits_100m_obfuscated (WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, Refresh UInt8, RefererCategoryID UInt16, RefererRegionID UInt32, URLCategoryID UInt16, URLRegionID UInt32, ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, OriginalURL String, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), LocalEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, RemoteIP UInt32, WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming UInt32, DNSTiming UInt32, ConnectTiming UInt32, ResponseStartTiming UInt32, ResponseEndTiming UInt32, FetchTiming UInt32, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
# import data
cat hits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.hits_v1 FORMAT TSV" --max_insert_block_size=100000
# optionally you can optimize table

View File

@ -3,7 +3,7 @@ toc_priority: 20
toc_title: OpenSky
---
# Crowdsourced air traffic data from The OpenSky Network 2020
# Crowdsourced air traffic data from The OpenSky Network 2020 {#opensky}
"The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 2500 members since 1 January 2019. More data will be periodically included in the dataset until the end of the COVID-19 pandemic".
@ -14,17 +14,19 @@ Martin Strohmeier, Xavier Olive, Jannis Lübbe, Matthias Schäfer, and Vincent L
Earth System Science Data 13(2), 2021
https://doi.org/10.5194/essd-13-357-2021
## Download the Dataset
## Download the Dataset {#download-dataset}
```
Run the command:
```bash
wget -O- https://zenodo.org/record/5092942 | grep -oP 'https://zenodo.org/record/5092942/files/flightlist_\d+_\d+\.csv\.gz' | xargs wget
```
Download will take about 2 minutes with good internet connection. There are 30 files with total size of 4.3 GB.
## Create the Table
## Create the Table {#create-table}
```
```sql
CREATE TABLE opensky
(
callsign String,
@ -46,69 +48,101 @@ CREATE TABLE opensky
) ENGINE = MergeTree ORDER BY (origin, destination, callsign);
```
## Import Data
## Import Data {#import-data}
Upload data into ClickHouse in parallel:
```
ls -1 flightlist_*.csv.gz | xargs -P100 -I{} bash -c '
gzip -c -d "{}" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"'
```bash
ls -1 flightlist_*.csv.gz | xargs -P100 -I{} bash -c 'gzip -c -d "{}" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"'
```
Here we pass the list of files (`ls -1 flightlist_*.csv.gz`) to `xargs` for parallel processing.
- Here we pass the list of files (`ls -1 flightlist_*.csv.gz`) to `xargs` for parallel processing.
`xargs -P100` specifies to use up to 100 parallel workers but as we only have 30 files, the number of workers will be only 30.
- For every file, `xargs` will run a script with `bash -c`. The script has substitution in form of `{}` and the `xargs` command will substitute the filename to it (we have asked it for `xargs` with `-I{}`).
- The script will decompress the file (`gzip -c -d "{}"`) to standard output (`-c` parameter) and the output is redirected to `clickhouse-client`.
- We also asked to parse [DateTime](../../sql-reference/data-types/datetime.md) fields with extended parser ([--date_time_input_format best_effort](../../operations/settings/settings.md#settings-date_time_input_format)) to recognize ISO-8601 format with timezone offsets.
For every file, `xargs` will run a script with `bash -c`. The script has substitution in form of `{}` and the `xargs` command will substitute the filename to it (we have asked it for xargs with `-I{}`).
The script will decompress the file (`gzip -c -d "{}"`) to standard output (`-c` parameter) and the output is redirected to `clickhouse-client`.
Finally, `clickhouse-client` will do insertion. It will read input data in `CSVWithNames` format. We also asked to parse DateTime fields with extended parser (`--date_time_input_format best_effort`) to recognize ISO-8601 format with timezone offsets.
Finally, `clickhouse-client` will do insertion. It will read input data in [CSVWithNames](../../interfaces/formats.md#csvwithnames) format.
Parallel upload takes 24 seconds.
If you don't like parallel upload, here is sequential variant:
```
```bash
for file in flightlist_*.csv.gz; do gzip -c -d "$file" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"; done
```
## Validate the Data
## Validate the Data {#validate-data}
```
SELECT count() FROM opensky
66010819
Query:
```sql
SELECT count() FROM opensky;
```
The size of dataset in ClickHouse is just 2.64 GiB:
Result:
```
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'opensky'
2.64 GiB
```text
┌──count()─┐
│ 66010819 │
└──────────┘
```
## Run Some Queries
The size of dataset in ClickHouse is just 2.66 GiB, check it.
Total distance travelled is 68 billion kilometers:
Query:
```sql
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'opensky';
```
SELECT formatReadableQuantity(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) / 1000) FROM opensky
Result:
```text
┌─formatReadableSize(total_bytes)─┐
│ 2.66 GiB │
└─────────────────────────────────┘
```
## Run Some Queries {#run-queries}
Total distance travelled is 68 billion kilometers.
Query:
```sql
SELECT formatReadableQuantity(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) / 1000) FROM opensky;
```
Result:
```text
┌─formatReadableQuantity(divide(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)), 1000))─┐
│ 68.72 billion │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
Average flight distance is around 1000 km.
```
SELECT avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) FROM opensky
Query:
```sql
SELECT avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) FROM opensky;
```
Result:
```text
┌─avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))─┐
│ 1041090.6465708319 │
└────────────────────────────────────────────────────────────────────┘
```
### Most busy origin airports and the average distance seen:
### Most busy origin airports and the average distance seen {#busy-airports-average-distance}
```
Query:
```sql
SELECT
origin,
count(),
@ -118,10 +152,12 @@ FROM opensky
WHERE origin != ''
GROUP BY origin
ORDER BY count() DESC
LIMIT 100
LIMIT 100;
```
Query id: f9010ea5-97d0-45a3-a5bd-9657906cd105
Result:
```text
┌─origin─┬─count()─┬─distance─┬─bar────────────────────────────────────┐
1. │ KORD │ 745007 │ 1546108 │ ███████████████▍ │
2. │ KDFW │ 696702 │ 1358721 │ █████████████▌ │
@ -224,13 +260,13 @@ Query id: f9010ea5-97d0-45a3-a5bd-9657906cd105
99. │ EDDT │ 115122 │ 941740 │ █████████▍ │
100. │ EFHK │ 114860 │ 1629143 │ ████████████████▎ │
└────────┴─────────┴──────────┴────────────────────────────────────────┘
100 rows in set. Elapsed: 0.186 sec. Processed 48.31 million rows, 2.17 GB (259.27 million rows/s., 11.67 GB/s.)
```
### Number of flights from three major Moscow airports, weekly:
### Number of flights from three major Moscow airports, weekly {#flights-from-moscow}
```
Query:
```sql
SELECT
toMonday(day) AS k,
count() AS c,
@ -238,10 +274,12 @@ SELECT
FROM opensky
WHERE origin IN ('UUEE', 'UUDD', 'UUWW')
GROUP BY k
ORDER BY k ASC
ORDER BY k ASC;
```
Query id: 1b446157-9519-4cc4-a1cb-178dfcc15a8e
Result:
```text
┌──────────k─┬────c─┬─bar──────────────────────────────────────────────────────────────────────────┐
1. │ 2018-12-31 │ 5248 │ ████████████████████████████████████████████████████▍ │
2. │ 2019-01-07 │ 6302 │ ███████████████████████████████████████████████████████████████ │
@ -375,10 +413,8 @@ Query id: 1b446157-9519-4cc4-a1cb-178dfcc15a8e
130. │ 2021-06-21 │ 6061 │ ████████████████████████████████████████████████████████████▌ │
131. │ 2021-06-28 │ 2554 │ █████████████████████████▌ │
└────────────┴──────┴──────────────────────────────────────────────────────────────────────────────┘
131 rows in set. Elapsed: 0.014 sec. Processed 655.36 thousand rows, 11.14 MB (47.56 million rows/s., 808.48 MB/s.)
```
### Test it in Playground
### Online Playground {#playground}
The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICBvcmlnaW4sCiAgICBjb3VudCgpLAogICAgcm91bmQoYXZnKGdlb0Rpc3RhbmNlKGxvbmdpdHVkZV8xLCBsYXRpdHVkZV8xLCBsb25naXR1ZGVfMiwgbGF0aXR1ZGVfMikpKSBBUyBkaXN0YW5jZSwKICAgIGJhcihkaXN0YW5jZSwgMCwgMTAwMDAwMDAsIDEwMCkgQVMgYmFyCkZST00gb3BlbnNreQpXSEVSRSBvcmlnaW4gIT0gJycKR1JPVVAgQlkgb3JpZ2luCk9SREVSIEJZIGNvdW50KCkgREVTQwpMSU1JVCAxMDA=).
You can test other queries to this data set using the interactive resource [Online Playground](https://gh-api.clickhouse.tech/play?user=play). For example, [like this](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICBvcmlnaW4sCiAgICBjb3VudCgpLAogICAgcm91bmQoYXZnKGdlb0Rpc3RhbmNlKGxvbmdpdHVkZV8xLCBsYXRpdHVkZV8xLCBsb25naXR1ZGVfMiwgbGF0aXR1ZGVfMikpKSBBUyBkaXN0YW5jZSwKICAgIGJhcihkaXN0YW5jZSwgMCwgMTAwMDAwMDAsIDEwMCkgQVMgYmFyCkZST00gb3BlbnNreQpXSEVSRSBvcmlnaW4gIT0gJycKR1JPVVAgQlkgb3JpZ2luCk9SREVSIEJZIGNvdW50KCkgREVTQwpMSU1JVCAxMDA=). However, please note that you cannot create temporary tables here.

View File

@ -3,27 +3,29 @@ toc_priority: 20
toc_title: UK Property Price Paid
---
# UK Property Price Paid
# UK Property Price Paid {#uk-property-price-paid}
The dataset contains data about prices paid for real-estate property in England and Wales. The data is available since year 1995.
The size of the dataset in uncompressed form is about 4 GiB and it will take about 226 MiB in ClickHouse.
The size of the dataset in uncompressed form is about 4 GiB and it will take about 278 MiB in ClickHouse.
Source: https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
Description of the fields: https://www.gov.uk/guidance/about-the-price-paid-data
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
## Download the Dataset
## Download the Dataset {#download-dataset}
```
Run the command:
```bash
wget http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv
```
Download will take about 2 minutes with good internet connection.
## Create the Table
## Create the Table {#create-table}
```
```sql
CREATE TABLE uk_price_paid
(
price UInt32,
@ -44,7 +46,7 @@ CREATE TABLE uk_price_paid
) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);
```
## Preprocess and Import Data
## Preprocess and Import Data {#preprocess-import-data}
We will use `clickhouse-local` tool for data preprocessing and `clickhouse-client` to upload it.
@ -53,13 +55,13 @@ In this example, we define the structure of source data from the CSV file and sp
The preprocessing is:
- splitting the postcode to two different columns `postcode1` and `postcode2` that is better for storage and queries;
- coverting the `time` field to date as it only contains 00:00 time;
- ignoring the `uuid` field because we don't need it for analysis;
- transforming `type` and `duration` to more readable Enum fields with function `transform`;
- transforming `is_new` and `category` fields from single-character string (`Y`/`N` and `A`/`B`) to UInt8 field with 0 and 1.
- ignoring the [UUid](../../sql-reference/data-types/uuid.md) field because we don't need it for analysis;
- transforming `type` and `duration` to more readable Enum fields with function [transform](../../sql-reference/functions/other-functions.md#transform);
- transforming `is_new` and `category` fields from single-character string (`Y`/`N` and `A`/`B`) to [UInt8](../../sql-reference/data-types/int-uint.md#uint8-uint16-uint32-uint64-uint256-int8-int16-int32-int64-int128-int256) field with 0 and 1.
Preprocessed data is piped directly to `clickhouse-client` to be inserted into ClickHouse table in streaming fashion.
```
```bash
clickhouse-local --input-format CSV --structure '
uuid String,
price UInt32,
@ -100,103 +102,131 @@ clickhouse-local --input-format CSV --structure '
It will take about 40 seconds.
## Validate the Data
## Validate the Data {#validate-data}
```
SELECT count() FROM uk_price_paid
26248711
Query:
```sql
SELECT count() FROM uk_price_paid;
```
The size of dataset in ClickHouse is just 226 MiB:
Result:
```
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid'
226.40 MiB
```text
┌──count()─┐
│ 26321785 │
└──────────┘
```
## Run Some Queries
The size of dataset in ClickHouse is just 278 MiB, check it.
### Average price per year:
Query:
```sql
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid';
```
SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 1000000, 80) FROM uk_price_paid GROUP BY year ORDER BY year
Result:
```text
┌─formatReadableSize(total_bytes)─┐
│ 278.80 MiB │
└─────────────────────────────────┘
```
## Run Some Queries {#run-queries}
### Query 1. Average Price Per Year {#average-price}
Query:
```sql
SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 1000000, 80) FROM uk_price_paid GROUP BY year ORDER BY year;
```
Result:
```text
┌─year─┬──price─┬─bar(round(avg(price)), 0, 1000000, 80)─┐
│ 1995 │ 67932 │ █████▍ │
│ 1996 │ 71505 │ █████▋ │
│ 1997 │ 78532 │ ██████▎ │
│ 1998 │ 85435 │ ██████▋ │
│ 1999 │ 96036 │ ███████▋ │
│ 2000 │ 107478 │ ████████▌ │
│ 2001 │ 118886 │ █████████▌ │
│ 2002 │ 137940 │ ███████████ │
│ 2003 │ 155888 │ ████████████▍ │
│ 1998 │ 85436 │ ██████▋ │
│ 1999 │ 96037 │ ███████▋ │
│ 2000 │ 107479 │ ████████▌ │
│ 2001 │ 118885 │ █████████▌ │
│ 2002 │ 137941 │ ███████████ │
│ 2003 │ 155889 │ ████████████▍ │
│ 2004 │ 178885 │ ██████████████▎ │
│ 2005 │ 189350 │ ███████████████▏ │
│ 2005 │ 189351 │ ███████████████▏ │
│ 2006 │ 203528 │ ████████████████▎ │
│ 2007 │ 219377 │ █████████████████▌ │
│ 2007 │ 219378 │ █████████████████▌ │
│ 2008 │ 217056 │ █████████████████▎ │
│ 2009 │ 213419 │ █████████████████ │
│ 2010 │ 236110 │ ██████████████████▊ │
│ 2011 │ 232804 │ ██████████████████▌ │
│ 2012 │ 238366 │ ███████████████████ │
│ 2010 │ 236109 │ ██████████████████▊ │
│ 2011 │ 232805 │ ██████████████████▌ │
│ 2012 │ 238367 │ ███████████████████ │
│ 2013 │ 256931 │ ████████████████████▌ │
│ 2014 │ 279917 │ ██████████████████████▍ │
│ 2015 │ 297264 │ ███████████████████████▋ │
│ 2016 │ 313197 │ █████████████████████████ │
│ 2017 │ 346070 │ ███████████████████████████▋ │
│ 2018 │ 350117 │ ████████████████████████████ │
│ 2019 │ 351010 │ ████████████████████████████ │
│ 2020 │ 368974 │ █████████████████████████████▌ │
│ 2021 │ 384351 │ ██████████████████████████████▋
│ 2014 │ 279915 │ ██████████████████████▍ │
│ 2015 │ 297266 │ ███████████████████████▋ │
│ 2016 │ 313201 │ █████████████████████████ │
│ 2017 │ 346097 │ ███████████████████████████▋ │
│ 2018 │ 350116 │ ████████████████████████████ │
│ 2019 │ 351013 │ ████████████████████████████ │
│ 2020 │ 369420 │ █████████████████████████████▌ │
│ 2021 │ 386903 │ ██████████████████████████████▊
└──────┴────────┴────────────────────────────────────────┘
27 rows in set. Elapsed: 0.027 sec. Processed 26.25 million rows, 157.49 MB (955.96 million rows/s., 5.74 GB/s.)
```
### Average price per year in London:
### Query 2. Average Price per Year in London {#average-price-london}
Query:
```sql
SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 2000000, 100) FROM uk_price_paid WHERE town = 'LONDON' GROUP BY year ORDER BY year;
```
SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 2000000, 100) FROM uk_price_paid WHERE town = 'LONDON' GROUP BY year ORDER BY year
Result:
```text
┌─year─┬───price─┬─bar(round(avg(price)), 0, 2000000, 100)───────────────┐
│ 1995 │ 109112 │ █████▍ │
│ 1995 │ 109116 │ █████▍ │
│ 1996 │ 118667 │ █████▊ │
│ 1997 │ 136518 │ ██████▋ │
│ 1998 │ 152983 │ ███████▋ │
│ 1999 │ 180633 │ █████████ │
│ 2000 │ 215830 │ ██████████▋ │
│ 2001 │ 232996 │ ███████████▋ │
│ 2002 │ 263672 │ █████████████▏ │
│ 1999 │ 180637 │ █████████ │
│ 2000 │ 215838 │ ██████████▋ │
│ 2001 │ 232994 │ ███████████▋ │
│ 2002 │ 263670 │ █████████████▏ │
│ 2003 │ 278394 │ █████████████▊ │
│ 2004 │ 304665 │ ███████████████▏ │
│ 2004 │ 304666 │ ███████████████▏ │
│ 2005 │ 322875 │ ████████████████▏ │
│ 2006 │ 356192 │ █████████████████▋ │
│ 2007 │ 404055 │ ████████████████████▏ │
│ 2006 │ 356191 │ █████████████████▋ │
│ 2007 │ 404054 │ ████████████████████▏ │
│ 2008 │ 420741 │ █████████████████████ │
│ 2009 │ 427754 │ █████████████████████▍ │
│ 2009 │ 427753 │ █████████████████████▍ │
│ 2010 │ 480306 │ ████████████████████████ │
│ 2011 │ 496274 │ ████████████████████████▋ │
│ 2012 │ 519441 │ █████████████████████████▊ │
│ 2013 │ 616209 │ ██████████████████████████████▋ │
│ 2014 │ 724144 │ ████████████████████████████████████▏ │
│ 2015 │ 792112 │ ███████████████████████████████████████▌ │
│ 2016 │ 843568 │ ██████████████████████████████████████████▏ │
│ 2017 │ 982566 │ █████████████████████████████████████████████████▏ │
│ 2018 │ 1016845 │ ██████████████████████████████████████████████████▋ │
│ 2019 │ 1043277 │ ████████████████████████████████████████████████████▏ │
│ 2020 │ 1003963 │ ██████████████████████████████████████████████████
│ 2021 │ 940794 │ ███████████████████████████████████████████████
│ 2012 │ 519442 │ █████████████████████████▊ │
│ 2013 │ 616212 │ ██████████████████████████████▋ │
│ 2014 │ 724154 │ ████████████████████████████████████▏ │
│ 2015 │ 792129 │ ███████████████████████████████████████▌ │
│ 2016 │ 843655 │ ██████████████████████████████████████████▏ │
│ 2017 │ 982642 │ █████████████████████████████████████████████████▏ │
│ 2018 │ 1016835 │ ██████████████████████████████████████████████████▋ │
│ 2019 │ 1042849 │ ████████████████████████████████████████████████████▏ │
│ 2020 │ 1011889 │ ██████████████████████████████████████████████████
│ 2021 │ 960343 │ ███████████████████████████████████████████████
└──────┴─────────┴───────────────────────────────────────────────────────┘
27 rows in set. Elapsed: 0.024 sec. Processed 26.25 million rows, 76.88 MB (1.08 billion rows/s., 3.15 GB/s.)
```
Something happened in 2013. I don't have a clue. Maybe you have a clue what happened in 2020?
### The most expensive neighborhoods:
### Query 3. The Most Expensive Neighborhoods {#most-expensive-neighborhoods}
```
Query:
```sql
SELECT
town,
district,
@ -210,127 +240,126 @@ GROUP BY
district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100
LIMIT 100;
```
Result:
```text
┌─town─────────────────┬─district───────────────┬────c─┬───price─┬─bar(round(avg(price)), 0, 5000000, 100)────────────────────────────┐
│ LONDON │ CITY OF WESTMINSTER │ 3372 │ 3305225 │ ██████████████████████████████████████████████████████████████████ │
│ LONDON │ CITY OF LONDON │ 257 │ 3294478 │ █████████████████████████████████████████████████████████████████▊ │
│ LONDON │ KENSINGTON AND CHELSEA │ 2367 │ 2342422 │ ██████████████████████████████████████████████▋ │
│ LEATHERHEAD │ ELMBRIDGE │ 108 │ 1927143 │ ██████████████████████████████████████▌
VIRGINIA WATER │ RUNNYMEDE │ 142 │ 1868819 │ █████████████████████████████████████▍
LONDON │ CAMDEN │ 2815 │ 1736788 │ ██████████████████████████████████
THORNTON HEATH │ CROYDON │ 521 │ 1733051 │ ██████████████████████████████████▋
WINDLESHAM │ SURREY HEATH │ 103 │ 1717255 │ ██████████████████████████████████▎
│ BARNET │ ENFIELD │ 115 │ 1503458 │ ██████████████████████████████ │
OXFORD │ SOUTH OXFORDSHIRE │ 298 │ 1275200 │ █████████████████████████▌
│ LONDON │ ISLINGTON │ 2458 │ 1274308 │ █████████████████████████▍
COBHAM │ ELMBRIDGE │ 364 │ 1260005 │ █████████████████████████▏
│ LONDON │ HOUNSLOW │ 618 │ 1215682 │ ████████████████████████▎ │
ASCOT │ WINDSOR AND MAIDENHEAD │ 379 │ 1215146 │ ████████████████████████▎
LONDON │ RICHMOND UPON THAMES │ 654 │ 1207551 │ ████████████████████████▏
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 307 │ 1186220 │ ███████████████████████▋
│ RICHMOND │ RICHMOND UPON THAMES │ 805 │ 1100420 │ ██████████████████████
│ LONDON │ HAMMERSMITH AND FULHAM │ 2888 │ 1062959 │ █████████████████████▎
WEYBRIDGE │ ELMBRIDGE │ 607 │ 1027161 │ ████████████████████▌
│ RADLETT │ HERTSMERE │ 265 │ 1015896 │ ████████████████████▎
│ SALCOMBE │ SOUTH HAMS │ 124 │ 1014393 │ ████████████████████▎
BURFORD │ WEST OXFORDSHIRE │ 102 │ 993100 │ ███████████████████▋
│ ESHER │ ELMBRIDGE │ 454 │ 969770 │ ███████████████████▍
HINDHEAD │ WAVERLEY │ 128 │ 967786 │ ███████████████████▎
│ BROCKENHURST │ NEW FOREST │ 121 │ 967046 │ ███████████████████▎ │
LEATHERHEAD │ GUILDFORD │ 191 │ 964489 │ ███████████████████▎
GERRARDS CROSS │ BUCKINGHAMSHIRE │ 376 │ 958555 │ ███████████████████▏
EAST MOLESEY │ ELMBRIDGE │ 181 │ 943457 │ ██████████████████▋
OLNEY │ MILTON KEYNES │ 220 │ 942892 │ ██████████████████▋ │
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 135 │ 926950 │ ██████████████████▌ │
HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 509 │ 905732 │ ██████████████████
KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 889 │ 899689 │ █████████████████▊
BELVEDERE │ BEXLEY │ 313 │ 895336 │ █████████████████▊
CRANBROOK │ TUNBRIDGE WELLS │ 404 │ 888190 │ █████████████████▋ │
LONDON │ EALING │ 2460 │ 865893 │ █████████████████▎
MAIDENHEAD │ BUCKINGHAMSHIRE │ 114 │ 863814 │ █████████████████▎
LONDON │ MERTON │ 1958 │ 857192 │ █████████████████▏ │
GUILDFORD │ WAVERLEY │ 131 │ 854447 │ █████████████████
LONDON │ HACKNEY │ 3088 │ 846571 │ ████████████████▊
LYMM │ WARRINGTON │ 285 │ 839920 │ ████████████████▋ │
│ HARPENDEN │ ST ALBANS │ 606 │ 836994 │ ████████████████▋ │
│ LONDON │ WANDSWORTH │ 6113 │ 832292 │ ████████████████▋ │
│ LONDON │ SOUTHWARK │ 3612 │ 831319 │ ████████████████▋ │
BERKHAMSTED │ DACORUM │ 502 │ 830356 │ ████████████████▌
│ KINGS LANGLEY │ DACORUM │ 137 │ 821358 │ ████████████████▍ │
TONBRIDGE │ TUNBRIDGE WELLS │ 339 │ 806736 │ ████████████████▏
EPSOM │ REIGATE AND BANSTEAD │ 157 │ 805903 │ ████████████████ │
WOKING │ GUILDFORD │ 161 │ 803283 │ ████████████████
STOCKBRIDGE │ TEST VALLEY │ 168 │ 801973 │ ████████████████
TEDDINGTON │ RICHMOND UPON THAMES │ 539 │ 798591 │ ███████████████▊
OXFORD │ VALE OF WHITE HORSE │ 329 │ 792907 │ ███████████████▋ │
LONDON │ BARNET │ 3624 │ 789583 │ ███████████████▋ │
│ TWICKENHAM │ RICHMOND UPON THAMES │ 1090 │ 787760 │ ███████████████▋ │
LUTON │ CENTRAL BEDFORDSHIRE │ 196 │ 786051 │ ███████████████▋ │
TONBRIDGE │ MAIDSTONE │ 277 │ 785746 │ ███████████████▋ │
TOWCESTER │ WEST NORTHAMPTONSHIRE │ 186 │ 783532 │ ███████████████▋
│ LONDON │ LAMBETH │ 4832 │ 783422 │ ███████████████▋
LUTTERWORTH │ HARBOROUGH │ 515 │ 781775 │ ███████████████▋
WOODSTOCK │ WEST OXFORDSHIRE │ 135 │ 777499 │ ███████████████▌
ALRESFORD │ WINCHESTER │ 196 │ 775577 │ ███████████████▌
LONDON │ NEWHAM │ 2942 │ 768551 │ ███████████████▎ │
ALDERLEY EDGE │ CHESHIRE EAST │ 168 │ 768280 │ ███████████████▎ │
│ MARLOW │ BUCKINGHAMSHIRE │ 301 │ 762784 │ ███████████████▎
BILLINGSHURST │ CHICHESTER │ 134 │ 760920 │ ███████████████▏ │
LONDON │ TOWER HAMLETS │ 4183 │ 759635 │ ███████████████▏ │
MIDHURST │ CHICHESTER │ 245 │ 759101 │ ███████████████▏
THAMES DITTON │ ELMBRIDGE │ 227 │ 753347 │ ███████████████ │
POTTERS BAR │ WELWYN HATFIELD │ 163 │ 752926 │ ███████████████ │
REIGATE │ REIGATE AND BANSTEAD │ 555 │ 740961 │ ██████████████▋
TADWORTH │ REIGATE AND BANSTEAD │ 477 │ 738997 │ ██████████████▋
SEVENOAKS │ SEVENOAKS │ 1074 │ 734658 │ ██████████████▋ │
PETWORTH │ CHICHESTER │ 138 │ 732432 │ ██████████████▋ │
BOURNE END │ BUCKINGHAMSHIRE │ 127 │ 730742 │ ██████████████▌
PURLEY │ CROYDON │ 540 │ 727721 │ ██████████████▌
OXTED │ TANDRIDGE │ 320 │ 726078 │ ██████████████▌ │
LONDON │ HARINGEY │ 2988 │ 724573 │ ██████████████▍
BANSTEAD │ REIGATE AND BANSTEAD │ 373 │ 713834 │ ██████████████▎
PINNER │ HARROW │ 480 │ 712166 │ ██████████████▏
MALMESBURY │ WILTSHIRE │ 293 │ 707747 │ ██████████████▏
RICKMANSWORTH │ THREE RIVERS │ 732 │ 705400 │ ██████████████
SLOUGH │ BUCKINGHAMSHIRE │ 359 │ 705002 │ ██████████████
GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 214 │ 704904 │ ██████████████ │
READING │ SOUTH OXFORDSHIRE │ 295 │ 701697 │ ██████████████ │
HYTHE │ FOLKESTONE AND HYTHE │ 457 │ 700334 │ ██████████████ │
WELWYN │ WELWYN HATFIELD │ 217 │ 699649 │ █████████████▊ │
CHIGWELL │ EPPING FOREST │ 242 │ 697869 │ █████████████▊ │
BARNET │ BARNET │ 906 │ 695680 │ █████████████▊ │
HASLEMERE │ CHICHESTER │ 120 │ 694028 │ █████████████▊
LEATHERHEAD │ MOLE VALLEY │ 748 │ 692026 │ █████████████▋ │
LONDON │ BRENT │ 1945 │ 690799 │ █████████████▋ │
HASLEMERE │ WAVERLEY │ 258 │ 690765 │ █████████████▋ │
NORTHWOOD │ HILLINGDON │ 252 │ 690753 │ █████████████▋ │
WALTON-ON-THAMES │ ELMBRIDGE │ 871 │ 689431 │ █████████████▋ │
INGATESTONE │ BRENTWOOD │ 150 │ 688345 │ █████████████▋ │
OXFORD │ OXFORD │ 1761 │ 686114 │ █████████████▋ │
CHISLEHURST │ BROMLEY │ 410 │ 682892 │ █████████████▋
KINGS LANGLEY │ THREE RIVERS │ 109 │ 682320 │ █████████████▋
ASHTEAD │ MOLE VALLEY │ 280 │ 680483 │ █████████████▌ │
WOKING │ SURREY HEATH │ 269 │ 679035 │ █████████████▌ │
│ ASCOT │ BRACKNELL FOREST │ 160 │ 678632 │ █████████████▌ │
│ LONDON │ CITY OF WESTMINSTER │ 3606 │ 3280239 │ █████████████████████████████████████████████████████████████████
│ LONDON │ CITY OF LONDON │ 274 │ 3160502 │ ███████████████████████████████████████████████████████████████
│ LONDON │ KENSINGTON AND CHELSEA │ 2550 │ 2308478 │ ██████████████████████████████████████████████
│ LEATHERHEAD │ ELMBRIDGE │ 114 │ 1897407 │ █████████████████████████████████████
LONDON │ CAMDEN │ 3033 │ 1805404 │ ████████████████████████████████████
VIRGINIA WATER │ RUNNYMEDE │ 156 │ 1753247 │ ██████████████████████████████████
WINDLESHAM │ SURREY HEATH │ 108 │ 1677613 │ █████████████████████████████████
THORNTON HEATH │ CROYDON │ 546 │ 1671721 │ █████████████████████████████████
│ BARNET │ ENFIELD │ 124 │ 1505840 │ ██████████████████████████████ │
COBHAM │ ELMBRIDGE │ 387 │ 1237250 │ ████████████████████████▋
│ LONDON │ ISLINGTON │ 2668 │ 1236980 │ ████████████████████████▋
OXFORD │ SOUTH OXFORDSHIRE │ 321 │ 1220907 │ ████████████████████████▍
│ LONDON │ RICHMOND UPON THAMES │ 704 │ 1215551 │ ████████████████████████▎ │
LONDON │ HOUNSLOW │ 671 │ 1207493 │ ████████████████████████▏
ASCOT │ WINDSOR AND MAIDENHEAD │ 407 │ 1183299 │ ███████████████████████▋
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 330 │ 1175615 │ ███████████████████████▌
│ RICHMOND │ RICHMOND UPON THAMES │ 874 │ 1110444 │ ██████████████████████▏
│ LONDON │ HAMMERSMITH AND FULHAM │ 3086 │ 1053983 │ █████████████████████
SURBITON │ ELMBRIDGE │ 100 │ 1011800 │ ████████████████████▏
│ RADLETT │ HERTSMERE │ 283 │ 1011712 │ ████████████████████▏
│ SALCOMBE │ SOUTH HAMS │ 127 │ 1011624 │ ████████████████████▏
WEYBRIDGE │ ELMBRIDGE │ 655 │ 1007265 │ ████████████████████▏
│ ESHER │ ELMBRIDGE │ 485 │ 986581 │ ███████████████████▋
LEATHERHEAD │ GUILDFORD │ 202 │ 977320 │ ███████████████████▌
│ BURFORD │ WEST OXFORDSHIRE │ 111 │ 966893 │ ███████████████████▎ │
BROCKENHURST │ NEW FOREST │ 129 │ 956675 │ ███████████████████▏
HINDHEAD │ WAVERLEY │ 137 │ 953753 │ ███████████████████
GERRARDS CROSS │ BUCKINGHAMSHIRE │ 419 │ 951121 │ ███████████████████
EAST MOLESEY │ ELMBRIDGE │ 192 │ 936769 │ ██████████████████▋ │
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 146 │ 925515 │ ██████████████████▌ │
LONDON │ TOWER HAMLETS │ 4388 │ 918304 │ ██████████████████▎
OLNEY │ MILTON KEYNES │ 235 │ 910646 │ ██████████████████▏
HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 540 │ 902418 │ ██████████████████
LONDON │ SOUTHWARK │ 3885 │ 892997 │ █████████████████▋ │
KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 960 │ 885969 │ █████████████████▋
LONDON │ EALING │ 2658 │ 871755 │ █████████████████▍
CRANBROOK │ TUNBRIDGE WELLS │ 431 │ 862348 │ █████████████████▏ │
LONDON │ MERTON │ 2099 │ 859118 │ █████████████████▏
BELVEDERE │ BEXLEY │ 346 │ 842423 │ ████████████████▋
GUILDFORD │ WAVERLEY │ 143 │ 841277 │ ████████████████▋ │
│ HARPENDEN │ ST ALBANS │ 657 │ 841216 │ ████████████████▋ │
│ LONDON │ HACKNEY │ 3307 │ 837090 │ ████████████████▋ │
│ LONDON │ WANDSWORTH │ 6566 │ 832663 │ ████████████████▋ │
MAIDENHEAD │ BUCKINGHAMSHIRE │ 123 │ 824299 │ ████████████████▍
│ KINGS LANGLEY │ DACORUM │ 145 │ 821331 │ ████████████████▍ │
BERKHAMSTED │ DACORUM │ 543 │ 818415 │ ████████████████▎
GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 226 │ 802807 │ ████████████████ │
BILLINGSHURST │ CHICHESTER │ 144 │ 797829 │ ███████████████▊
WOKING │ GUILDFORD │ 176 │ 793494 │ ███████████████▋
STOCKBRIDGE │ TEST VALLEY │ 178 │ 793269 │ ███████████████▋
EPSOM │ REIGATE AND BANSTEAD │ 172 │ 791862 │ ███████████████▋ │
TONBRIDGE │ TUNBRIDGE WELLS │ 360 │ 787876 │ ███████████████▋ │
│ TEDDINGTON │ RICHMOND UPON THAMES │ 595 │ 786492 │ ███████████████▋ │
TWICKENHAM │ RICHMOND UPON THAMES │ 1155 │ 786193 │ ███████████████▋ │
LYNDHURST │ NEW FOREST │ 102 │ 785593 │ ███████████████▋ │
LONDON │ LAMBETH │ 5228 │ 774574 │ ███████████████▍
│ LONDON │ BARNET │ 3955 │ 773259 │ ███████████████▍
OXFORD │ VALE OF WHITE HORSE │ 353 │ 772088 │ ███████████████▍
TONBRIDGE │ MAIDSTONE │ 305 │ 770740 │ ███████████████▍
LUTTERWORTH │ HARBOROUGH │ 538 │ 768634 │ ███████████████▎
WOODSTOCK │ WEST OXFORDSHIRE │ 140 │ 766037 │ ███████████████▎ │
MIDHURST │ CHICHESTER │ 257 │ 764815 │ ███████████████▎ │
│ MARLOW │ BUCKINGHAMSHIRE │ 327 │ 761876 │ ███████████████▏
LONDON │ NEWHAM │ 3237 │ 761784 │ ███████████████▏ │
ALDERLEY EDGE │ CHESHIRE EAST │ 178 │ 757318 │ ███████████████▏ │
LUTON │ CENTRAL BEDFORDSHIRE │ 212 │ 754283 │ ███████████████
PETWORTH │ CHICHESTER │ 154 │ 754220 │ ███████████████ │
ALRESFORD │ WINCHESTER │ 219 │ 752718 │ ███████████████ │
POTTERS BAR │ WELWYN HATFIELD │ 174 │ 748465 │ ██████████████▊
HASLEMERE │ CHICHESTER │ 128 │ 746907 │ ██████████████▊
TADWORTH │ REIGATE AND BANSTEAD │ 502 │ 743252 │ ██████████████▋ │
THAMES DITTON │ ELMBRIDGE │ 244 │ 741913 │ ██████████████▋ │
REIGATE │ REIGATE AND BANSTEAD │ 581 │ 738198 │ ██████████████▋
BOURNE END │ BUCKINGHAMSHIRE │ 138 │ 735190 │ ██████████████▋
SEVENOAKS │ SEVENOAKS │ 1156 │ 730018 │ ██████████████▌ │
OXTED │ TANDRIDGE │ 336 │ 729123 │ ██████████████▌
INGATESTONE │ BRENTWOOD │ 166 │ 728103 │ ██████████████▌
LONDON │ BRENT │ 2079 │ 720605 │ ██████████████▍
LONDON │ HARINGEY │ 3216 │ 717780 │ ██████████████▎
PURLEY │ CROYDON │ 575 │ 716108 │ ██████████████▎
WELWYN │ WELWYN HATFIELD │ 222 │ 710603 │ ██████████████▏
RICKMANSWORTH │ THREE RIVERS │ 798 │ 704571 │ ██████████████ │
BANSTEAD │ REIGATE AND BANSTEAD │ 401 │ 701293 │ ██████████████ │
CHIGWELL │ EPPING FOREST │ 261 │ 701203 │ ██████████████ │
PINNER │ HARROW │ 528 │ 698885 │ █████████████▊ │
HASLEMERE │ WAVERLEY │ 280 │ 696659 │ █████████████▊ │
SLOUGH │ BUCKINGHAMSHIRE │ 396 │ 694917 │ █████████████▊ │
WALTON-ON-THAMES │ ELMBRIDGE │ 946 │ 692395 │ █████████████▋
READING │ SOUTH OXFORDSHIRE │ 318 │ 691988 │ █████████████▋ │
NORTHWOOD │ HILLINGDON │ 271 │ 690643 │ █████████████▋ │
FELTHAM │ HOUNSLOW │ 763 │ 688595 │ █████████████▋ │
ASHTEAD │ MOLE VALLEY │ 303 │ 687923 │ █████████████▋ │
BARNET │ BARNET │ 975 │ 686980 │ █████████████▋ │
WOKING │ SURREY HEATH │ 283 │ 686669 │ █████████████▋ │
MALMESBURY │ WILTSHIRE │ 323 │ 683324 │ █████████████▋ │
AMERSHAM │ BUCKINGHAMSHIRE │ 496 │ 680962 │ █████████████▌
CHISLEHURST │ BROMLEY │ 430 │ 680209 │ █████████████▌
HYTHE │ FOLKESTONE AND HYTHE │ 490 │ 676908 │ █████████████▌ │
MAYFIELD │ WEALDEN │ 101 │ 676210 │ █████████████▌ │
│ ASCOT │ BRACKNELL FOREST │ 168 │ 676004 │ █████████████▌ │
└──────────────────────┴────────────────────────┴──────┴─────────┴────────────────────────────────────────────────────────────────────┘
100 rows in set. Elapsed: 0.039 sec. Processed 26.25 million rows, 278.03 MB (674.32 million rows/s., 7.14 GB/s.)
```
### Test it in Playground
## Let's Speed Up Queries Using Projections {#speedup-with-projections}
The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUIHRvd24sIGRpc3RyaWN0LCBjb3VudCgpIEFTIGMsIHJvdW5kKGF2ZyhwcmljZSkpIEFTIHByaWNlLCBiYXIocHJpY2UsIDAsIDUwMDAwMDAsIDEwMCkgRlJPTSB1a19wcmljZV9wYWlkIFdIRVJFIGRhdGUgPj0gJzIwMjAtMDEtMDEnIEdST1VQIEJZIHRvd24sIGRpc3RyaWN0IEhBVklORyBjID49IDEwMCBPUkRFUiBCWSBwcmljZSBERVNDIExJTUlUIDEwMA==).
[Projections](../../sql-reference/statements/alter/projection.md) allow to improve queries speed by storing pre-aggregated data.
## Let's speed up queries using projections
### Build a Projection {#build-projection}
[Projections](https://../../sql-reference/statements/alter/projection/) allow to improve queries speed by storing pre-aggregated data.
### Build a projection
```
-- create an aggregate projection by dimensions (toYear(date), district, town)
Create an aggregate projection by dimensions `toYear(date)`, `district`, `town`:
```sql
ALTER TABLE uk_price_paid
ADD PROJECTION projection_by_year_district_town
(
@ -346,25 +375,31 @@ ALTER TABLE uk_price_paid
district,
town
);
```
-- populate the projection for existing data (without it projection will be
-- created for only newly inserted data)
Populate the projection for existing data (without it projection will be created for only newly inserted data):
```sql
ALTER TABLE uk_price_paid
MATERIALIZE PROJECTION projection_by_year_district_town
SETTINGS mutations_sync = 1;
```
## Test performance
## Test Performance {#test-performance}
Let's run the same 3 queries.
[Enable](../../operations/settings/settings.md#allow-experimental-projection-optimization) projections for selects:
```sql
SET allow_experimental_projection_optimization = 1;
```
-- enable projections for selects
set allow_experimental_projection_optimization=1;
-- Q1) Average price per year:
### Query 1. Average Price Per Year {#average-price-projections}
Query:
```sql
SELECT
toYear(date) AS year,
round(avg(price)) AS price,
@ -372,41 +407,47 @@ SELECT
FROM uk_price_paid
GROUP BY year
ORDER BY year ASC;
```
Result:
```text
┌─year─┬──price─┬─bar(round(avg(price)), 0, 1000000, 80)─┐
│ 1995 │ 67932 │ █████▍ │
│ 1996 │ 71505 │ █████▋ │
│ 1997 │ 78532 │ ██████▎ │
│ 1998 │ 85435 │ ██████▋ │
│ 1999 │ 96036 │ ███████▋ │
│ 2000 │ 107478 │ ████████▌ │
│ 2001 │ 118886 │ █████████▌ │
│ 2002 │ 137940 │ ███████████ │
│ 2003 │ 155888 │ ████████████▍ │
│ 1998 │ 85436 │ ██████▋ │
│ 1999 │ 96037 │ ███████▋ │
│ 2000 │ 107479 │ ████████▌ │
│ 2001 │ 118885 │ █████████▌ │
│ 2002 │ 137941 │ ███████████ │
│ 2003 │ 155889 │ ████████████▍ │
│ 2004 │ 178885 │ ██████████████▎ │
│ 2005 │ 189350 │ ███████████████▏ │
│ 2005 │ 189351 │ ███████████████▏ │
│ 2006 │ 203528 │ ████████████████▎ │
│ 2007 │ 219377 │ █████████████████▌ │
│ 2007 │ 219378 │ █████████████████▌ │
│ 2008 │ 217056 │ █████████████████▎ │
│ 2009 │ 213419 │ █████████████████ │
│ 2010 │ 236110 │ ██████████████████▊ │
│ 2011 │ 232804 │ ██████████████████▌ │
│ 2012 │ 238366 │ ███████████████████ │
│ 2010 │ 236109 │ ██████████████████▊ │
│ 2011 │ 232805 │ ██████████████████▌ │
│ 2012 │ 238367 │ ███████████████████ │
│ 2013 │ 256931 │ ████████████████████▌ │
│ 2014 │ 279917 │ ██████████████████████▍ │
│ 2015 │ 297264 │ ███████████████████████▋ │
│ 2016 │ 313197 │ █████████████████████████ │
│ 2017 │ 346070 │ ███████████████████████████▋ │
│ 2018 │ 350117 │ ████████████████████████████ │
│ 2019 │ 351010 │ ████████████████████████████ │
│ 2020 │ 368974 │ █████████████████████████████▌ │
│ 2021 │ 384351 │ ██████████████████████████████▋
│ 2014 │ 279915 │ ██████████████████████▍ │
│ 2015 │ 297266 │ ███████████████████████▋ │
│ 2016 │ 313201 │ █████████████████████████ │
│ 2017 │ 346097 │ ███████████████████████████▋ │
│ 2018 │ 350116 │ ████████████████████████████ │
│ 2019 │ 351013 │ ████████████████████████████ │
│ 2020 │ 369420 │ █████████████████████████████▌ │
│ 2021 │ 386903 │ ██████████████████████████████▊
└──────┴────────┴────────────────────────────────────────┘
```
27 rows in set. Elapsed: 0.003 sec. Processed 106.87 thousand rows, 3.21 MB (31.92 million rows/s., 959.03 MB/s.)
### Query 2. Average Price Per Year in London {#average-price-london-projections}
-- Q2) Average price per year in London:
Query:
```sql
SELECT
toYear(date) AS year,
round(avg(price)) AS price,
@ -415,42 +456,49 @@ FROM uk_price_paid
WHERE town = 'LONDON'
GROUP BY year
ORDER BY year ASC;
```
Result:
```text
┌─year─┬───price─┬─bar(round(avg(price)), 0, 2000000, 100)───────────────┐
│ 1995 │ 109112 │ █████▍ │
│ 1995 │ 109116 │ █████▍ │
│ 1996 │ 118667 │ █████▊ │
│ 1997 │ 136518 │ ██████▋ │
│ 1998 │ 152983 │ ███████▋ │
│ 1999 │ 180633 │ █████████ │
│ 2000 │ 215830 │ ██████████▋ │
│ 2001 │ 232996 │ ███████████▋ │
│ 2002 │ 263672 │ █████████████▏ │
│ 1999 │ 180637 │ █████████ │
│ 2000 │ 215838 │ ██████████▋ │
│ 2001 │ 232994 │ ███████████▋ │
│ 2002 │ 263670 │ █████████████▏ │
│ 2003 │ 278394 │ █████████████▊ │
│ 2004 │ 304665 │ ███████████████▏ │
│ 2004 │ 304666 │ ███████████████▏ │
│ 2005 │ 322875 │ ████████████████▏ │
│ 2006 │ 356192 │ █████████████████▋ │
│ 2007 │ 404055 │ ████████████████████▏ │
│ 2006 │ 356191 │ █████████████████▋ │
│ 2007 │ 404054 │ ████████████████████▏ │
│ 2008 │ 420741 │ █████████████████████ │
│ 2009 │ 427754 │ █████████████████████▍ │
│ 2009 │ 427753 │ █████████████████████▍ │
│ 2010 │ 480306 │ ████████████████████████ │
│ 2011 │ 496274 │ ████████████████████████▋ │
│ 2012 │ 519441 │ █████████████████████████▊ │
│ 2013 │ 616209 │ ██████████████████████████████▋ │
│ 2014 │ 724144 │ ████████████████████████████████████▏ │
│ 2015 │ 792112 │ ███████████████████████████████████████▌ │
│ 2016 │ 843568 │ ██████████████████████████████████████████▏ │
│ 2017 │ 982566 │ █████████████████████████████████████████████████▏ │
│ 2018 │ 1016845 │ ██████████████████████████████████████████████████▋ │
│ 2019 │ 1043277 │ ████████████████████████████████████████████████████▏ │
│ 2020 │ 1003963 │ ██████████████████████████████████████████████████
│ 2021 │ 940794 │ ███████████████████████████████████████████████
│ 2012 │ 519442 │ █████████████████████████▊ │
│ 2013 │ 616212 │ ██████████████████████████████▋ │
│ 2014 │ 724154 │ ████████████████████████████████████▏ │
│ 2015 │ 792129 │ ███████████████████████████████████████▌ │
│ 2016 │ 843655 │ ██████████████████████████████████████████▏ │
│ 2017 │ 982642 │ █████████████████████████████████████████████████▏ │
│ 2018 │ 1016835 │ ██████████████████████████████████████████████████▋ │
│ 2019 │ 1042849 │ ████████████████████████████████████████████████████▏ │
│ 2020 │ 1011889 │ ██████████████████████████████████████████████████
│ 2021 │ 960343 │ ███████████████████████████████████████████████
└──────┴─────────┴───────────────────────────────────────────────────────┘
```
27 rows in set. Elapsed: 0.005 sec. Processed 106.87 thousand rows, 3.53 MB (23.49 million rows/s., 775.95 MB/s.)
### Query 3. The Most Expensive Neighborhoods {#most-expensive-neighborhoods-projections}
-- Q3) The most expensive neighborhoods:
-- the condition (date >= '2020-01-01') needs to be modified to match projection dimension (toYear(date) >= 2020)
The condition (date >= '2020-01-01') needs to be modified to match projection dimension (toYear(date) >= 2020).
Query:
```sql
SELECT
town,
district,
@ -464,118 +512,138 @@ GROUP BY
district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100
┌─town─────────────────┬─district───────────────┬────c─┬───price─┬─bar(round(avg(price)), 0, 5000000, 100)────────────────────────────┐
│ LONDON │ CITY OF WESTMINSTER │ 3372 │ 3305225 │ ██████████████████████████████████████████████████████████████████ │
│ LONDON │ CITY OF LONDON │ 257 │ 3294478 │ █████████████████████████████████████████████████████████████████▊ │
│ LONDON │ KENSINGTON AND CHELSEA │ 2367 │ 2342422 │ ██████████████████████████████████████████████▋ │
│ LEATHERHEAD │ ELMBRIDGE │ 108 │ 1927143 │ ██████████████████████████████████████▌ │
│ VIRGINIA WATER │ RUNNYMEDE │ 142 │ 1868819 │ █████████████████████████████████████▍ │
│ LONDON │ CAMDEN │ 2815 │ 1736788 │ ██████████████████████████████████▋ │
│ THORNTON HEATH │ CROYDON │ 521 │ 1733051 │ ██████████████████████████████████▋ │
│ WINDLESHAM │ SURREY HEATH │ 103 │ 1717255 │ ██████████████████████████████████▎ │
│ BARNET │ ENFIELD │ 115 │ 1503458 │ ██████████████████████████████ │
│ OXFORD │ SOUTH OXFORDSHIRE │ 298 │ 1275200 │ █████████████████████████▌ │
│ LONDON │ ISLINGTON │ 2458 │ 1274308 │ █████████████████████████▍ │
│ COBHAM │ ELMBRIDGE │ 364 │ 1260005 │ █████████████████████████▏ │
│ LONDON │ HOUNSLOW │ 618 │ 1215682 │ ████████████████████████▎ │
│ ASCOT │ WINDSOR AND MAIDENHEAD │ 379 │ 1215146 │ ████████████████████████▎ │
│ LONDON │ RICHMOND UPON THAMES │ 654 │ 1207551 │ ████████████████████████▏ │
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 307 │ 1186220 │ ███████████████████████▋ │
│ RICHMOND │ RICHMOND UPON THAMES │ 805 │ 1100420 │ ██████████████████████ │
│ LONDON │ HAMMERSMITH AND FULHAM │ 2888 │ 1062959 │ █████████████████████▎ │
│ WEYBRIDGE │ ELMBRIDGE │ 607 │ 1027161 │ ████████████████████▌ │
│ RADLETT │ HERTSMERE │ 265 │ 1015896 │ ████████████████████▎ │
│ SALCOMBE │ SOUTH HAMS │ 124 │ 1014393 │ ████████████████████▎ │
│ BURFORD │ WEST OXFORDSHIRE │ 102 │ 993100 │ ███████████████████▋ │
│ ESHER │ ELMBRIDGE │ 454 │ 969770 │ ███████████████████▍ │
│ HINDHEAD │ WAVERLEY │ 128 │ 967786 │ ███████████████████▎ │
│ BROCKENHURST │ NEW FOREST │ 121 │ 967046 │ ███████████████████▎ │
│ LEATHERHEAD │ GUILDFORD │ 191 │ 964489 │ ███████████████████▎ │
│ GERRARDS CROSS │ BUCKINGHAMSHIRE │ 376 │ 958555 │ ███████████████████▏ │
│ EAST MOLESEY │ ELMBRIDGE │ 181 │ 943457 │ ██████████████████▋ │
│ OLNEY │ MILTON KEYNES │ 220 │ 942892 │ ██████████████████▋ │
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 135 │ 926950 │ ██████████████████▌ │
│ HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 509 │ 905732 │ ██████████████████ │
│ KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 889 │ 899689 │ █████████████████▊ │
│ BELVEDERE │ BEXLEY │ 313 │ 895336 │ █████████████████▊ │
│ CRANBROOK │ TUNBRIDGE WELLS │ 404 │ 888190 │ █████████████████▋ │
│ LONDON │ EALING │ 2460 │ 865893 │ █████████████████▎ │
│ MAIDENHEAD │ BUCKINGHAMSHIRE │ 114 │ 863814 │ █████████████████▎ │
│ LONDON │ MERTON │ 1958 │ 857192 │ █████████████████▏ │
│ GUILDFORD │ WAVERLEY │ 131 │ 854447 │ █████████████████ │
│ LONDON │ HACKNEY │ 3088 │ 846571 │ ████████████████▊ │
│ LYMM │ WARRINGTON │ 285 │ 839920 │ ████████████████▋ │
│ HARPENDEN │ ST ALBANS │ 606 │ 836994 │ ████████████████▋ │
│ LONDON │ WANDSWORTH │ 6113 │ 832292 │ ████████████████▋ │
│ LONDON │ SOUTHWARK │ 3612 │ 831319 │ ████████████████▋ │
│ BERKHAMSTED │ DACORUM │ 502 │ 830356 │ ████████████████▌ │
│ KINGS LANGLEY │ DACORUM │ 137 │ 821358 │ ████████████████▍ │
│ TONBRIDGE │ TUNBRIDGE WELLS │ 339 │ 806736 │ ████████████████▏ │
│ EPSOM │ REIGATE AND BANSTEAD │ 157 │ 805903 │ ████████████████ │
│ WOKING │ GUILDFORD │ 161 │ 803283 │ ████████████████ │
│ STOCKBRIDGE │ TEST VALLEY │ 168 │ 801973 │ ████████████████ │
│ TEDDINGTON │ RICHMOND UPON THAMES │ 539 │ 798591 │ ███████████████▊ │
│ OXFORD │ VALE OF WHITE HORSE │ 329 │ 792907 │ ███████████████▋ │
│ LONDON │ BARNET │ 3624 │ 789583 │ ███████████████▋ │
│ TWICKENHAM │ RICHMOND UPON THAMES │ 1090 │ 787760 │ ███████████████▋ │
│ LUTON │ CENTRAL BEDFORDSHIRE │ 196 │ 786051 │ ███████████████▋ │
│ TONBRIDGE │ MAIDSTONE │ 277 │ 785746 │ ███████████████▋ │
│ TOWCESTER │ WEST NORTHAMPTONSHIRE │ 186 │ 783532 │ ███████████████▋ │
│ LONDON │ LAMBETH │ 4832 │ 783422 │ ███████████████▋ │
│ LUTTERWORTH │ HARBOROUGH │ 515 │ 781775 │ ███████████████▋ │
│ WOODSTOCK │ WEST OXFORDSHIRE │ 135 │ 777499 │ ███████████████▌ │
│ ALRESFORD │ WINCHESTER │ 196 │ 775577 │ ███████████████▌ │
│ LONDON │ NEWHAM │ 2942 │ 768551 │ ███████████████▎ │
│ ALDERLEY EDGE │ CHESHIRE EAST │ 168 │ 768280 │ ███████████████▎ │
│ MARLOW │ BUCKINGHAMSHIRE │ 301 │ 762784 │ ███████████████▎ │
│ BILLINGSHURST │ CHICHESTER │ 134 │ 760920 │ ███████████████▏ │
│ LONDON │ TOWER HAMLETS │ 4183 │ 759635 │ ███████████████▏ │
│ MIDHURST │ CHICHESTER │ 245 │ 759101 │ ███████████████▏ │
│ THAMES DITTON │ ELMBRIDGE │ 227 │ 753347 │ ███████████████ │
│ POTTERS BAR │ WELWYN HATFIELD │ 163 │ 752926 │ ███████████████ │
│ REIGATE │ REIGATE AND BANSTEAD │ 555 │ 740961 │ ██████████████▋ │
│ TADWORTH │ REIGATE AND BANSTEAD │ 477 │ 738997 │ ██████████████▋ │
│ SEVENOAKS │ SEVENOAKS │ 1074 │ 734658 │ ██████████████▋ │
│ PETWORTH │ CHICHESTER │ 138 │ 732432 │ ██████████████▋ │
│ BOURNE END │ BUCKINGHAMSHIRE │ 127 │ 730742 │ ██████████████▌ │
│ PURLEY │ CROYDON │ 540 │ 727721 │ ██████████████▌ │
│ OXTED │ TANDRIDGE │ 320 │ 726078 │ ██████████████▌ │
│ LONDON │ HARINGEY │ 2988 │ 724573 │ ██████████████▍ │
│ BANSTEAD │ REIGATE AND BANSTEAD │ 373 │ 713834 │ ██████████████▎ │
│ PINNER │ HARROW │ 480 │ 712166 │ ██████████████▏ │
│ MALMESBURY │ WILTSHIRE │ 293 │ 707747 │ ██████████████▏ │
│ RICKMANSWORTH │ THREE RIVERS │ 732 │ 705400 │ ██████████████ │
│ SLOUGH │ BUCKINGHAMSHIRE │ 359 │ 705002 │ ██████████████ │
│ GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 214 │ 704904 │ ██████████████ │
│ READING │ SOUTH OXFORDSHIRE │ 295 │ 701697 │ ██████████████ │
│ HYTHE │ FOLKESTONE AND HYTHE │ 457 │ 700334 │ ██████████████ │
│ WELWYN │ WELWYN HATFIELD │ 217 │ 699649 │ █████████████▊ │
│ CHIGWELL │ EPPING FOREST │ 242 │ 697869 │ █████████████▊ │
│ BARNET │ BARNET │ 906 │ 695680 │ █████████████▊ │
│ HASLEMERE │ CHICHESTER │ 120 │ 694028 │ █████████████▊ │
│ LEATHERHEAD │ MOLE VALLEY │ 748 │ 692026 │ █████████████▋ │
│ LONDON │ BRENT │ 1945 │ 690799 │ █████████████▋ │
│ HASLEMERE │ WAVERLEY │ 258 │ 690765 │ █████████████▋ │
│ NORTHWOOD │ HILLINGDON │ 252 │ 690753 │ █████████████▋ │
│ WALTON-ON-THAMES │ ELMBRIDGE │ 871 │ 689431 │ █████████████▋ │
│ INGATESTONE │ BRENTWOOD │ 150 │ 688345 │ █████████████▋ │
│ OXFORD │ OXFORD │ 1761 │ 686114 │ █████████████▋ │
│ CHISLEHURST │ BROMLEY │ 410 │ 682892 │ █████████████▋ │
│ KINGS LANGLEY │ THREE RIVERS │ 109 │ 682320 │ █████████████▋ │
│ ASHTEAD │ MOLE VALLEY │ 280 │ 680483 │ █████████████▌ │
│ WOKING │ SURREY HEATH │ 269 │ 679035 │ █████████████▌ │
│ ASCOT │ BRACKNELL FOREST │ 160 │ 678632 │ █████████████▌ │
└──────────────────────┴────────────────────────┴──────┴─────────┴────────────────────────────────────────────────────────────────────┘
100 rows in set. Elapsed: 0.005 sec. Processed 12.85 thousand rows, 813.40 KB (2.73 million rows/s., 172.95 MB/s.)
LIMIT 100;
```
Result:
```text
┌─town─────────────────┬─district───────────────┬────c─┬───price─┬─bar(round(avg(price)), 0, 5000000, 100)────────────────────────────┐
│ LONDON │ CITY OF WESTMINSTER │ 3606 │ 3280239 │ █████████████████████████████████████████████████████████████████▌ │
│ LONDON │ CITY OF LONDON │ 274 │ 3160502 │ ███████████████████████████████████████████████████████████████▏ │
│ LONDON │ KENSINGTON AND CHELSEA │ 2550 │ 2308478 │ ██████████████████████████████████████████████▏ │
│ LEATHERHEAD │ ELMBRIDGE │ 114 │ 1897407 │ █████████████████████████████████████▊ │
│ LONDON │ CAMDEN │ 3033 │ 1805404 │ ████████████████████████████████████ │
│ VIRGINIA WATER │ RUNNYMEDE │ 156 │ 1753247 │ ███████████████████████████████████ │
│ WINDLESHAM │ SURREY HEATH │ 108 │ 1677613 │ █████████████████████████████████▌ │
│ THORNTON HEATH │ CROYDON │ 546 │ 1671721 │ █████████████████████████████████▍ │
│ BARNET │ ENFIELD │ 124 │ 1505840 │ ██████████████████████████████ │
│ COBHAM │ ELMBRIDGE │ 387 │ 1237250 │ ████████████████████████▋ │
│ LONDON │ ISLINGTON │ 2668 │ 1236980 │ ████████████████████████▋ │
│ OXFORD │ SOUTH OXFORDSHIRE │ 321 │ 1220907 │ ████████████████████████▍ │
│ LONDON │ RICHMOND UPON THAMES │ 704 │ 1215551 │ ████████████████████████▎ │
│ LONDON │ HOUNSLOW │ 671 │ 1207493 │ ████████████████████████▏ │
│ ASCOT │ WINDSOR AND MAIDENHEAD │ 407 │ 1183299 │ ███████████████████████▋ │
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 330 │ 1175615 │ ███████████████████████▌ │
│ RICHMOND │ RICHMOND UPON THAMES │ 874 │ 1110444 │ ██████████████████████▏ │
│ LONDON │ HAMMERSMITH AND FULHAM │ 3086 │ 1053983 │ █████████████████████ │
│ SURBITON │ ELMBRIDGE │ 100 │ 1011800 │ ████████████████████▏ │
│ RADLETT │ HERTSMERE │ 283 │ 1011712 │ ████████████████████▏ │
│ SALCOMBE │ SOUTH HAMS │ 127 │ 1011624 │ ████████████████████▏ │
│ WEYBRIDGE │ ELMBRIDGE │ 655 │ 1007265 │ ████████████████████▏ │
│ ESHER │ ELMBRIDGE │ 485 │ 986581 │ ███████████████████▋ │
│ LEATHERHEAD │ GUILDFORD │ 202 │ 977320 │ ███████████████████▌ │
│ BURFORD │ WEST OXFORDSHIRE │ 111 │ 966893 │ ███████████████████▎ │
│ BROCKENHURST │ NEW FOREST │ 129 │ 956675 │ ███████████████████▏ │
│ HINDHEAD │ WAVERLEY │ 137 │ 953753 │ ███████████████████ │
│ GERRARDS CROSS │ BUCKINGHAMSHIRE │ 419 │ 951121 │ ███████████████████ │
│ EAST MOLESEY │ ELMBRIDGE │ 192 │ 936769 │ ██████████████████▋ │
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 146 │ 925515 │ ██████████████████▌ │
│ LONDON │ TOWER HAMLETS │ 4388 │ 918304 │ ██████████████████▎ │
│ OLNEY │ MILTON KEYNES │ 235 │ 910646 │ ██████████████████▏ │
│ HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 540 │ 902418 │ ██████████████████ │
│ LONDON │ SOUTHWARK │ 3885 │ 892997 │ █████████████████▋ │
│ KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 960 │ 885969 │ █████████████████▋ │
│ LONDON │ EALING │ 2658 │ 871755 │ █████████████████▍ │
│ CRANBROOK │ TUNBRIDGE WELLS │ 431 │ 862348 │ █████████████████▏ │
│ LONDON │ MERTON │ 2099 │ 859118 │ █████████████████▏ │
│ BELVEDERE │ BEXLEY │ 346 │ 842423 │ ████████████████▋ │
│ GUILDFORD │ WAVERLEY │ 143 │ 841277 │ ████████████████▋ │
│ HARPENDEN │ ST ALBANS │ 657 │ 841216 │ ████████████████▋ │
│ LONDON │ HACKNEY │ 3307 │ 837090 │ ████████████████▋ │
│ LONDON │ WANDSWORTH │ 6566 │ 832663 │ ████████████████▋ │
│ MAIDENHEAD │ BUCKINGHAMSHIRE │ 123 │ 824299 │ ████████████████▍ │
│ KINGS LANGLEY │ DACORUM │ 145 │ 821331 │ ████████████████▍ │
│ BERKHAMSTED │ DACORUM │ 543 │ 818415 │ ████████████████▎ │
│ GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 226 │ 802807 │ ████████████████ │
│ BILLINGSHURST │ CHICHESTER │ 144 │ 797829 │ ███████████████▊ │
│ WOKING │ GUILDFORD │ 176 │ 793494 │ ███████████████▋ │
│ STOCKBRIDGE │ TEST VALLEY │ 178 │ 793269 │ ███████████████▋ │
│ EPSOM │ REIGATE AND BANSTEAD │ 172 │ 791862 │ ███████████████▋ │
│ TONBRIDGE │ TUNBRIDGE WELLS │ 360 │ 787876 │ ███████████████▋ │
│ TEDDINGTON │ RICHMOND UPON THAMES │ 595 │ 786492 │ ███████████████▋ │
│ TWICKENHAM │ RICHMOND UPON THAMES │ 1155 │ 786193 │ ███████████████▋ │
│ LYNDHURST │ NEW FOREST │ 102 │ 785593 │ ███████████████▋ │
│ LONDON │ LAMBETH │ 5228 │ 774574 │ ███████████████▍ │
│ LONDON │ BARNET │ 3955 │ 773259 │ ███████████████▍ │
│ OXFORD │ VALE OF WHITE HORSE │ 353 │ 772088 │ ███████████████▍ │
│ TONBRIDGE │ MAIDSTONE │ 305 │ 770740 │ ███████████████▍ │
│ LUTTERWORTH │ HARBOROUGH │ 538 │ 768634 │ ███████████████▎ │
│ WOODSTOCK │ WEST OXFORDSHIRE │ 140 │ 766037 │ ███████████████▎ │
│ MIDHURST │ CHICHESTER │ 257 │ 764815 │ ███████████████▎ │
│ MARLOW │ BUCKINGHAMSHIRE │ 327 │ 761876 │ ███████████████▏ │
│ LONDON │ NEWHAM │ 3237 │ 761784 │ ███████████████▏ │
│ ALDERLEY EDGE │ CHESHIRE EAST │ 178 │ 757318 │ ███████████████▏ │
│ LUTON │ CENTRAL BEDFORDSHIRE │ 212 │ 754283 │ ███████████████ │
│ PETWORTH │ CHICHESTER │ 154 │ 754220 │ ███████████████ │
│ ALRESFORD │ WINCHESTER │ 219 │ 752718 │ ███████████████ │
│ POTTERS BAR │ WELWYN HATFIELD │ 174 │ 748465 │ ██████████████▊ │
│ HASLEMERE │ CHICHESTER │ 128 │ 746907 │ ██████████████▊ │
│ TADWORTH │ REIGATE AND BANSTEAD │ 502 │ 743252 │ ██████████████▋ │
│ THAMES DITTON │ ELMBRIDGE │ 244 │ 741913 │ ██████████████▋ │
│ REIGATE │ REIGATE AND BANSTEAD │ 581 │ 738198 │ ██████████████▋ │
│ BOURNE END │ BUCKINGHAMSHIRE │ 138 │ 735190 │ ██████████████▋ │
│ SEVENOAKS │ SEVENOAKS │ 1156 │ 730018 │ ██████████████▌ │
│ OXTED │ TANDRIDGE │ 336 │ 729123 │ ██████████████▌ │
│ INGATESTONE │ BRENTWOOD │ 166 │ 728103 │ ██████████████▌ │
│ LONDON │ BRENT │ 2079 │ 720605 │ ██████████████▍ │
│ LONDON │ HARINGEY │ 3216 │ 717780 │ ██████████████▎ │
│ PURLEY │ CROYDON │ 575 │ 716108 │ ██████████████▎ │
│ WELWYN │ WELWYN HATFIELD │ 222 │ 710603 │ ██████████████▏ │
│ RICKMANSWORTH │ THREE RIVERS │ 798 │ 704571 │ ██████████████ │
│ BANSTEAD │ REIGATE AND BANSTEAD │ 401 │ 701293 │ ██████████████ │
│ CHIGWELL │ EPPING FOREST │ 261 │ 701203 │ ██████████████ │
│ PINNER │ HARROW │ 528 │ 698885 │ █████████████▊ │
│ HASLEMERE │ WAVERLEY │ 280 │ 696659 │ █████████████▊ │
│ SLOUGH │ BUCKINGHAMSHIRE │ 396 │ 694917 │ █████████████▊ │
│ WALTON-ON-THAMES │ ELMBRIDGE │ 946 │ 692395 │ █████████████▋ │
│ READING │ SOUTH OXFORDSHIRE │ 318 │ 691988 │ █████████████▋ │
│ NORTHWOOD │ HILLINGDON │ 271 │ 690643 │ █████████████▋ │
│ FELTHAM │ HOUNSLOW │ 763 │ 688595 │ █████████████▋ │
│ ASHTEAD │ MOLE VALLEY │ 303 │ 687923 │ █████████████▋ │
│ BARNET │ BARNET │ 975 │ 686980 │ █████████████▋ │
│ WOKING │ SURREY HEATH │ 283 │ 686669 │ █████████████▋ │
│ MALMESBURY │ WILTSHIRE │ 323 │ 683324 │ █████████████▋ │
│ AMERSHAM │ BUCKINGHAMSHIRE │ 496 │ 680962 │ █████████████▌ │
│ CHISLEHURST │ BROMLEY │ 430 │ 680209 │ █████████████▌ │
│ HYTHE │ FOLKESTONE AND HYTHE │ 490 │ 676908 │ █████████████▌ │
│ MAYFIELD │ WEALDEN │ 101 │ 676210 │ █████████████▌ │
│ ASCOT │ BRACKNELL FOREST │ 168 │ 676004 │ █████████████▌ │
└──────────────────────┴────────────────────────┴──────┴─────────┴────────────────────────────────────────────────────────────────────┘
```
### Summary {#summary}
All 3 queries work much faster and read fewer rows.
```text
Query 1
no projection: 27 rows in set. Elapsed: 0.158 sec. Processed 26.32 million rows, 157.93 MB (166.57 million rows/s., 999.39 MB/s.)
projection: 27 rows in set. Elapsed: 0.007 sec. Processed 105.96 thousand rows, 3.33 MB (14.58 million rows/s., 458.13 MB/s.)
Query 2
no projection: 27 rows in set. Elapsed: 0.163 sec. Processed 26.32 million rows, 80.01 MB (161.75 million rows/s., 491.64 MB/s.)
projection: 27 rows in set. Elapsed: 0.008 sec. Processed 105.96 thousand rows, 3.67 MB (13.29 million rows/s., 459.89 MB/s.)
Query 3
no projection: 100 rows in set. Elapsed: 0.069 sec. Processed 26.32 million rows, 62.47 MB (382.13 million rows/s., 906.93 MB/s.)
projection: 100 rows in set. Elapsed: 0.029 sec. Processed 8.08 thousand rows, 511.08 KB (276.06 thousand rows/s., 17.47 MB/s.)
```
Q1)
no projection: 27 rows in set. Elapsed: 0.027 sec. Processed 26.25 million rows, 157.49 MB (955.96 million rows/s., 5.74 GB/s.)
projection: 27 rows in set. Elapsed: 0.003 sec. Processed 106.87 thousand rows, 3.21 MB (31.92 million rows/s., 959.03 MB/s.)
```
### Test It in Playground {#playground}
The dataset is also available in the [Online Playground](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUIHRvd24sIGRpc3RyaWN0LCBjb3VudCgpIEFTIGMsIHJvdW5kKGF2ZyhwcmljZSkpIEFTIHByaWNlLCBiYXIocHJpY2UsIDAsIDUwMDAwMDAsIDEwMCkgRlJPTSB1a19wcmljZV9wYWlkIFdIRVJFIGRhdGUgPj0gJzIwMjAtMDEtMDEnIEdST1VQIEJZIHRvd24sIGRpc3RyaWN0IEhBVklORyBjID49IDEwMCBPUkRFUiBCWSBwcmljZSBERVNDIExJTUlUIDEwMA==).

View File

@ -29,7 +29,7 @@ It is recommended to use official pre-compiled `deb` packages for Debian or Ubun
If you want to use the most recent version, replace `stable` with `testing` (this is recommended for your testing environments).
You can also download and install packages manually from [here](https://repo.clickhouse.tech/deb/stable/main/).
You can also download and install packages manually from [here](https://repo.clickhouse.com/deb/stable/main/).
#### Packages {#packages}
@ -50,8 +50,8 @@ First, you need to add the official repository:
``` bash
sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/stable/x86_64
```
If you want to use the most recent version, replace `stable` with `testing` (this is recommended for your testing environments). `prestable` is sometimes also available.
@ -62,21 +62,21 @@ Then run these commands to install packages:
sudo yum install clickhouse-server clickhouse-client
```
You can also download and install packages manually from [here](https://repo.clickhouse.tech/rpm/stable/x86_64).
You can also download and install packages manually from [here](https://repo.clickhouse.com/rpm/stable/x86_64).
### From Tgz Archives {#from-tgz-archives}
It is recommended to use official pre-compiled `tgz` archives for all Linux distributions, where installation of `deb` or `rpm` packages is not possible.
The required version can be downloaded with `curl` or `wget` from repository https://repo.clickhouse.tech/tgz/.
The required version can be downloaded with `curl` or `wget` from repository https://repo.clickhouse.com/tgz/.
After that downloaded archives should be unpacked and installed with installation scripts. Example for the latest version:
``` bash
export LATEST_VERSION=`curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1`
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-client-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-client-$LATEST_VERSION.tgz
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh

View File

@ -386,7 +386,7 @@ The CSV format supports the output of totals and extremes the same way as `TabSe
## CSVWithNames {#csvwithnames}
Also prints the header row, similar to `TabSeparatedWithNames`.
Also prints the header row, similar to [TabSeparatedWithNames](#tabseparatedwithnames).
## CustomSeparated {#format-customseparated}

View File

@ -56,6 +56,7 @@ toc_title: Adopters
| <a href="https://geniee.co.jp" class="favicon">Geniee</a> | Ad network | Main product | — | — | [Blog post in Japanese, July 2017](https://tech.geniee.co.jp/entry/2017/07/20/160100) |
| <a href="https://www.genotek.ru/" class="favicon">Genotek</a> | Bioinformatics | Main product | — | — | [Video, August 2020](https://youtu.be/v3KyZbz9lEE) |
| <a href="https://glaber.io/" class="favicon">Glaber</a> | Monitoring | Main product | — | — | [Website](https://glaber.io/) |
| <a href="https://graphcdn.io/" class="favicon">GraphCDN</a> | CDN | Traffic Analytics | — | — | [Blog Post in English, August 2021](https://altinity.com/blog/delivering-insight-on-graphql-apis-with-clickhouse-at-graphcdn/) |
| <a href="https://www.huya.com/" class="favicon">HUYA</a> | Video Streaming | Analytics | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/7.%20ClickHouse万亿数据分析实践%20李本旺(sundy-li)%20虎牙.pdf) |
| <a href="https://www.the-ica.com/" class="favicon">ICA</a> | FinTech | Risk Management | — | — | [Blog Post in English, Sep 2020](https://altinity.com/blog/clickhouse-vs-redshift-performance-for-fintech-risk-management?utm_campaign=ClickHouse%20vs%20RedShift&utm_content=143520807&utm_medium=social&utm_source=twitter&hss_channel=tw-3894792263) |
| <a href="https://www.idealista.com" class="favicon">Idealista</a> | Real Estate | Analytics | — | — | [Blog Post in English, April 2019](https://clickhouse.com/blog/en/clickhouse-meetup-in-madrid-on-april-2-2019) |

View File

@ -45,7 +45,7 @@ Configuration template:
- `min_part_size` The minimum size of a data part.
- `min_part_size_ratio` The ratio of the data part size to the table size.
- `method` Compression method. Acceptable values: `lz4`, `lz4hc`, `zstd`.
- `level` Compression level. See [Codecs](../../sql-reference/statements/create/table/#create-query-general-purpose-codecs).
- `level` Compression level. See [Codecs](../../sql-reference/statements/create/table.md#create-query-general-purpose-codecs).
You can configure multiple `<case>` sections.

View File

@ -93,6 +93,17 @@ Works with tables in the MergeTree family.
If `force_primary_key=1`, ClickHouse checks to see if the query has a primary key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition reduces the amount of data to read. For more information about data ranges in MergeTree tables, see [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md).
## use_skip_indexes {#settings-use_skip_indexes}
Use data skipping indexes during query execution.
Possible values:
- 0 — Disabled.
- 1 — Enabled.
Default value: 1.
## force_data_skipping_indices {#settings-force_data_skipping_indices}
Disables query execution if passed data skipping indices wasn't used.
@ -3630,7 +3641,7 @@ Default value: `enable`.
## max_hyperscan_regexp_length {#max-hyperscan-regexp-length}
Defines the maximum length for each regular expression in the [hyperscan multi-match functions](../../sql-reference/functions/string-search-functions.md#multimatchanyhaystack-pattern1-pattern2-patternn).
Defines the maximum length for each regular expression in the [hyperscan multi-match functions](../../sql-reference/functions/string-search-functions.md#multimatchanyhaystack-pattern1-pattern2-patternn).
Possible values:
@ -3673,7 +3684,7 @@ Exception: Regexp length too large.
## max_hyperscan_regexp_total_length {#max-hyperscan-regexp-total-length}
Sets the maximum length total of all regular expressions in each [hyperscan multi-match function](../../sql-reference/functions/string-search-functions.md#multimatchanyhaystack-pattern1-pattern2-patternn).
Sets the maximum length total of all regular expressions in each [hyperscan multi-match function](../../sql-reference/functions/string-search-functions.md#multimatchanyhaystack-pattern1-pattern2-patternn).
Possible values:

View File

@ -7,7 +7,8 @@ toc_title: DateTime64
Allows to store an instant in time, that can be expressed as a calendar date and a time of a day, with defined sub-second precision
Tick size (precision): 10<sup>-precision</sup> seconds
Tick size (precision): 10<sup>-precision</sup> seconds. Valid range: [ 0 : 9 ].
Typically are used - 3 (milliseconds), 6 (microseconds), 9 (nanoseconds).
**Syntax:**

View File

@ -594,14 +594,14 @@ Result:
└─────┘
```
## h3ResIsClassIII {#h3resisclassIII}
## h3IsResClassIII {#h3isresclassIII}
Returns whether [H3](#h3index) index has a resolution with Class III orientation.
**Syntax**
``` sql
h3ResIsClassIII(index)
h3IsResClassIII(index)
```
**Parameter**
@ -620,7 +620,7 @@ Type: [UInt8](../../../sql-reference/data-types/int-uint.md).
Query:
``` sql
SELECT h3ResIsClassIII(617420388352917503) as res;
SELECT h3IsResClassIII(617420388352917503) as res;
```
Result:

View File

@ -0,0 +1,386 @@
---
toc_title: S2 Geometry
---
# Functions for Working with S2 Index {#s2Index}
[S2](https://s2geometry.io/) is a geographical indexing system where all geographical data is represented on a three-dimensional sphere (similar to a globe).
In the S2 library points are represented as unit length vectors called S2 point indices (points on the surface of a three dimensional unit sphere) as opposed to traditional (latitude, longitude) pairs.
## geoToS2 {#geoToS2}
Returns [S2](#s2index) point index corresponding to the provided coordinates `(longitude, latitude)`.
**Syntax**
``` sql
geoToS2(lon, lat)
```
**Arguments**
- `lon` — Longitude. [Float64](../../../sql-reference/data-types/float.md).
- `lat` — Latitude. [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- S2 point index.
Type: [UInt64](../../../sql-reference/data-types/int-uint.md).
**Example**
Query:
``` sql
SELECT geoToS2(37.79506683, 55.71290588) as s2Index;
```
Result:
``` text
┌─────────────s2Index─┐
│ 4704772434919038107 │
└─────────────────────┘
```
## s2ToGeo {#s2ToGeo}
Returns geo coordinates `(longitude, latitude)` corresponding to the provided [S2](#s2index) point index.
**Syntax**
``` sql
s2ToGeo(s2index)
```
**Arguments**
- `s2Index` — S2 Index. [UInt64](../../../sql-reference/data-types/int-uint.md).
**Returned values**
- A tuple consisting of two values: `tuple(lon,lat)`.
Type: `lon` - [Float64](../../../sql-reference/data-types/float.md). `lat` — [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
SELECT s2ToGeo(4704772434919038107) as s2Coodrinates;
```
Result:
``` text
┌─s2Coodrinates────────────────────────┐
│ (37.79506681471008,55.7129059052841) │
└──────────────────────────────────────┘
```
## s2GetNeighbors {#s2GetNeighbors}
Returns S2 neighbor indices corresponding to the provided [S2](#s2index)). Each cell in the S2 system is a quadrilateral bounded by four geodesics. So, each cell has 4 neighbors.
**Syntax**
``` sql
s2GetNeighbors(s2index)
```
**Arguments**
- `s2index` — S2 Index. [UInt64](../../../sql-reference/data-types/int-uint.md).
**Returned values**
- An array consisting of the 4 neighbor indices: `array[s2index1, s2index3, s2index2, s2index4]`.
Type: Each S2 index is [UInt64](../../../sql-reference/data-types/int-uint.md).
**Example**
Query:
``` sql
select s2GetNeighbors(5074766849661468672) AS s2Neighbors;
```
Result:
``` text
┌─s2Neighbors───────────────────────────────────────────────────────────────────────┐
│ [5074766987100422144,5074766712222515200,5074767536856236032,5074767261978329088] │
└───────────────────────────────────────────────────────────────────────────────────┘
```
## s2CellsIntersect {#s2CellsIntersect}
Determines if the two provided [S2](#s2index)) cell indices intersect or not.
**Syntax**
``` sql
s2CellsIntersect(s2index1, s2index2)
```
**Arguments**
- `siIndex1`, `s2index2` — S2 Index. [UInt64](../../../sql-reference/data-types/int-uint.md).
**Returned values**
- 1 — If the S2 cell indices intersect.
- 0 — If the S2 cell indices don't intersect.
Type: [UInt8](../../../sql-reference/data-types/int-uint.md).
**Example**
Query:
``` sql
select s2CellsIntersect(9926595209846587392, 9926594385212866560) as intersect;
```
Result:
``` text
┌─intersect─┐
│ 1 │
└───────────┘
```
## s2CapContains {#s2CapContains}
A cap represents a portion of the sphere that has been cut off by a plane. It is defined by a point on a sphere and a radius in degrees.
Determines if a cap contains a s2 point index.
**Syntax**
``` sql
s2CapContains(center, degrees, point)
```
**Arguments**
- `center` - S2 point index corresponding to the cap. [UInt64](../../../sql-reference/data-types/int-uint.md).
- `degrees` - Radius of the cap in degrees. [Float64](../../../sql-reference/data-types/float.md).
- `point` - S2 point index. [UInt64](../../../sql-reference/data-types/int-uint.md).
**Returned values**
- 1 — If the cap contains the S2 point index.
- 0 — If the cap doesn't contain the S2 point index.
Type: [UInt8](../../../sql-reference/data-types/int-uint.md).
**Example**
Query:
``` sql
select s2CapContains(1157339245694594829, 1.0, 1157347770437378819) as capContains;
```
Result:
``` text
┌─capContains─┐
│ 1 │
└─────────────┘
```
## s2CapUnion {#s2CapUnion}
A cap represents a portion of the sphere that has been cut off by a plane. It is defined by a point on a sphere and a radius in degrees.
Determines the smallest cap that contains the given two input caps.
**Syntax**
``` sql
s2CapUnion(center1, radius1, center2, radius2)
```
**Arguments**
- `center1`, `center2` - S2 point indices corresponding to the two input caps. [UInt64](../../../sql-reference/data-types/int-uint.md).
- `radius1`, `radius2` - Radii of the two input caps in degrees. [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- `center` - S2 point index corresponding the center of the smallest cap containing the two input caps. Type: [UInt64](../../../sql-reference/data-types/int-uint.md).
- `radius` - Radius of the smallest cap containing the two input caps. Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
SELECT s2CapUnion(3814912406305146967, 1.0, 1157347770437378819, 1.0) AS capUnion;
```
Result:
``` text
┌─capUnion───────────────────────────────┐
│ (4534655147792050737,60.2088283994957) │
└────────────────────────────────────────┘
```
## s2RectAdd{#s2RectAdd}
In the S2 system, a rectangle is represented by a type of S2Region called a S2LatLngRect that represents a rectangle in latitude-longitude space.
Increases the size of the bounding rectangle to include the given S2 point index.
**Syntax**
``` sql
s2RectAdd(s2pointLow, s2pointHigh, s2Point)
```
**Arguments**
- `s2PointLow` - Low S2 point index corresponding to the rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2PointHigh` - High S2 point index corresponding to the rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2Point` - Target S2 point index that the bound rectangle should be grown to include. [UInt64](../../../sql-reference/data-types/int-uint.md).
**Returned values**
- `s2PointLow` - Low S2 cell id corresponding to the grown rectangle. Type: [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2PointHigh` - Hight S2 cell id corresponding to the grown rectangle. Type: [UInt64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
SELECT s2RectAdd(5178914411069187297, 5177056748191934217, 5179056748191934217) as rectAdd;
```
Result:
``` text
┌─rectAdd───────────────────────────────────┐
│ (5179062030687166815,5177056748191934217) │
└───────────────────────────────────────────┘
```
## s2RectContains{#s2RectContains}
In the S2 system, a rectangle is represented by a type of S2Region called a S2LatLngRect that represents a rectangle in latitude-longitude space.
Determines if a given rectangle contains a S2 point index.
**Syntax**
``` sql
s2RectContains(s2PointLow, s2PointHi, s2Point)
```
**Arguments**
- `s2PointLow` - Low S2 point index corresponding to the rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2PointHigh` - High S2 point index corresponding to the rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2Point` - Target S2 point index. [UInt64](../../../sql-reference/data-types/int-uint.md).
**Returned values**
- 1 — If the rectangle contains the given S2 point.
- 0 — If the rectangle doesn't contain the given S2 point.
**Example**
Query:
``` sql
SELECT s2RectContains(5179062030687166815, 5177056748191934217, 5177914411069187297) AS rectContains
```
Result:
``` text
┌─rectContains─┐
│ 0 │
└──────────────┘
```
## s2RectUinion{#s2RectUnion}
In the S2 system, a rectangle is represented by a type of S2Region called a S2LatLngRect that represents a rectangle in latitude-longitude space.
Returns the smallest rectangle containing the union of this rectangle and the given rectangle.
**Syntax**
``` sql
s2RectUnion(s2Rect1PointLow, s2Rect1PointHi, s2Rect2PointLow, s2Rect2PointHi)
```
**Arguments**
- `s2Rect1PointLow`, `s2Rect1PointHi` - Low and High S2 point indices corresponding to the first rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2Rect2PointLow`, `s2Rect2PointHi` - Low and High S2 point indices corresponding to the second rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md).
**Returned values**
- `s2UnionRect2PointLow` - Low S2 cell id corresponding to the union rectangle. Type: [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2UnionRect2PointHi` - High S2 cell id corresponding to the union rectangle. Type: [UInt64](../../../sql-reference/data-types/int-uint.md).
**Example**
Query:
``` sql
SELECT s2RectUnion(5178914411069187297, 5177056748191934217, 5179062030687166815, 5177056748191934217) AS rectUnion
```
Result:
``` text
┌─rectUnion─────────────────────────────────┐
│ (5179062030687166815,5177056748191934217) │
└───────────────────────────────────────────┘
```
## s2RectIntersection{#s2RectIntersection}
Returns the smallest Rectangle containing the intersection of this rectangle and the given rectangle.
**Syntax**
``` sql
s2RectIntersection(s2Rect1PointLow, s2Rect1PointHi, s2Rect2PointLow, s2Rect2PointHi)
```
**Arguments**
- `s2Rect1PointLow`, `s2Rect1PointHi` - Low and High S2 point indices corresponding to the first rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2Rect2PointLow`, `s2Rect2PointHi` - Low and High S2 point indices corresponding to the second rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md).
**Returned values**
- `s2UnionRect2PointLow` - Low S2 cell id corresponding to the rectangle containing the intersection of the given rectangles. Type: [UInt64](../../../sql-reference/data-types/int-uint.md).
- `s2UnionRect2PointHi` - Hi S2 cell id corresponding to the rectangle containing the intersection of the given rectangles. Type: [UInt64](../../../sql-reference/data-types/int-uint.md).
**Example**
Query:
``` sql
SELECT s2RectIntersection(5178914411069187297, 5177056748191934217, 5179062030687166815, 5177056748191934217) AS rectIntersection
```
Result:
``` text
┌─rectIntersection──────────────────────────┐
│ (5178914411069187297,5177056748191934217) │
└───────────────────────────────────────────┘
```

View File

@ -59,9 +59,68 @@ A lambda function that accepts multiple arguments can also be passed to a higher
For some functions the first argument (the lambda function) can be omitted. In this case, identical mapping is assumed.
## User Defined Functions {#user-defined-functions}
## SQL User Defined Functions {#user-defined-functions}
Custom functions from lambda expressions can be created using the [CREATE FUNCTION](../statements/create/function.md) statement. To delete these functions use the [DROP FUNCTION](../statements/drop.md#drop-function) statement.
## Executable User Defined Functions {#executable-user-defined-functions}
ClickHouse can call any external executable program or script to process data. Describe such functions in a [configuration file](../../operations/configuration-files.md) and add the path of that file to the main configuration in `user_defined_executable_functions_config` setting. If a wildcard symbol `*` is used in the path, then all files matching the pattern are loaded. Example:
``` xml
<user_defined_executable_functions_config>*_function.xml</user_defined_executable_functions_config>
```
User defined function configurations are searched relative to the path specified in the `user_files_path` setting.
A function configuration contains the following settings:
- `name` - a function name.
- `command` - a command or a script to execute.
- `argument` - argument description with the `type` of an argument. Each argument is described in a separate setting.
- `format` - a [format](../../interfaces/formats.md) in which arguments are passed to the command.
- `return_type` - the type of a returned value.
- `type` - an executable type. If `type` is set to `executable` then single command is started. If it is set to `executable_pool` then a pool of commands is created.
- `max_command_execution_time` - maximum execution time in seconds for processing block of data. This setting is valid for `executable_pool` commands only. Optional. Default value is `10`.
- `command_termination_timeout` - time in seconds during which a command should finish after its pipe is closed. After that time `SIGTERM` is sent to the process executing the command. This setting is valid for `executable_pool` commands only. Optional. Default value is `10`.
- `pool_size` - the size of a command pool. Optional. Default value is `16`.
- `lifetime` - the reload interval of a function in seconds. If it is set to `0` then the function is not reloaded.
- `send_chunk_header` - controls whether to send row count before sending a chunk of data to process. Optional. Default value is `false`.
The command must read arguments from `STDIN` and must output the result to `STDOUT`. The command must process arguments iteratively. That is after processing a chunk of arguments it must wait for the next chunk.
**Example**
Creating `test_function` using XML configuration:
```
<functions>
<function>
<type>executable</type>
<name>test_function</name>
<return_type>UInt64</return_type>
<argument>
<type>UInt64</type>
</argument>
<argument>
<type>UInt64</type>
</argument>
<format>TabSeparated</format>
<command>cd /; clickhouse-local --input-format TabSeparated --output-format TabSeparated --structure 'x UInt64, y UInt64' --query "SELECT x + y FROM table"</command>
<lifetime>0</lifetime>
</function>
</functions>
```
Query:
``` sql
SELECT test_function(toUInt64(2), toUInt64(2));
```
Result:
``` text
┌─test_function(toUInt64(2), toUInt64(2))─┐
│ 4 │
└─────────────────────────────────────────┘
```
Custom functions can be created using the [CREATE FUNCTION](../statements/create/function.md) statement. To delete these functions use the [DROP FUNCTION](../statements/drop.md#drop-function) statement.
## Error Handling {#error-handling}

View File

@ -155,6 +155,8 @@ Hierarchy of privileges:
- `SYSTEM RELOAD CONFIG`
- `SYSTEM RELOAD DICTIONARY`
- `SYSTEM RELOAD EMBEDDED DICTIONARIES`
- `SYSTEM RELOAD FUNCTION`
- `SYSTEM RELOAD FUNCTIONS`
- `SYSTEM MERGES`
- `SYSTEM TTL MERGES`
- `SYSTEM FETCHES`

View File

@ -12,6 +12,8 @@ The list of available `SYSTEM` statements:
- [RELOAD DICTIONARY](#query_language-system-reload-dictionary)
- [RELOAD MODELS](#query_language-system-reload-models)
- [RELOAD MODEL](#query_language-system-reload-model)
- [RELOAD FUNCTIONS](#query_language-system-reload-functions)
- [RELOAD FUNCTION](#query_language-system-reload-functions)
- [DROP DNS CACHE](#query_language-system-drop-dns-cache)
- [DROP MARK CACHE](#query_language-system-drop-mark-cache)
- [DROP UNCOMPRESSED CACHE](#query_language-system-drop-uncompressed-cache)
@ -83,6 +85,17 @@ Completely reloads a CatBoost model `model_name` if the configuration was update
SYSTEM RELOAD MODEL <model_name>
```
## RELOAD FUNCTIONS {#query_language-system-reload-functions}
Reloads all registered [executable user defined functions](../functions/index.md#executable-user-defined-functions) or one of them from a configuration file.
**Syntax**
```sql
RELOAD FUNCTIONS
RELOAD FUNCTION function_name
```
## DROP DNS CACHE {#query_language-system-drop-dns-cache}
Resets ClickHouses internal DNS cache. Sometimes (for old ClickHouse versions) it is necessary to use this command when changing the infrastructure (changing the IP address of another ClickHouse server or the server used by dictionaries).

View File

@ -30,7 +30,7 @@ Debian や Ubuntu 用にコンパイル済みの公式パッケージ `deb` を
最新版を使いたい場合は、`stable`を`testing`に置き換えてください。(テスト環境ではこれを推奨します)
同様に、[こちら](https://repo.clickhouse.tech/deb/stable/main/)からパッケージをダウンロードして、手動でインストールすることもできます。
同様に、[こちら](https://repo.clickhouse.com/deb/stable/main/)からパッケージをダウンロードして、手動でインストールすることもできます。
#### パッケージ {#packages}
@ -47,8 +47,8 @@ CentOS、RedHat、その他すべてのrpmベースのLinuxディストリビュ
``` bash
sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/stable/x86_64
```
最新版を使いたい場合は `stable``testing` に置き換えてください。(テスト環境ではこれが推奨されています)。`prestable` もしばしば同様に利用できます。
@ -59,20 +59,20 @@ sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_6
sudo yum install clickhouse-server clickhouse-client
```
同様に、[こちら](https://repo.clickhouse.tech/rpm/stable/x86_64) からパッケージをダウンロードして、手動でインストールすることもできます。
同様に、[こちら](https://repo.clickhouse.com/rpm/stable/x86_64) からパッケージをダウンロードして、手動でインストールすることもできます。
### Tgzアーカイブから {#from-tgz-archives}
すべての Linux ディストリビューションで、`deb` や `rpm` パッケージがインストールできない場合は、公式のコンパイル済み `tgz` アーカイブを使用することをお勧めします。
必要なバージョンは、リポジトリ https://repo.clickhouse.tech/tgz/ から `curl` または `wget` でダウンロードできます。その後、ダウンロードしたアーカイブを解凍し、インストールスクリプトでインストールしてください。最新版の例は以下です:
必要なバージョンは、リポジトリ https://repo.clickhouse.com/tgz/ から `curl` または `wget` でダウンロードできます。その後、ダウンロードしたアーカイブを解凍し、インストールスクリプトでインストールしてください。最新版の例は以下です:
``` bash
export LATEST_VERSION=`curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1`
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-client-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-client-$LATEST_VERSION.tgz
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh

View File

@ -9,12 +9,16 @@ toc_title: "Введение"
Этот раздел описывает как получить тестовые массивы данных и загрузить их в ClickHouse.
Для некоторых тестовых массивов данных также доступны тестовые запросы.
- [Анонимизированные данные Яндекс.Метрики](metrica.md)
- [Star Schema Benchmark](star-schema.md)
- [WikiStat](wikistat.md)
- [Терабайт логов кликов от Criteo](criteo.md)
- [AMPLab Big Data Benchmark](amplab-benchmark.md)
- [Данные о такси в Нью-Йорке](nyc-taxi.md)
- [OnTime](ontime.md)
- [Анонимизированные данные Яндекс.Метрики](../../getting-started/example-datasets/metrica.md)
- [Star Schema Benchmark](../../getting-started/example-datasets/star-schema.md)
- [Набор данных кулинарных рецептов](../../getting-started/example-datasets/recipes.md)
- [WikiStat](../../getting-started/example-datasets/wikistat.md)
- [Терабайт логов кликов от Criteo](../../getting-started/example-datasets/criteo.md)
- [AMPLab Big Data Benchmark](../../getting-started/example-datasets/amplab-benchmark.md)
- [Данные о такси в Нью-Йорке](../../getting-started/example-datasets/nyc-taxi.md)
- [Набор данных о воздушном движении OpenSky Network 2020](../../getting-started/example-datasets/opensky.md)
- [Данные о стоимости недвижимости в Великобритании](../../getting-started/example-datasets/uk-price-paid.md)
- [OnTime](../../getting-started/example-datasets/ontime.md)
- [Вышки сотовой связи](../../getting-started/example-datasets/cell-towers.md)
[Оригинальная статья](https://clickhouse.tech/docs/ru/getting_started/example_datasets) <!--hide-->

View File

@ -1 +0,0 @@
../../../en/getting-started/example-datasets/menus.md

View File

@ -0,0 +1,360 @@
---
toc_priority: 21
toc_title: Меню
---
# Набор данных публичной библиотеки Нью-Йорка "Что в меню?" {#menus-dataset}
Набор данных создан Нью-Йоркской публичной библиотекой. Он содержит исторические данные о меню отелей, ресторанов и кафе с блюдами, а также их ценами.
Источник: http://menus.nypl.org/data
Эти данные находятся в открытом доступе.
Данные взяты из архива библиотеки, и они могут быть неполными и сложными для статистического анализа. Тем не менее, это тоже очень интересно.
В наборе всего 1,3 миллиона записей о блюдах в меню — очень небольшой объем данных для ClickHouse, но это все равно хороший пример.
## Загрузите набор данных {#download-dataset}
Выполните команду:
```bash
wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.tgz
```
При необходимости замените ссылку на актуальную ссылку с http://menus.nypl.org/data.
Размер архива составляет около 35 МБ.
## Распакуйте набор данных {#unpack-dataset}
```bash
tar xvf 2021_08_01_07_01_17_data.tgz
```
Размер распакованных данных составляет около 150 МБ.
Данные нормализованы и состоят из четырех таблиц:
- `Menu` — информация о меню: название ресторана, дата, когда было просмотрено меню, и т.д.
- `Dish` — информация о блюдах: название блюда вместе с некоторыми характеристиками.
- `MenuPage` — информация о страницах в меню, потому что каждая страница принадлежит какому-либо меню.
- `MenuItem` — один из пунктов меню. Блюдо вместе с его ценой на какой-либо странице меню: ссылки на блюдо и страницу меню.
## Создайте таблицы {#create-tables}
Для хранения цен используется тип данных [Decimal](../../sql-reference/data-types/decimal.md).
```sql
CREATE TABLE dish
(
id UInt32,
name String,
description String,
menus_appeared UInt32,
times_appeared Int32,
first_appeared UInt16,
last_appeared UInt16,
lowest_price Decimal64(3),
highest_price Decimal64(3)
) ENGINE = MergeTree ORDER BY id;
CREATE TABLE menu
(
id UInt32,
name String,
sponsor String,
event String,
venue String,
place String,
physical_description String,
occasion String,
notes String,
call_number String,
keywords String,
language String,
date String,
location String,
location_type String,
currency String,
currency_symbol String,
status String,
page_count UInt16,
dish_count UInt16
) ENGINE = MergeTree ORDER BY id;
CREATE TABLE menu_page
(
id UInt32,
menu_id UInt32,
page_number UInt16,
image_id String,
full_height UInt16,
full_width UInt16,
uuid UUID
) ENGINE = MergeTree ORDER BY id;
CREATE TABLE menu_item
(
id UInt32,
menu_page_id UInt32,
price Decimal64(3),
high_price Decimal64(3),
dish_id UInt32,
created_at DateTime,
updated_at DateTime,
xpos Float64,
ypos Float64
) ENGINE = MergeTree ORDER BY id;
```
## Импортируйте данные {#import-data}
Импортируйте данные в ClickHouse, выполните команды:
```bash
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO dish FORMAT CSVWithNames" < Dish.csv
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu FORMAT CSVWithNames" < Menu.csv
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu_page FORMAT CSVWithNames" < MenuPage.csv
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --date_time_input_format best_effort --query "INSERT INTO menu_item FORMAT CSVWithNames" < MenuItem.csv
```
Поскольку данные представлены в формате CSV с заголовком, используется формат [CSVWithNames](../../interfaces/formats.md#csvwithnames).
Отключите `format_csv_allow_single_quotes`, так как для данных используются только двойные кавычки, а одинарные кавычки могут находиться внутри значений и не должны сбивать с толку CSV-парсер.
Отключите [input_format_null_as_default](../../operations/settings/settings.md#settings-input-format-null-as-default), поскольку в данных нет значений [NULL](../../sql-reference/syntax.md#null-literal).
В противном случае ClickHouse попытается проанализировать последовательности `\N` и может перепутать с `\` в данных.
Настройка [date_time_input_format best_effort](../../operations/settings/settings.md#settings-date_time_input_format) позволяет анализировать поля [DateTime](../../sql-reference/data-types/datetime.md) в самых разных форматах. К примеру, будет распознан ISO-8601 без секунд: '2000-01-01 01:02'. Без этой настройки допускается только фиксированный формат даты и времени.
## Денормализуйте данные {#denormalize-data}
Данные представлены в нескольких таблицах в [нормализованном виде](https://ru.wikipedia.org/wiki/%D0%9D%D0%BE%D1%80%D0%BC%D0%B0%D0%BB%D1%8C%D0%BD%D0%B0%D1%8F_%D1%84%D0%BE%D1%80%D0%BC%D0%B0).
Это означает, что вам нужно использовать условие объединения [JOIN](../../sql-reference/statements/select/join.md#select-join), если вы хотите получить, например, названия блюд из пунктов меню.
Для типовых аналитических задач гораздо эффективнее работать с предварительно объединенными данными, чтобы не использовать `JOIN` каждый раз. Такие данные называются денормализованными.
Создайте таблицу `menu_item_denorm`, которая будет содержать все данные, объединенные вместе:
```sql
CREATE TABLE menu_item_denorm
ENGINE = MergeTree ORDER BY (dish_name, created_at)
AS SELECT
price,
high_price,
created_at,
updated_at,
xpos,
ypos,
dish.id AS dish_id,
dish.name AS dish_name,
dish.description AS dish_description,
dish.menus_appeared AS dish_menus_appeared,
dish.times_appeared AS dish_times_appeared,
dish.first_appeared AS dish_first_appeared,
dish.last_appeared AS dish_last_appeared,
dish.lowest_price AS dish_lowest_price,
dish.highest_price AS dish_highest_price,
menu.id AS menu_id,
menu.name AS menu_name,
menu.sponsor AS menu_sponsor,
menu.event AS menu_event,
menu.venue AS menu_venue,
menu.place AS menu_place,
menu.physical_description AS menu_physical_description,
menu.occasion AS menu_occasion,
menu.notes AS menu_notes,
menu.call_number AS menu_call_number,
menu.keywords AS menu_keywords,
menu.language AS menu_language,
menu.date AS menu_date,
menu.location AS menu_location,
menu.location_type AS menu_location_type,
menu.currency AS menu_currency,
menu.currency_symbol AS menu_currency_symbol,
menu.status AS menu_status,
menu.page_count AS menu_page_count,
menu.dish_count AS menu_dish_count
FROM menu_item
JOIN dish ON menu_item.dish_id = dish.id
JOIN menu_page ON menu_item.menu_page_id = menu_page.id
JOIN menu ON menu_page.menu_id = menu.id;
```
## Проверьте загруженные данные {#validate-data}
Запрос:
```sql
SELECT count() FROM menu_item_denorm;
```
Результат:
```text
┌─count()─┐
│ 1329175 │
└─────────┘
```
## Примеры запросов {#run-queries}
### Усредненные исторические цены на блюда {#query-averaged-historical-prices}
Запрос:
```sql
SELECT
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
count(),
round(avg(price), 2),
bar(avg(price), 0, 100, 100)
FROM menu_item_denorm
WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022)
GROUP BY d
ORDER BY d ASC;
```
Результат:
```text
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 100, 100)─┐
│ 1850 │ 618 │ 1.5 │ █▍ │
│ 1860 │ 1634 │ 1.29 │ █▎ │
│ 1870 │ 2215 │ 1.36 │ █▎ │
│ 1880 │ 3909 │ 1.01 │ █ │
│ 1890 │ 8837 │ 1.4 │ █▍ │
│ 1900 │ 176292 │ 0.68 │ ▋ │
│ 1910 │ 212196 │ 0.88 │ ▊ │
│ 1920 │ 179590 │ 0.74 │ ▋ │
│ 1930 │ 73707 │ 0.6 │ ▌ │
│ 1940 │ 58795 │ 0.57 │ ▌ │
│ 1950 │ 41407 │ 0.95 │ ▊ │
│ 1960 │ 51179 │ 1.32 │ █▎ │
│ 1970 │ 12914 │ 1.86 │ █▋ │
│ 1980 │ 7268 │ 4.35 │ ████▎ │
│ 1990 │ 11055 │ 6.03 │ ██████ │
│ 2000 │ 2467 │ 11.85 │ ███████████▋ │
│ 2010 │ 597 │ 25.66 │ █████████████████████████▋ │
└──────┴─────────┴──────────────────────┴──────────────────────────────┘
```
Просто не принимайте это всерьез.
### Цены на бургеры {#query-burger-prices}
Запрос:
```sql
SELECT
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
count(),
round(avg(price), 2),
bar(avg(price), 0, 50, 100)
FROM menu_item_denorm
WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%burger%')
GROUP BY d
ORDER BY d ASC;
```
Результат:
```text
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)───────────┐
│ 1880 │ 2 │ 0.42 │ ▋ │
│ 1890 │ 7 │ 0.85 │ █▋ │
│ 1900 │ 399 │ 0.49 │ ▊ │
│ 1910 │ 589 │ 0.68 │ █▎ │
│ 1920 │ 280 │ 0.56 │ █ │
│ 1930 │ 74 │ 0.42 │ ▋ │
│ 1940 │ 119 │ 0.59 │ █▏ │
│ 1950 │ 134 │ 1.09 │ ██▏ │
│ 1960 │ 272 │ 0.92 │ █▋ │
│ 1970 │ 108 │ 1.18 │ ██▎ │
│ 1980 │ 88 │ 2.82 │ █████▋ │
│ 1990 │ 184 │ 3.68 │ ███████▎ │
│ 2000 │ 21 │ 7.14 │ ██████████████▎ │
│ 2010 │ 6 │ 18.42 │ ████████████████████████████████████▋ │
└──────┴─────────┴──────────────────────┴───────────────────────────────────────┘
```
### Водка {#query-vodka}
Запрос:
```sql
SELECT
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
count(),
round(avg(price), 2),
bar(avg(price), 0, 50, 100)
FROM menu_item_denorm
WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%vodka%')
GROUP BY d
ORDER BY d ASC;
```
Результат:
```text
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)─┐
│ 1910 │ 2 │ 0 │ │
│ 1920 │ 1 │ 0.3 │ ▌ │
│ 1940 │ 21 │ 0.42 │ ▋ │
│ 1950 │ 14 │ 0.59 │ █▏ │
│ 1960 │ 113 │ 2.17 │ ████▎ │
│ 1970 │ 37 │ 0.68 │ █▎ │
│ 1980 │ 19 │ 2.55 │ █████ │
│ 1990 │ 86 │ 3.6 │ ███████▏ │
│ 2000 │ 2 │ 3.98 │ ███████▊ │
└──────┴─────────┴──────────────────────┴─────────────────────────────┘
```
Чтобы получить водку, мы должны написать `ILIKE '%vodka%'`, и это хорошая идея.
### Икра {#query-caviar}
Посмотрите цены на икру. Получите название любого блюда с икрой.
Запрос:
```sql
SELECT
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
count(),
round(avg(price), 2),
bar(avg(price), 0, 50, 100),
any(dish_name)
FROM menu_item_denorm
WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%caviar%')
GROUP BY d
ORDER BY d ASC;
```
Результат:
```text
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)──────┬─any(dish_name)──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 1090 │ 1 │ 0 │ │ Caviar │
│ 1880 │ 3 │ 0 │ │ Caviar │
│ 1890 │ 39 │ 0.59 │ █▏ │ Butter and caviar │
│ 1900 │ 1014 │ 0.34 │ ▋ │ Anchovy Caviar on Toast │
│ 1910 │ 1588 │ 1.35 │ ██▋ │ 1/1 Brötchen Caviar │
│ 1920 │ 927 │ 1.37 │ ██▋ │ ASTRAKAN CAVIAR │
│ 1930 │ 289 │ 1.91 │ ███▋ │ Astrachan caviar │
│ 1940 │ 201 │ 0.83 │ █▋ │ (SPECIAL) Domestic Caviar Sandwich │
│ 1950 │ 81 │ 2.27 │ ████▌ │ Beluga Caviar │
│ 1960 │ 126 │ 2.21 │ ████▍ │ Beluga Caviar │
│ 1970 │ 105 │ 0.95 │ █▊ │ BELUGA MALOSSOL CAVIAR AMERICAN DRESSING │
│ 1980 │ 12 │ 7.22 │ ██████████████▍ │ Authentic Iranian Beluga Caviar the world's finest black caviar presented in ice garni and a sampling of chilled 100° Russian vodka │
│ 1990 │ 74 │ 14.42 │ ████████████████████████████▋ │ Avocado Salad, Fresh cut avocado with caviare │
│ 2000 │ 3 │ 7.82 │ ███████████████▋ │ Aufgeschlagenes Kartoffelsueppchen mit Forellencaviar │
│ 2010 │ 6 │ 15.58 │ ███████████████████████████████▏ │ "OYSTERS AND PEARLS" "Sabayon" of Pearl Tapioca with Island Creek Oysters and Russian Sevruga Caviar │
└──────┴─────────┴──────────────────────┴──────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
По крайней мере, есть икра с водкой. Очень мило.
## Online Playground {#playground}
Этот набор данных доступен в интерактивном ресурсе [Online Playground](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICByb3VuZCh0b1VJbnQzMk9yWmVybyhleHRyYWN0KG1lbnVfZGF0ZSwgJ15cXGR7NH0nKSksIC0xKSBBUyBkLAogICAgY291bnQoKSwKICAgIHJvdW5kKGF2ZyhwcmljZSksIDIpLAogICAgYmFyKGF2ZyhwcmljZSksIDAsIDUwLCAxMDApLAogICAgYW55KGRpc2hfbmFtZSkKRlJPTSBtZW51X2l0ZW1fZGVub3JtCldIRVJFIChtZW51X2N1cnJlbmN5IElOICgnRG9sbGFycycsICcnKSkgQU5EIChkID4gMCkgQU5EIChkIDwgMjAyMikgQU5EIChkaXNoX25hbWUgSUxJS0UgJyVjYXZpYXIlJykKR1JPVVAgQlkgZApPUkRFUiBCWSBkIEFTQw==).

View File

@ -1 +0,0 @@
../../../en/getting-started/example-datasets/opensky.md

View File

@ -0,0 +1,422 @@
---
toc_priority: 20
toc_title: Набор данных о воздушном движении OpenSky Network 2020
---
# Набор данных о воздушном движении OpenSky Network 2020 {#opensky}
"Данные в этом наборе получены и отфильтрованы из полного набора данных OpenSky, чтобы проиллюстрировать развитие воздушного движения во время пандемии COVID-19. Набор включает в себя все рейсы, которые видели более 2500 участников сети с 1 января 2019 года. Дополнительные данные будут периодически включаться в набор данных до окончания пандемии COVID-19".
Источник: https://zenodo.org/record/5092942#.YRBCyTpRXYd
Martin Strohmeier, Xavier Olive, Jannis Lübbe, Matthias Schäfer, and Vincent Lenders
"Crowdsourced air traffic data from the OpenSky Network 20192020"
Earth System Science Data 13(2), 2021
https://doi.org/10.5194/essd-13-357-2021
## Загрузите набор данных {#download-dataset}
Выполните команду:
```bash
wget -O- https://zenodo.org/record/5092942 | grep -oP 'https://zenodo.org/record/5092942/files/flightlist_\d+_\d+\.csv\.gz' | xargs wget
```
Загрузка займет около 2 минут при хорошем подключении к интернету. Будет загружено 30 файлов общим размером 4,3 ГБ.
## Создайте таблицу {#create-table}
```sql
CREATE TABLE opensky
(
callsign String,
number String,
icao24 String,
registration String,
typecode String,
origin String,
destination String,
firstseen DateTime,
lastseen DateTime,
day DateTime,
latitude_1 Float64,
longitude_1 Float64,
altitude_1 Float64,
latitude_2 Float64,
longitude_2 Float64,
altitude_2 Float64
) ENGINE = MergeTree ORDER BY (origin, destination, callsign);
```
## Импортируйте данные в ClickHouse {#import-data}
Загрузите данные в ClickHouse параллельными потоками:
```bash
ls -1 flightlist_*.csv.gz | xargs -P100 -I{} bash -c 'gzip -c -d "{}" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"'
```
- Список файлов передаётся (`ls -1 flightlist_*.csv.gz`) в `xargs` для параллельной обработки.
- `xargs -P100` указывает на возможность использования до 100 параллельных обработчиков, но поскольку у нас всего 30 файлов, то количество обработчиков будет всего 30.
- Для каждого файла `xargs` будет запускать скрипт с `bash -c`. Сценарий имеет подстановку в виде `{}`, а команда `xargs` заменяет имя файла на указанные в подстановке символы (мы указали это для `xargs` с помощью `-I{}`).
- Скрипт распакует файл (`gzip -c -d "{}"`) в стандартный вывод (параметр `-c`) и перенаправит его в `clickhouse-client`.
- Чтобы распознать формат ISO-8601 со смещениями часовых поясов в полях типа [DateTime](../../sql-reference/data-types/datetime.md), указывается параметр парсера [--date_time_input_format best_effort](../../operations/settings/settings.md#settings-date_time_input_format).
В итоге: клиент clickhouse добавит данные в таблицу `opensky`. Входные данные импортируются в формате [CSVWithNames](../../interfaces/formats.md#csvwithnames).
Загрузка параллельными потоками займёт около 24 секунд.
Также вы можете использовать вариант последовательной загрузки:
```bash
for file in flightlist_*.csv.gz; do gzip -c -d "$file" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"; done
```
## Проверьте импортированные данные {#validate-data}
Запрос:
```sql
SELECT count() FROM opensky;
```
Результат:
```text
┌──count()─┐
│ 66010819 │
└──────────┘
```
Убедитесь, что размер набора данных в ClickHouse составляет всего 2,66 GiB.
Запрос:
```sql
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'opensky';
```
Результат:
```text
┌─formatReadableSize(total_bytes)─┐
│ 2.66 GiB │
└─────────────────────────────────┘
```
## Примеры {#run-queries}
Общее пройденное расстояние составляет 68 миллиардов километров.
Запрос:
```sql
SELECT formatReadableQuantity(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) / 1000) FROM opensky;
```
Результат:
```text
┌─formatReadableQuantity(divide(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)), 1000))─┐
│ 68.72 billion │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
Средняя дальность полета составляет около 1000 км.
Запрос:
```sql
SELECT avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) FROM opensky;
```
Результат:
```text
┌─avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))─┐
│ 1041090.6465708319 │
└────────────────────────────────────────────────────────────────────┘
```
### Наиболее загруженные аэропорты в указанных координатах и среднее пройденное расстояние {#busy-airports-average-distance}
Запрос:
```sql
SELECT
origin,
count(),
round(avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))) AS distance,
bar(distance, 0, 10000000, 100) AS bar
FROM opensky
WHERE origin != ''
GROUP BY origin
ORDER BY count() DESC
LIMIT 100;
```
Результат:
```text
┌─origin─┬─count()─┬─distance─┬─bar────────────────────────────────────┐
1. │ KORD │ 745007 │ 1546108 │ ███████████████▍ │
2. │ KDFW │ 696702 │ 1358721 │ █████████████▌ │
3. │ KATL │ 667286 │ 1169661 │ ███████████▋ │
4. │ KDEN │ 582709 │ 1287742 │ ████████████▊ │
5. │ KLAX │ 581952 │ 2628393 │ ██████████████████████████▎ │
6. │ KLAS │ 447789 │ 1336967 │ █████████████▎ │
7. │ KPHX │ 428558 │ 1345635 │ █████████████▍ │
8. │ KSEA │ 412592 │ 1757317 │ █████████████████▌ │
9. │ KCLT │ 404612 │ 880355 │ ████████▋ │
10. │ VIDP │ 363074 │ 1445052 │ ██████████████▍ │
11. │ EDDF │ 362643 │ 2263960 │ ██████████████████████▋ │
12. │ KSFO │ 361869 │ 2445732 │ ████████████████████████▍ │
13. │ KJFK │ 349232 │ 2996550 │ █████████████████████████████▊ │
14. │ KMSP │ 346010 │ 1287328 │ ████████████▋ │
15. │ LFPG │ 344748 │ 2206203 │ ██████████████████████ │
16. │ EGLL │ 341370 │ 3216593 │ ████████████████████████████████▏ │
17. │ EHAM │ 340272 │ 2116425 │ █████████████████████▏ │
18. │ KEWR │ 337696 │ 1826545 │ ██████████████████▎ │
19. │ KPHL │ 320762 │ 1291761 │ ████████████▊ │
20. │ OMDB │ 308855 │ 2855706 │ ████████████████████████████▌ │
21. │ UUEE │ 307098 │ 1555122 │ ███████████████▌ │
22. │ KBOS │ 304416 │ 1621675 │ ████████████████▏ │
23. │ LEMD │ 291787 │ 1695097 │ ████████████████▊ │
24. │ YSSY │ 272979 │ 1875298 │ ██████████████████▋ │
25. │ KMIA │ 265121 │ 1923542 │ ███████████████████▏ │
26. │ ZGSZ │ 263497 │ 745086 │ ███████▍ │
27. │ EDDM │ 256691 │ 1361453 │ █████████████▌ │
28. │ WMKK │ 254264 │ 1626688 │ ████████████████▎ │
29. │ CYYZ │ 251192 │ 2175026 │ █████████████████████▋ │
30. │ KLGA │ 248699 │ 1106935 │ ███████████ │
31. │ VHHH │ 248473 │ 3457658 │ ██████████████████████████████████▌ │
32. │ RJTT │ 243477 │ 1272744 │ ████████████▋ │
33. │ KBWI │ 241440 │ 1187060 │ ███████████▋ │
34. │ KIAD │ 239558 │ 1683485 │ ████████████████▋ │
35. │ KIAH │ 234202 │ 1538335 │ ███████████████▍ │
36. │ KFLL │ 223447 │ 1464410 │ ██████████████▋ │
37. │ KDAL │ 212055 │ 1082339 │ ██████████▋ │
38. │ KDCA │ 207883 │ 1013359 │ ██████████▏ │
39. │ LIRF │ 207047 │ 1427965 │ ██████████████▎ │
40. │ PANC │ 206007 │ 2525359 │ █████████████████████████▎ │
41. │ LTFJ │ 205415 │ 860470 │ ████████▌ │
42. │ KDTW │ 204020 │ 1106716 │ ███████████ │
43. │ VABB │ 201679 │ 1300865 │ █████████████ │
44. │ OTHH │ 200797 │ 3759544 │ █████████████████████████████████████▌ │
45. │ KMDW │ 200796 │ 1232551 │ ████████████▎ │
46. │ KSAN │ 198003 │ 1495195 │ ██████████████▊ │
47. │ KPDX │ 197760 │ 1269230 │ ████████████▋ │
48. │ SBGR │ 197624 │ 2041697 │ ████████████████████▍ │
49. │ VOBL │ 189011 │ 1040180 │ ██████████▍ │
50. │ LEBL │ 188956 │ 1283190 │ ████████████▋ │
51. │ YBBN │ 188011 │ 1253405 │ ████████████▌ │
52. │ LSZH │ 187934 │ 1572029 │ ███████████████▋ │
53. │ YMML │ 187643 │ 1870076 │ ██████████████████▋ │
54. │ RCTP │ 184466 │ 2773976 │ ███████████████████████████▋ │
55. │ KSNA │ 180045 │ 778484 │ ███████▋ │
56. │ EGKK │ 176420 │ 1694770 │ ████████████████▊ │
57. │ LOWW │ 176191 │ 1274833 │ ████████████▋ │
58. │ UUDD │ 176099 │ 1368226 │ █████████████▋ │
59. │ RKSI │ 173466 │ 3079026 │ ██████████████████████████████▋ │
60. │ EKCH │ 172128 │ 1229895 │ ████████████▎ │
61. │ KOAK │ 171119 │ 1114447 │ ███████████▏ │
62. │ RPLL │ 170122 │ 1440735 │ ██████████████▍ │
63. │ KRDU │ 167001 │ 830521 │ ████████▎ │
64. │ KAUS │ 164524 │ 1256198 │ ████████████▌ │
65. │ KBNA │ 163242 │ 1022726 │ ██████████▏ │
66. │ KSDF │ 162655 │ 1380867 │ █████████████▋ │
67. │ ENGM │ 160732 │ 910108 │ █████████ │
68. │ LIMC │ 160696 │ 1564620 │ ███████████████▋ │
69. │ KSJC │ 159278 │ 1081125 │ ██████████▋ │
70. │ KSTL │ 157984 │ 1026699 │ ██████████▎ │
71. │ UUWW │ 156811 │ 1261155 │ ████████████▌ │
72. │ KIND │ 153929 │ 987944 │ █████████▊ │
73. │ ESSA │ 153390 │ 1203439 │ ████████████ │
74. │ KMCO │ 153351 │ 1508657 │ ███████████████ │
75. │ KDVT │ 152895 │ 74048 │ ▋ │
76. │ VTBS │ 152645 │ 2255591 │ ██████████████████████▌ │
77. │ CYVR │ 149574 │ 2027413 │ ████████████████████▎ │
78. │ EIDW │ 148723 │ 1503985 │ ███████████████ │
79. │ LFPO │ 143277 │ 1152964 │ ███████████▌ │
80. │ EGSS │ 140830 │ 1348183 │ █████████████▍ │
81. │ KAPA │ 140776 │ 420441 │ ████▏ │
82. │ KHOU │ 138985 │ 1068806 │ ██████████▋ │
83. │ KTPA │ 138033 │ 1338223 │ █████████████▍ │
84. │ KFFZ │ 137333 │ 55397 │ ▌ │
85. │ NZAA │ 136092 │ 1581264 │ ███████████████▋ │
86. │ YPPH │ 133916 │ 1271550 │ ████████████▋ │
87. │ RJBB │ 133522 │ 1805623 │ ██████████████████ │
88. │ EDDL │ 133018 │ 1265919 │ ████████████▋ │
89. │ ULLI │ 130501 │ 1197108 │ ███████████▊ │
90. │ KIWA │ 127195 │ 250876 │ ██▌ │
91. │ KTEB │ 126969 │ 1189414 │ ███████████▊ │
92. │ VOMM │ 125616 │ 1127757 │ ███████████▎ │
93. │ LSGG │ 123998 │ 1049101 │ ██████████▍ │
94. │ LPPT │ 122733 │ 1779187 │ █████████████████▋ │
95. │ WSSS │ 120493 │ 3264122 │ ████████████████████████████████▋ │
96. │ EBBR │ 118539 │ 1579939 │ ███████████████▋ │
97. │ VTBD │ 118107 │ 661627 │ ██████▌ │
98. │ KVNY │ 116326 │ 692960 │ ██████▊ │
99. │ EDDT │ 115122 │ 941740 │ █████████▍ │
100. │ EFHK │ 114860 │ 1629143 │ ████████████████▎ │
└────────┴─────────┴──────────┴────────────────────────────────────────┘
```
### Номера рейсов из трех крупных аэропортов Москвы, еженедельно {#flights-from-moscow}
Запрос:
```sql
SELECT
toMonday(day) AS k,
count() AS c,
bar(c, 0, 10000, 100) AS bar
FROM opensky
WHERE origin IN ('UUEE', 'UUDD', 'UUWW')
GROUP BY k
ORDER BY k ASC;
```
Результат:
```text
┌──────────k─┬────c─┬─bar──────────────────────────────────────────────────────────────────────────┐
1. │ 2018-12-31 │ 5248 │ ████████████████████████████████████████████████████▍ │
2. │ 2019-01-07 │ 6302 │ ███████████████████████████████████████████████████████████████ │
3. │ 2019-01-14 │ 5701 │ █████████████████████████████████████████████████████████ │
4. │ 2019-01-21 │ 5638 │ ████████████████████████████████████████████████████████▍ │
5. │ 2019-01-28 │ 5731 │ █████████████████████████████████████████████████████████▎ │
6. │ 2019-02-04 │ 5683 │ ████████████████████████████████████████████████████████▋ │
7. │ 2019-02-11 │ 5759 │ █████████████████████████████████████████████████████████▌ │
8. │ 2019-02-18 │ 5736 │ █████████████████████████████████████████████████████████▎ │
9. │ 2019-02-25 │ 5873 │ ██████████████████████████████████████████████████████████▋ │
10. │ 2019-03-04 │ 5965 │ ███████████████████████████████████████████████████████████▋ │
11. │ 2019-03-11 │ 5900 │ ███████████████████████████████████████████████████████████ │
12. │ 2019-03-18 │ 5823 │ ██████████████████████████████████████████████████████████▏ │
13. │ 2019-03-25 │ 5899 │ ██████████████████████████████████████████████████████████▊ │
14. │ 2019-04-01 │ 6043 │ ████████████████████████████████████████████████████████████▍ │
15. │ 2019-04-08 │ 6098 │ ████████████████████████████████████████████████████████████▊ │
16. │ 2019-04-15 │ 6196 │ █████████████████████████████████████████████████████████████▊ │
17. │ 2019-04-22 │ 6486 │ ████████████████████████████████████████████████████████████████▋ │
18. │ 2019-04-29 │ 6682 │ ██████████████████████████████████████████████████████████████████▋ │
19. │ 2019-05-06 │ 6739 │ ███████████████████████████████████████████████████████████████████▍ │
20. │ 2019-05-13 │ 6600 │ ██████████████████████████████████████████████████████████████████ │
21. │ 2019-05-20 │ 6575 │ █████████████████████████████████████████████████████████████████▋ │
22. │ 2019-05-27 │ 6786 │ ███████████████████████████████████████████████████████████████████▋ │
23. │ 2019-06-03 │ 6872 │ ████████████████████████████████████████████████████████████████████▋ │
24. │ 2019-06-10 │ 7045 │ ██████████████████████████████████████████████████████████████████████▍ │
25. │ 2019-06-17 │ 7045 │ ██████████████████████████████████████████████████████████████████████▍ │
26. │ 2019-06-24 │ 6852 │ ████████████████████████████████████████████████████████████████████▌ │
27. │ 2019-07-01 │ 7248 │ ████████████████████████████████████████████████████████████████████████▍ │
28. │ 2019-07-08 │ 7284 │ ████████████████████████████████████████████████████████████████████████▋ │
29. │ 2019-07-15 │ 7142 │ ███████████████████████████████████████████████████████████████████████▍ │
30. │ 2019-07-22 │ 7108 │ ███████████████████████████████████████████████████████████████████████ │
31. │ 2019-07-29 │ 7251 │ ████████████████████████████████████████████████████████████████████████▌ │
32. │ 2019-08-05 │ 7403 │ ██████████████████████████████████████████████████████████████████████████ │
33. │ 2019-08-12 │ 7457 │ ██████████████████████████████████████████████████████████████████████████▌ │
34. │ 2019-08-19 │ 7502 │ ███████████████████████████████████████████████████████████████████████████ │
35. │ 2019-08-26 │ 7540 │ ███████████████████████████████████████████████████████████████████████████▍ │
36. │ 2019-09-02 │ 7237 │ ████████████████████████████████████████████████████████████████████████▎ │
37. │ 2019-09-09 │ 7328 │ █████████████████████████████████████████████████████████████████████████▎ │
38. │ 2019-09-16 │ 5566 │ ███████████████████████████████████████████████████████▋ │
39. │ 2019-09-23 │ 7049 │ ██████████████████████████████████████████████████████████████████████▍ │
40. │ 2019-09-30 │ 6880 │ ████████████████████████████████████████████████████████████████████▋ │
41. │ 2019-10-07 │ 6518 │ █████████████████████████████████████████████████████████████████▏ │
42. │ 2019-10-14 │ 6688 │ ██████████████████████████████████████████████████████████████████▊ │
43. │ 2019-10-21 │ 6667 │ ██████████████████████████████████████████████████████████████████▋ │
44. │ 2019-10-28 │ 6303 │ ███████████████████████████████████████████████████████████████ │
45. │ 2019-11-04 │ 6298 │ ██████████████████████████████████████████████████████████████▊ │
46. │ 2019-11-11 │ 6137 │ █████████████████████████████████████████████████████████████▎ │
47. │ 2019-11-18 │ 6051 │ ████████████████████████████████████████████████████████████▌ │
48. │ 2019-11-25 │ 5820 │ ██████████████████████████████████████████████████████████▏ │
49. │ 2019-12-02 │ 5942 │ ███████████████████████████████████████████████████████████▍ │
50. │ 2019-12-09 │ 4891 │ ████████████████████████████████████████████████▊ │
51. │ 2019-12-16 │ 5682 │ ████████████████████████████████████████████████████████▋ │
52. │ 2019-12-23 │ 6111 │ █████████████████████████████████████████████████████████████ │
53. │ 2019-12-30 │ 5870 │ ██████████████████████████████████████████████████████████▋ │
54. │ 2020-01-06 │ 5953 │ ███████████████████████████████████████████████████████████▌ │
55. │ 2020-01-13 │ 5698 │ ████████████████████████████████████████████████████████▊ │
56. │ 2020-01-20 │ 5339 │ █████████████████████████████████████████████████████▍ │
57. │ 2020-01-27 │ 5566 │ ███████████████████████████████████████████████████████▋ │
58. │ 2020-02-03 │ 5801 │ ██████████████████████████████████████████████████████████ │
59. │ 2020-02-10 │ 5692 │ ████████████████████████████████████████████████████████▊ │
60. │ 2020-02-17 │ 5912 │ ███████████████████████████████████████████████████████████ │
61. │ 2020-02-24 │ 6031 │ ████████████████████████████████████████████████████████████▎ │
62. │ 2020-03-02 │ 6105 │ █████████████████████████████████████████████████████████████ │
63. │ 2020-03-09 │ 5823 │ ██████████████████████████████████████████████████████████▏ │
64. │ 2020-03-16 │ 4659 │ ██████████████████████████████████████████████▌ │
65. │ 2020-03-23 │ 3720 │ █████████████████████████████████████▏ │
66. │ 2020-03-30 │ 1720 │ █████████████████▏ │
67. │ 2020-04-06 │ 849 │ ████████▍ │
68. │ 2020-04-13 │ 710 │ ███████ │
69. │ 2020-04-20 │ 725 │ ███████▏ │
70. │ 2020-04-27 │ 920 │ █████████▏ │
71. │ 2020-05-04 │ 859 │ ████████▌ │
72. │ 2020-05-11 │ 1047 │ ██████████▍ │
73. │ 2020-05-18 │ 1135 │ ███████████▎ │
74. │ 2020-05-25 │ 1266 │ ████████████▋ │
75. │ 2020-06-01 │ 1793 │ █████████████████▊ │
76. │ 2020-06-08 │ 1979 │ ███████████████████▋ │
77. │ 2020-06-15 │ 2297 │ ██████████████████████▊ │
78. │ 2020-06-22 │ 2788 │ ███████████████████████████▊ │
79. │ 2020-06-29 │ 3389 │ █████████████████████████████████▊ │
80. │ 2020-07-06 │ 3545 │ ███████████████████████████████████▍ │
81. │ 2020-07-13 │ 3569 │ ███████████████████████████████████▋ │
82. │ 2020-07-20 │ 3784 │ █████████████████████████████████████▋ │
83. │ 2020-07-27 │ 3960 │ ███████████████████████████████████████▌ │
84. │ 2020-08-03 │ 4323 │ ███████████████████████████████████████████▏ │
85. │ 2020-08-10 │ 4581 │ █████████████████████████████████████████████▋ │
86. │ 2020-08-17 │ 4791 │ ███████████████████████████████████████████████▊ │
87. │ 2020-08-24 │ 4928 │ █████████████████████████████████████████████████▎ │
88. │ 2020-08-31 │ 4687 │ ██████████████████████████████████████████████▋ │
89. │ 2020-09-07 │ 4643 │ ██████████████████████████████████████████████▍ │
90. │ 2020-09-14 │ 4594 │ █████████████████████████████████████████████▊ │
91. │ 2020-09-21 │ 4478 │ ████████████████████████████████████████████▋ │
92. │ 2020-09-28 │ 4382 │ ███████████████████████████████████████████▋ │
93. │ 2020-10-05 │ 4261 │ ██████████████████████████████████████████▌ │
94. │ 2020-10-12 │ 4243 │ ██████████████████████████████████████████▍ │
95. │ 2020-10-19 │ 3941 │ ███████████████████████████████████████▍ │
96. │ 2020-10-26 │ 3616 │ ████████████████████████████████████▏ │
97. │ 2020-11-02 │ 3586 │ ███████████████████████████████████▋ │
98. │ 2020-11-09 │ 3403 │ ██████████████████████████████████ │
99. │ 2020-11-16 │ 3336 │ █████████████████████████████████▎ │
100. │ 2020-11-23 │ 3230 │ ████████████████████████████████▎ │
101. │ 2020-11-30 │ 3183 │ ███████████████████████████████▋ │
102. │ 2020-12-07 │ 3285 │ ████████████████████████████████▋ │
103. │ 2020-12-14 │ 3367 │ █████████████████████████████████▋ │
104. │ 2020-12-21 │ 3748 │ █████████████████████████████████████▍ │
105. │ 2020-12-28 │ 3986 │ ███████████████████████████████████████▋ │
106. │ 2021-01-04 │ 3906 │ ███████████████████████████████████████ │
107. │ 2021-01-11 │ 3425 │ ██████████████████████████████████▎ │
108. │ 2021-01-18 │ 3144 │ ███████████████████████████████▍ │
109. │ 2021-01-25 │ 3115 │ ███████████████████████████████▏ │
110. │ 2021-02-01 │ 3285 │ ████████████████████████████████▋ │
111. │ 2021-02-08 │ 3321 │ █████████████████████████████████▏ │
112. │ 2021-02-15 │ 3475 │ ██████████████████████████████████▋ │
113. │ 2021-02-22 │ 3549 │ ███████████████████████████████████▍ │
114. │ 2021-03-01 │ 3755 │ █████████████████████████████████████▌ │
115. │ 2021-03-08 │ 3080 │ ██████████████████████████████▋ │
116. │ 2021-03-15 │ 3789 │ █████████████████████████████████████▊ │
117. │ 2021-03-22 │ 3804 │ ██████████████████████████████████████ │
118. │ 2021-03-29 │ 4238 │ ██████████████████████████████████████████▍ │
119. │ 2021-04-05 │ 4307 │ ███████████████████████████████████████████ │
120. │ 2021-04-12 │ 4225 │ ██████████████████████████████████████████▎ │
121. │ 2021-04-19 │ 4391 │ ███████████████████████████████████████████▊ │
122. │ 2021-04-26 │ 4868 │ ████████████████████████████████████████████████▋ │
123. │ 2021-05-03 │ 4977 │ █████████████████████████████████████████████████▋ │
124. │ 2021-05-10 │ 5164 │ ███████████████████████████████████████████████████▋ │
125. │ 2021-05-17 │ 4986 │ █████████████████████████████████████████████████▋ │
126. │ 2021-05-24 │ 5024 │ ██████████████████████████████████████████████████▏ │
127. │ 2021-05-31 │ 4824 │ ████████████████████████████████████████████████▏ │
128. │ 2021-06-07 │ 5652 │ ████████████████████████████████████████████████████████▌ │
129. │ 2021-06-14 │ 5613 │ ████████████████████████████████████████████████████████▏ │
130. │ 2021-06-21 │ 6061 │ ████████████████████████████████████████████████████████████▌ │
131. │ 2021-06-28 │ 2554 │ █████████████████████████▌ │
└────────────┴──────┴──────────────────────────────────────────────────────────────────────────────┘
```
### Online Playground {#playground}
Вы можете протестировать другие запросы к этому набору данным с помощью интерактивного ресурса [Online Playground](https://gh-api.clickhouse.tech/play?user=play). Например, [вот так](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICBvcmlnaW4sCiAgICBjb3VudCgpLAogICAgcm91bmQoYXZnKGdlb0Rpc3RhbmNlKGxvbmdpdHVkZV8xLCBsYXRpdHVkZV8xLCBsb25naXR1ZGVfMiwgbGF0aXR1ZGVfMikpKSBBUyBkaXN0YW5jZSwKICAgIGJhcihkaXN0YW5jZSwgMCwgMTAwMDAwMDAsIDEwMCkgQVMgYmFyCkZST00gb3BlbnNreQpXSEVSRSBvcmlnaW4gIT0gJycKR1JPVVAgQlkgb3JpZ2luCk9SREVSIEJZIGNvdW50KCkgREVTQwpMSU1JVCAxMDA=). Однако обратите внимание, что здесь нельзя создавать временные таблицы.

View File

@ -1 +0,0 @@
../../../en/getting-started/example-datasets/uk-price-paid.md

View File

@ -0,0 +1,650 @@
---
toc_priority: 20
toc_title: Набор данных о стоимости недвижимости в Великобритании
---
# Набор данных о стоимости недвижимости в Великобритании {#uk-property-price-paid}
Набор содержит данные о стоимости недвижимости в Англии и Уэльсе. Данные доступны с 1995 года.
Размер набора данных в несжатом виде составляет около 4 GiB, а в ClickHouse он займет около 278 MiB.
Источник: https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
Описание полей таблицы: https://www.gov.uk/guidance/about-the-price-paid-data
Набор содержит данные HM Land Registry data © Crown copyright and database right 2021. Эти данные лицензированы в соответствии с Open Government Licence v3.0.
## Загрузите набор данных {#download-dataset}
Выполните команду:
```bash
wget http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv
```
Загрузка займет около 2 минут при хорошем подключении к интернету.
## Создайте таблицу {#create-table}
```sql
CREATE TABLE uk_price_paid
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
is_new UInt8,
duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String),
category UInt8
) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);
```
## Обработайте и импортируйте данные {#preprocess-import-data}
В этом примере используется `clickhouse-local` для предварительной обработки данных и `clickhouse-client` для импорта данных.
Указывается структура исходных данных CSV-файла и запрос для предварительной обработки данных с помощью `clickhouse-local`.
Предварительная обработка включает:
- разделение почтового индекса на два разных столбца `postcode1` и `postcode2`, что лучше подходит для хранения данных и выполнения запросов к ним;
- преобразование поля `time` в дату, поскольку оно содержит только время 00:00;
- поле [UUid](../../sql-reference/data-types/uuid.md) игнорируется, потому что оно не будет использовано для анализа;
- преобразование полей `type` и `duration` в более читаемые поля типа `Enum` с помощью функции [transform](../../sql-reference/functions/other-functions.md#transform);
- преобразование полей `is_new` и `category` из односимвольной строки (`Y`/`N` и `A`/`B`) в поле [UInt8](../../sql-reference/data-types/int-uint.md#uint8-uint16-uint32-uint64-uint256-int8-int16-int32-int64-int128-int256) со значениями 0 и 1 соответственно.
Обработанные данные передаются в `clickhouse-client` и импортируются в таблицу ClickHouse потоковым способом.
```bash
clickhouse-local --input-format CSV --structure '
uuid String,
price UInt32,
time DateTime,
postcode String,
a String,
b String,
c String,
addr1 String,
addr2 String,
street String,
locality String,
town String,
district String,
county String,
d String,
e String
' --query "
WITH splitByChar(' ', postcode) AS p
SELECT
price,
toDate(time) AS date,
p[1] AS postcode1,
p[2] AS postcode2,
transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
b = 'Y' AS is_new,
transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
addr1,
addr2,
street,
locality,
town,
district,
county,
d = 'B' AS category
FROM table" --date_time_input_format best_effort < pp-complete.csv | clickhouse-client --query "INSERT INTO uk_price_paid FORMAT TSV"
```
Выполнение запроса займет около 40 секунд.
## Проверьте импортированные данные {#validate-data}
Запрос:
```sql
SELECT count() FROM uk_price_paid;
```
Результат:
```text
┌──count()─┐
│ 26321785 │
└──────────┘
```
Размер набора данных в ClickHouse составляет всего 278 MiB, проверьте это.
Запрос:
```sql
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid';
```
Результат:
```text
┌─formatReadableSize(total_bytes)─┐
│ 278.80 MiB │
└─────────────────────────────────┘
```
## Примеры запросов {#run-queries}
### Запрос 1. Средняя цена за год {#average-price}
Запрос:
```sql
SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 1000000, 80) FROM uk_price_paid GROUP BY year ORDER BY year;
```
Результат:
```text
┌─year─┬──price─┬─bar(round(avg(price)), 0, 1000000, 80)─┐
│ 1995 │ 67932 │ █████▍ │
│ 1996 │ 71505 │ █████▋ │
│ 1997 │ 78532 │ ██████▎ │
│ 1998 │ 85436 │ ██████▋ │
│ 1999 │ 96037 │ ███████▋ │
│ 2000 │ 107479 │ ████████▌ │
│ 2001 │ 118885 │ █████████▌ │
│ 2002 │ 137941 │ ███████████ │
│ 2003 │ 155889 │ ████████████▍ │
│ 2004 │ 178885 │ ██████████████▎ │
│ 2005 │ 189351 │ ███████████████▏ │
│ 2006 │ 203528 │ ████████████████▎ │
│ 2007 │ 219378 │ █████████████████▌ │
│ 2008 │ 217056 │ █████████████████▎ │
│ 2009 │ 213419 │ █████████████████ │
│ 2010 │ 236109 │ ██████████████████▊ │
│ 2011 │ 232805 │ ██████████████████▌ │
│ 2012 │ 238367 │ ███████████████████ │
│ 2013 │ 256931 │ ████████████████████▌ │
│ 2014 │ 279915 │ ██████████████████████▍ │
│ 2015 │ 297266 │ ███████████████████████▋ │
│ 2016 │ 313201 │ █████████████████████████ │
│ 2017 │ 346097 │ ███████████████████████████▋ │
│ 2018 │ 350116 │ ████████████████████████████ │
│ 2019 │ 351013 │ ████████████████████████████ │
│ 2020 │ 369420 │ █████████████████████████████▌ │
│ 2021 │ 386903 │ ██████████████████████████████▊ │
└──────┴────────┴────────────────────────────────────────┘
```
### Запрос 2. Средняя цена за год в Лондоне {#average-price-london}
Запрос:
```sql
SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 2000000, 100) FROM uk_price_paid WHERE town = 'LONDON' GROUP BY year ORDER BY year;
```
Результат:
```text
┌─year─┬───price─┬─bar(round(avg(price)), 0, 2000000, 100)───────────────┐
│ 1995 │ 109116 │ █████▍ │
│ 1996 │ 118667 │ █████▊ │
│ 1997 │ 136518 │ ██████▋ │
│ 1998 │ 152983 │ ███████▋ │
│ 1999 │ 180637 │ █████████ │
│ 2000 │ 215838 │ ██████████▋ │
│ 2001 │ 232994 │ ███████████▋ │
│ 2002 │ 263670 │ █████████████▏ │
│ 2003 │ 278394 │ █████████████▊ │
│ 2004 │ 304666 │ ███████████████▏ │
│ 2005 │ 322875 │ ████████████████▏ │
│ 2006 │ 356191 │ █████████████████▋ │
│ 2007 │ 404054 │ ████████████████████▏ │
│ 2008 │ 420741 │ █████████████████████ │
│ 2009 │ 427753 │ █████████████████████▍ │
│ 2010 │ 480306 │ ████████████████████████ │
│ 2011 │ 496274 │ ████████████████████████▋ │
│ 2012 │ 519442 │ █████████████████████████▊ │
│ 2013 │ 616212 │ ██████████████████████████████▋ │
│ 2014 │ 724154 │ ████████████████████████████████████▏ │
│ 2015 │ 792129 │ ███████████████████████████████████████▌ │
│ 2016 │ 843655 │ ██████████████████████████████████████████▏ │
│ 2017 │ 982642 │ █████████████████████████████████████████████████▏ │
│ 2018 │ 1016835 │ ██████████████████████████████████████████████████▋ │
│ 2019 │ 1042849 │ ████████████████████████████████████████████████████▏ │
│ 2020 │ 1011889 │ ██████████████████████████████████████████████████▌ │
│ 2021 │ 960343 │ ████████████████████████████████████████████████ │
└──────┴─────────┴───────────────────────────────────────────────────────┘
```
Что-то случилось в 2013 году. Я понятия не имею. Может быть, вы имеете представление о том, что произошло в 2020 году?
### Запрос 3. Самые дорогие районы {#most-expensive-neighborhoods}
Запрос:
```sql
SELECT
town,
district,
count() AS c,
round(avg(price)) AS price,
bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE date >= '2020-01-01'
GROUP BY
town,
district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100;
```
Результат:
```text
┌─town─────────────────┬─district───────────────┬────c─┬───price─┬─bar(round(avg(price)), 0, 5000000, 100)────────────────────────────┐
│ LONDON │ CITY OF WESTMINSTER │ 3606 │ 3280239 │ █████████████████████████████████████████████████████████████████▌ │
│ LONDON │ CITY OF LONDON │ 274 │ 3160502 │ ███████████████████████████████████████████████████████████████▏ │
│ LONDON │ KENSINGTON AND CHELSEA │ 2550 │ 2308478 │ ██████████████████████████████████████████████▏ │
│ LEATHERHEAD │ ELMBRIDGE │ 114 │ 1897407 │ █████████████████████████████████████▊ │
│ LONDON │ CAMDEN │ 3033 │ 1805404 │ ████████████████████████████████████ │
│ VIRGINIA WATER │ RUNNYMEDE │ 156 │ 1753247 │ ███████████████████████████████████ │
│ WINDLESHAM │ SURREY HEATH │ 108 │ 1677613 │ █████████████████████████████████▌ │
│ THORNTON HEATH │ CROYDON │ 546 │ 1671721 │ █████████████████████████████████▍ │
│ BARNET │ ENFIELD │ 124 │ 1505840 │ ██████████████████████████████ │
│ COBHAM │ ELMBRIDGE │ 387 │ 1237250 │ ████████████████████████▋ │
│ LONDON │ ISLINGTON │ 2668 │ 1236980 │ ████████████████████████▋ │
│ OXFORD │ SOUTH OXFORDSHIRE │ 321 │ 1220907 │ ████████████████████████▍ │
│ LONDON │ RICHMOND UPON THAMES │ 704 │ 1215551 │ ████████████████████████▎ │
│ LONDON │ HOUNSLOW │ 671 │ 1207493 │ ████████████████████████▏ │
│ ASCOT │ WINDSOR AND MAIDENHEAD │ 407 │ 1183299 │ ███████████████████████▋ │
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 330 │ 1175615 │ ███████████████████████▌ │
│ RICHMOND │ RICHMOND UPON THAMES │ 874 │ 1110444 │ ██████████████████████▏ │
│ LONDON │ HAMMERSMITH AND FULHAM │ 3086 │ 1053983 │ █████████████████████ │
│ SURBITON │ ELMBRIDGE │ 100 │ 1011800 │ ████████████████████▏ │
│ RADLETT │ HERTSMERE │ 283 │ 1011712 │ ████████████████████▏ │
│ SALCOMBE │ SOUTH HAMS │ 127 │ 1011624 │ ████████████████████▏ │
│ WEYBRIDGE │ ELMBRIDGE │ 655 │ 1007265 │ ████████████████████▏ │
│ ESHER │ ELMBRIDGE │ 485 │ 986581 │ ███████████████████▋ │
│ LEATHERHEAD │ GUILDFORD │ 202 │ 977320 │ ███████████████████▌ │
│ BURFORD │ WEST OXFORDSHIRE │ 111 │ 966893 │ ███████████████████▎ │
│ BROCKENHURST │ NEW FOREST │ 129 │ 956675 │ ███████████████████▏ │
│ HINDHEAD │ WAVERLEY │ 137 │ 953753 │ ███████████████████ │
│ GERRARDS CROSS │ BUCKINGHAMSHIRE │ 419 │ 951121 │ ███████████████████ │
│ EAST MOLESEY │ ELMBRIDGE │ 192 │ 936769 │ ██████████████████▋ │
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 146 │ 925515 │ ██████████████████▌ │
│ LONDON │ TOWER HAMLETS │ 4388 │ 918304 │ ██████████████████▎ │
│ OLNEY │ MILTON KEYNES │ 235 │ 910646 │ ██████████████████▏ │
│ HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 540 │ 902418 │ ██████████████████ │
│ LONDON │ SOUTHWARK │ 3885 │ 892997 │ █████████████████▋ │
│ KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 960 │ 885969 │ █████████████████▋ │
│ LONDON │ EALING │ 2658 │ 871755 │ █████████████████▍ │
│ CRANBROOK │ TUNBRIDGE WELLS │ 431 │ 862348 │ █████████████████▏ │
│ LONDON │ MERTON │ 2099 │ 859118 │ █████████████████▏ │
│ BELVEDERE │ BEXLEY │ 346 │ 842423 │ ████████████████▋ │
│ GUILDFORD │ WAVERLEY │ 143 │ 841277 │ ████████████████▋ │
│ HARPENDEN │ ST ALBANS │ 657 │ 841216 │ ████████████████▋ │
│ LONDON │ HACKNEY │ 3307 │ 837090 │ ████████████████▋ │
│ LONDON │ WANDSWORTH │ 6566 │ 832663 │ ████████████████▋ │
│ MAIDENHEAD │ BUCKINGHAMSHIRE │ 123 │ 824299 │ ████████████████▍ │
│ KINGS LANGLEY │ DACORUM │ 145 │ 821331 │ ████████████████▍ │
│ BERKHAMSTED │ DACORUM │ 543 │ 818415 │ ████████████████▎ │
│ GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 226 │ 802807 │ ████████████████ │
│ BILLINGSHURST │ CHICHESTER │ 144 │ 797829 │ ███████████████▊ │
│ WOKING │ GUILDFORD │ 176 │ 793494 │ ███████████████▋ │
│ STOCKBRIDGE │ TEST VALLEY │ 178 │ 793269 │ ███████████████▋ │
│ EPSOM │ REIGATE AND BANSTEAD │ 172 │ 791862 │ ███████████████▋ │
│ TONBRIDGE │ TUNBRIDGE WELLS │ 360 │ 787876 │ ███████████████▋ │
│ TEDDINGTON │ RICHMOND UPON THAMES │ 595 │ 786492 │ ███████████████▋ │
│ TWICKENHAM │ RICHMOND UPON THAMES │ 1155 │ 786193 │ ███████████████▋ │
│ LYNDHURST │ NEW FOREST │ 102 │ 785593 │ ███████████████▋ │
│ LONDON │ LAMBETH │ 5228 │ 774574 │ ███████████████▍ │
│ LONDON │ BARNET │ 3955 │ 773259 │ ███████████████▍ │
│ OXFORD │ VALE OF WHITE HORSE │ 353 │ 772088 │ ███████████████▍ │
│ TONBRIDGE │ MAIDSTONE │ 305 │ 770740 │ ███████████████▍ │
│ LUTTERWORTH │ HARBOROUGH │ 538 │ 768634 │ ███████████████▎ │
│ WOODSTOCK │ WEST OXFORDSHIRE │ 140 │ 766037 │ ███████████████▎ │
│ MIDHURST │ CHICHESTER │ 257 │ 764815 │ ███████████████▎ │
│ MARLOW │ BUCKINGHAMSHIRE │ 327 │ 761876 │ ███████████████▏ │
│ LONDON │ NEWHAM │ 3237 │ 761784 │ ███████████████▏ │
│ ALDERLEY EDGE │ CHESHIRE EAST │ 178 │ 757318 │ ███████████████▏ │
│ LUTON │ CENTRAL BEDFORDSHIRE │ 212 │ 754283 │ ███████████████ │
│ PETWORTH │ CHICHESTER │ 154 │ 754220 │ ███████████████ │
│ ALRESFORD │ WINCHESTER │ 219 │ 752718 │ ███████████████ │
│ POTTERS BAR │ WELWYN HATFIELD │ 174 │ 748465 │ ██████████████▊ │
│ HASLEMERE │ CHICHESTER │ 128 │ 746907 │ ██████████████▊ │
│ TADWORTH │ REIGATE AND BANSTEAD │ 502 │ 743252 │ ██████████████▋ │
│ THAMES DITTON │ ELMBRIDGE │ 244 │ 741913 │ ██████████████▋ │
│ REIGATE │ REIGATE AND BANSTEAD │ 581 │ 738198 │ ██████████████▋ │
│ BOURNE END │ BUCKINGHAMSHIRE │ 138 │ 735190 │ ██████████████▋ │
│ SEVENOAKS │ SEVENOAKS │ 1156 │ 730018 │ ██████████████▌ │
│ OXTED │ TANDRIDGE │ 336 │ 729123 │ ██████████████▌ │
│ INGATESTONE │ BRENTWOOD │ 166 │ 728103 │ ██████████████▌ │
│ LONDON │ BRENT │ 2079 │ 720605 │ ██████████████▍ │
│ LONDON │ HARINGEY │ 3216 │ 717780 │ ██████████████▎ │
│ PURLEY │ CROYDON │ 575 │ 716108 │ ██████████████▎ │
│ WELWYN │ WELWYN HATFIELD │ 222 │ 710603 │ ██████████████▏ │
│ RICKMANSWORTH │ THREE RIVERS │ 798 │ 704571 │ ██████████████ │
│ BANSTEAD │ REIGATE AND BANSTEAD │ 401 │ 701293 │ ██████████████ │
│ CHIGWELL │ EPPING FOREST │ 261 │ 701203 │ ██████████████ │
│ PINNER │ HARROW │ 528 │ 698885 │ █████████████▊ │
│ HASLEMERE │ WAVERLEY │ 280 │ 696659 │ █████████████▊ │
│ SLOUGH │ BUCKINGHAMSHIRE │ 396 │ 694917 │ █████████████▊ │
│ WALTON-ON-THAMES │ ELMBRIDGE │ 946 │ 692395 │ █████████████▋ │
│ READING │ SOUTH OXFORDSHIRE │ 318 │ 691988 │ █████████████▋ │
│ NORTHWOOD │ HILLINGDON │ 271 │ 690643 │ █████████████▋ │
│ FELTHAM │ HOUNSLOW │ 763 │ 688595 │ █████████████▋ │
│ ASHTEAD │ MOLE VALLEY │ 303 │ 687923 │ █████████████▋ │
│ BARNET │ BARNET │ 975 │ 686980 │ █████████████▋ │
│ WOKING │ SURREY HEATH │ 283 │ 686669 │ █████████████▋ │
│ MALMESBURY │ WILTSHIRE │ 323 │ 683324 │ █████████████▋ │
│ AMERSHAM │ BUCKINGHAMSHIRE │ 496 │ 680962 │ █████████████▌ │
│ CHISLEHURST │ BROMLEY │ 430 │ 680209 │ █████████████▌ │
│ HYTHE │ FOLKESTONE AND HYTHE │ 490 │ 676908 │ █████████████▌ │
│ MAYFIELD │ WEALDEN │ 101 │ 676210 │ █████████████▌ │
│ ASCOT │ BRACKNELL FOREST │ 168 │ 676004 │ █████████████▌ │
└──────────────────────┴────────────────────────┴──────┴─────────┴────────────────────────────────────────────────────────────────────┘
```
## Ускорьте запросы с помощью проекций {#speedup-with-projections}
[Проекции](../../sql-reference/statements/alter/projection.md) позволяют повысить скорость запросов за счет хранения предварительно агрегированных данных.
### Создайте проекцию {#build-projection}
Создайте агрегирующую проекцию по параметрам `toYear(date)`, `district`, `town`:
```sql
ALTER TABLE uk_price_paid
ADD PROJECTION projection_by_year_district_town
(
SELECT
toYear(date),
district,
town,
avg(price),
sum(price),
count()
GROUP BY
toYear(date),
district,
town
);
```
Заполните проекцию для текущих данных (иначе проекция будет создана только для добавляемых данных):
```sql
ALTER TABLE uk_price_paid
MATERIALIZE PROJECTION projection_by_year_district_town
SETTINGS mutations_sync = 1;
```
## Проверьте производительность {#test-performance}
Давайте выполним те же 3 запроса.
[Включите](../../operations/settings/settings.md#allow-experimental-projection-optimization) поддержку проекций:
```sql
SET allow_experimental_projection_optimization = 1;
```
### Запрос 1. Средняя цена за год {#average-price-projections}
Запрос:
```sql
SELECT
toYear(date) AS year,
round(avg(price)) AS price,
bar(price, 0, 1000000, 80)
FROM uk_price_paid
GROUP BY year
ORDER BY year ASC;
```
Результат:
```text
┌─year─┬──price─┬─bar(round(avg(price)), 0, 1000000, 80)─┐
│ 1995 │ 67932 │ █████▍ │
│ 1996 │ 71505 │ █████▋ │
│ 1997 │ 78532 │ ██████▎ │
│ 1998 │ 85436 │ ██████▋ │
│ 1999 │ 96037 │ ███████▋ │
│ 2000 │ 107479 │ ████████▌ │
│ 2001 │ 118885 │ █████████▌ │
│ 2002 │ 137941 │ ███████████ │
│ 2003 │ 155889 │ ████████████▍ │
│ 2004 │ 178885 │ ██████████████▎ │
│ 2005 │ 189351 │ ███████████████▏ │
│ 2006 │ 203528 │ ████████████████▎ │
│ 2007 │ 219378 │ █████████████████▌ │
│ 2008 │ 217056 │ █████████████████▎ │
│ 2009 │ 213419 │ █████████████████ │
│ 2010 │ 236109 │ ██████████████████▊ │
│ 2011 │ 232805 │ ██████████████████▌ │
│ 2012 │ 238367 │ ███████████████████ │
│ 2013 │ 256931 │ ████████████████████▌ │
│ 2014 │ 279915 │ ██████████████████████▍ │
│ 2015 │ 297266 │ ███████████████████████▋ │
│ 2016 │ 313201 │ █████████████████████████ │
│ 2017 │ 346097 │ ███████████████████████████▋ │
│ 2018 │ 350116 │ ████████████████████████████ │
│ 2019 │ 351013 │ ████████████████████████████ │
│ 2020 │ 369420 │ █████████████████████████████▌ │
│ 2021 │ 386903 │ ██████████████████████████████▊ │
└──────┴────────┴────────────────────────────────────────┘
```
### Запрос 2. Средняя цена за год в Лондоне {#average-price-london-projections}
Запрос:
```sql
SELECT
toYear(date) AS year,
round(avg(price)) AS price,
bar(price, 0, 2000000, 100)
FROM uk_price_paid
WHERE town = 'LONDON'
GROUP BY year
ORDER BY year ASC;
```
Результат:
```text
┌─year─┬───price─┬─bar(round(avg(price)), 0, 2000000, 100)───────────────┐
│ 1995 │ 109116 │ █████▍ │
│ 1996 │ 118667 │ █████▊ │
│ 1997 │ 136518 │ ██████▋ │
│ 1998 │ 152983 │ ███████▋ │
│ 1999 │ 180637 │ █████████ │
│ 2000 │ 215838 │ ██████████▋ │
│ 2001 │ 232994 │ ███████████▋ │
│ 2002 │ 263670 │ █████████████▏ │
│ 2003 │ 278394 │ █████████████▊ │
│ 2004 │ 304666 │ ███████████████▏ │
│ 2005 │ 322875 │ ████████████████▏ │
│ 2006 │ 356191 │ █████████████████▋ │
│ 2007 │ 404054 │ ████████████████████▏ │
│ 2008 │ 420741 │ █████████████████████ │
│ 2009 │ 427753 │ █████████████████████▍ │
│ 2010 │ 480306 │ ████████████████████████ │
│ 2011 │ 496274 │ ████████████████████████▋ │
│ 2012 │ 519442 │ █████████████████████████▊ │
│ 2013 │ 616212 │ ██████████████████████████████▋ │
│ 2014 │ 724154 │ ████████████████████████████████████▏ │
│ 2015 │ 792129 │ ███████████████████████████████████████▌ │
│ 2016 │ 843655 │ ██████████████████████████████████████████▏ │
│ 2017 │ 982642 │ █████████████████████████████████████████████████▏ │
│ 2018 │ 1016835 │ ██████████████████████████████████████████████████▋ │
│ 2019 │ 1042849 │ ████████████████████████████████████████████████████▏ │
│ 2020 │ 1011889 │ ██████████████████████████████████████████████████▌ │
│ 2021 │ 960343 │ ████████████████████████████████████████████████ │
└──────┴─────────┴───────────────────────────────────────────────────────┘
```
### Запрос 3. Самые дорогие районы {#most-expensive-neighborhoods-projections}
Условие (date >= '2020-01-01') необходимо изменить, чтобы оно соответствовало проекции (toYear(date) >= 2020).
Запрос:
```sql
SELECT
town,
district,
count() AS c,
round(avg(price)) AS price,
bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE toYear(date) >= 2020
GROUP BY
town,
district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100;
```
Результат:
```text
┌─town─────────────────┬─district───────────────┬────c─┬───price─┬─bar(round(avg(price)), 0, 5000000, 100)────────────────────────────┐
│ LONDON │ CITY OF WESTMINSTER │ 3606 │ 3280239 │ █████████████████████████████████████████████████████████████████▌ │
│ LONDON │ CITY OF LONDON │ 274 │ 3160502 │ ███████████████████████████████████████████████████████████████▏ │
│ LONDON │ KENSINGTON AND CHELSEA │ 2550 │ 2308478 │ ██████████████████████████████████████████████▏ │
│ LEATHERHEAD │ ELMBRIDGE │ 114 │ 1897407 │ █████████████████████████████████████▊ │
│ LONDON │ CAMDEN │ 3033 │ 1805404 │ ████████████████████████████████████ │
│ VIRGINIA WATER │ RUNNYMEDE │ 156 │ 1753247 │ ███████████████████████████████████ │
│ WINDLESHAM │ SURREY HEATH │ 108 │ 1677613 │ █████████████████████████████████▌ │
│ THORNTON HEATH │ CROYDON │ 546 │ 1671721 │ █████████████████████████████████▍ │
│ BARNET │ ENFIELD │ 124 │ 1505840 │ ██████████████████████████████ │
│ COBHAM │ ELMBRIDGE │ 387 │ 1237250 │ ████████████████████████▋ │
│ LONDON │ ISLINGTON │ 2668 │ 1236980 │ ████████████████████████▋ │
│ OXFORD │ SOUTH OXFORDSHIRE │ 321 │ 1220907 │ ████████████████████████▍ │
│ LONDON │ RICHMOND UPON THAMES │ 704 │ 1215551 │ ████████████████████████▎ │
│ LONDON │ HOUNSLOW │ 671 │ 1207493 │ ████████████████████████▏ │
│ ASCOT │ WINDSOR AND MAIDENHEAD │ 407 │ 1183299 │ ███████████████████████▋ │
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 330 │ 1175615 │ ███████████████████████▌ │
│ RICHMOND │ RICHMOND UPON THAMES │ 874 │ 1110444 │ ██████████████████████▏ │
│ LONDON │ HAMMERSMITH AND FULHAM │ 3086 │ 1053983 │ █████████████████████ │
│ SURBITON │ ELMBRIDGE │ 100 │ 1011800 │ ████████████████████▏ │
│ RADLETT │ HERTSMERE │ 283 │ 1011712 │ ████████████████████▏ │
│ SALCOMBE │ SOUTH HAMS │ 127 │ 1011624 │ ████████████████████▏ │
│ WEYBRIDGE │ ELMBRIDGE │ 655 │ 1007265 │ ████████████████████▏ │
│ ESHER │ ELMBRIDGE │ 485 │ 986581 │ ███████████████████▋ │
│ LEATHERHEAD │ GUILDFORD │ 202 │ 977320 │ ███████████████████▌ │
│ BURFORD │ WEST OXFORDSHIRE │ 111 │ 966893 │ ███████████████████▎ │
│ BROCKENHURST │ NEW FOREST │ 129 │ 956675 │ ███████████████████▏ │
│ HINDHEAD │ WAVERLEY │ 137 │ 953753 │ ███████████████████ │
│ GERRARDS CROSS │ BUCKINGHAMSHIRE │ 419 │ 951121 │ ███████████████████ │
│ EAST MOLESEY │ ELMBRIDGE │ 192 │ 936769 │ ██████████████████▋ │
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 146 │ 925515 │ ██████████████████▌ │
│ LONDON │ TOWER HAMLETS │ 4388 │ 918304 │ ██████████████████▎ │
│ OLNEY │ MILTON KEYNES │ 235 │ 910646 │ ██████████████████▏ │
│ HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 540 │ 902418 │ ██████████████████ │
│ LONDON │ SOUTHWARK │ 3885 │ 892997 │ █████████████████▋ │
│ KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 960 │ 885969 │ █████████████████▋ │
│ LONDON │ EALING │ 2658 │ 871755 │ █████████████████▍ │
│ CRANBROOK │ TUNBRIDGE WELLS │ 431 │ 862348 │ █████████████████▏ │
│ LONDON │ MERTON │ 2099 │ 859118 │ █████████████████▏ │
│ BELVEDERE │ BEXLEY │ 346 │ 842423 │ ████████████████▋ │
│ GUILDFORD │ WAVERLEY │ 143 │ 841277 │ ████████████████▋ │
│ HARPENDEN │ ST ALBANS │ 657 │ 841216 │ ████████████████▋ │
│ LONDON │ HACKNEY │ 3307 │ 837090 │ ████████████████▋ │
│ LONDON │ WANDSWORTH │ 6566 │ 832663 │ ████████████████▋ │
│ MAIDENHEAD │ BUCKINGHAMSHIRE │ 123 │ 824299 │ ████████████████▍ │
│ KINGS LANGLEY │ DACORUM │ 145 │ 821331 │ ████████████████▍ │
│ BERKHAMSTED │ DACORUM │ 543 │ 818415 │ ████████████████▎ │
│ GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 226 │ 802807 │ ████████████████ │
│ BILLINGSHURST │ CHICHESTER │ 144 │ 797829 │ ███████████████▊ │
│ WOKING │ GUILDFORD │ 176 │ 793494 │ ███████████████▋ │
│ STOCKBRIDGE │ TEST VALLEY │ 178 │ 793269 │ ███████████████▋ │
│ EPSOM │ REIGATE AND BANSTEAD │ 172 │ 791862 │ ███████████████▋ │
│ TONBRIDGE │ TUNBRIDGE WELLS │ 360 │ 787876 │ ███████████████▋ │
│ TEDDINGTON │ RICHMOND UPON THAMES │ 595 │ 786492 │ ███████████████▋ │
│ TWICKENHAM │ RICHMOND UPON THAMES │ 1155 │ 786193 │ ███████████████▋ │
│ LYNDHURST │ NEW FOREST │ 102 │ 785593 │ ███████████████▋ │
│ LONDON │ LAMBETH │ 5228 │ 774574 │ ███████████████▍ │
│ LONDON │ BARNET │ 3955 │ 773259 │ ███████████████▍ │
│ OXFORD │ VALE OF WHITE HORSE │ 353 │ 772088 │ ███████████████▍ │
│ TONBRIDGE │ MAIDSTONE │ 305 │ 770740 │ ███████████████▍ │
│ LUTTERWORTH │ HARBOROUGH │ 538 │ 768634 │ ███████████████▎ │
│ WOODSTOCK │ WEST OXFORDSHIRE │ 140 │ 766037 │ ███████████████▎ │
│ MIDHURST │ CHICHESTER │ 257 │ 764815 │ ███████████████▎ │
│ MARLOW │ BUCKINGHAMSHIRE │ 327 │ 761876 │ ███████████████▏ │
│ LONDON │ NEWHAM │ 3237 │ 761784 │ ███████████████▏ │
│ ALDERLEY EDGE │ CHESHIRE EAST │ 178 │ 757318 │ ███████████████▏ │
│ LUTON │ CENTRAL BEDFORDSHIRE │ 212 │ 754283 │ ███████████████ │
│ PETWORTH │ CHICHESTER │ 154 │ 754220 │ ███████████████ │
│ ALRESFORD │ WINCHESTER │ 219 │ 752718 │ ███████████████ │
│ POTTERS BAR │ WELWYN HATFIELD │ 174 │ 748465 │ ██████████████▊ │
│ HASLEMERE │ CHICHESTER │ 128 │ 746907 │ ██████████████▊ │
│ TADWORTH │ REIGATE AND BANSTEAD │ 502 │ 743252 │ ██████████████▋ │
│ THAMES DITTON │ ELMBRIDGE │ 244 │ 741913 │ ██████████████▋ │
│ REIGATE │ REIGATE AND BANSTEAD │ 581 │ 738198 │ ██████████████▋ │
│ BOURNE END │ BUCKINGHAMSHIRE │ 138 │ 735190 │ ██████████████▋ │
│ SEVENOAKS │ SEVENOAKS │ 1156 │ 730018 │ ██████████████▌ │
│ OXTED │ TANDRIDGE │ 336 │ 729123 │ ██████████████▌ │
│ INGATESTONE │ BRENTWOOD │ 166 │ 728103 │ ██████████████▌ │
│ LONDON │ BRENT │ 2079 │ 720605 │ ██████████████▍ │
│ LONDON │ HARINGEY │ 3216 │ 717780 │ ██████████████▎ │
│ PURLEY │ CROYDON │ 575 │ 716108 │ ██████████████▎ │
│ WELWYN │ WELWYN HATFIELD │ 222 │ 710603 │ ██████████████▏ │
│ RICKMANSWORTH │ THREE RIVERS │ 798 │ 704571 │ ██████████████ │
│ BANSTEAD │ REIGATE AND BANSTEAD │ 401 │ 701293 │ ██████████████ │
│ CHIGWELL │ EPPING FOREST │ 261 │ 701203 │ ██████████████ │
│ PINNER │ HARROW │ 528 │ 698885 │ █████████████▊ │
│ HASLEMERE │ WAVERLEY │ 280 │ 696659 │ █████████████▊ │
│ SLOUGH │ BUCKINGHAMSHIRE │ 396 │ 694917 │ █████████████▊ │
│ WALTON-ON-THAMES │ ELMBRIDGE │ 946 │ 692395 │ █████████████▋ │
│ READING │ SOUTH OXFORDSHIRE │ 318 │ 691988 │ █████████████▋ │
│ NORTHWOOD │ HILLINGDON │ 271 │ 690643 │ █████████████▋ │
│ FELTHAM │ HOUNSLOW │ 763 │ 688595 │ █████████████▋ │
│ ASHTEAD │ MOLE VALLEY │ 303 │ 687923 │ █████████████▋ │
│ BARNET │ BARNET │ 975 │ 686980 │ █████████████▋ │
│ WOKING │ SURREY HEATH │ 283 │ 686669 │ █████████████▋ │
│ MALMESBURY │ WILTSHIRE │ 323 │ 683324 │ █████████████▋ │
│ AMERSHAM │ BUCKINGHAMSHIRE │ 496 │ 680962 │ █████████████▌ │
│ CHISLEHURST │ BROMLEY │ 430 │ 680209 │ █████████████▌ │
│ HYTHE │ FOLKESTONE AND HYTHE │ 490 │ 676908 │ █████████████▌ │
│ MAYFIELD │ WEALDEN │ 101 │ 676210 │ █████████████▌ │
│ ASCOT │ BRACKNELL FOREST │ 168 │ 676004 │ █████████████▌ │
└──────────────────────┴────────────────────────┴──────┴─────────┴────────────────────────────────────────────────────────────────────┘
```
### Резюме {#summary}
Все три запроса работают намного быстрее и читают меньшее количество строк.
```text
Query 1
no projection: 27 rows in set. Elapsed: 0.158 sec. Processed 26.32 million rows, 157.93 MB (166.57 million rows/s., 999.39 MB/s.)
projection: 27 rows in set. Elapsed: 0.007 sec. Processed 105.96 thousand rows, 3.33 MB (14.58 million rows/s., 458.13 MB/s.)
Query 2
no projection: 27 rows in set. Elapsed: 0.163 sec. Processed 26.32 million rows, 80.01 MB (161.75 million rows/s., 491.64 MB/s.)
projection: 27 rows in set. Elapsed: 0.008 sec. Processed 105.96 thousand rows, 3.67 MB (13.29 million rows/s., 459.89 MB/s.)
Query 3
no projection: 100 rows in set. Elapsed: 0.069 sec. Processed 26.32 million rows, 62.47 MB (382.13 million rows/s., 906.93 MB/s.)
projection: 100 rows in set. Elapsed: 0.029 sec. Processed 8.08 thousand rows, 511.08 KB (276.06 thousand rows/s., 17.47 MB/s.)
```
### Online Playground {#playground}
Этот набор данных доступен в [Online Playground](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUIHRvd24sIGRpc3RyaWN0LCBjb3VudCgpIEFTIGMsIHJvdW5kKGF2ZyhwcmljZSkpIEFTIHByaWNlLCBiYXIocHJpY2UsIDAsIDUwMDAwMDAsIDEwMCkgRlJPTSB1a19wcmljZV9wYWlkIFdIRVJFIGRhdGUgPj0gJzIwMjAtMDEtMDEnIEdST1VQIEJZIHRvd24sIGRpc3RyaWN0IEhBVklORyBjID49IDEwMCBPUkRFUiBCWSBwcmljZSBERVNDIExJTUlUIDEwMA==).

View File

@ -27,11 +27,11 @@ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not su
{% include 'install/deb.sh' %}
```
Также эти пакеты можно скачать и установить вручную отсюда: https://repo.clickhouse.tech/deb/stable/main/.
Также эти пакеты можно скачать и установить вручную отсюда: https://repo.clickhouse.com/deb/stable/main/.
Если вы хотите использовать наиболее свежую версию, замените `stable` на `testing` (рекомендуется для тестовых окружений).
Также вы можете вручную скачать и установить пакеты из [репозитория](https://repo.clickhouse.tech/deb/stable/main/).
Также вы можете вручную скачать и установить пакеты из [репозитория](https://repo.clickhouse.com/deb/stable/main/).
#### Пакеты {#packages}
@ -52,8 +52,8 @@ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not su
``` bash
sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/stable/x86_64
```
Для использования наиболее свежих версий нужно заменить `stable` на `testing` (рекомендуется для тестовых окружений). Также иногда доступен `prestable`.
@ -64,21 +64,21 @@ sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_6
sudo yum install clickhouse-server clickhouse-client
```
Также есть возможность установить пакеты вручную, скачав отсюда: https://repo.clickhouse.tech/rpm/stable/x86_64.
Также есть возможность установить пакеты вручную, скачав отсюда: https://repo.clickhouse.com/rpm/stable/x86_64.
### Из Tgz архивов {#from-tgz-archives}
Команда ClickHouse в Яндексе рекомендует использовать предкомпилированные бинарники из `tgz` архивов для всех дистрибутивов, где невозможна установка `deb` и `rpm` пакетов.
Интересующую версию архивов можно скачать вручную с помощью `curl` или `wget` из репозитория https://repo.clickhouse.tech/tgz/.
Интересующую версию архивов можно скачать вручную с помощью `curl` или `wget` из репозитория https://repo.clickhouse.com/tgz/.
После этого архивы нужно распаковать и воспользоваться скриптами установки. Пример установки самой свежей версии:
``` bash
export LATEST_VERSION=`curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1`
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-client-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-client-$LATEST_VERSION.tgz
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh

View File

@ -364,7 +364,7 @@ $ clickhouse-client --format_csv_delimiter="|" --query="INSERT INTO test.csv FOR
## CSVWithNames {#csvwithnames}
Выводит также заголовок, аналогично `TabSeparatedWithNames`.
Выводит также заголовок, аналогично [TabSeparatedWithNames](#tabseparatedwithnames).
## CustomSeparated {#format-customseparated}

View File

@ -7,7 +7,8 @@ toc_title: DateTime64
Позволяет хранить момент времени, который может быть представлен как календарная дата и время, с заданной суб-секундной точностью.
Размер тика (точность, precision): 10<sup>-precision</sup> секунд, где precision - целочисленный параметр.
Размер тика (точность, precision): 10<sup>-precision</sup> секунд, где precision - целочисленный параметр. Возможные значения: [ 0 : 9 ].
Обычно используются - 3 (миллисекунды), 6 (микросекунды), 9 (наносекунды).
**Синтаксис:**

View File

@ -58,9 +58,68 @@ str -> str != Referer
Для некоторых функций первый аргумент (лямбда-функция) может отсутствовать. В этом случае подразумевается тождественное отображение.
## Пользовательские функции {#user-defined-functions}
## Пользовательские функции SQL {#user-defined-functions}
Функции можно создавать с помощью выражения [CREATE FUNCTION](../statements/create/function.md). Для удаления таких функций используется выражение [DROP FUNCTION](../statements/drop.md#drop-function).
Функции можно создавать из лямбда выражений с помощью [CREATE FUNCTION](../statements/create/function.md). Для удаления таких функций используется выражение [DROP FUNCTION](../statements/drop.md#drop-function).
## Исполняемые пользовательские функции {#executable-user-defined-functions}
ClickHouse может вызывать внешнюю программу или скрипт для обработки данных. Такие функции описываются в [конфигурационном файле](../../operations/configuration-files.md). Путь к нему должен быть указан в настройке `user_defined_executable_functions_config` в основной конфигурации. В пути можно использовать символ подстановки `*`, тогда будут загружены все файлы, соответствующие шаблону. Пример:
``` xml
<user_defined_executable_functions_config>*_function.xml</user_defined_executable_functions_config>
```
Файлы с описанием функций ищутся относительно каталога, заданного в настройке `user_files_path`.
Конфигурация функции содержит следующие настройки:
- `name` - имя функции.
- `command` - исполняемая команда или скрипт.
- `argument` - описание аргумента, содержащее его тип во вложенной настройке `type`. Каждый аргумент описывается отдельно.
- `format` - [формат](../../interfaces/formats.md) передачи аргументов.
- `return_type` - тип возвращаемого значения.
- `type` - вариант запуска команды. Если задан вариант `executable`, то запускается одна команда. При указании `executable_pool` создается пул команд.
- `max_command_execution_time` - максимальное время в секундах, которое отводится на обработку блока данных. Эта настройка применима только для команд с вариантом запуска `executable_pool`. Необязательная настройка. Значение по умолчанию `10`.
- `command_termination_timeout` - максимальное время завершения команды в секундах после закрытия конвейера. Если команда не завершается, то процессу отправляется сигнал `SIGTERM`. Эта настройка применима только для команд с вариантом запуска `executable_pool`. Необязательная настройка. Значение по умолчанию `10`.
- `pool_size` - размер пула команд. Необязательная настройка. Значение по умолчанию `16`.
- `lifetime` - интервал перезагрузки функций в секундах. Если задан `0`, то функция не перезагружается.
- `send_chunk_header` - управляет отправкой количества строк перед отправкой блока данных для обработки. Необязательная настройка. Значение по умолчанию `false`.
Команда должна читать аргументы из `STDIN` и выводить результат в `STDOUT`. Обработка должна выполняться в цикле. То есть после обработки группы аргументов команда должна ожидать следующую группу.
**Пример**
XML конфигурация, описывающая функцию `test_function`:
```
<functions>
<function>
<type>executable</type>
<name>test_function</name>
<return_type>UInt64</return_type>
<argument>
<type>UInt64</type>
</argument>
<argument>
<type>UInt64</type>
</argument>
<format>TabSeparated</format>
<command>cd /; clickhouse-local --input-format TabSeparated --output-format TabSeparated --structure 'x UInt64, y UInt64' --query "SELECT x + y FROM table"</command>
<lifetime>0</lifetime>
</function>
</functions>
```
Запрос:
``` sql
SELECT test_function(toUInt64(2), toUInt64(2));
```
Результат:
``` text
┌─test_function(toUInt64(2), toUInt64(2))─┐
│ 4 │
└─────────────────────────────────────────┘
```
## Обработка ошибок {#obrabotka-oshibok}

View File

@ -157,6 +157,8 @@ GRANT SELECT(x,y) ON db.table TO john WITH GRANT OPTION
- `SYSTEM RELOAD CONFIG`
- `SYSTEM RELOAD DICTIONARY`
- `SYSTEM RELOAD EMBEDDED DICTIONARIES`
- `SYSTEM RELOAD FUNCTION`
- `SYSTEM RELOAD FUNCTIONS`
- `SYSTEM MERGES`
- `SYSTEM TTL MERGES`
- `SYSTEM FETCHES`

View File

@ -10,6 +10,8 @@ toc_title: SYSTEM
- [RELOAD DICTIONARY](#query_language-system-reload-dictionary)
- [RELOAD MODELS](#query_language-system-reload-models)
- [RELOAD MODEL](#query_language-system-reload-model)
- [RELOAD FUNCTIONS](#query_language-system-reload-functions)
- [RELOAD FUNCTION](#query_language-system-reload-functions)
- [DROP DNS CACHE](#query_language-system-drop-dns-cache)
- [DROP MARK CACHE](#query_language-system-drop-mark-cache)
- [DROP UNCOMPRESSED CACHE](#query_language-system-drop-uncompressed-cache)
@ -80,6 +82,17 @@ SYSTEM RELOAD MODELS
SYSTEM RELOAD MODEL <model_name>
```
## RELOAD FUNCTIONS {#query_language-system-reload-functions}
Перезагружает все зарегистрированные [исполняемые пользовательские функции](../functions/index.md#executable-user-defined-functions) или одну из них из файла конфигурации.
**Синтаксис**
```sql
RELOAD FUNCTIONS
RELOAD FUNCTION function_name
```
## DROP DNS CACHE {#query_language-system-drop-dns-cache}
Сбрасывает внутренний DNS кеш ClickHouse. Иногда (для старых версий ClickHouse) необходимо использовать эту команду при изменении инфраструктуры (смене IP адреса у другого ClickHouse сервера или сервера, используемого словарями).

View File

@ -4,8 +4,8 @@ set -ex
BASE_DIR=$(dirname $(readlink -f $0))
BUILD_DIR="${BASE_DIR}/../build"
PUBLISH_DIR="${BASE_DIR}/../publish"
BASE_DOMAIN="${BASE_DOMAIN:-content.clickhouse.tech}"
GIT_TEST_URI="${GIT_TEST_URI:-git@github.com:ClickHouse/clickhouse-website-content.git}"
BASE_DOMAIN="${BASE_DOMAIN:-content.clickhouse.com}"
GIT_TEST_URI="${GIT_TEST_URI:-git@github.com:ClickHouse/clickhouse-com-content.git}"
GIT_PROD_URI="git@github.com:ClickHouse/clickhouse-website-content.git"
EXTRA_BUILD_ARGS="${EXTRA_BUILD_ARGS:---minify --verbose}"

View File

@ -29,7 +29,7 @@ $ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not
如果您想使用最新的版本,请用`testing`替代`stable`(我们只推荐您用于测试环境)。
你也可以从这里手动下载安装包:[下载](https://repo.clickhouse.tech/deb/stable/main/)。
你也可以从这里手动下载安装包:[下载](https://repo.clickhouse.com/deb/stable/main/)。
安装包列表:
@ -46,8 +46,8 @@ $ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not
``` bash
sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/stable/x86_64
```
如果您想使用最新的版本,请用`testing`替代`stable`(我们只推荐您用于测试环境)。`prestable`有时也可用。
@ -58,22 +58,22 @@ sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_6
sudo yum install clickhouse-server clickhouse-client
```
你也可以从这里手动下载安装包:[下载](https://repo.clickhouse.tech/rpm/stable/x86_64)。
你也可以从这里手动下载安装包:[下载](https://repo.clickhouse.com/rpm/stable/x86_64)。
### `Tgz`安装包 {#from-tgz-archives}
如果您的操作系统不支持安装`deb`或`rpm`包,建议使用官方预编译的`tgz`软件包。
所需的版本可以通过`curl`或`wget`从存储库`https://repo.clickhouse.tech/tgz/`下载。
所需的版本可以通过`curl`或`wget`从存储库`https://repo.clickhouse.com/tgz/`下载。
下载后解压缩下载资源文件并使用安装脚本进行安装。以下是一个最新版本的安装示例:
``` bash
export LATEST_VERSION=`curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1`
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-client-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.com/tgz/clickhouse-client-$LATEST_VERSION.tgz
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh

View File

@ -34,7 +34,7 @@
#include <Poco/Util/Application.h>
#include <Processors/Formats/IInputFormat.h>
#include <Processors/Executors/PullingAsyncPipelineExecutor.h>
#include <Processors/QueryPipeline.h>
#include <Processors/QueryPipelineBuilder.h>
#include <Columns/ColumnString.h>
#include <common/find_symbols.h>
#include <common/LineReader.h>
@ -2050,8 +2050,7 @@ private:
});
}
QueryPipeline pipeline;
pipeline.init(std::move(pipe));
QueryPipeline pipeline(std::move(pipe));
PullingAsyncPipelineExecutor executor(pipeline);
Block block;

View File

@ -9,6 +9,11 @@
#include <IO/ConnectionTimeoutsContext.h>
#include <Interpreters/InterpreterInsertQuery.h>
#include <Processors/Transforms/ExpressionTransform.h>
#include <Processors/QueryPipelineBuilder.h>
#include <Processors/Chain.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
#include <Processors/Executors/PushingPipelineExecutor.h>
#include <Processors/Sources/RemoteSource.h>
#include <DataStreams/ExpressionBlockInputStream.h>
namespace DB
@ -1446,7 +1451,7 @@ TaskStatus ClusterCopier::processPartitionPieceTaskImpl(
local_context->setSettings(task_cluster->settings_pull);
local_context->setSetting("skip_unavailable_shards", true);
Block block = getBlockWithAllStreamData(InterpreterFactory::get(query_select_ast, local_context)->execute().getInputStream());
Block block = getBlockWithAllStreamData(InterpreterFactory::get(query_select_ast, local_context)->execute().pipeline);
count = (block) ? block.safeGetByPosition(0).column->getUInt(0) : 0;
}
@ -1524,25 +1529,30 @@ TaskStatus ClusterCopier::processPartitionPieceTaskImpl(
context_insert->setSettings(task_cluster->settings_push);
/// Custom INSERT SELECT implementation
BlockInputStreamPtr input;
BlockOutputStreamPtr output;
QueryPipeline input;
QueryPipeline output;
{
BlockIO io_select = InterpreterFactory::get(query_select_ast, context_select)->execute();
BlockIO io_insert = InterpreterFactory::get(query_insert_ast, context_insert)->execute();
auto pure_input = io_select.getInputStream();
output = io_insert.out;
output = std::move(io_insert.pipeline);
/// Add converting actions to make it possible to copy blocks with slightly different schema
const auto & select_block = pure_input->getHeader();
const auto & insert_block = output->getHeader();
const auto & select_block = io_select.pipeline.getHeader();
const auto & insert_block = output.getHeader();
auto actions_dag = ActionsDAG::makeConvertingActions(
select_block.getColumnsWithTypeAndName(),
insert_block.getColumnsWithTypeAndName(),
ActionsDAG::MatchColumnsMode::Position);
auto actions = std::make_shared<ExpressionActions>(actions_dag, ExpressionActionsSettings::fromContext(getContext()));
input = std::make_shared<ExpressionBlockInputStream>(pure_input, actions);
QueryPipelineBuilder builder;
builder.init(std::move(io_select.pipeline));
builder.addSimpleTransform([&](const Block & header)
{
return std::make_shared<ExpressionTransform>(header, actions);
});
input = QueryPipelineBuilder::getPipeline(std::move(builder));
}
/// Fail-fast optimization to abort copying when the current clean state expires
@ -1588,7 +1598,26 @@ TaskStatus ClusterCopier::processPartitionPieceTaskImpl(
};
/// Main work is here
copyData(*input, *output, cancel_check, update_stats);
PullingPipelineExecutor pulling_executor(input);
PushingPipelineExecutor pushing_executor(output);
Block data;
bool is_cancelled = false;
while (pulling_executor.pull(data))
{
if (cancel_check())
{
is_cancelled = true;
pushing_executor.cancel();
pushing_executor.cancel();
break;
}
pushing_executor.push(data);
update_stats(data);
}
if (!is_cancelled)
pushing_executor.finish();
// Just in case
if (future_is_dirty_checker.valid())
@ -1711,7 +1740,8 @@ String ClusterCopier::getRemoteCreateTable(
String query = "SHOW CREATE TABLE " + getQuotedTable(table);
Block block = getBlockWithAllStreamData(
std::make_shared<RemoteBlockInputStream>(connection, query, InterpreterShowCreateQuery::getSampleBlock(), remote_context));
QueryPipeline(std::make_shared<RemoteSource>(
std::make_shared<RemoteQueryExecutor>(connection, query, InterpreterShowCreateQuery::getSampleBlock(), remote_context), false, false)));
return typeid_cast<const ColumnString &>(*block.safeGetByPosition(0).column).getDataAt(0).toString();
}
@ -1824,7 +1854,7 @@ std::set<String> ClusterCopier::getShardPartitions(const ConnectionTimeouts & ti
auto local_context = Context::createCopy(context);
local_context->setSettings(task_cluster->settings_pull);
Block block = getBlockWithAllStreamData(InterpreterFactory::get(query_ast, local_context)->execute().getInputStream());
Block block = getBlockWithAllStreamData(InterpreterFactory::get(query_ast, local_context)->execute().pipeline);
if (block)
{
@ -1869,7 +1899,11 @@ bool ClusterCopier::checkShardHasPartition(const ConnectionTimeouts & timeouts,
auto local_context = Context::createCopy(context);
local_context->setSettings(task_cluster->settings_pull);
return InterpreterFactory::get(query_ast, local_context)->execute().getInputStream()->read().rows() != 0;
auto pipeline = InterpreterFactory::get(query_ast, local_context)->execute().pipeline;
PullingPipelineExecutor executor(pipeline);
Block block;
executor.pull(block);
return block.rows() != 0;
}
bool ClusterCopier::checkPresentPartitionPiecesOnCurrentShard(const ConnectionTimeouts & timeouts,
@ -1910,12 +1944,15 @@ bool ClusterCopier::checkPresentPartitionPiecesOnCurrentShard(const ConnectionTi
auto local_context = Context::createCopy(context);
local_context->setSettings(task_cluster->settings_pull);
auto result = InterpreterFactory::get(query_ast, local_context)->execute().getInputStream()->read().rows();
if (result != 0)
auto pipeline = InterpreterFactory::get(query_ast, local_context)->execute().pipeline;
PullingPipelineExecutor executor(pipeline);
Block result;
executor.pull(result);
if (result.rows() != 0)
LOG_INFO(log, "Partition {} piece number {} is PRESENT on shard {}", partition_quoted_name, std::to_string(current_piece_number), task_shard.getDescription());
else
LOG_INFO(log, "Partition {} piece number {} is ABSENT on shard {}", partition_quoted_name, std::to_string(current_piece_number), task_shard.getDescription());
return result != 0;
return result.rows() != 0;
}

View File

@ -1,6 +1,8 @@
#include "Internals.h"
#include <Storages/MergeTree/MergeTreeData.h>
#include <Storages/extractKeyExpressionList.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
#include <Processors/Transforms/SquashingChunksTransform.h>
namespace DB
{
@ -63,9 +65,21 @@ BlockInputStreamPtr squashStreamIntoOneBlock(const BlockInputStreamPtr & stream)
std::numeric_limits<size_t>::max());
}
Block getBlockWithAllStreamData(const BlockInputStreamPtr & stream)
Block getBlockWithAllStreamData(QueryPipeline pipeline)
{
return squashStreamIntoOneBlock(stream)->read();
QueryPipelineBuilder builder;
builder.init(std::move(pipeline));
builder.addTransform(std::make_shared<SquashingChunksTransform>(
builder.getHeader(),
std::numeric_limits<size_t>::max(),
std::numeric_limits<size_t>::max()));
auto cur_pipeline = QueryPipelineBuilder::getPipeline(std::move(builder));
Block block;
PullingPipelineExecutor executor(cur_pipeline);
executor.pull(block);
return block;
}

View File

@ -165,10 +165,7 @@ std::shared_ptr<ASTStorage> createASTStorageDistributed(
const String & cluster_name, const String & database, const String & table,
const ASTPtr & sharding_key_ast = nullptr);
BlockInputStreamPtr squashStreamIntoOneBlock(const BlockInputStreamPtr & stream);
Block getBlockWithAllStreamData(const BlockInputStreamPtr & stream);
Block getBlockWithAllStreamData(QueryPipeline pipeline);
bool isExtendedDefinitionStorage(const ASTPtr & storage_ast);

View File

@ -962,7 +962,7 @@ namespace
if (isRunning(pid_file))
{
throw Exception(ErrorCodes::CANNOT_KILL,
"The server process still exists after %zu ms",
"The server process still exists after {} tries (delay: {} ms)",
num_kill_check_tries, kill_check_delay_ms);
}
}

View File

@ -26,7 +26,7 @@
#include <Formats/registerFormats.h>
#include <Formats/FormatFactory.h>
#include <Processors/Formats/IInputFormat.h>
#include <Processors/QueryPipeline.h>
#include <Processors/QueryPipelineBuilder.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
#include <Core/Block.h>
#include <common/StringRef.h>
@ -1162,8 +1162,7 @@ try
Pipe pipe(FormatFactory::instance().getInput(input_format, file_in, header, context, max_block_size));
QueryPipeline pipeline;
pipeline.init(std::move(pipe));
QueryPipeline pipeline(std::move(pipe));
PullingPipelineExecutor executor(pipeline);
Block block;
@ -1200,8 +1199,7 @@ try
});
}
QueryPipeline pipeline;
pipeline.init(std::move(pipe));
QueryPipeline pipeline(std::move(pipe));
BlockOutputStreamPtr output = context->getOutputStreamParallelIfPossible(output_format, file_out, header);

View File

@ -859,6 +859,9 @@ if (ThreadFuzzer::instance().isEffective())
if (config->has("max_partition_size_to_drop"))
global_context->setMaxPartitionSizeToDrop(config->getUInt64("max_partition_size_to_drop"));
if (config->has("max_concurrent_queries"))
global_context->getProcessList().setMaxSize(config->getInt("max_concurrent_queries", 0));
if (!initial_loading)
{
/// We do not load ZooKeeper configuration on the first config loading

View File

@ -1,22 +0,0 @@
OWNER(g:clickhouse)
PROGRAM(clickhouse-server)
PEERDIR(
clickhouse/base/common
clickhouse/base/daemon
clickhouse/base/loggers
clickhouse/src
contrib/libs/poco/NetSSL_OpenSSL
)
CFLAGS(-g0)
SRCS(
clickhouse-server.cpp
MetricsTransmitter.cpp
Server.cpp
)
END()

View File

@ -1,32 +0,0 @@
OWNER(g:clickhouse)
PROGRAM(clickhouse)
CFLAGS(
-DENABLE_CLICKHOUSE_CLIENT
-DENABLE_CLICKHOUSE_EXTRACT_FROM_CONFIG
-DENABLE_CLICKHOUSE_SERVER
)
PEERDIR(
clickhouse/base/daemon
clickhouse/base/loggers
clickhouse/src
)
CFLAGS(-g0)
SRCS(
main.cpp
client/Client.cpp
client/QueryFuzzer.cpp
client/ConnectionParameters.cpp
client/Suggest.cpp
client/TestHint.cpp
extract-from-config/ExtractFromConfig.cpp
server/Server.cpp
server/MetricsTransmitter.cpp
)
END()

View File

@ -45,6 +45,7 @@ enum class AccessType
M(ALTER_RENAME_COLUMN, "RENAME COLUMN", COLUMN, ALTER_COLUMN) \
M(ALTER_MATERIALIZE_COLUMN, "MATERIALIZE COLUMN", COLUMN, ALTER_COLUMN) \
M(ALTER_COLUMN, "", GROUP, ALTER_TABLE) /* allow to execute ALTER {ADD|DROP|MODIFY...} COLUMN */\
M(ALTER_MODIFY_COMMENT, "MODIFY COMMENT", TABLE, ALTER_TABLE) /* modify table comment */\
\
M(ALTER_ORDER_BY, "ALTER MODIFY ORDER BY, MODIFY ORDER BY", TABLE, ALTER_INDEX) \
M(ALTER_SAMPLE_BY, "ALTER MODIFY SAMPLE BY, MODIFY SAMPLE BY", TABLE, ALTER_INDEX) \

View File

@ -14,6 +14,7 @@
#include <Interpreters/InterpreterCreateUserQuery.h>
#include <Interpreters/InterpreterShowGrantsQuery.h>
#include <Common/quoteString.h>
#include <common/logger_useful.h>
#include <Poco/JSON/JSON.h>
#include <Poco/JSON/Object.h>
#include <Poco/JSON/Stringifier.h>

View File

@ -63,9 +63,12 @@ void ReplicatedAccessStorage::shutdown()
bool prev_stop_flag = stop_flag.exchange(true);
if (!prev_stop_flag)
{
/// Notify the worker thread to stop waiting for new queue items
refresh_queue.push(UUIDHelpers::Nil);
worker_thread.join();
if (worker_thread.joinable())
{
/// Notify the worker thread to stop waiting for new queue items
refresh_queue.push(UUIDHelpers::Nil);
worker_thread.join();
}
}
}

View File

@ -0,0 +1,46 @@
#include <gtest/gtest.h>
#include <Access/ReplicatedAccessStorage.h>
using namespace DB;
namespace DB
{
namespace ErrorCodes
{
extern const int NO_ZOOKEEPER;
}
}
TEST(ReplicatedAccessStorage, ShutdownWithoutStartup)
{
auto get_zk = []()
{
return std::shared_ptr<zkutil::ZooKeeper>();
};
auto storage = ReplicatedAccessStorage("replicated", "/clickhouse/access", get_zk);
storage.shutdown();
}
TEST(ReplicatedAccessStorage, ShutdownWithFailedStartup)
{
auto get_zk = []()
{
return std::shared_ptr<zkutil::ZooKeeper>();
};
auto storage = ReplicatedAccessStorage("replicated", "/clickhouse/access", get_zk);
try
{
storage.startup();
}
catch (Exception & e)
{
if (e.code() != ErrorCodes::NO_ZOOKEEPER)
throw;
}
storage.shutdown();
}

View File

@ -1,54 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
AccessControlManager.cpp
AccessEntityIO.cpp
AccessRights.cpp
AccessRightsElement.cpp
AllowedClientHosts.cpp
Authentication.cpp
ContextAccess.cpp
Credentials.cpp
DiskAccessStorage.cpp
EnabledQuota.cpp
EnabledRoles.cpp
EnabledRolesInfo.cpp
EnabledRowPolicies.cpp
EnabledSettings.cpp
ExternalAuthenticators.cpp
GSSAcceptor.cpp
GrantedRoles.cpp
IAccessEntity.cpp
IAccessStorage.cpp
LDAPAccessStorage.cpp
LDAPClient.cpp
MemoryAccessStorage.cpp
MultipleAccessStorage.cpp
Quota.cpp
QuotaCache.cpp
QuotaUsage.cpp
ReplicatedAccessStorage.cpp
Role.cpp
RoleCache.cpp
RolesOrUsersSet.cpp
RowPolicy.cpp
RowPolicyCache.cpp
SettingsConstraints.cpp
SettingsProfile.cpp
SettingsProfileElement.cpp
SettingsProfilesCache.cpp
SettingsProfilesInfo.cpp
User.cpp
UsersConfigAccessStorage.cpp
)
END()

View File

@ -1,14 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -155,7 +155,7 @@ AggregateFunctionPtr AggregateFunctionFactory::getImpl(
}
/// Combinators of aggregate functions.
/// For every aggregate function 'agg' and combiner '-Comb' there is combined aggregate function with name 'aggComb',
/// For every aggregate function 'agg' and combiner '-Comb' there is a combined aggregate function with the name 'aggComb',
/// that can have different number and/or types of arguments, different result type and different behaviour.
if (AggregateFunctionCombinatorPtr combinator = AggregateFunctionCombinatorFactory::instance().tryFindSuffix(name))
@ -172,13 +172,12 @@ AggregateFunctionPtr AggregateFunctionFactory::getImpl(
String nested_name = name.substr(0, name.size() - combinator_name.size());
/// Nested identical combinators (i.e. uniqCombinedIfIf) is not
/// supported (since they even don't work -- silently).
/// supported (since they don't work -- silently).
///
/// But non-identical does supported and works, for example
/// uniqCombinedIfMergeIf, it is useful in case when the underlying
/// But non-identical is supported and works. For example,
/// uniqCombinedIfMergeIf is useful in cases when the underlying
/// storage stores AggregateFunction(uniqCombinedIf) and in SELECT you
/// need to filter aggregation result based on another column for
/// example.
/// need to filter aggregation result based on another column.
if (!combinator->supportsNesting() && nested_name.ends_with(combinator_name))
{
throw Exception(ErrorCodes::ILLEGAL_AGGREGATION,
@ -234,7 +233,7 @@ std::optional<AggregateFunctionProperties> AggregateFunctionFactory::tryGetPrope
return found.properties;
/// Combinators of aggregate functions.
/// For every aggregate function 'agg' and combiner '-Comb' there is combined aggregate function with name 'aggComb',
/// For every aggregate function 'agg' and combiner '-Comb' there is a combined aggregate function with the name 'aggComb',
/// that can have different number and/or types of arguments, different result type and different behaviour.
if (AggregateFunctionCombinatorPtr combinator = AggregateFunctionCombinatorFactory::instance().tryFindSuffix(name))

View File

@ -1,74 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
AggregateFunctionAggThrow.cpp
AggregateFunctionAny.cpp
AggregateFunctionArray.cpp
AggregateFunctionAvg.cpp
AggregateFunctionAvgWeighted.cpp
AggregateFunctionBitwise.cpp
AggregateFunctionBoundingRatio.cpp
AggregateFunctionCategoricalInformationValue.cpp
AggregateFunctionCombinatorFactory.cpp
AggregateFunctionCount.cpp
AggregateFunctionDeltaSum.cpp
AggregateFunctionDeltaSumTimestamp.cpp
AggregateFunctionDistinct.cpp
AggregateFunctionEntropy.cpp
AggregateFunctionFactory.cpp
AggregateFunctionForEach.cpp
AggregateFunctionGroupArray.cpp
AggregateFunctionGroupArrayInsertAt.cpp
AggregateFunctionGroupArrayMoving.cpp
AggregateFunctionGroupUniqArray.cpp
AggregateFunctionHistogram.cpp
AggregateFunctionIf.cpp
AggregateFunctionIntervalLengthSum.cpp
AggregateFunctionMLMethod.cpp
AggregateFunctionMannWhitney.cpp
AggregateFunctionMax.cpp
AggregateFunctionMaxIntersections.cpp
AggregateFunctionMerge.cpp
AggregateFunctionMin.cpp
AggregateFunctionNull.cpp
AggregateFunctionOrFill.cpp
AggregateFunctionQuantile.cpp
AggregateFunctionRankCorrelation.cpp
AggregateFunctionResample.cpp
AggregateFunctionRetention.cpp
AggregateFunctionSequenceMatch.cpp
AggregateFunctionSequenceNextNode.cpp
AggregateFunctionSimpleLinearRegression.cpp
AggregateFunctionSimpleState.cpp
AggregateFunctionSingleValueOrNull.cpp
AggregateFunctionSparkbar.cpp
AggregateFunctionState.cpp
AggregateFunctionStatistics.cpp
AggregateFunctionStatisticsSimple.cpp
AggregateFunctionStudentTTest.cpp
AggregateFunctionSum.cpp
AggregateFunctionSumCount.cpp
AggregateFunctionSumMap.cpp
AggregateFunctionTopK.cpp
AggregateFunctionUniq.cpp
AggregateFunctionUniqCombined.cpp
AggregateFunctionUniqUpTo.cpp
AggregateFunctionWelchTTest.cpp
AggregateFunctionWindowFunnel.cpp
IAggregateFunction.cpp
UniqCombinedBiasData.cpp
UniqVariadicHash.cpp
parseAggregateFunctionParameters.cpp
registerAggregateFunctions.cpp
)
END()

View File

@ -1,14 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | grep -v -F GroupBitmap | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -1,27 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
BackupEntryConcat.cpp
BackupEntryFromAppendOnlyFile.cpp
BackupEntryFromImmutableFile.cpp
BackupEntryFromMemory.cpp
BackupEntryFromSmallFile.cpp
BackupFactory.cpp
BackupInDirectory.cpp
BackupRenamingConfig.cpp
BackupSettings.cpp
BackupUtils.cpp
hasCompatibleDataToRestoreTable.cpp
renameInCreateQuery.cpp
)
END()

View File

@ -1,14 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -1,17 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
IBridgeHelper.cpp
LibraryBridgeHelper.cpp
)
END()

View File

@ -1,14 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -212,6 +212,7 @@ add_object_library(clickhouse_processors_formats Processors/Formats)
add_object_library(clickhouse_processors_formats_impl Processors/Formats/Impl)
add_object_library(clickhouse_processors_transforms Processors/Transforms)
add_object_library(clickhouse_processors_sources Processors/Sources)
add_object_library(clickhouse_processors_sinks Processors/Sinks)
add_object_library(clickhouse_processors_merges Processors/Merges)
add_object_library(clickhouse_processors_merges_algorithms Processors/Merges/Algorithms)
add_object_library(clickhouse_processors_queryplan Processors/QueryPlan)

View File

@ -23,7 +23,7 @@
#include <Interpreters/ClientInfo.h>
#include <Compression/CompressionFactory.h>
#include <Processors/Pipe.h>
#include <Processors/QueryPipeline.h>
#include <Processors/QueryPipelineBuilder.h>
#include <Processors/ISink.h>
#include <Processors/Executors/PipelineExecutor.h>
#include <pcg_random.hpp>
@ -700,14 +700,14 @@ void Connection::sendExternalTablesData(ExternalTablesData & data)
if (!elem->pipe)
elem->pipe = elem->creating_pipe_callback();
QueryPipeline pipeline;
QueryPipelineBuilder pipeline;
pipeline.init(std::move(*elem->pipe));
elem->pipe.reset();
pipeline.resize(1);
auto sink = std::make_shared<ExternalTableDataSink>(pipeline.getHeader(), *this, *elem, std::move(on_cancel));
pipeline.setSinks([&](const Block &, QueryPipeline::StreamType type) -> ProcessorPtr
pipeline.setSinks([&](const Block &, QueryPipelineBuilder::StreamType type) -> ProcessorPtr
{
if (type != QueryPipeline::StreamType::Main)
if (type != QueryPipelineBuilder::StreamType::Main)
return nullptr;
return sink;
});

View File

@ -1,24 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
contrib/libs/poco/NetSSL_OpenSSL
)
SRCS(
Connection.cpp
ConnectionEstablisher.cpp
ConnectionPool.cpp
ConnectionPoolWithFailover.cpp
HedgedConnections.cpp
HedgedConnectionsFactory.cpp
IConnections.cpp
MultiplexedConnections.cpp
)
END()

View File

@ -1,15 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
contrib/libs/poco/NetSSL_OpenSSL
)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -1,43 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
ADDINCL(
contrib/libs/icu/common
contrib/libs/icu/i18n
contrib/libs/pdqsort
contrib/libs/lz4
)
PEERDIR(
clickhouse/src/Common
contrib/libs/icu
contrib/libs/pdqsort
contrib/libs/lz4
)
SRCS(
Collator.cpp
ColumnAggregateFunction.cpp
ColumnArray.cpp
ColumnCompressed.cpp
ColumnConst.cpp
ColumnDecimal.cpp
ColumnFixedString.cpp
ColumnFunction.cpp
ColumnLowCardinality.cpp
ColumnMap.cpp
ColumnNullable.cpp
ColumnString.cpp
ColumnTuple.cpp
ColumnVector.cpp
ColumnsCommon.cpp
FilterDescription.cpp
IColumn.cpp
MaskOperations.cpp
getLeastSuperColumn.cpp
)
END()

View File

@ -1,23 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
ADDINCL(
contrib/libs/icu/common
contrib/libs/icu/i18n
contrib/libs/pdqsort
contrib/libs/lz4
)
PEERDIR(
clickhouse/src/Common
contrib/libs/icu
contrib/libs/pdqsort
contrib/libs/lz4
)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -34,6 +34,10 @@ using RWLock = std::shared_ptr<RWLockImpl>;
/// - SELECT thread 1 locks in the Read mode
/// - ALTER tries to lock in the Write mode (waits for SELECT thread 1)
/// - SELECT thread 2 tries to lock in the Read mode (waits for ALTER)
///
/// NOTE: it is dangerous to acquire lock with NO_QUERY, because FastPath doesn't
/// exist for this case and deadlock, described in previous note,
/// may accur in case of recursive locking.
class RWLockImpl : public std::enable_shared_from_this<RWLockImpl>
{
public:

View File

@ -145,7 +145,6 @@ protected:
Poco::Logger * log = nullptr;
friend class CurrentThread;
friend class PushingToViewsBlockOutputStream;
/// Use ptr not to add extra dependencies in the header
std::unique_ptr<RUsageCounters> last_rusage;
@ -188,6 +187,11 @@ public:
return query_context.lock();
}
void disableProfiling()
{
query_profiled_enabled = false;
}
/// Starts new query and create new thread group for it, current thread becomes master thread of the query
void initializeQuery();
@ -222,6 +226,7 @@ public:
/// Detaches thread from the thread group and the query, dumps performance counters if they have not been dumped
void detachQuery(bool exit_if_already_detached = false, bool thread_exits = false);
void logToQueryViewsLog(const ViewRuntimeData & vinfo);
protected:
void applyQuerySettings();
@ -234,7 +239,6 @@ protected:
void logToQueryThreadLog(QueryThreadLog & thread_log, const String & current_database, std::chrono::time_point<std::chrono::system_clock> now);
void logToQueryViewsLog(const ViewRuntimeData & vinfo);
void assertState(const std::initializer_list<int> & permitted_states, const char * description = nullptr) const;

View File

@ -1,133 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
ADDINCL (
GLOBAL clickhouse/src
contrib/libs/libcpuid
contrib/libs/libunwind/include
GLOBAL contrib/restricted/dragonbox
)
PEERDIR(
clickhouse/base/common
clickhouse/base/pcg-random
clickhouse/base/widechar_width
contrib/libs/libcpuid
contrib/libs/openssl
contrib/libs/poco/NetSSL_OpenSSL
contrib/libs/re2
contrib/restricted/dragonbox
)
INCLUDE(${ARCADIA_ROOT}/clickhouse/cmake/yandex/ya.make.versions.inc)
SRCS(
ActionLock.cpp
AlignedBuffer.cpp
Allocator.cpp
ClickHouseRevision.cpp
Config/AbstractConfigurationComparison.cpp
Config/ConfigProcessor.cpp
Config/ConfigReloader.cpp
Config/YAMLParser.cpp
Config/configReadClient.cpp
CurrentMemoryTracker.cpp
CurrentMetrics.cpp
CurrentThread.cpp
DNSResolver.cpp
Dwarf.cpp
Elf.cpp
Epoll.cpp
ErrorCodes.cpp
Exception.cpp
FieldVisitorDump.cpp
FieldVisitorHash.cpp
FieldVisitorSum.cpp
FieldVisitorToString.cpp
FieldVisitorWriteBinary.cpp
FileChecker.cpp
IO.cpp
IPv6ToBinary.cpp
IntervalKind.cpp
JSONBuilder.cpp
Macros.cpp
MemoryStatisticsOS.cpp
MemoryTracker.cpp
OpenSSLHelpers.cpp
OptimizedRegularExpression.cpp
PODArray.cpp
PipeFDs.cpp
ProcfsMetricsProvider.cpp
ProfileEvents.cpp
ProgressIndication.cpp
QueryProfiler.cpp
RWLock.cpp
RemoteHostFilter.cpp
SensitiveDataMasker.cpp
SettingsChanges.cpp
SharedLibrary.cpp
ShellCommand.cpp
StackTrace.cpp
StatusFile.cpp
StatusInfo.cpp
Stopwatch.cpp
StringUtils/StringUtils.cpp
StudentTTest.cpp
SymbolIndex.cpp
TLDListsHolder.cpp
TaskStatsInfoGetter.cpp
TerminalSize.cpp
ThreadFuzzer.cpp
ThreadPool.cpp
ThreadProfileEvents.cpp
ThreadStatus.cpp
Throttler.cpp
TimerDescriptor.cpp
TraceCollector.cpp
UTF8Helpers.cpp
UnicodeBar.cpp
VersionNumber.cpp
WeakHash.cpp
ZooKeeper/IKeeper.cpp
ZooKeeper/TestKeeper.cpp
ZooKeeper/ZooKeeper.cpp
ZooKeeper/ZooKeeperCommon.cpp
ZooKeeper/ZooKeeperConstants.cpp
ZooKeeper/ZooKeeperIO.cpp
ZooKeeper/ZooKeeperImpl.cpp
ZooKeeper/ZooKeeperNodeCache.cpp
checkStackSize.cpp
clearPasswordFromCommandLine.cpp
clickhouse_malloc.cpp
createHardLink.cpp
escapeForFileName.cpp
filesystemHelpers.cpp
formatIPv6.cpp
formatReadable.cpp
getExecutablePath.cpp
getHashOfLoadedBinary.cpp
getMappedArea.cpp
getMultipleKeysFromConfig.cpp
getNumberOfPhysicalCPUCores.cpp
hasLinuxCapability.cpp
hex.cpp
isLocalAddress.cpp
isValidUTF8.cpp
malloc.cpp
new_delete.cpp
parseAddress.cpp
parseGlobs.cpp
parseRemoteDescription.cpp
quoteString.cpp
randomSeed.cpp
remapExecutable.cpp
renameat2.cpp
setThreadName.cpp
thread_local_rng.cpp
)
END()

View File

@ -1,30 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
ADDINCL (
GLOBAL clickhouse/src
contrib/libs/libcpuid
contrib/libs/libunwind/include
GLOBAL contrib/restricted/dragonbox
)
PEERDIR(
clickhouse/base/common
clickhouse/base/pcg-random
clickhouse/base/widechar_width
contrib/libs/libcpuid
contrib/libs/openssl
contrib/libs/poco/NetSSL_OpenSSL
contrib/libs/re2
contrib/restricted/dragonbox
)
INCLUDE(${ARCADIA_ROOT}/clickhouse/cmake/yandex/ya.make.versions.inc)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -1,42 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
ADDINCL(
contrib/libs/lz4
contrib/libs/zstd/include
)
PEERDIR(
clickhouse/src/Common
contrib/libs/lz4
contrib/libs/zstd
)
SRCS(
CachedCompressedReadBuffer.cpp
CheckingCompressedReadBuffer.cpp
CompressedReadBuffer.cpp
CompressedReadBufferBase.cpp
CompressedReadBufferFromFile.cpp
CompressedWriteBuffer.cpp
CompressionCodecDelta.cpp
CompressionCodecDoubleDelta.cpp
CompressionCodecEncrypted.cpp
CompressionCodecGorilla.cpp
CompressionCodecLZ4.cpp
CompressionCodecMultiple.cpp
CompressionCodecNone.cpp
CompressionCodecT64.cpp
CompressionCodecZSTD.cpp
CompressionFactory.cpp
CompressionFactoryAdditions.cpp
ICompressionCodec.cpp
LZ4_decompress_faster.cpp
getCompressionCodecForFile.cpp
)
END()

View File

@ -1,21 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
ADDINCL(
contrib/libs/lz4
contrib/libs/zstd/include
)
PEERDIR(
clickhouse/src/Common
contrib/libs/lz4
contrib/libs/zstd
)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -16,6 +16,7 @@ struct Settings;
* and should not be changed by the user without a reason.
*/
#define LIST_OF_COORDINATION_SETTINGS(M) \
M(Milliseconds, session_timeout_ms, Coordination::DEFAULT_SESSION_TIMEOUT_MS, "Default client session timeout", 0) \
M(Milliseconds, operation_timeout_ms, Coordination::DEFAULT_OPERATION_TIMEOUT_MS, "Default client operation timeout", 0) \
@ -36,7 +37,8 @@ struct Settings;
M(UInt64, max_requests_batch_size, 100, "Max size of batch in requests count before it will be sent to RAFT", 0) \
M(Bool, quorum_reads, false, "Execute read requests as writes through whole RAFT consesus with similar speed", 0) \
M(Bool, force_sync, true, "Call fsync on each change in RAFT changelog", 0) \
M(Bool, compress_logs, true, "Write compressed coordination logs in ZSTD format", 0)
M(Bool, compress_logs, true, "Write compressed coordination logs in ZSTD format", 0) \
M(Bool, compress_snapshots_with_zstd_format, true, "Write compressed snapshots in ZSTD format (instead of custom LZ4)", 0)
DECLARE_SETTINGS_TRAITS(CoordinationSettingsTraits, LIST_OF_COORDINATION_SETTINGS)

View File

@ -10,6 +10,7 @@
#include <IO/ReadBufferFromFile.h>
#include <IO/copyData.h>
#include <filesystem>
#include <memory>
namespace DB
{
@ -32,9 +33,12 @@ namespace
return parse<uint64_t>(name_parts[1]);
}
std::string getSnapshotFileName(uint64_t up_to_log_idx)
std::string getSnapshotFileName(uint64_t up_to_log_idx, bool compress_zstd)
{
return std::string{"snapshot_"} + std::to_string(up_to_log_idx) + ".bin";
auto base = std::string{"snapshot_"} + std::to_string(up_to_log_idx) + ".bin";
if (compress_zstd)
base += ".zstd";
return base;
}
std::string getBaseName(const String & path)
@ -218,6 +222,7 @@ SnapshotMetadataPtr KeeperStorageSnapshot::deserialize(KeeperStorage & storage,
storage.zxid = result->get_last_log_idx();
storage.session_id_counter = session_id;
/// Before V1 we serialized ACL without acl_map
if (current_version >= SnapshotVersion::V1)
{
size_t acls_map_size;
@ -338,9 +343,13 @@ KeeperStorageSnapshot::~KeeperStorageSnapshot()
storage->disableSnapshotMode();
}
KeeperSnapshotManager::KeeperSnapshotManager(const std::string & snapshots_path_, size_t snapshots_to_keep_, const std::string & superdigest_, size_t storage_tick_time_)
KeeperSnapshotManager::KeeperSnapshotManager(
const std::string & snapshots_path_, size_t snapshots_to_keep_,
bool compress_snapshots_zstd_,
const std::string & superdigest_, size_t storage_tick_time_)
: snapshots_path(snapshots_path_)
, snapshots_to_keep(snapshots_to_keep_)
, compress_snapshots_zstd(compress_snapshots_zstd_)
, superdigest(superdigest_)
, storage_tick_time(storage_tick_time_)
{
@ -380,7 +389,7 @@ std::string KeeperSnapshotManager::serializeSnapshotBufferToDisk(nuraft::buffer
{
ReadBufferFromNuraftBuffer reader(buffer);
auto snapshot_file_name = getSnapshotFileName(up_to_log_idx);
auto snapshot_file_name = getSnapshotFileName(up_to_log_idx, compress_snapshots_zstd);
auto tmp_snapshot_file_name = "tmp_" + snapshot_file_name;
std::string tmp_snapshot_path = std::filesystem::path{snapshots_path} / tmp_snapshot_file_name;
std::string new_snapshot_path = std::filesystem::path{snapshots_path} / snapshot_file_name;
@ -426,22 +435,46 @@ nuraft::ptr<nuraft::buffer> KeeperSnapshotManager::deserializeSnapshotBufferFrom
return writer.getBuffer();
}
nuraft::ptr<nuraft::buffer> KeeperSnapshotManager::serializeSnapshotToBuffer(const KeeperStorageSnapshot & snapshot)
nuraft::ptr<nuraft::buffer> KeeperSnapshotManager::serializeSnapshotToBuffer(const KeeperStorageSnapshot & snapshot) const
{
WriteBufferFromNuraftBuffer writer;
CompressedWriteBuffer compressed_writer(writer);
std::unique_ptr<WriteBufferFromNuraftBuffer> writer = std::make_unique<WriteBufferFromNuraftBuffer>();
auto * buffer_raw_ptr = writer.get();
std::unique_ptr<WriteBuffer> compressed_writer;
if (compress_snapshots_zstd)
compressed_writer = wrapWriteBufferWithCompressionMethod(std::move(writer), CompressionMethod::Zstd, 3);
else
compressed_writer = std::make_unique<CompressedWriteBuffer>(*writer);
KeeperStorageSnapshot::serialize(snapshot, compressed_writer);
compressed_writer.finalize();
return writer.getBuffer();
KeeperStorageSnapshot::serialize(snapshot, *compressed_writer);
compressed_writer->finalize();
return buffer_raw_ptr->getBuffer();
}
bool KeeperSnapshotManager::isZstdCompressed(nuraft::ptr<nuraft::buffer> buffer)
{
static constexpr uint32_t ZSTD_COMPRESSED_MAGIC = 0xFD2FB528;
ReadBufferFromNuraftBuffer reader(buffer);
uint32_t magic_from_buffer;
reader.readStrict(reinterpret_cast<char *>(&magic_from_buffer), sizeof(magic_from_buffer));
buffer->pos(0);
return magic_from_buffer == ZSTD_COMPRESSED_MAGIC;
}
SnapshotMetaAndStorage KeeperSnapshotManager::deserializeSnapshotFromBuffer(nuraft::ptr<nuraft::buffer> buffer) const
{
ReadBufferFromNuraftBuffer reader(buffer);
CompressedReadBuffer compressed_reader(reader);
bool is_zstd_compressed = isZstdCompressed(buffer);
std::unique_ptr<ReadBufferFromNuraftBuffer> reader = std::make_unique<ReadBufferFromNuraftBuffer>(buffer);
std::unique_ptr<ReadBuffer> compressed_reader;
if (is_zstd_compressed)
compressed_reader = wrapReadBufferWithCompressionMethod(std::move(reader), CompressionMethod::Zstd);
else
compressed_reader = std::make_unique<CompressedReadBuffer>(*reader);
auto storage = std::make_unique<KeeperStorage>(storage_tick_time, superdigest);
auto snapshot_metadata = KeeperStorageSnapshot::deserialize(*storage, compressed_reader);
auto snapshot_metadata = KeeperStorageSnapshot::deserialize(*storage, *compressed_reader);
return std::make_pair(snapshot_metadata, std::move(storage));
}

View File

@ -15,10 +15,19 @@ enum SnapshotVersion : uint8_t
V0 = 0,
V1 = 1, /// with ACL map
V2 = 2, /// with 64 bit buffer header
V3 = 3, /// compress snapshots with ZSTD codec
};
static constexpr auto CURRENT_SNAPSHOT_VERSION = SnapshotVersion::V2;
static constexpr auto CURRENT_SNAPSHOT_VERSION = SnapshotVersion::V3;
/// In memory keeper snapshot. Keeper Storage based on a hash map which can be
/// turned into snapshot mode. This operation is fast and KeeperStorageSnapshot
/// class do it in constructor. It also copies iterators from storage hash table
/// up to some log index with lock. In destructor this class turn off snapshot
/// mode for KeeperStorage.
///
/// This representation of snapshot have to be serialized into NuRaft
/// buffer and send over network or saved to file.
struct KeeperStorageSnapshot
{
public:
@ -34,12 +43,20 @@ public:
KeeperStorage * storage;
SnapshotVersion version = CURRENT_SNAPSHOT_VERSION;
/// Snapshot metadata
SnapshotMetadataPtr snapshot_meta;
/// Max session id
int64_t session_id;
/// Size of snapshot container in amount of nodes after begin iterator
/// so we have for loop for (i = 0; i < snapshot_container_size; ++i) { doSmth(begin + i); }
size_t snapshot_container_size;
/// Iterator to the start of the storage
KeeperStorage::Container::const_iterator begin;
/// Active sessions and their timeouts
SessionAndTimeout session_and_timeout;
/// Sessions credentials
KeeperStorage::SessionAndAuth session_and_auth;
/// ACLs cache for better performance. Without we cannot deserialize storage.
std::unordered_map<uint64_t, Coordination::ACLs> acl_map;
};
@ -49,28 +66,42 @@ using CreateSnapshotCallback = std::function<void(KeeperStorageSnapshotPtr &&)>;
using SnapshotMetaAndStorage = std::pair<SnapshotMetadataPtr, KeeperStoragePtr>;
/// Class responsible for snapshots serialization and deserialization. Each snapshot
/// has it's path on disk and log index.
class KeeperSnapshotManager
{
public:
KeeperSnapshotManager(const std::string & snapshots_path_, size_t snapshots_to_keep_, const std::string & superdigest_ = "", size_t storage_tick_time_ = 500);
KeeperSnapshotManager(
const std::string & snapshots_path_, size_t snapshots_to_keep_,
bool compress_snapshots_zstd_ = true, const std::string & superdigest_ = "", size_t storage_tick_time_ = 500);
/// Restore storage from latest available snapshot
SnapshotMetaAndStorage restoreFromLatestSnapshot();
static nuraft::ptr<nuraft::buffer> serializeSnapshotToBuffer(const KeeperStorageSnapshot & snapshot);
/// Compress snapshot and serialize it to buffer
nuraft::ptr<nuraft::buffer> serializeSnapshotToBuffer(const KeeperStorageSnapshot & snapshot) const;
/// Serialize already compressed snapshot to disk (return path)
std::string serializeSnapshotBufferToDisk(nuraft::buffer & buffer, uint64_t up_to_log_idx);
SnapshotMetaAndStorage deserializeSnapshotFromBuffer(nuraft::ptr<nuraft::buffer> buffer) const;
/// Deserialize snapshot with log index up_to_log_idx from disk into compressed nuraft buffer.
nuraft::ptr<nuraft::buffer> deserializeSnapshotBufferFromDisk(uint64_t up_to_log_idx) const;
/// Deserialize latest snapshot from disk into compressed nuraft buffer.
nuraft::ptr<nuraft::buffer> deserializeLatestSnapshotBufferFromDisk();
/// Remove snapshot with this log_index
void removeSnapshot(uint64_t log_idx);
/// Total amount of snapshots
size_t totalSnapshots() const
{
return existing_snapshots.size();
}
/// The most fresh snapshot log index we have
size_t getLatestSnapshotIndex() const
{
if (!existing_snapshots.empty())
@ -80,13 +111,28 @@ public:
private:
void removeOutdatedSnapshotsIfNeeded();
/// Checks first 4 buffer bytes to became sure that snapshot compressed with
/// ZSTD codec.
static bool isZstdCompressed(nuraft::ptr<nuraft::buffer> buffer);
const std::string snapshots_path;
/// How many snapshots to keep before remove
const size_t snapshots_to_keep;
/// All existing snapshots in our path (log_index -> path)
std::map<uint64_t, std::string> existing_snapshots;
/// Compress snapshots in common ZSTD format instead of custom ClickHouse block LZ4 format
const bool compress_snapshots_zstd;
/// Superdigest for deserialization of storage
const std::string superdigest;
/// Storage sessions timeout check interval (also for deserializatopn)
size_t storage_tick_time;
};
/// Keeper create snapshots in background thread. KeeperStateMachine just create
/// in-memory snapshot from storage and push task for it serialization into
/// special tasks queue. Background thread check this queue and after snapshot
/// successfully serialized notify state machine.
struct CreateSnapshotTask
{
KeeperStorageSnapshotPtr snapshot;

View File

@ -46,7 +46,10 @@ KeeperStateMachine::KeeperStateMachine(
const CoordinationSettingsPtr & coordination_settings_,
const std::string & superdigest_)
: coordination_settings(coordination_settings_)
, snapshot_manager(snapshots_path_, coordination_settings->snapshots_to_keep, superdigest_, coordination_settings->dead_session_check_period_ms.totalMicroseconds())
, snapshot_manager(
snapshots_path_, coordination_settings->snapshots_to_keep,
coordination_settings->compress_snapshots_with_zstd_format, superdigest_,
coordination_settings->dead_session_check_period_ms.totalMicroseconds())
, responses_queue(responses_queue_)
, snapshots_queue(snapshots_queue_)
, last_committed_idx(0)

View File

@ -934,8 +934,9 @@ void addNode(DB::KeeperStorage & storage, const std::string & path, const std::s
TEST_P(CoordinationTest, TestStorageSnapshotSimple)
{
auto params = GetParam();
ChangelogDirTest test("./snapshots");
DB::KeeperSnapshotManager manager("./snapshots", 3);
DB::KeeperSnapshotManager manager("./snapshots", 3, params.enable_compression);
DB::KeeperStorage storage(500, "");
addNode(storage, "/hello", "world", 1);
@ -956,7 +957,7 @@ TEST_P(CoordinationTest, TestStorageSnapshotSimple)
auto buf = manager.serializeSnapshotToBuffer(snapshot);
manager.serializeSnapshotBufferToDisk(*buf, 2);
EXPECT_TRUE(fs::exists("./snapshots/snapshot_2.bin"));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_2.bin" + params.extension));
auto debuf = manager.deserializeSnapshotBufferFromDisk(2);
@ -981,8 +982,9 @@ TEST_P(CoordinationTest, TestStorageSnapshotSimple)
TEST_P(CoordinationTest, TestStorageSnapshotMoreWrites)
{
auto params = GetParam();
ChangelogDirTest test("./snapshots");
DB::KeeperSnapshotManager manager("./snapshots", 3);
DB::KeeperSnapshotManager manager("./snapshots", 3, params.enable_compression);
DB::KeeperStorage storage(500, "");
storage.getSessionID(130);
@ -1005,7 +1007,7 @@ TEST_P(CoordinationTest, TestStorageSnapshotMoreWrites)
auto buf = manager.serializeSnapshotToBuffer(snapshot);
manager.serializeSnapshotBufferToDisk(*buf, 50);
EXPECT_TRUE(fs::exists("./snapshots/snapshot_50.bin"));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_50.bin" + params.extension));
auto debuf = manager.deserializeSnapshotBufferFromDisk(50);
@ -1021,8 +1023,9 @@ TEST_P(CoordinationTest, TestStorageSnapshotMoreWrites)
TEST_P(CoordinationTest, TestStorageSnapshotManySnapshots)
{
auto params = GetParam();
ChangelogDirTest test("./snapshots");
DB::KeeperSnapshotManager manager("./snapshots", 3);
DB::KeeperSnapshotManager manager("./snapshots", 3, params.enable_compression);
DB::KeeperStorage storage(500, "");
storage.getSessionID(130);
@ -1037,14 +1040,14 @@ TEST_P(CoordinationTest, TestStorageSnapshotManySnapshots)
DB::KeeperStorageSnapshot snapshot(&storage, j * 50);
auto buf = manager.serializeSnapshotToBuffer(snapshot);
manager.serializeSnapshotBufferToDisk(*buf, j * 50);
EXPECT_TRUE(fs::exists(std::string{"./snapshots/snapshot_"} + std::to_string(j * 50) + ".bin"));
EXPECT_TRUE(fs::exists(std::string{"./snapshots/snapshot_"} + std::to_string(j * 50) + ".bin" + params.extension));
}
EXPECT_FALSE(fs::exists("./snapshots/snapshot_50.bin"));
EXPECT_FALSE(fs::exists("./snapshots/snapshot_100.bin"));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_150.bin"));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_200.bin"));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_250.bin"));
EXPECT_FALSE(fs::exists("./snapshots/snapshot_50.bin" + params.extension));
EXPECT_FALSE(fs::exists("./snapshots/snapshot_100.bin" + params.extension));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_150.bin" + params.extension));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_200.bin" + params.extension));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_250.bin" + params.extension));
auto [meta, restored_storage] = manager.restoreFromLatestSnapshot();
@ -1059,8 +1062,9 @@ TEST_P(CoordinationTest, TestStorageSnapshotManySnapshots)
TEST_P(CoordinationTest, TestStorageSnapshotMode)
{
auto params = GetParam();
ChangelogDirTest test("./snapshots");
DB::KeeperSnapshotManager manager("./snapshots", 3);
DB::KeeperSnapshotManager manager("./snapshots", 3, params.enable_compression);
DB::KeeperStorage storage(500, "");
for (size_t i = 0; i < 50; ++i)
{
@ -1087,7 +1091,7 @@ TEST_P(CoordinationTest, TestStorageSnapshotMode)
auto buf = manager.serializeSnapshotToBuffer(snapshot);
manager.serializeSnapshotBufferToDisk(*buf, 50);
}
EXPECT_TRUE(fs::exists("./snapshots/snapshot_50.bin"));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_50.bin" + params.extension));
EXPECT_EQ(storage.container.size(), 26);
storage.clearGarbageAfterSnapshot();
EXPECT_EQ(storage.container.snapshotSize(), 26);
@ -1110,8 +1114,9 @@ TEST_P(CoordinationTest, TestStorageSnapshotMode)
TEST_P(CoordinationTest, TestStorageSnapshotBroken)
{
auto params = GetParam();
ChangelogDirTest test("./snapshots");
DB::KeeperSnapshotManager manager("./snapshots", 3);
DB::KeeperSnapshotManager manager("./snapshots", 3, params.enable_compression);
DB::KeeperStorage storage(500, "");
for (size_t i = 0; i < 50; ++i)
{
@ -1122,10 +1127,10 @@ TEST_P(CoordinationTest, TestStorageSnapshotBroken)
auto buf = manager.serializeSnapshotToBuffer(snapshot);
manager.serializeSnapshotBufferToDisk(*buf, 50);
}
EXPECT_TRUE(fs::exists("./snapshots/snapshot_50.bin"));
EXPECT_TRUE(fs::exists("./snapshots/snapshot_50.bin" + params.extension));
/// Let's corrupt file
DB::WriteBufferFromFile plain_buf("./snapshots/snapshot_50.bin", DBMS_DEFAULT_BUFFER_SIZE, O_APPEND | O_CREAT | O_WRONLY);
DB::WriteBufferFromFile plain_buf("./snapshots/snapshot_50.bin" + params.extension, DBMS_DEFAULT_BUFFER_SIZE, O_APPEND | O_CREAT | O_WRONLY);
plain_buf.truncate(34);
plain_buf.sync();
@ -1464,6 +1469,52 @@ TEST_P(CoordinationTest, TestCompressedLogsMultipleRewrite)
}
TEST_P(CoordinationTest, TestStorageSnapshotDifferentCompressions)
{
auto params = GetParam();
ChangelogDirTest test("./snapshots");
DB::KeeperSnapshotManager manager("./snapshots", 3, params.enable_compression);
DB::KeeperStorage storage(500, "");
addNode(storage, "/hello", "world", 1);
addNode(storage, "/hello/somepath", "somedata", 3);
storage.session_id_counter = 5;
storage.zxid = 2;
storage.ephemerals[3] = {"/hello"};
storage.ephemerals[1] = {"/hello/somepath"};
storage.getSessionID(130);
storage.getSessionID(130);
DB::KeeperStorageSnapshot snapshot(&storage, 2);
auto buf = manager.serializeSnapshotToBuffer(snapshot);
manager.serializeSnapshotBufferToDisk(*buf, 2);
EXPECT_TRUE(fs::exists("./snapshots/snapshot_2.bin" + params.extension));
DB::KeeperSnapshotManager new_manager("./snapshots", 3, !params.enable_compression);
auto debuf = new_manager.deserializeSnapshotBufferFromDisk(2);
auto [snapshot_meta, restored_storage] = new_manager.deserializeSnapshotFromBuffer(debuf);
EXPECT_EQ(restored_storage->container.size(), 3);
EXPECT_EQ(restored_storage->container.getValue("/").children.size(), 1);
EXPECT_EQ(restored_storage->container.getValue("/hello").children.size(), 1);
EXPECT_EQ(restored_storage->container.getValue("/hello/somepath").children.size(), 0);
EXPECT_EQ(restored_storage->container.getValue("/").data, "");
EXPECT_EQ(restored_storage->container.getValue("/hello").data, "world");
EXPECT_EQ(restored_storage->container.getValue("/hello/somepath").data, "somedata");
EXPECT_EQ(restored_storage->session_id_counter, 7);
EXPECT_EQ(restored_storage->zxid, 2);
EXPECT_EQ(restored_storage->ephemerals.size(), 2);
EXPECT_EQ(restored_storage->ephemerals[3].size(), 1);
EXPECT_EQ(restored_storage->ephemerals[1].size(), 1);
EXPECT_EQ(restored_storage->session_and_timeout.size(), 2);
}
INSTANTIATE_TEST_SUITE_P(CoordinationTestSuite,
CoordinationTest,
::testing::ValuesIn(std::initializer_list<CompressionParam>{

View File

@ -1,13 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
)
END()

View File

@ -1,12 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
)
SRCS(
)
END()

View File

@ -15,6 +15,7 @@
#include <Processors/Executors/PipelineExecutor.h>
#include <Processors/Sources/SourceFromInputStream.h>
#include <Processors/Sinks/SinkToStorage.h>
#include <Processors/Sinks/EmptySink.h>
#include <Core/ExternalTable.h>
#include <Poco/Net/MessageHeader.h>
@ -160,14 +161,17 @@ void ExternalTablesHandler::handlePart(const Poco::Net::MessageHeader & header,
auto storage = temporary_table.getTable();
getContext()->addExternalTable(data->table_name, std::move(temporary_table));
auto sink = storage->write(ASTPtr(), storage->getInMemoryMetadataPtr(), getContext());
auto exception_handling = std::make_shared<EmptySink>(sink->getOutputPort().getHeader());
/// Write data
data->pipe->resize(1);
connect(*data->pipe->getOutputPort(0), sink->getPort());
connect(*data->pipe->getOutputPort(0), sink->getInputPort());
connect(sink->getOutputPort(), exception_handling->getPort());
auto processors = Pipe::detachProcessors(std::move(*data->pipe));
processors.push_back(std::move(sink));
processors.push_back(std::move(exception_handling));
auto executor = std::make_shared<PipelineExecutor>(processors);
executor->execute(/*num_threads = */ 1);

View File

@ -161,6 +161,7 @@ class IColumn;
\
M(Bool, force_index_by_date, false, "Throw an exception if there is a partition key in a table, and it is not used.", 0) \
M(Bool, force_primary_key, false, "Throw an exception if there is primary key in a table, and it is not used.", 0) \
M(Bool, use_skip_indexes, true, "Use data skipping indexes during query execution.", 0) \
M(String, force_data_skipping_indices, "", "Comma separated list of strings or literals with the name of the data skipping indices that should be used during query execution, otherwise an exception will be thrown.", 0) \
\
M(Float, max_streams_to_max_threads_ratio, 1, "Allows you to use more sources than the number of threads - to more evenly distribute work across threads. It is assumed that this is a temporary solution, since it will be possible in the future to make the number of sources equal to the number of threads, but for each source to dynamically select available work for itself.", 0) \
@ -349,7 +350,7 @@ class IColumn;
M(UInt64, max_memory_usage, 0, "Maximum memory usage for processing of single query. Zero means unlimited.", 0) \
M(UInt64, max_memory_usage_for_user, 0, "Maximum memory usage for processing all concurrently running queries for the user. Zero means unlimited.", 0) \
M(UInt64, max_untracked_memory, (4 * 1024 * 1024), "Small allocations and deallocations are grouped in thread local variable and tracked or profiled only when amount (in absolute value) becomes larger than specified value. If the value is higher than 'memory_profiler_step' it will be effectively lowered to 'memory_profiler_step'.", 0) \
M(UInt64, memory_profiler_step, 0, "Whenever query memory usage becomes larger than every next step in number of bytes the memory profiler will collect the allocating stack trace. Zero means disabled memory profiler. Values lower than a few megabytes will slow down query processing.", 0) \
M(UInt64, memory_profiler_step, (4 * 1024 * 1024), "Whenever query memory usage becomes larger than every next step in number of bytes the memory profiler will collect the allocating stack trace. Zero means disabled memory profiler. Values lower than a few megabytes will slow down query processing.", 0) \
M(Float, memory_profiler_sample_probability, 0., "Collect random allocations and deallocations and write them into system.trace_log with 'MemorySample' trace_type. The probability is for every alloc/free regardless to the size of the allocation. Note that sampling happens only when the amount of untracked memory exceeds 'max_untracked_memory'. You may want to set 'max_untracked_memory' to 0 for extra fine grained sampling.", 0) \
\
M(UInt64, max_network_bandwidth, 0, "The maximum speed of data exchange over the network in bytes per second for a query. Zero means unlimited.", 0) \

View File

@ -1,51 +0,0 @@
# This file is generated automatically, do not edit. See 'ya.make.in' and use 'utils/generate-ya-make' to regenerate it.
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
contrib/libs/sparsehash
contrib/restricted/boost/libs
)
SRCS(
BackgroundSchedulePool.cpp
BaseSettings.cpp
Block.cpp
BlockInfo.cpp
ColumnWithTypeAndName.cpp
ExternalResultDescription.cpp
ExternalTable.cpp
Field.cpp
MySQL/Authentication.cpp
MySQL/IMySQLReadPacket.cpp
MySQL/IMySQLWritePacket.cpp
MySQL/MySQLClient.cpp
MySQL/MySQLGtid.cpp
MySQL/MySQLReplication.cpp
MySQL/PacketEndpoint.cpp
MySQL/PacketsConnection.cpp
MySQL/PacketsGeneric.cpp
MySQL/PacketsProtocolText.cpp
MySQL/PacketsReplication.cpp
NamesAndTypes.cpp
PostgreSQL/Connection.cpp
PostgreSQL/PoolWithFailover.cpp
PostgreSQL/Utils.cpp
PostgreSQL/insertPostgreSQLValue.cpp
PostgreSQLProtocol.cpp
QueryProcessingStage.cpp
ServerUUID.cpp
Settings.cpp
SettingsEnums.cpp
SettingsFields.cpp
SettingsQuirks.cpp
SortDescription.cpp
UUID.cpp
iostream_debug_helpers.cpp
)
END()

View File

@ -1,16 +0,0 @@
OWNER(g:clickhouse)
LIBRARY()
PEERDIR(
clickhouse/src/Common
contrib/libs/sparsehash
contrib/restricted/boost/libs
)
SRCS(
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | grep -v -F fuzzers | sed 's/^\.\// /' | sort ?>
)
END()

Some files were not shown because too many files have changed in this diff Show More