Merge branch 'master' into windowview-multi-column-groupby

This commit is contained in:
mergify[bot] 2022-02-26 00:50:49 +00:00 committed by GitHub
commit 8d84d22618
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
96 changed files with 1459 additions and 737 deletions

View File

@ -27,6 +27,8 @@ jobs:
- name: Create Pull Request
uses: peter-evans/create-pull-request@v3
with:
author: "robot-clickhouse <robot-clickhouse@users.noreply.github.com>"
committer: "robot-clickhouse <robot-clickhouse@users.noreply.github.com>"
commit-message: Update version_date.tsv after ${{ env.GITHUB_TAG }}
branch: auto/${{ env.GITHUB_TAG }}
delete-branch: true

View File

@ -3,25 +3,25 @@ compilers and build settings. Correctly configured Docker daemon is single depen
Usage:
Build deb package with `clang-11` in `debug` mode:
Build deb package with `clang-14` in `debug` mode:
```
$ mkdir deb/test_output
$ ./packager --output-dir deb/test_output/ --package-type deb --compiler=clang-11 --build-type=debug
$ ./packager --output-dir deb/test_output/ --package-type deb --compiler=clang-14 --build-type=debug
$ ls -l deb/test_output
-rw-r--r-- 1 root root 3730 clickhouse-client_18.14.2+debug_all.deb
-rw-r--r-- 1 root root 84221888 clickhouse-common-static_18.14.2+debug_amd64.deb
-rw-r--r-- 1 root root 255967314 clickhouse-common-static-dbg_18.14.2+debug_amd64.deb
-rw-r--r-- 1 root root 14940 clickhouse-server_18.14.2+debug_all.deb
-rw-r--r-- 1 root root 340206010 clickhouse-server-base_18.14.2+debug_amd64.deb
-rw-r--r-- 1 root root 7900 clickhouse-server-common_18.14.2+debug_all.deb
-rw-r--r-- 1 root root 3730 clickhouse-client_22.2.2+debug_all.deb
-rw-r--r-- 1 root root 84221888 clickhouse-common-static_22.2.2+debug_amd64.deb
-rw-r--r-- 1 root root 255967314 clickhouse-common-static-dbg_22.2.2+debug_amd64.deb
-rw-r--r-- 1 root root 14940 clickhouse-server_22.2.2+debug_all.deb
-rw-r--r-- 1 root root 340206010 clickhouse-server-base_22.2.2+debug_amd64.deb
-rw-r--r-- 1 root root 7900 clickhouse-server-common_22.2.2+debug_all.deb
```
Build ClickHouse binary with `clang-11` and `address` sanitizer in `relwithdebuginfo`
Build ClickHouse binary with `clang-14` and `address` sanitizer in `relwithdebuginfo`
mode:
```
$ mkdir $HOME/some_clickhouse
$ ./packager --output-dir=$HOME/some_clickhouse --package-type binary --compiler=clang-11 --sanitizer=address
$ ./packager --output-dir=$HOME/some_clickhouse --package-type binary --compiler=clang-14 --sanitizer=address
$ ls -l $HOME/some_clickhouse
-rwxr-xr-x 1 root root 787061952 clickhouse
lrwxrwxrwx 1 root root 10 clickhouse-benchmark -> clickhouse

View File

@ -322,7 +322,7 @@ std::string getName() const override { return "Memory"; }
class StorageMemory : public IStorage
```
**4.** `using` are named the same way as classes, or with `_t` on the end.
**4.** `using` are named the same way as classes.
**5.** Names of template type arguments: in simple cases, use `T`; `T`, `U`; `T1`, `T2`.
@ -490,7 +490,7 @@ if (0 != close(fd))
throwFromErrno("Cannot close file " + file_name, ErrorCodes::CANNOT_CLOSE_FILE);
```
`Do not use assert`.
You can use assert to check invariants in code.
**4.** Exception types.
@ -571,7 +571,7 @@ Dont use these types for numbers: `signed/unsigned long`, `long long`, `short
**13.** Passing arguments.
Pass complex values by reference (including `std::string`).
Pass complex values by value if they are going to be moved and use std::move; pass by reference if you want to update value in a loop.
If a function captures ownership of an object created in the heap, make the argument type `shared_ptr` or `unique_ptr`.
@ -581,7 +581,7 @@ In most cases, just use `return`. Do not write `return std::move(res)`.
If the function allocates an object on heap and returns it, use `shared_ptr` or `unique_ptr`.
In rare cases you might need to return the value via an argument. In this case, the argument should be a reference.
In rare cases (updating a value in a loop) you might need to return the value via an argument. In this case, the argument should be a reference.
``` cpp
using AggregateFunctionPtr = std::shared_ptr<IAggregateFunction>;

View File

@ -887,6 +887,57 @@ S3 disk can be configured as `main` or `cold` storage:
In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule.
## Using Azure Blob Storage for Data Storage {#table_engine-mergetree-azure-blob-storage}
`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.
As of February 2022, this feature is still a fresh addition, so expect that some Azure Blob Storage functionalities might be unimplemented.
Configuration markup:
``` xml
<storage_configuration>
...
<disks>
<blob_storage_disk>
<type>azure_blob_storage</type>
<storage_account_url>http://account.blob.core.windows.net</storage_account_url>
<container_name>container</container_name>
<account_name>account</account_name>
<account_key>pass123</account_key>
<metadata_path>/var/lib/clickhouse/disks/blob_storage_disk/</metadata_path>
<cache_enabled>true</cache_enabled>
<cache_path>/var/lib/clickhouse/disks/blob_storage_disk/cache/</cache_path>
<skip_access_check>false</skip_access_check>
</blob_storage_disk>
</disks>
...
</storage_configuration>
```
Connection parameters:
* `storage_account_url` - **Required**, Azure Blob Storage account URL, like `http://account.blob.core.windows.net` or `http://azurite1:10000/devstoreaccount1`.
* `container_name` - Target container name, defaults to `default-container`.
* `container_already_exists` - If set to `false`, a new container `container_name` is created in the storage account, if set to `true`, disk connects to the container directly, and if left unset, disk connects to the account, checks if the container `container_name` exists, and creates it if it doesn't exist yet.
Authentication parameters (the disk will try all available methods **and** Managed Identity Credential):
* `connection_string` - For authentication using a connection string.
* `account_name` and `account_key` - For authentication using Shared Key.
Limit parameters (mainly for internal usage):
* `max_single_part_upload_size` - Limits the size of a single block upload to Blob Storage.
* `min_bytes_for_seek` - Limits the size of a seekable region.
* `max_single_read_retries` - Limits the number of attempts to read a chunk of data from Blob Storage.
* `max_single_download_retries` - Limits the number of attempts to download a readable buffer from Blob Storage.
* `thread_pool_size` - Limits the number of threads with which `IDiskRemote` is instantiated.
Other parameters:
* `metadata_path` - Path on local FS to store metadata files for Blob Storage. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
* `cache_enabled` - Allows to cache mark and index files on local FS. Default value is `true`.
* `cache_path` - Path on local FS where to store cached mark and index files. Default value is `/var/lib/clickhouse/disks/<disk_name>/cache/`.
* `skip_access_check` - If true, disk access checks will not be performed on disk start-up. Default value is `false`.
Examples of working configurations can be found in integration tests directory (see e.g. [test_merge_tree_azure_blob_storage](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml) or [test_azure_blob_storage_zero_copy_replication](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml)).
## Virtual Columns {#virtual-columns}
- `_part` — Name of a part.

View File

@ -143,6 +143,10 @@ Features:
- Backup and restore.
- RBAC.
### Zeppelin-Interpreter-for-ClickHouse {#zeppelin-interpreter-for-clickhouse}
[Zeppelin-Interpreter-for-ClickHouse](https://github.com/SiderZhang/Zeppelin-Interpreter-for-ClickHouse) is a [Zeppelin](https://zeppelin.apache.org) interpreter for ClickHouse. Compared with JDBC interpreter, it can provide better timeout control for long running queries.
## Commercial {#commercial}
### DataGrip {#datagrip}

View File

@ -5,7 +5,7 @@ toc_title: Array(T)
# Array(t) {#data-type-array}
An array of `T`-type items. `T` can be any data type, including an array.
An array of `T`-type items, with the starting array index as 1. `T` can be any data type, including an array.
## Creating an Array {#creating-an-array}

View File

@ -7,6 +7,8 @@ toc_title: Date
A date. Stored in two bytes as the number of days since 1970-01-01 (unsigned). Allows storing values from just after the beginning of the Unix Epoch to the upper threshold defined by a constant at the compilation stage (currently, this is until the year 2149, but the final fully-supported year is 2148).
Supported range of values: \[1970-01-01, 2149-06-06\].
The date value is stored without the time zone.
**Example**

View File

@ -13,7 +13,7 @@ Syntax:
DateTime([timezone])
```
Supported range of values: \[1970-01-01 00:00:00, 2105-12-31 23:59:59\].
Supported range of values: \[1970-01-01 00:00:00, 2106-02-07 06:28:15\].
Resolution: 1 second.

View File

@ -18,7 +18,7 @@ DateTime64(precision, [timezone])
Internally, stores data as a number of ticks since epoch start (1970-01-01 00:00:00 UTC) as Int64. The tick resolution is determined by the precision parameter. Additionally, the `DateTime64` type can store time zone that is the same for the entire column, that affects how the values of the `DateTime64` type values are displayed in text format and how the values specified as strings are parsed (2020-01-01 05:00:01.000). The time zone is not stored in the rows of the table (or in resultset), but is stored in the column metadata. See details in [DateTime](../../sql-reference/data-types/datetime.md).
Supported range from January 1, 1925 till November 11, 2283.
Supported range of values: \[1925-01-01 00:00:00, 2283-11-11 23:59:59.99999999\] (Note: The precision of the maximum value is 8).
## Examples {#examples}

View File

@ -127,7 +127,7 @@ ARRAY JOIN [1, 2, 3] AS arr_external;
└─────────────┴──────────────┘
```
Multiple arrays can be comma-separated in the `ARRAY JOIN` clause. In this case, `JOIN` is performed with them simultaneously (the direct sum, not the cartesian product). Note that all the arrays must have the same size. Example:
Multiple arrays can be comma-separated in the `ARRAY JOIN` clause. In this case, `JOIN` is performed with them simultaneously (the direct sum, not the cartesian product). Note that all the arrays must have the same size by default. Example:
``` sql
SELECT s, arr, a, num, mapped
@ -162,6 +162,25 @@ ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num;
│ World │ [3,4,5] │ 5 │ 3 │ [1,2,3] │
└───────┴─────────┴───┴─────┴─────────────────────┘
```
Multiple arrays with different sizes can be joined by using: `SETTINGS enable_unaligned_array_join = 1`. Example:
```sql
SELECT s, arr, a, b
FROM arrays_test ARRAY JOIN arr as a, [['a','b'],['c']] as b
SETTINGS enable_unaligned_array_join = 1;
```
```text
┌─s───────┬─arr─────┬─a─┬─b─────────┐
│ Hello │ [1,2] │ 1 │ ['a','b'] │
│ Hello │ [1,2] │ 2 │ ['c'] │
│ World │ [3,4,5] │ 3 │ ['a','b'] │
│ World │ [3,4,5] │ 4 │ ['c'] │
│ World │ [3,4,5] │ 5 │ [] │
│ Goodbye │ [] │ 0 │ ['a','b'] │
│ Goodbye │ [] │ 0 │ ['c'] │
└─────────┴─────────┴───┴───────────┘
```
## ARRAY JOIN with Nested Data Structure {#array-join-with-nested-data-structure}

View File

@ -7,6 +7,8 @@ toc_title: "\u65E5\u4ED8"
日付型です。 1970-01-01 からの日数が2バイトの符号なし整数として格納されます。 UNIX時間の開始直後から、変換段階で定数として定義される上限しきい値までの値を格納できます現在は2106年までですが、一年分を完全にサポートしているのは2105年までです
サポートされる値の範囲: \[1970-01-01, 2149-06-06\].
日付値は、タイムゾーンなしで格納されます。
[元の記事](https://clickhouse.com/docs/en/data_types/date/) <!--hide-->

View File

@ -15,7 +15,7 @@ toc_title: DateTime
DateTime([timezone])
```
サポートされる値の範囲: \[1970-01-01 00:00:00, 2105-12-31 23:59:59\].
サポートされる値の範囲: \[1970-01-01 00:00:00, 2106-02-07 06:28:15\].
解像度:1秒.

View File

@ -19,6 +19,8 @@ DateTime64(precision, [timezone])
内部的には、データを ticks エポック開始1970-01-01 00:00:00UTC以来、Int64として。 目盛りの解像度は、精度パラメータによって決定されます。 さらに、 `DateTime64` 型は、列全体で同じタイムゾーンを格納することができます。 `DateTime64` 型の値はテキスト形式で表示され、文字列として指定された値がどのように解析されるか (2020-01-01 05:00:01.000). タイムゾーンは、テーブルの行(またはresultset)には格納されませんが、列メタデータに格納されます。 詳細はを参照。 [DateTime](datetime.md).
サポートされる値の範囲: \[1925-01-01 00:00:00, 2283-11-11 23:59:59.99999999\] (注最大値の精度は、8).
## 例 {#examples}
**1.** テーブルの作成 `DateTime64`-列を入力し、そこにデータを挿入する:

View File

@ -5,7 +5,7 @@ toc_title: Array(T)
# Array(T) {#data-type-array}
Массив из элементов типа `T`. `T` может любым, в том числе массивом. Таким образом поддерживаются многомерные массивы.
Массив из элементов типа `T`. `T` может любым, в том числе массивом. Таким образом поддерживаются многомерные массивы. Первый элемент массива имеет индекс 1.
## Создание массива {#creating-an-array}

View File

@ -7,6 +7,8 @@ toc_title: Date
Дата. Хранится в двух байтах в виде (беззнакового) числа дней, прошедших от 1970-01-01. Позволяет хранить значения от чуть больше, чем начала unix-эпохи до верхнего порога, определяющегося константой на этапе компиляции (сейчас - до 2106 года, последний полностью поддерживаемый год - 2105).
Диапазон значений: \[1970-01-01, 2149-06-06\].
Дата хранится без учёта часового пояса.
**Пример**

View File

@ -13,7 +13,7 @@ toc_title: DateTime
DateTime([timezone])
```
Диапазон значений: \[1970-01-01 00:00:00, 2105-12-31 23:59:59\].
Диапазон значений: \[1970-01-01 00:00:00, 2106-02-07 06:28:15\].
Точность: 1 секунда.

View File

@ -18,7 +18,7 @@ DateTime64(precision, [timezone])
Данные хранятся в виде количества ‘тиков’, прошедших с момента начала эпохи (1970-01-01 00:00:00 UTC), в Int64. Размер тика определяется параметром precision. Дополнительно, тип `DateTime64` позволяет хранить часовой пояс, единый для всей колонки, который влияет на то, как будут отображаться значения типа `DateTime64` в текстовом виде и как будут парситься значения заданные в виде строк (2020-01-01 05:00:01.000). Часовой пояс не хранится в строках таблицы (выборки), а хранится в метаданных колонки. Подробнее см. [DateTime](datetime.md).
Поддерживаются значения от 1 января 1925 г. и до 11 ноября 2283 г.
Диапазон значений: \[1925-01-01 00:00:00, 2283-11-11 23:59:59.99999999\] (Примечание: Точность максимального значения составляет 8).
## Примеры {#examples}

View File

@ -1 +0,0 @@
../../../en/faq/general/who-is-using-clickhouse.md

View File

@ -0,0 +1,19 @@
---
title: 谁在使用 ClickHouse?
toc_hidden: true
toc_priority: 9
---
# 谁在使用 ClickHouse? {#who-is-using-clickhouse}
作为一个开源产品这个问题的答案并不那么简单。如果你想开始使用ClickHouse你不需要告诉任何人你只需要获取源代码或预编译包。不需要签署任何合同[Apache 2.0许可证](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE)允许不受约束的软件分发。
此外,技术堆栈通常处于保密协议所涵盖的灰色地带。一些公司认为他们使用的技术是一种竞争优势,即使这些技术是开源的,并且不允许员工公开分享任何细节。一些公司看到了一些公关风险,只允许员工在获得公关部门批准后分享实施细节。
那么如何辨别谁在使用ClickHouse呢?
一种方法是询问周围的人。如果不是书面形式,人们更愿意分享他们公司使用的技术、用例、使用的硬件类型、数据量等。我们定期在[ClickHouse meetup](https://www.youtube.com/channel/UChtmrD-dsdpspr42P_PyRAw/playlists)上与世界各地的用户进行交流并听到了大约1000多家使用ClickHouse的公司的故事。不幸的是这是不可复制的我们试图把这些故事当作是在保密协议下被告知的以避免任何潜在的麻烦。但你可以参加我们未来的任何聚会并与其他用户单独交谈。有多种方式宣布聚会例如你可以订阅[我们的Twitter](http://twitter.com/ClickHouseDB/)。
第二种方法是寻找**公开表示**使用ClickHouse的公司。因为通常会有一些确凿的证据如博客文章、谈话视频录音、幻灯片等。我们在我们的[**Adopters**](../../introduction/adopters.md)页面上收集指向此类证据的链接。你可以随意提供你雇主的故事,或者只是一些你偶然发现的链接(但尽量不要在这个过程中违反保密协议)。
你可以在采用者名单中找到一些非常大的公司,比如彭博社、思科、中国电信、腾讯或优步,但通过第一种方法,我们发现还有更多。例如,如果你看看《福布斯》[(2020年)列出的最大IT公司名单](https://www.forbes.com/sites/hanktucker/2020/05/13/worlds-largest-technology-companies-2020-apple-stays-on-top-zoom-and-uber-debut/)超过一半的公司都在以某种方式使用ClickHouse。此外不提[Yandex](../../introduction/history.md)是不公平的该公司最初于2016年开放ClickHouse碰巧是欧洲最大的it公司之一。

View File

@ -1 +0,0 @@
../../../en/getting-started/example-datasets/github-events.md

View File

@ -0,0 +1,10 @@
---
toc_priority: 11
toc_title: GitHub 事件数据集
---
# GitHub 事件数据集
数据集包含了GitHub上从2011年到2020年12月6日的所有事件大小为31亿条记录。下载大小为75 GB如果存储在使用lz4压缩的表中则需要多达200 GB的磁盘空间。
完整的数据集描述,见解,下载说明和交互式查询请参考[这里](https://ghe.clickhouse.tech/)。

View File

@ -1,30 +1,50 @@
---
machine_translated: true
machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3
---
# system.mutations {#system_tables-mutations}
# 系统。突变 {#system_tables-mutations}
该表包含关于MergeTree表的[mutation](../../sql-reference/statements/alter.md#alter-mutations)及其进度信息 。每条mutation命令都用一行来表示。
该表包含以下信息 [突变](../../sql-reference/statements/alter.md#alter-mutations) MergeTree表及其进展。 每个突变命令由一行表示。 该表具有以下列:
该表具有以下列属性:
**数据库**, **表** -应用突变的数据库和表的名称。
- `database` ([String](../../sql-reference/data-types/string.md)) — 应用mutation的数据库名称。
**mutation_id** -变异的ID 对于复制的表这些Id对应于znode中的名称 `<table_path_in_zookeeper>/mutations/` 动物园管理员的目录。 对于未复制的表Id对应于表的数据目录中的文件名
- `table` ([String](../../sql-reference/data-types/string.md)) — 应用mutation的表名称
**命令** -Mutation命令字符串查询后的部分 `ALTER TABLE [db.]table`).
- `mutation_id` ([String](../../sql-reference/data-types/string.md)) — mutation的ID。对于复制表这些ID对应于ZooKeeper中<table_path_in_zookeeper>/mutations/目录下的znode名称。对于非复制表ID对应表的数据目录中的文件名。
**create_time** -当这个突变命令被提交执行
- `command` ([String](../../sql-reference/data-types/string.md)) — mutation命令字符串`ALTER TABLE [db.]table`语句之后的部分)
**block_numbers.partition_id**, **block_numbers.编号** -嵌套列。 对于复制表的突变它包含每个分区的一条记录分区ID和通过突变获取的块编号在每个分区中只有包含编号小于该分区中突变获取的块编号的块的 在非复制表中,所有分区中的块编号形成一个序列。 这意味着对于非复制表的突变,该列将包含一条记录,其中包含由突变获取的单个块编号
- `create_time` ([Datetime](../../sql-reference/data-types/datetime.md)) — mutation命令提交执行的日期和时间
**parts_to_do** -为了完成突变,需要突变的数据部分的数量
- `block_numbers.partition_id` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — 对于复制表的mutation该数组包含分区的ID每个分区都有一条记录。对于非复制表的mutation该数组为空
**is_done** -变异完成了? 请注意,即使 `parts_to_do = 0` 由于长时间运行的INSERT将创建需要突变的新数据部分因此可能尚未完成复制表的突变。
- `block_numbers.number` ([Array](../../sql-reference/data-types/array.md)([Int64](../../sql-reference/data-types/int-uint.md))) — 对于复制表的mutation该数组包含每个分区的一条记录以及通过mutation获取的块号。只有包含块号小于该数字的块的part才会在分区中应用mutation。
在非复制表中所有分区中的块号组成一个序列。这意味着对于非复制表的mutation该列将包含一条记录该记录具有通过mutation获得的单个块号。
- `parts_to_do_names` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — 由需要应用mutation的part名称构成的数组。
如果在改变某些部分时出现问题,以下列将包含其他信息:
- `parts_to_do` ([Int64](../../sql-reference/data-types/int-uint.md)) — 需要应用mutation的part的数量。
**latest_failed_part** -不能变异的最新部分的名称。
- `is_done` ([UInt8](../../sql-reference/data-types/int-uint.md)) — mutation是否完成的标志。其中
- 1表示mutation已经完成。
- 0表示mutation仍在进行中。
**latest_fail_time** -最近的部分突变失败的时间。
**latest_fail_reason** -导致最近部件变异失败的异常消息。
!!! info "注意"
即使 parts_to_do = 0由于长时间运行的`INSERT`查询将创建需要mutate的新part也可能导致复制表mutation尚未完成。
如果某些parts在mutation时出现问题以下列将包含附加信息
- `latest_failed_part`([String](../../sql-reference/data-types/string.md)) — 最近不能mutation的part的名称。
- `latest_fail_time`([Datetime](../../sql-reference/data-types/datetime.md)) — 最近的一个mutation失败的时间。
- `latest_fail_reason`([String](../../sql-reference/data-types/string.md)) — 导致最近part的mutation失败的异常消息。
**另请参阅**
- Mutations
- [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) 表引擎
- [ReplicatedMergeTree](../../engines/table-engines/mergetree-family/replication.md) 族
[Original article](https://clickhouse.com/docs/en/operations/system_tables/mutations) <!--hide-->

View File

@ -2,4 +2,6 @@
日期类型,用两个字节存储,表示从 1970-01-01 (无符号) 到当前的日期值。允许存储从 Unix 纪元开始到编译阶段定义的上限阈值常量目前上限是2106年但最终完全支持的年份为2105。最小值输出为1970-01-01。
值的范围: \[1970-01-01, 2149-06-06\]。
日期中没有存储时区信息。

View File

@ -2,6 +2,8 @@
时间戳类型。用四个字节(无符号的)存储 Unix 时间戳)。允许存储与日期类型相同的范围内的值。最小值为 1970-01-01 00:00:00。时间戳类型值精确到秒没有闰秒
值的范围: \[1970-01-01 00:00:00, 2106-02-07 06:28:15\]。
## 时区 {#shi-qu}
使用启动客户端或服务器时的系统时区,时间戳是从文本(分解为组件)转换为二进制并返回。在文本格式中,有关夏令时的信息会丢失。

View File

@ -19,6 +19,8 @@ DateTime64(precision, [timezone])
在内部此类型以Int64类型将数据存储为自Linux纪元开始(1970-01-01 00:00:00UTC)的时间刻度数ticks。时间刻度的分辨率由precision参数确定。此外`DateTime64` 类型可以像存储其他数据列一样存储时区信息,时区会影响 `DateTime64` 类型的值如何以文本格式显示,以及如何解析以字符串形式指定的时间数据 (2020-01-01 05:00:01.000)。时区不存储在表的行中也不在resultset中而是存储在列的元数据中。详细信息请参考 [DateTime](datetime.md) 数据类型.
值的范围: \[1925-01-01 00:00:00, 2283-11-11 23:59:59.99999999\] (注意: 最大值的精度是8)。
## 示例 {#examples}
**1.** 创建一个具有 `DateTime64` 类型列的表,并向其中插入数据:

View File

@ -486,12 +486,19 @@ void Client::connect()
UInt64 server_version_minor = 0;
UInt64 server_version_patch = 0;
if (hosts_and_ports.empty())
{
String host = config().getString("host", "localhost");
UInt16 port = static_cast<UInt16>(ConnectionParameters::getPortFromConfig(config()));
hosts_and_ports.emplace_back(HostAndPort{host, port});
}
for (size_t attempted_address_index = 0; attempted_address_index < hosts_and_ports.size(); ++attempted_address_index)
{
try
{
connection_parameters
= ConnectionParameters(config(), hosts_and_ports[attempted_address_index].host, hosts_and_ports[attempted_address_index].port);
connection_parameters = ConnectionParameters(
config(), hosts_and_ports[attempted_address_index].host, hosts_and_ports[attempted_address_index].port);
if (is_interactive)
std::cout << "Connecting to "
@ -1085,22 +1092,15 @@ void Client::processOptions(const OptionsDescription & options_description,
}
}
if (hosts_and_ports_arguments.empty())
for (const auto & hosts_and_ports_argument : hosts_and_ports_arguments)
{
hosts_and_ports.emplace_back(HostAndPort{"localhost", DBMS_DEFAULT_PORT});
}
else
{
for (const auto & hosts_and_ports_argument : hosts_and_ports_arguments)
{
/// Parse commandline options related to external tables.
po::parsed_options parsed_hosts_and_ports
= po::command_line_parser(hosts_and_ports_argument).options(options_description.hosts_and_ports_description.value()).run();
po::variables_map host_and_port_options;
po::store(parsed_hosts_and_ports, host_and_port_options);
hosts_and_ports.emplace_back(
HostAndPort{host_and_port_options["host"].as<std::string>(), host_and_port_options["port"].as<UInt16>()});
}
/// Parse commandline options related to external tables.
po::parsed_options parsed_hosts_and_ports
= po::command_line_parser(hosts_and_ports_argument).options(options_description.hosts_and_ports_description.value()).run();
po::variables_map host_and_port_options;
po::store(parsed_hosts_and_ports, host_and_port_options);
hosts_and_ports.emplace_back(
HostAndPort{host_and_port_options["host"].as<std::string>(), host_and_port_options["port"].as<UInt16>()});
}
send_external_tables = true;

View File

@ -149,8 +149,6 @@ void ODBCSource::insertValue(
DateTime64 time = 0;
const auto * datetime_type = assert_cast<const DataTypeDateTime64 *>(data_type.get());
readDateTime64Text(time, datetime_type->getScale(), in, datetime_type->getTimeZone());
if (time < 0)
time = 0;
assert_cast<DataTypeDateTime64::ColumnType &>(column).insertValue(time);
break;
}

View File

@ -238,28 +238,39 @@ void ProgressIndication::writeProgress()
/// at right after progress bar or at left on top of the progress bar.
if (width_of_progress_bar <= 1 + 2 * static_cast<int64_t>(profiling_msg.size()))
profiling_msg.clear();
else
width_of_progress_bar -= profiling_msg.size();
if (width_of_progress_bar > 0)
{
double bar_width = UnicodeBar::getWidth(current_count, 0, max_count, width_of_progress_bar);
std::string bar = UnicodeBar::render(bar_width);
size_t bar_width_in_terminal = bar.size() / UNICODE_BAR_CHAR_SIZE;
/// Render profiling_msg at left on top of the progress bar.
bool render_profiling_msg_at_left = current_count * 2 >= max_count;
if (!profiling_msg.empty() && render_profiling_msg_at_left)
message << "\033[30;42m" << profiling_msg << "\033[0m";
if (profiling_msg.empty())
{
message << "\033[0;32m" << bar << "\033[0m"
<< std::string(width_of_progress_bar - bar_width_in_terminal, ' ');
}
else
{
bool render_profiling_msg_at_left = current_count * 2 >= max_count;
message << "\033[0;32m" << bar << "\033[0m";
if (render_profiling_msg_at_left)
{
/// Render profiling_msg at left on top of the progress bar.
/// Whitespaces after the progress bar.
if (width_of_progress_bar > static_cast<int64_t>(bar.size() / UNICODE_BAR_CHAR_SIZE))
message << std::string(width_of_progress_bar - bar.size() / UNICODE_BAR_CHAR_SIZE, ' ');
message << "\033[30;42m" << profiling_msg << "\033[0m"
<< "\033[0;32m" << bar.substr(profiling_msg.size() * UNICODE_BAR_CHAR_SIZE) << "\033[0m"
<< std::string(width_of_progress_bar - bar_width_in_terminal, ' ');
}
else
{
/// Render profiling_msg at right after the progress bar.
/// Render profiling_msg at right after the progress bar.
if (!profiling_msg.empty() && !render_profiling_msg_at_left)
message << "\033[2m" << profiling_msg << "\033[0m";
message << "\033[0;32m" << bar << "\033[0m"
<< std::string(width_of_progress_bar - bar_width_in_terminal - profiling_msg.size(), ' ')
<< "\033[2m" << profiling_msg << "\033[0m";
}
}
}
}
}

View File

@ -51,7 +51,7 @@ public:
/// Seek is lazy. It doesn't move the position anywhere, just remember them and perform actual
/// seek inside nextImpl.
void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block);
void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block) override;
void setProfileCallback(const ReadBufferFromFileBase::ProfileCallback & profile_callback_, clockid_t clock_type_ = CLOCK_MONOTONIC_COARSE)
{

View File

@ -48,8 +48,8 @@ protected:
public:
/// 'compressed_in' could be initialized lazily, but before first call of 'readCompressedData'.
CompressedReadBufferBase(ReadBuffer * in = nullptr, bool allow_different_codecs_ = false);
~CompressedReadBufferBase();
explicit CompressedReadBufferBase(ReadBuffer * in = nullptr, bool allow_different_codecs_ = false);
virtual ~CompressedReadBufferBase();
/** Disable checksums.
* For example, may be used when
@ -60,7 +60,9 @@ public:
disable_checksum = true;
}
public:
/// Some compressed read buffer can do useful seek operation
virtual void seek(size_t /* offset_in_compressed_file */, size_t /* offset_in_decompressed_block */) {}
CompressionCodecPtr codec;
};

View File

@ -51,7 +51,7 @@ public:
/// Seek is lazy in some sense. We move position in compressed file_in to offset_in_compressed_file, but don't
/// read data into working_buffer and don't shift our position to offset_in_decompressed_block. Instead
/// we store this offset inside nextimpl_working_buffer_offset.
void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block);
void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block) override;
size_t readBig(char * to, size_t n) override;

View File

@ -293,6 +293,8 @@ Changelog::Changelog(
if (existing_changelogs.empty())
LOG_WARNING(log, "No logs exists in {}. It's Ok if it's the first run of clickhouse-keeper.", changelogs_dir);
clean_log_thread = ThreadFromGlobalPool([this] { cleanLogThread(); });
}
void Changelog::readChangelogAndInitWriter(uint64_t last_commited_log_index, uint64_t logs_to_keep)
@ -581,7 +583,17 @@ void Changelog::compact(uint64_t up_to_log_index)
}
LOG_INFO(log, "Removing changelog {} because of compaction", itr->second.path);
std::filesystem::remove(itr->second.path);
/// If failed to push to queue for background removing, then we will remove it now
if (!log_files_to_delete_queue.tryPush(itr->second.path, 1))
{
std::error_code ec;
std::filesystem::remove(itr->second.path, ec);
if (ec)
LOG_WARNING(log, "Failed to remove changelog {} in compaction, error message: {}", itr->second.path, ec.message());
else
LOG_INFO(log, "Removed changelog {} because of compaction", itr->second.path);
}
itr = existing_changelogs.erase(itr);
}
else /// Files are ordered, so all subsequent should exist
@ -705,6 +717,9 @@ Changelog::~Changelog()
try
{
flush();
log_files_to_delete_queue.finish();
if (clean_log_thread.joinable())
clean_log_thread.join();
}
catch (...)
{
@ -712,4 +727,20 @@ Changelog::~Changelog()
}
}
void Changelog::cleanLogThread()
{
while (!log_files_to_delete_queue.isFinishedAndEmpty())
{
std::string path;
if (log_files_to_delete_queue.tryPop(path))
{
std::error_code ec;
if (std::filesystem::remove(path, ec))
LOG_INFO(log, "Removed changelog {} because of compaction.", path);
else
LOG_WARNING(log, "Failed to remove changelog {} in compaction, error message: {}", path, ec.message());
}
}
}
}

View File

@ -7,6 +7,7 @@
#include <IO/HashingWriteBuffer.h>
#include <IO/CompressionMethod.h>
#include <Disks/IDisk.h>
#include <Common/ConcurrentBoundedQueue.h>
namespace DB
{
@ -142,6 +143,9 @@ private:
/// Init writer for existing log with some entries already written
void initWriter(const ChangelogFileDescription & description);
/// Clean useless log files in a background thread
void cleanLogThread();
private:
const std::string changelogs_dir;
const uint64_t rotate_interval;
@ -160,6 +164,10 @@ private:
/// min_log_id + 1 == max_log_id means empty log storage for NuRaft
uint64_t min_log_id = 0;
uint64_t max_log_id = 0;
/// For compaction, queue of delete not used logs
/// 128 is enough, even if log is not removed, it's not a problem
ConcurrentBoundedQueue<std::string> log_files_to_delete_queue{128};
ThreadFromGlobalPool clean_log_thread;
};
}

View File

@ -240,6 +240,8 @@ bool KeeperDispatcher::putRequest(const Coordination::ZooKeeperRequestPtr & requ
KeeperStorage::RequestForSession request_info;
request_info.request = request;
using namespace std::chrono;
request_info.time = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
request_info.session_id = session_id;
std::lock_guard lock(push_request_mutex);
@ -400,6 +402,8 @@ void KeeperDispatcher::sessionCleanerTask()
request->xid = Coordination::CLOSE_XID;
KeeperStorage::RequestForSession request_info;
request_info.request = request;
using namespace std::chrono;
request_info.time = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
request_info.session_id = dead_session;
{
std::lock_guard lock(push_request_mutex);
@ -433,7 +437,7 @@ void KeeperDispatcher::finishSession(int64_t session_id)
void KeeperDispatcher::addErrorResponses(const KeeperStorage::RequestsForSessions & requests_for_sessions, Coordination::Error error)
{
for (const auto & [session_id, request] : requests_for_sessions)
for (const auto & [session_id, time, request] : requests_for_sessions)
{
KeeperStorage::ResponsesForSessions responses;
auto response = request->makeResponse();
@ -477,6 +481,8 @@ int64_t KeeperDispatcher::getSessionID(int64_t session_timeout_ms)
request->server_id = server->getServerID();
request_info.request = request;
using namespace std::chrono;
request_info.time = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
request_info.session_id = -1;
auto promise = std::make_shared<std::promise<int64_t>>();

View File

@ -260,11 +260,12 @@ void KeeperServer::shutdown()
namespace
{
nuraft::ptr<nuraft::buffer> getZooKeeperLogEntry(int64_t session_id, const Coordination::ZooKeeperRequestPtr & request)
nuraft::ptr<nuraft::buffer> getZooKeeperLogEntry(int64_t session_id, int64_t time, const Coordination::ZooKeeperRequestPtr & request)
{
DB::WriteBufferFromNuraftBuffer buf;
DB::writeIntBinary(session_id, buf);
request->write(buf);
DB::writeIntBinary(time, buf);
return buf.getBuffer();
}
@ -283,8 +284,8 @@ RaftAppendResult KeeperServer::putRequestBatch(const KeeperStorage::RequestsForS
{
std::vector<nuraft::ptr<nuraft::buffer>> entries;
for (const auto & [session_id, request] : requests_for_sessions)
entries.push_back(getZooKeeperLogEntry(session_id, request));
for (const auto & [session_id, time, request] : requests_for_sessions)
entries.push_back(getZooKeeperLogEntry(session_id, time, request));
return raft_instance->append_entries(entries);
}

View File

@ -337,8 +337,9 @@ KeeperStorageSnapshot::KeeperStorageSnapshot(KeeperStorage * storage_, uint64_t
, session_id(storage->session_id_counter)
, cluster_config(cluster_config_)
{
snapshot_container_size = storage->container.snapshotSize();
storage->enableSnapshotMode(snapshot_container_size);
auto [size, ver] = storage->container.snapshotSizeWithVersion();
snapshot_container_size = size;
storage->enableSnapshotMode(ver);
begin = storage->getSnapshotIteratorBegin();
session_and_timeout = storage->getActiveSessions();
acl_map = storage->acl_map.getMapping();
@ -351,8 +352,9 @@ KeeperStorageSnapshot::KeeperStorageSnapshot(KeeperStorage * storage_, const Sna
, session_id(storage->session_id_counter)
, cluster_config(cluster_config_)
{
snapshot_container_size = storage->container.snapshotSize();
storage->enableSnapshotMode(snapshot_container_size);
auto [size, ver] = storage->container.snapshotSizeWithVersion();
snapshot_container_size = size;
storage->enableSnapshotMode(ver);
begin = storage->getSnapshotIteratorBegin();
session_and_timeout = storage->getActiveSessions();
acl_map = storage->acl_map.getMapping();

View File

@ -38,6 +38,8 @@ namespace
request_for_session.request = Coordination::ZooKeeperRequestFactory::instance().get(opnum);
request_for_session.request->xid = xid;
request_for_session.request->readImpl(buffer);
readIntBinary(request_for_session.time, buffer);
return request_for_session;
}
}
@ -133,7 +135,7 @@ nuraft::ptr<nuraft::buffer> KeeperStateMachine::commit(const uint64_t log_idx, n
else
{
std::lock_guard lock(storage_and_responses_lock);
KeeperStorage::ResponsesForSessions responses_for_sessions = storage->processRequest(request_for_session.request, request_for_session.session_id, log_idx);
KeeperStorage::ResponsesForSessions responses_for_sessions = storage->processRequest(request_for_session.request, request_for_session.session_id, request_for_session.time, log_idx);
for (auto & response_for_session : responses_for_sessions)
if (!responses_queue.push(response_for_session))
throw Exception(ErrorCodes::SYSTEM_ERROR, "Could not push response with session id {} into responses queue", response_for_session.session_id);
@ -358,7 +360,7 @@ void KeeperStateMachine::processReadRequest(const KeeperStorage::RequestForSessi
{
/// Pure local request, just process it with storage
std::lock_guard lock(storage_and_responses_lock);
auto responses = storage->processRequest(request_for_session.request, request_for_session.session_id, std::nullopt);
auto responses = storage->processRequest(request_for_session.request, request_for_session.session_id, request_for_session.time, std::nullopt);
for (const auto & response : responses)
if (!responses_queue.push(response))
throw Exception(ErrorCodes::SYSTEM_ERROR, "Could not push response with session id {} into responses queue", response.session_id);

View File

@ -191,7 +191,7 @@ struct KeeperStorageRequestProcessor
explicit KeeperStorageRequestProcessor(const Coordination::ZooKeeperRequestPtr & zk_request_)
: zk_request(zk_request_)
{}
virtual std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id) const = 0;
virtual std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id, int64_t time) const = 0;
virtual KeeperStorage::ResponsesForSessions processWatches(KeeperStorage::Watches & /*watches*/, KeeperStorage::Watches & /*list_watches*/) const { return {}; }
virtual bool checkAuth(KeeperStorage & /*storage*/, int64_t /*session_id*/) const { return true; }
@ -201,7 +201,7 @@ struct KeeperStorageRequestProcessor
struct KeeperStorageHeartbeatRequestProcessor final : public KeeperStorageRequestProcessor
{
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & /* storage */, int64_t /* zxid */, int64_t /* session_id */) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & /* storage */, int64_t /* zxid */, int64_t /* session_id */, int64_t /* time */) const override
{
return {zk_request->makeResponse(), {}};
}
@ -210,7 +210,7 @@ struct KeeperStorageHeartbeatRequestProcessor final : public KeeperStorageReques
struct KeeperStorageSyncRequestProcessor final : public KeeperStorageRequestProcessor
{
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & /* storage */, int64_t /* zxid */, int64_t /* session_id */) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & /* storage */, int64_t /* zxid */, int64_t /* session_id */, int64_t /* time */) const override
{
auto response = zk_request->makeResponse();
dynamic_cast<Coordination::ZooKeeperSyncResponse &>(*response).path
@ -246,7 +246,7 @@ struct KeeperStorageCreateRequestProcessor final : public KeeperStorageRequestPr
return checkACL(Coordination::ACL::Create, node_acls, session_auths);
}
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id, int64_t time) const override
{
auto & container = storage.container;
auto & ephemerals = storage.ephemerals;
@ -309,8 +309,8 @@ struct KeeperStorageCreateRequestProcessor final : public KeeperStorageRequestPr
created_node.stat.czxid = zxid;
created_node.stat.mzxid = zxid;
created_node.stat.pzxid = zxid;
created_node.stat.ctime = std::chrono::system_clock::now().time_since_epoch() / std::chrono::milliseconds(1);
created_node.stat.mtime = created_node.stat.ctime;
created_node.stat.ctime = time;
created_node.stat.mtime = time;
created_node.stat.numChildren = 0;
created_node.stat.dataLength = request.data.length();
created_node.stat.ephemeralOwner = request.is_ephemeral ? session_id : 0;
@ -394,7 +394,7 @@ struct KeeperStorageGetRequestProcessor final : public KeeperStorageRequestProce
}
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /* zxid */, int64_t /* session_id */) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /* zxid */, int64_t /* session_id */, int64_t /* time */) const override
{
auto & container = storage.container;
Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
@ -453,7 +453,7 @@ struct KeeperStorageRemoveRequestProcessor final : public KeeperStorageRequestPr
}
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t /*session_id*/) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t /*session_id*/, int64_t /* time */) const override
{
auto & container = storage.container;
auto & ephemerals = storage.ephemerals;
@ -538,7 +538,7 @@ struct KeeperStorageRemoveRequestProcessor final : public KeeperStorageRequestPr
struct KeeperStorageExistsRequestProcessor final : public KeeperStorageRequestProcessor
{
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /* session_id */) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /* session_id */, int64_t /* time */) const override
{
auto & container = storage.container;
@ -579,7 +579,7 @@ struct KeeperStorageSetRequestProcessor final : public KeeperStorageRequestProce
}
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t /* session_id */) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t /* session_id */, int64_t time) const override
{
auto & container = storage.container;
@ -598,11 +598,11 @@ struct KeeperStorageSetRequestProcessor final : public KeeperStorageRequestProce
auto prev_node = it->value;
auto itr = container.updateValue(request.path, [zxid, request] (KeeperStorage::Node & value)
auto itr = container.updateValue(request.path, [zxid, request, time] (KeeperStorage::Node & value)
{
value.stat.version++;
value.stat.mzxid = zxid;
value.stat.mtime = std::chrono::system_clock::now().time_since_epoch() / std::chrono::milliseconds(1);
value.stat.mtime = time;
value.stat.dataLength = request.data.length();
value.size_bytes = value.size_bytes + request.data.size() - value.data.size();
value.data = request.data;
@ -657,7 +657,7 @@ struct KeeperStorageListRequestProcessor final : public KeeperStorageRequestProc
}
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/, int64_t /* time */) const override
{
auto & container = storage.container;
Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
@ -706,7 +706,7 @@ struct KeeperStorageCheckRequestProcessor final : public KeeperStorageRequestPro
}
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/, int64_t /* time */) const override
{
auto & container = storage.container;
@ -751,7 +751,7 @@ struct KeeperStorageSetACLRequestProcessor final : public KeeperStorageRequestPr
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t session_id) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t session_id, int64_t /* time */) const override
{
auto & container = storage.container;
@ -815,7 +815,7 @@ struct KeeperStorageGetACLRequestProcessor final : public KeeperStorageRequestPr
}
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t /*session_id*/, int64_t /* time */) const override
{
Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
Coordination::ZooKeeperGetACLResponse & response = dynamic_cast<Coordination::ZooKeeperGetACLResponse &>(*response_ptr);
@ -877,7 +877,7 @@ struct KeeperStorageMultiRequestProcessor final : public KeeperStorageRequestPro
}
}
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t zxid, int64_t session_id, int64_t time) const override
{
Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
Coordination::ZooKeeperMultiResponse & response = dynamic_cast<Coordination::ZooKeeperMultiResponse &>(*response_ptr);
@ -888,7 +888,7 @@ struct KeeperStorageMultiRequestProcessor final : public KeeperStorageRequestPro
size_t i = 0;
for (const auto & concrete_request : concrete_requests)
{
auto [ cur_response, undo_action ] = concrete_request->process(storage, zxid, session_id);
auto [ cur_response, undo_action ] = concrete_request->process(storage, zxid, session_id, time);
response.responses[i] = cur_response;
if (cur_response->error != Coordination::Error::ZOK)
@ -945,7 +945,7 @@ struct KeeperStorageMultiRequestProcessor final : public KeeperStorageRequestPro
struct KeeperStorageCloseRequestProcessor final : public KeeperStorageRequestProcessor
{
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage &, int64_t, int64_t) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage &, int64_t, int64_t, int64_t /* time */) const override
{
throw DB::Exception("Called process on close request", ErrorCodes::LOGICAL_ERROR);
}
@ -954,7 +954,7 @@ struct KeeperStorageCloseRequestProcessor final : public KeeperStorageRequestPro
struct KeeperStorageAuthRequestProcessor final : public KeeperStorageRequestProcessor
{
using KeeperStorageRequestProcessor::KeeperStorageRequestProcessor;
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t session_id) const override
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(KeeperStorage & storage, int64_t /*zxid*/, int64_t session_id, int64_t /* time */) const override
{
Coordination::ZooKeeperAuthRequest & auth_request = dynamic_cast<Coordination::ZooKeeperAuthRequest &>(*zk_request);
Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
@ -1067,7 +1067,7 @@ KeeperStorageRequestProcessorsFactory::KeeperStorageRequestProcessorsFactory()
}
KeeperStorage::ResponsesForSessions KeeperStorage::processRequest(const Coordination::ZooKeeperRequestPtr & zk_request, int64_t session_id, std::optional<int64_t> new_last_zxid, bool check_acl)
KeeperStorage::ResponsesForSessions KeeperStorage::processRequest(const Coordination::ZooKeeperRequestPtr & zk_request, int64_t session_id, int64_t time, std::optional<int64_t> new_last_zxid, bool check_acl)
{
KeeperStorage::ResponsesForSessions results;
if (new_last_zxid)
@ -1119,7 +1119,7 @@ KeeperStorage::ResponsesForSessions KeeperStorage::processRequest(const Coordina
else if (zk_request->getOpNum() == Coordination::OpNum::Heartbeat) /// Heartbeat request is also special
{
KeeperStorageRequestProcessorPtr storage_request = KeeperStorageRequestProcessorsFactory::instance().get(zk_request);
auto [response, _] = storage_request->process(*this, zxid, session_id);
auto [response, _] = storage_request->process(*this, zxid, session_id, time);
response->xid = zk_request->xid;
response->zxid = getZXID();
@ -1138,7 +1138,7 @@ KeeperStorage::ResponsesForSessions KeeperStorage::processRequest(const Coordina
}
else
{
std::tie(response, std::ignore) = request_processor->process(*this, zxid, session_id);
std::tie(response, std::ignore) = request_processor->process(*this, zxid, session_id, time);
}
/// Watches for this requests are added to the watches lists

View File

@ -66,6 +66,7 @@ public:
struct RequestForSession
{
int64_t session_id;
int64_t time;
Coordination::ZooKeeperRequestPtr request;
};
@ -153,16 +154,17 @@ public:
/// Process user request and return response.
/// check_acl = false only when converting data from ZooKeeper.
ResponsesForSessions processRequest(const Coordination::ZooKeeperRequestPtr & request, int64_t session_id, std::optional<int64_t> new_last_zxid, bool check_acl = true);
ResponsesForSessions processRequest(const Coordination::ZooKeeperRequestPtr & request, int64_t session_id, int64_t time, std::optional<int64_t> new_last_zxid, bool check_acl = true);
void finalize();
/// Set of methods for creating snapshots
/// Turn on snapshot mode, so data inside Container is not deleted, but replaced with new version.
void enableSnapshotMode(size_t up_to_size)
void enableSnapshotMode(size_t up_to_version)
{
container.enableSnapshotMode(up_to_size);
container.enableSnapshotMode(up_to_version);
}
/// Turn off snapshot mode.

View File

@ -17,6 +17,8 @@ struct ListNode
StringRef key;
V value;
/// Monotonically increasing version info for snapshot
size_t version{0};
bool active_in_map{true};
bool free_key{false};
};
@ -35,7 +37,8 @@ private:
IndexMap map;
bool snapshot_mode{false};
/// Allows to avoid additional copies in updateValue function
size_t snapshot_up_to_size = 0;
size_t current_version{0};
size_t snapshot_up_to_version{0};
ArenaWithFreeLists arena;
/// Collect invalid iterators to avoid traversing the whole list
std::vector<Mapped> snapshot_invalid_iters;
@ -129,8 +132,9 @@ public:
if (!it)
{
ListElem elem{copyStringInArena(arena, key), value, true};
auto itr = list.insert(list.end(), elem);
ListElem elem{copyStringInArena(arena, key), value, current_version};
auto itr = list.insert(list.end(), std::move(elem));
bool inserted;
map.emplace(itr->key, it, inserted, hash_value);
assert(inserted);
@ -151,8 +155,8 @@ public:
if (it == map.end())
{
ListElem elem{copyStringInArena(arena, key), value, true};
auto itr = list.insert(list.end(), elem);
ListElem elem{copyStringInArena(arena, key), value, current_version};
auto itr = list.insert(list.end(), std::move(elem));
bool inserted;
map.emplace(itr->key, it, inserted, hash_value);
assert(inserted);
@ -163,9 +167,9 @@ public:
auto list_itr = it->getMapped();
if (snapshot_mode)
{
ListElem elem{list_itr->key, value, true};
ListElem elem{list_itr->key, value, current_version};
list_itr->active_in_map = false;
auto new_list_itr = list.insert(list.end(), elem);
auto new_list_itr = list.insert(list.end(), std::move(elem));
it->getMapped() = new_list_itr;
snapshot_invalid_iters.push_back(list_itr);
}
@ -224,14 +228,14 @@ public:
/// We in snapshot mode but updating some node which is already more
/// fresh than snapshot distance. So it will not participate in
/// snapshot and we don't need to copy it.
size_t distance = std::distance(list.begin(), list_itr);
if (distance < snapshot_up_to_size)
if (snapshot_mode && list_itr->version <= snapshot_up_to_version)
{
auto elem_copy = *(list_itr);
list_itr->active_in_map = false;
snapshot_invalid_iters.push_back(list_itr);
updater(elem_copy.value);
auto itr = list.insert(list.end(), elem_copy);
elem_copy.version = current_version;
auto itr = list.insert(list.end(), std::move(elem_copy));
it->getMapped() = itr;
ret = itr;
}
@ -289,16 +293,16 @@ public:
updateDataSize(CLEAR, 0, 0, 0);
}
void enableSnapshotMode(size_t up_to_size)
void enableSnapshotMode(size_t version)
{
snapshot_mode = true;
snapshot_up_to_size = up_to_size;
snapshot_up_to_version = version;
++current_version;
}
void disableSnapshotMode()
{
snapshot_mode = false;
snapshot_up_to_size = 0;
}
size_t size() const
@ -306,9 +310,9 @@ public:
return map.size();
}
size_t snapshotSize() const
std::pair<size_t, size_t> snapshotSizeWithVersion() const
{
return list.size();
return std::make_pair(list.size(), current_version);
}
uint64_t getApproximateDataSize() const

View File

@ -518,7 +518,7 @@ bool deserializeTxn(KeeperStorage & storage, ReadBuffer & in, Poco::Logger * /*l
if (request->getOpNum() == Coordination::OpNum::Multi && hasErrorsInMultiRequest(request))
return true;
storage.processRequest(request, session_id, zxid, /* check_acl = */ false);
storage.processRequest(request, session_id, time, zxid, /* check_acl = */ false);
}
}

View File

@ -1,3 +1,4 @@
#include <chrono>
#include <gtest/gtest.h>
#include "config_core.h"
@ -406,6 +407,7 @@ TEST_P(CoordinationTest, ChangelogTestCompaction)
EXPECT_TRUE(fs::exists("./logs/changelog_6_10.bin" + params.extension));
changelog.compact(6);
std::this_thread::sleep_for(std::chrono::microseconds(200));
EXPECT_FALSE(fs::exists("./logs/changelog_1_5.bin" + params.extension));
EXPECT_TRUE(fs::exists("./logs/changelog_6_10.bin" + params.extension));
@ -865,7 +867,7 @@ TEST_P(CoordinationTest, SnapshotableHashMapTrySnapshot)
EXPECT_FALSE(map_snp.insert("/hello", 145).second);
map_snp.updateValue("/hello", [](IntNode & value) { value = 554; });
EXPECT_EQ(map_snp.getValue("/hello"), 554);
EXPECT_EQ(map_snp.snapshotSize(), 2);
EXPECT_EQ(map_snp.snapshotSizeWithVersion().first, 2);
EXPECT_EQ(map_snp.size(), 1);
auto itr = map_snp.begin();
@ -884,7 +886,7 @@ TEST_P(CoordinationTest, SnapshotableHashMapTrySnapshot)
}
EXPECT_EQ(map_snp.getValue("/hello3"), 3);
EXPECT_EQ(map_snp.snapshotSize(), 7);
EXPECT_EQ(map_snp.snapshotSizeWithVersion().first, 7);
EXPECT_EQ(map_snp.size(), 6);
itr = std::next(map_snp.begin(), 2);
for (size_t i = 0; i < 5; ++i)
@ -898,7 +900,7 @@ TEST_P(CoordinationTest, SnapshotableHashMapTrySnapshot)
EXPECT_TRUE(map_snp.erase("/hello3"));
EXPECT_TRUE(map_snp.erase("/hello2"));
EXPECT_EQ(map_snp.snapshotSize(), 7);
EXPECT_EQ(map_snp.snapshotSizeWithVersion().first, 7);
EXPECT_EQ(map_snp.size(), 4);
itr = std::next(map_snp.begin(), 2);
for (size_t i = 0; i < 5; ++i)
@ -910,7 +912,7 @@ TEST_P(CoordinationTest, SnapshotableHashMapTrySnapshot)
}
map_snp.clearOutdatedNodes();
EXPECT_EQ(map_snp.snapshotSize(), 4);
EXPECT_EQ(map_snp.snapshotSizeWithVersion().first, 4);
EXPECT_EQ(map_snp.size(), 4);
itr = map_snp.begin();
EXPECT_EQ(itr->key, "/hello");
@ -1164,14 +1166,15 @@ TEST_P(CoordinationTest, TestStorageSnapshotMode)
storage.container.erase("/hello_" + std::to_string(i));
}
EXPECT_EQ(storage.container.size(), 26);
EXPECT_EQ(storage.container.snapshotSize(), 101);
EXPECT_EQ(storage.container.snapshotSizeWithVersion().first, 101);
EXPECT_EQ(storage.container.snapshotSizeWithVersion().second, 1);
auto buf = manager.serializeSnapshotToBuffer(snapshot);
manager.serializeSnapshotBufferToDisk(*buf, 50);
}
EXPECT_TRUE(fs::exists("./snapshots/snapshot_50.bin" + params.extension));
EXPECT_EQ(storage.container.size(), 26);
storage.clearGarbageAfterSnapshot();
EXPECT_EQ(storage.container.snapshotSize(), 26);
EXPECT_EQ(storage.container.snapshotSizeWithVersion().first, 26);
for (size_t i = 0; i < 50; ++i)
{
if (i % 2 != 0)
@ -1219,6 +1222,9 @@ nuraft::ptr<nuraft::buffer> getBufferFromZKRequest(int64_t session_id, const Coo
DB::WriteBufferFromNuraftBuffer buf;
DB::writeIntBinary(session_id, buf);
request->write(buf);
using namespace std::chrono;
auto time = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
DB::writeIntBinary(time, buf);
return buf.getBuffer();
}
@ -1459,6 +1465,7 @@ TEST_P(CoordinationTest, TestRotateIntervalChanges)
}
changelog_2.compact(105);
std::this_thread::sleep_for(std::chrono::microseconds(200));
EXPECT_FALSE(fs::exists("./logs/changelog_1_100.bin" + params.extension));
EXPECT_TRUE(fs::exists("./logs/changelog_101_110.bin" + params.extension));
@ -1478,6 +1485,7 @@ TEST_P(CoordinationTest, TestRotateIntervalChanges)
}
changelog_3.compact(125);
std::this_thread::sleep_for(std::chrono::microseconds(200));
EXPECT_FALSE(fs::exists("./logs/changelog_101_110.bin" + params.extension));
EXPECT_FALSE(fs::exists("./logs/changelog_111_117.bin" + params.extension));
EXPECT_FALSE(fs::exists("./logs/changelog_118_124.bin" + params.extension));

View File

@ -108,8 +108,6 @@ void insertPostgreSQLValue(
ReadBufferFromString in(value);
DateTime64 time = 0;
readDateTime64Text(time, 6, in, assert_cast<const DataTypeDateTime64 *>(data_type.get())->getTimeZone());
if (time < 0)
time = 0;
assert_cast<DataTypeDateTime64::ColumnType &>(column).insertValue(time);
break;
}

View File

@ -498,7 +498,6 @@ class IColumn;
/** Experimental feature for moving data between shards. */ \
\
M(Bool, allow_experimental_query_deduplication, false, "Experimental data deduplication for SELECT queries based on part UUIDs", 0) \
M(Bool, experimental_query_deduplication_send_all_part_uuids, false, "If false only part UUIDs for currently moving parts are sent. If true all read part UUIDs are sent (useful only for testing).", 0) \
\
M(Bool, engine_file_empty_if_not_exists, false, "Allows to select data from a file engine table without file", 0) \
M(Bool, engine_file_truncate_on_insert, false, "Enables or disables truncate before insert in file engine tables", 0) \

View File

@ -141,9 +141,6 @@ void DatabaseAtomic::dropTable(ContextPtr local_context, const String & table_na
if (table->storesDataOnDisk())
tryRemoveSymlink(table_name);
if (table->dropTableImmediately())
table->drop();
/// Notify DatabaseCatalog that table was dropped. It will remove table data in background.
/// Cleanup is performed outside of database to allow easily DROP DATABASE without waiting for cleanup to complete.
DatabaseCatalog::instance().enqueueDroppedTableCleanup(table->getStorageID(), table, table_metadata_path_drop, no_delay);

View File

@ -110,7 +110,7 @@ std::unique_ptr<WriteBufferFromFileBase> DiskAzureBlobStorage::writeFile(
readOrCreateUpdateAndStoreMetadata(path, mode, false, [blob_path, count] (Metadata & metadata) { metadata.addObject(blob_path, count); return true; });
};
return std::make_unique<WriteIndirectBufferFromRemoteFS<WriteBufferFromAzureBlobStorage>>(std::move(buffer), std::move(create_metadata_callback), path);
return std::make_unique<WriteIndirectBufferFromRemoteFS>(std::move(buffer), std::move(create_metadata_callback), path);
}

View File

@ -20,11 +20,26 @@ public:
RestartAwareReadBuffer(const DiskRestartProxy & disk, std::unique_ptr<ReadBufferFromFileBase> impl_)
: ReadBufferFromFileDecorator(std::move(impl_)), lock(disk.mutex) { }
void prefetch() override { impl->prefetch(); }
void prefetch() override
{
swap(*impl);
impl->prefetch();
swap(*impl);
}
void setReadUntilPosition(size_t position) override { impl->setReadUntilPosition(position); }
void setReadUntilPosition(size_t position) override
{
swap(*impl);
impl->setReadUntilPosition(position);
swap(*impl);
}
void setReadUntilEnd() override { impl->setReadUntilEnd(); }
void setReadUntilEnd() override
{
swap(*impl);
impl->setReadUntilEnd();
swap(*impl);
}
private:
ReadLock lock;

View File

@ -106,8 +106,7 @@ std::unique_ptr<WriteBufferFromFileBase> DiskHDFS::writeFile(const String & path
readOrCreateUpdateAndStoreMetadata(path, mode, false, [file_name, count] (Metadata & metadata) { metadata.addObject(file_name, count); return true; });
};
return std::make_unique<WriteIndirectBufferFromRemoteFS<WriteBufferFromHDFS>>(
std::move(hdfs_buffer), std::move(create_metadata_callback), path);
return std::make_unique<WriteIndirectBufferFromRemoteFS>(std::move(hdfs_buffer), std::move(create_metadata_callback), path);
}

View File

@ -32,7 +32,7 @@ namespace DB
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int CANNOT_SEEK_THROUGH_FILE;
extern const int ARGUMENT_OUT_OF_BOUND;
}
@ -146,125 +146,127 @@ bool AsynchronousReadIndirectBufferFromRemoteFS::nextImpl()
return false;
size_t size = 0;
if (prefetch_future.valid())
{
ProfileEvents::increment(ProfileEvents::RemoteFSPrefetchedReads);
CurrentMetrics::Increment metric_increment{CurrentMetrics::AsynchronousReadWait};
Stopwatch watch;
size_t offset = 0;
{
Stopwatch watch;
CurrentMetrics::Increment metric_increment{CurrentMetrics::AsynchronousReadWait};
auto result = prefetch_future.get();
size = result.size;
auto offset = result.offset;
assert(offset < size);
if (size)
{
memory.swap(prefetch_buffer);
size -= offset;
set(memory.data() + offset, size);
working_buffer.resize(size);
file_offset_of_buffer_end += size;
}
offset = result.offset;
/// If prefetch_future is valid, size should always be greater than zero.
assert(offset < size && size > 0);
ProfileEvents::increment(ProfileEvents::AsynchronousReadWaitMicroseconds, watch.elapsedMicroseconds());
}
watch.stop();
ProfileEvents::increment(ProfileEvents::AsynchronousReadWaitMicroseconds, watch.elapsedMicroseconds());
prefetch_buffer.swap(memory);
/// Adjust the working buffer so that it ignores `offset` bytes.
setWithBytesToIgnore(memory.data(), size, offset);
}
else
{
ProfileEvents::increment(ProfileEvents::RemoteFSUnprefetchedReads);
auto result = readInto(memory.data(), memory.size()).get();
size = result.size;
auto offset = result.offset;
assert(offset < size);
assert(offset < size || size == 0);
if (size)
{
size -= offset;
set(memory.data() + offset, size);
working_buffer.resize(size);
file_offset_of_buffer_end += size;
/// Adjust the working buffer so that it ignores `offset` bytes.
setWithBytesToIgnore(memory.data(), size, offset);
}
}
if (file_offset_of_buffer_end != impl->offset())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Expected equality {} == {}. It's a bug", file_offset_of_buffer_end, impl->offset());
file_offset_of_buffer_end = impl->offset();
prefetch_future = {};
return size;
}
off_t AsynchronousReadIndirectBufferFromRemoteFS::seek(off_t offset_, int whence)
off_t AsynchronousReadIndirectBufferFromRemoteFS::seek(off_t offset, int whence)
{
ProfileEvents::increment(ProfileEvents::RemoteFSSeeks);
if (whence == SEEK_CUR)
size_t new_pos;
if (whence == SEEK_SET)
{
/// If position within current working buffer - shift pos.
if (!working_buffer.empty() && static_cast<size_t>(getPosition() + offset_) < file_offset_of_buffer_end)
{
pos += offset_;
return getPosition();
}
else
{
file_offset_of_buffer_end += offset_;
}
assert(offset >= 0);
new_pos = offset;
}
else if (whence == SEEK_SET)
else if (whence == SEEK_CUR)
{
/// If position is within current working buffer - shift pos.
if (!working_buffer.empty()
&& static_cast<size_t>(offset_) >= file_offset_of_buffer_end - working_buffer.size()
&& size_t(offset_) < file_offset_of_buffer_end)
{
pos = working_buffer.end() - (file_offset_of_buffer_end - offset_);
new_pos = file_offset_of_buffer_end - (working_buffer.end() - pos) + offset;
}
else
{
throw Exception("ReadBufferFromFileDescriptor::seek expects SEEK_SET or SEEK_CUR as whence", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
}
/// Position is unchanged.
if (new_pos + (working_buffer.end() - pos) == file_offset_of_buffer_end)
return new_pos;
bool read_from_prefetch = false;
while (true)
{
if (file_offset_of_buffer_end - working_buffer.size() <= new_pos && new_pos <= file_offset_of_buffer_end)
{
/// Position is still inside the buffer.
/// Probably it is at the end of the buffer - then we will load data on the following 'next' call.
pos = working_buffer.end() - file_offset_of_buffer_end + new_pos;
assert(pos >= working_buffer.begin());
assert(pos <= working_buffer.end());
return getPosition();
return new_pos;
}
else
else if (prefetch_future.valid())
{
file_offset_of_buffer_end = offset_;
/// Read from prefetch buffer and recheck if the new position is valid inside.
if (nextImpl())
{
read_from_prefetch = true;
continue;
}
}
}
else
throw Exception("Only SEEK_SET or SEEK_CUR modes are allowed.", ErrorCodes::CANNOT_SEEK_THROUGH_FILE);
if (prefetch_future.valid())
{
ProfileEvents::increment(ProfileEvents::RemoteFSCancelledPrefetches);
prefetch_future.wait();
prefetch_future = {};
/// Prefetch is cancelled because of seek.
if (read_from_prefetch)
ProfileEvents::increment(ProfileEvents::RemoteFSCancelledPrefetches);
break;
}
assert(!prefetch_future.valid());
/// First reset the buffer so the next read will fetch new data to the buffer.
resetWorkingBuffer();
/**
* Lazy ignore. Save number of bytes to ignore and ignore it either for prefetch buffer or current buffer.
* Note: we read in range [file_offset_of_buffer_end, read_until_position).
*/
off_t file_offset_before_seek = impl->offset();
if (impl->initialized()
&& read_until_position && file_offset_of_buffer_end < *read_until_position
&& static_cast<off_t>(file_offset_of_buffer_end) > file_offset_before_seek
&& static_cast<off_t>(file_offset_of_buffer_end) < file_offset_before_seek + static_cast<off_t>(min_bytes_for_seek))
&& read_until_position && new_pos < *read_until_position
&& new_pos > file_offset_of_buffer_end
&& new_pos < file_offset_of_buffer_end + min_bytes_for_seek)
{
ProfileEvents::increment(ProfileEvents::RemoteFSLazySeeks);
bytes_to_ignore = file_offset_of_buffer_end - file_offset_before_seek;
bytes_to_ignore = new_pos - file_offset_of_buffer_end;
}
else
{
ProfileEvents::increment(ProfileEvents::RemoteFSSeeksWithReset);
impl->reset();
file_offset_of_buffer_end = new_pos;
}
return file_offset_of_buffer_end;
return new_pos;
}

View File

@ -157,6 +157,7 @@ bool ReadBufferFromRemoteFSGather::readImpl()
if (bytes_to_ignore)
{
current_buf->ignore(bytes_to_ignore);
file_offset_of_buffer_end += bytes_to_ignore;
bytes_to_ignore = 0;
}

View File

@ -9,9 +9,8 @@
namespace DB
{
template <typename T>
WriteIndirectBufferFromRemoteFS<T>::WriteIndirectBufferFromRemoteFS(
std::unique_ptr<T> impl_,
WriteIndirectBufferFromRemoteFS::WriteIndirectBufferFromRemoteFS(
std::unique_ptr<WriteBuffer> impl_,
CreateMetadataCallback && create_callback_,
const String & metadata_file_path_)
: WriteBufferFromFileDecorator(std::move(impl_))
@ -20,8 +19,8 @@ WriteIndirectBufferFromRemoteFS<T>::WriteIndirectBufferFromRemoteFS(
{
}
template <typename T>
WriteIndirectBufferFromRemoteFS<T>::~WriteIndirectBufferFromRemoteFS()
WriteIndirectBufferFromRemoteFS::~WriteIndirectBufferFromRemoteFS()
{
try
{
@ -33,29 +32,12 @@ WriteIndirectBufferFromRemoteFS<T>::~WriteIndirectBufferFromRemoteFS()
}
}
template <typename T>
void WriteIndirectBufferFromRemoteFS<T>::finalizeImpl()
void WriteIndirectBufferFromRemoteFS::finalizeImpl()
{
WriteBufferFromFileDecorator::finalizeImpl();
create_metadata_callback(count());
}
#if USE_AWS_S3
template
class WriteIndirectBufferFromRemoteFS<WriteBufferFromS3>;
#endif
#if USE_AZURE_BLOB_STORAGE
template
class WriteIndirectBufferFromRemoteFS<WriteBufferFromAzureBlobStorage>;
#endif
#if USE_HDFS
template
class WriteIndirectBufferFromRemoteFS<WriteBufferFromHDFS>;
#endif
template
class WriteIndirectBufferFromRemoteFS<WriteBufferFromHTTP>;
}

View File

@ -12,12 +12,11 @@ namespace DB
using CreateMetadataCallback = std::function<void(size_t bytes_count)>;
/// Stores data in S3/HDFS and adds the object path and object size to metadata file on local FS.
template <typename T>
class WriteIndirectBufferFromRemoteFS final : public WriteBufferFromFileDecorator
{
public:
WriteIndirectBufferFromRemoteFS(
std::unique_ptr<T> impl_,
std::unique_ptr<WriteBuffer> impl_,
CreateMetadataCallback && create_callback_,
const String & metadata_file_path_);

View File

@ -293,7 +293,7 @@ std::unique_ptr<WriteBufferFromFileBase> DiskS3::writeFile(const String & path,
readOrCreateUpdateAndStoreMetadata(path, mode, false, [blob_name, count] (Metadata & metadata) { metadata.addObject(blob_name, count); return true; });
};
return std::make_unique<WriteIndirectBufferFromRemoteFS<WriteBufferFromS3>>(std::move(s3_buffer), std::move(create_metadata_callback), path);
return std::make_unique<WriteIndirectBufferFromRemoteFS>(std::move(s3_buffer), std::move(create_metadata_callback), path);
}
void DiskS3::createHardLink(const String & src_path, const String & dst_path)

View File

@ -121,7 +121,8 @@ public:
auto set = column_set->getData();
auto set_types = set->getDataTypes();
if (tuple && (set_types.size() != 1 || !set_types[0]->equals(*type_tuple)))
if (tuple && set_types.size() != 1 && set_types.size() == tuple->tupleSize())
{
const auto & tuple_columns = tuple->getColumns();
const DataTypes & tuple_types = type_tuple->getElements();

View File

@ -26,6 +26,7 @@ namespace DB
namespace ErrorCodes
{
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int LOGICAL_ERROR;
}
@ -43,6 +44,8 @@ std::future<IAsynchronousReader::Result> AsynchronousReadBufferFromFileDescripto
request.size = size;
request.offset = file_offset_of_buffer_end;
request.priority = priority;
request.ignore = bytes_to_ignore;
bytes_to_ignore = 0;
/// This is a workaround of a read pass EOF bug in linux kernel with pread()
if (file_size.has_value() && file_offset_of_buffer_end >= *file_size)
@ -75,11 +78,14 @@ bool AsynchronousReadBufferFromFileDescriptor::nextImpl()
/// Read request already in flight. Wait for its completion.
size_t size = 0;
size_t offset = 0;
{
Stopwatch watch;
CurrentMetrics::Increment metric_increment{CurrentMetrics::AsynchronousReadWait};
auto result = prefetch_future.get();
size = result.size;
offset = result.offset;
assert(offset < size || size == 0);
ProfileEvents::increment(ProfileEvents::AsynchronousReadWaitMicroseconds, watch.elapsedMicroseconds());
}
@ -89,8 +95,8 @@ bool AsynchronousReadBufferFromFileDescriptor::nextImpl()
if (size)
{
prefetch_buffer.swap(memory);
set(memory.data(), memory.size());
working_buffer.resize(size);
/// Adjust the working buffer so that it ignores `offset` bytes.
setWithBytesToIgnore(memory.data(), size, offset);
return true;
}
@ -100,13 +106,13 @@ bool AsynchronousReadBufferFromFileDescriptor::nextImpl()
{
/// No pending request. Do synchronous read.
auto [size, _] = readInto(memory.data(), memory.size()).get();
auto [size, offset] = readInto(memory.data(), memory.size()).get();
file_offset_of_buffer_end += size;
if (size)
{
set(memory.data(), memory.size());
working_buffer.resize(size);
/// Adjust the working buffer so that it ignores `offset` bytes.
setWithBytesToIgnore(memory.data(), size, offset);
return true;
}
@ -125,6 +131,30 @@ void AsynchronousReadBufferFromFileDescriptor::finalize()
}
AsynchronousReadBufferFromFileDescriptor::AsynchronousReadBufferFromFileDescriptor(
AsynchronousReaderPtr reader_,
Int32 priority_,
int fd_,
size_t buf_size,
char * existing_memory,
size_t alignment,
std::optional<size_t> file_size_)
: ReadBufferFromFileBase(buf_size, existing_memory, alignment, file_size_)
, reader(std::move(reader_))
, priority(priority_)
, required_alignment(alignment)
, fd(fd_)
{
if (required_alignment > buf_size)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Too large alignment. Cannot have required_alignment greater than buf_size: {} > {}. It is a bug",
required_alignment,
buf_size);
prefetch_buffer.alignment = alignment;
}
AsynchronousReadBufferFromFileDescriptor::~AsynchronousReadBufferFromFileDescriptor()
{
finalize();
@ -153,46 +183,48 @@ off_t AsynchronousReadBufferFromFileDescriptor::seek(off_t offset, int whence)
if (new_pos + (working_buffer.end() - pos) == file_offset_of_buffer_end)
return new_pos;
if (file_offset_of_buffer_end - working_buffer.size() <= static_cast<size_t>(new_pos)
&& new_pos <= file_offset_of_buffer_end)
while (true)
{
/// Position is still inside the buffer.
/// Probably it is at the end of the buffer - then we will load data on the following 'next' call.
pos = working_buffer.end() - file_offset_of_buffer_end + new_pos;
assert(pos >= working_buffer.begin());
assert(pos <= working_buffer.end());
return new_pos;
}
else
{
if (prefetch_future.valid())
if (file_offset_of_buffer_end - working_buffer.size() <= new_pos && new_pos <= file_offset_of_buffer_end)
{
//std::cerr << "Ignoring prefetched data" << "\n";
prefetch_future.wait();
prefetch_future = {};
/// Position is still inside the buffer.
/// Probably it is at the end of the buffer - then we will load data on the following 'next' call.
pos = working_buffer.end() - file_offset_of_buffer_end + new_pos;
assert(pos >= working_buffer.begin());
assert(pos <= working_buffer.end());
return new_pos;
}
else if (prefetch_future.valid())
{
/// Read from prefetch buffer and recheck if the new position is valid inside.
if (nextImpl())
continue;
}
/// Position is out of the buffer, we need to do real seek.
off_t seek_pos = required_alignment > 1
? new_pos / required_alignment * required_alignment
: new_pos;
off_t offset_after_seek_pos = new_pos - seek_pos;
/// First reset the buffer so the next read will fetch new data to the buffer.
resetWorkingBuffer();
/// Just update the info about the next position in file.
file_offset_of_buffer_end = seek_pos;
if (offset_after_seek_pos > 0)
ignore(offset_after_seek_pos);
return seek_pos;
break;
}
assert(!prefetch_future.valid());
/// Position is out of the buffer, we need to do real seek.
off_t seek_pos = required_alignment > 1
? new_pos / required_alignment * required_alignment
: new_pos;
/// First reset the buffer so the next read will fetch new data to the buffer.
resetWorkingBuffer();
/// Just update the info about the next position in file.
file_offset_of_buffer_end = seek_pos;
bytes_to_ignore = new_pos - seek_pos;
assert(bytes_to_ignore < internal_buffer.size());
return seek_pos;
}

View File

@ -24,6 +24,7 @@ protected:
const size_t required_alignment = 0; /// For O_DIRECT both file offsets and memory addresses have to be aligned.
size_t file_offset_of_buffer_end = 0; /// What offset in file corresponds to working_buffer.end().
size_t bytes_to_ignore = 0; /// How many bytes should we ignore upon a new read request.
int fd;
bool nextImpl() override;
@ -41,15 +42,7 @@ public:
size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE,
char * existing_memory = nullptr,
size_t alignment = 0,
std::optional<size_t> file_size_ = std::nullopt)
: ReadBufferFromFileBase(buf_size, existing_memory, alignment, file_size_)
, reader(std::move(reader_))
, priority(priority_)
, required_alignment(alignment)
, fd(fd_)
{
prefetch_buffer.alignment = alignment;
}
std::optional<size_t> file_size_ = std::nullopt);
~AsynchronousReadBufferFromFileDescriptor() override;

View File

@ -50,6 +50,29 @@ public:
// FIXME: behavior differs greately from `BufferBase::set()` and it's very confusing.
void set(Position ptr, size_t size) { BufferBase::set(ptr, size, 0); working_buffer.resize(0); }
/// Set buffer to given piece of memory but with certain bytes ignored from beginning.
///
/// internal_buffer: |__________________|
/// working_buffer: |xxxxx|____________|
/// ^ ^
/// bytes_to_ignore
///
/// It's used for lazy seek. We also have another lazy seek mechanism that uses
/// `nextimpl_working_buffer_offset` to set offset in `next` method. It's important that we
/// don't do double lazy seek, which means `nextimpl_working_buffer_offset` should be zero. It's
/// useful to keep internal_buffer points to the real span of the underlying memory, because its
/// size might be used to allocate other buffers. It's also important to have pos starts at
/// working_buffer.begin(), because some buffers assume this condition to be true and uses
/// offset() to check read bytes.
void setWithBytesToIgnore(Position ptr, size_t size, size_t bytes_to_ignore)
{
assert(bytes_to_ignore < size);
assert(nextimpl_working_buffer_offset == 0);
internal_buffer = Buffer(ptr, ptr + size);
working_buffer = Buffer(ptr + bytes_to_ignore, ptr + size);
pos = ptr + bytes_to_ignore;
}
/** read next data and fill a buffer with it; set position to the beginning;
* return `false` in case of end, `true` otherwise; throw an exception, if something is wrong
*/

View File

@ -82,8 +82,7 @@ std::future<IAsynchronousReader::Result> SynchronousReader::submit(Request reque
watch.stop();
ProfileEvents::increment(ProfileEvents::DiskReadElapsedMicroseconds, watch.elapsedMicroseconds());
return Result{ .size = bytes_read, .offset = 0};
return Result{ .size = bytes_read, .offset = request.ignore };
});
}

View File

@ -176,7 +176,7 @@ std::future<IAsynchronousReader::Result> ThreadPoolReader::submit(Request reques
ProfileEvents::increment(ProfileEvents::ThreadPoolReaderPageCacheHitElapsedMicroseconds, watch.elapsedMicroseconds());
ProfileEvents::increment(ProfileEvents::DiskReadElapsedMicroseconds, watch.elapsedMicroseconds());
promise.set_value({bytes_read, 0});
promise.set_value({bytes_read, request.ignore});
return future;
}
}
@ -219,7 +219,7 @@ std::future<IAsynchronousReader::Result> ThreadPoolReader::submit(Request reques
ProfileEvents::increment(ProfileEvents::ThreadPoolReaderPageCacheMissElapsedMicroseconds, watch.elapsedMicroseconds());
ProfileEvents::increment(ProfileEvents::DiskReadElapsedMicroseconds, watch.elapsedMicroseconds());
return Result{ .size = bytes_read, .offset = 0 };
return Result{ .size = bytes_read, .offset = request.ignore };
});
auto future = task->get_future();

View File

@ -95,8 +95,9 @@ StorageFileLog::StorageFileLog(
void StorageFileLog::loadMetaFiles(bool attach)
{
const auto & storage = getStorageID();
/// FIXME Why do we need separate directory? Why not to use data directory?
root_meta_path
= std::filesystem::path(getContext()->getPath()) / ".filelog_storage_metadata" / storage.getDatabaseName() / storage.getTableName();
= std::filesystem::path(getContext()->getPath()) / "stream_engines/filelog/" / DatabaseCatalog::getPathForUUID(storage.uuid);
/// Attach table
if (attach)

View File

@ -52,12 +52,6 @@ public:
void drop() override;
/// We need to call drop() immediately to remove meta data directory,
/// otherwise, if another filelog table with same name created before
/// the table be dropped finally, then its meta data directory will
/// be deleted by this table drop finally
bool dropTableImmediately() override { return true; }
const auto & getFormatName() const { return format_name; }
enum class FileStatus

View File

@ -598,10 +598,6 @@ public:
/// Does not takes underlying Storage (if any) into account.
virtual std::optional<UInt64> lifetimeBytes() const { return {}; }
/// Should table->drop be called at once or with delay (in case of atomic database engine).
/// Needed for integration engines, when there must be no delay for calling drop() method.
virtual bool dropTableImmediately() { return false; }
private:
/// Lock required for alter queries (lockForAlter).
/// Allows to execute only one simultaneous alter query.

View File

@ -1751,8 +1751,6 @@ void MergeTreeDataSelectExecutor::selectPartsToReadWithUUIDFilter(
PartFilterCounters & counters,
Poco::Logger * log)
{
const Settings & settings = query_context->getSettings();
/// process_parts prepare parts that have to be read for the query,
/// returns false if duplicated parts' UUID have been met
auto select_parts = [&] (MergeTreeData::DataPartsVector & selected_parts) -> bool
@ -1807,14 +1805,11 @@ void MergeTreeDataSelectExecutor::selectPartsToReadWithUUIDFilter(
counters.num_granules_after_partition_pruner += num_granules;
/// populate UUIDs and exclude ignored parts if enabled
if (part->uuid != UUIDHelpers::Nil)
if (part->uuid != UUIDHelpers::Nil && pinned_part_uuids->contains(part->uuid))
{
if (settings.experimental_query_deduplication_send_all_part_uuids || pinned_part_uuids->contains(part->uuid))
{
auto result = temp_part_uuids.insert(part->uuid);
if (!result.second)
throw Exception("Found a part with the same UUID on the same replica.", ErrorCodes::LOGICAL_ERROR);
}
auto result = temp_part_uuids.insert(part->uuid);
if (!result.second)
throw Exception("Found a part with the same UUID on the same replica.", ErrorCodes::LOGICAL_ERROR);
}
selected_parts.push_back(part_or_projection);

View File

@ -54,7 +54,7 @@ MergeTreeIndexReader::MergeTreeIndexReader(
std::move(settings));
version = index_format.version;
stream->adjustForRange(MarkRange(0, getLastMark(all_mark_ranges_)));
stream->adjustRightMark(getLastMark(all_mark_ranges_));
stream->seekToStart();
}

View File

@ -695,10 +695,10 @@ MergeTreeRangeReader::ReadResult MergeTreeRangeReader::read(size_t max_rows, Mar
{
auto block = prev_reader->sample_block.cloneWithColumns(read_result.columns);
auto block_before_prewhere = read_result.block_before_prewhere;
for (auto & ctn : block)
for (const auto & column : block)
{
if (block_before_prewhere.has(ctn.name))
block_before_prewhere.erase(ctn.name);
if (block_before_prewhere.has(column.name))
block_before_prewhere.erase(column.name);
}
if (block_before_prewhere)
@ -710,8 +710,8 @@ MergeTreeRangeReader::ReadResult MergeTreeRangeReader::read(size_t max_rows, Mar
block_before_prewhere.setColumns(std::move(old_columns));
}
for (auto && ctn : block_before_prewhere)
block.insert(std::move(ctn));
for (auto & column : block_before_prewhere)
block.insert(std::move(column));
}
merge_tree_reader->evaluateMissingDefaults(block, columns);
}

View File

@ -96,6 +96,7 @@ MergeTreeReaderCompact::MergeTreeReaderCompact(
cached_buffer = std::move(buffer);
data_buffer = cached_buffer.get();
compressed_data_buffer = cached_buffer.get();
}
else
{
@ -114,6 +115,7 @@ MergeTreeReaderCompact::MergeTreeReaderCompact(
non_cached_buffer = std::move(buffer);
data_buffer = non_cached_buffer.get();
compressed_data_buffer = non_cached_buffer.get();
}
}
catch (...)
@ -260,10 +262,7 @@ void MergeTreeReaderCompact::seekToMark(size_t row_index, size_t column_index)
MarkInCompressedFile mark = marks_loader.getMark(row_index, column_index);
try
{
if (cached_buffer)
cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
if (non_cached_buffer)
non_cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
compressed_data_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
}
catch (Exception & e)
{
@ -288,10 +287,7 @@ void MergeTreeReaderCompact::adjustUpperBound(size_t last_mark)
return;
last_right_offset = 0; // Zero value means the end of file.
if (cached_buffer)
cached_buffer->setReadUntilEnd();
if (non_cached_buffer)
non_cached_buffer->setReadUntilEnd();
data_buffer->setReadUntilEnd();
}
else
{
@ -299,10 +295,7 @@ void MergeTreeReaderCompact::adjustUpperBound(size_t last_mark)
return;
last_right_offset = right_offset;
if (cached_buffer)
cached_buffer->setReadUntilPosition(right_offset);
if (non_cached_buffer)
non_cached_buffer->setReadUntilPosition(right_offset);
data_buffer->setReadUntilPosition(right_offset);
}
}

View File

@ -41,6 +41,7 @@ private:
bool isContinuousReading(size_t mark, size_t column_position);
ReadBuffer * data_buffer;
CompressedReadBufferBase * compressed_data_buffer;
std::unique_ptr<CachedCompressedReadBuffer> cached_buffer;
std::unique_ptr<CompressedReadBufferFromFile> non_cached_buffer;

View File

@ -42,7 +42,8 @@ MergeTreeReaderStream::MergeTreeReaderStream(
{
size_t left_mark = mark_range.begin;
size_t right_mark = mark_range.end;
auto [_, mark_range_bytes] = getRightOffsetAndBytesRange(left_mark, right_mark);
size_t left_offset = left_mark < marks_count ? marks_loader.getMark(left_mark).offset_in_compressed_file : 0;
auto mark_range_bytes = getRightOffset(right_mark) - left_offset;
max_mark_range_bytes = std::max(max_mark_range_bytes, mark_range_bytes);
sum_mark_range_bytes += mark_range_bytes;
@ -85,6 +86,7 @@ MergeTreeReaderStream::MergeTreeReaderStream(
cached_buffer = std::move(buffer);
data_buffer = cached_buffer.get();
compressed_data_buffer = cached_buffer.get();
}
else
{
@ -102,22 +104,21 @@ MergeTreeReaderStream::MergeTreeReaderStream(
non_cached_buffer = std::move(buffer);
data_buffer = non_cached_buffer.get();
compressed_data_buffer = non_cached_buffer.get();
}
}
std::pair<size_t, size_t> MergeTreeReaderStream::getRightOffsetAndBytesRange(size_t left_mark, size_t right_mark_non_included)
size_t MergeTreeReaderStream::getRightOffset(size_t right_mark_non_included)
{
/// NOTE: if we are reading the whole file, then right_mark == marks_count
/// and we will use max_read_buffer_size for buffer size, thus avoiding the need to load marks.
/// Special case, can happen in Collapsing/Replacing engines
if (marks_count == 0)
return std::make_pair(0, 0);
return 0;
assert(left_mark < marks_count);
assert(right_mark_non_included <= marks_count);
assert(left_mark <= right_mark_non_included);
size_t result_right_offset;
if (0 < right_mark_non_included && right_mark_non_included < marks_count)
@ -177,30 +178,20 @@ std::pair<size_t, size_t> MergeTreeReaderStream::getRightOffsetAndBytesRange(siz
}
}
else if (right_mark_non_included == 0)
{
result_right_offset = marks_loader.getMark(right_mark_non_included).offset_in_compressed_file;
}
else
{
result_right_offset = file_size;
}
size_t mark_range_bytes = result_right_offset - (left_mark < marks_count ? marks_loader.getMark(left_mark).offset_in_compressed_file : 0);
return std::make_pair(result_right_offset, mark_range_bytes);
return result_right_offset;
}
void MergeTreeReaderStream::seekToMark(size_t index)
{
MarkInCompressedFile mark = marks_loader.getMark(index);
try
{
if (cached_buffer)
cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
if (non_cached_buffer)
non_cached_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
compressed_data_buffer->seek(mark.offset_in_compressed_file, mark.offset_in_decompressed_block);
}
catch (Exception & e)
{
@ -220,10 +211,7 @@ void MergeTreeReaderStream::seekToStart()
{
try
{
if (cached_buffer)
cached_buffer->seek(0, 0);
if (non_cached_buffer)
non_cached_buffer->seek(0, 0);
compressed_data_buffer->seek(0, 0);
}
catch (Exception & e)
{
@ -236,24 +224,21 @@ void MergeTreeReaderStream::seekToStart()
}
void MergeTreeReaderStream::adjustForRange(MarkRange range)
void MergeTreeReaderStream::adjustRightMark(size_t right_mark)
{
/**
* Note: this method is called multiple times for the same range of marks -- each time we
* read from stream, but we must update last_right_offset only if it is bigger than
* the last one to avoid redundantly cancelling prefetches.
*/
auto [right_offset, _] = getRightOffsetAndBytesRange(range.begin, range.end);
auto right_offset = getRightOffset(right_mark);
if (!right_offset)
{
if (last_right_offset && *last_right_offset == 0)
return;
last_right_offset = 0; // Zero value means the end of file.
if (cached_buffer)
cached_buffer->setReadUntilEnd();
if (non_cached_buffer)
non_cached_buffer->setReadUntilEnd();
data_buffer->setReadUntilEnd();
}
else
{
@ -261,10 +246,7 @@ void MergeTreeReaderStream::adjustForRange(MarkRange range)
return;
last_right_offset = right_offset;
if (cached_buffer)
cached_buffer->setReadUntilPosition(right_offset);
if (non_cached_buffer)
non_cached_buffer->setReadUntilPosition(right_offset);
data_buffer->setReadUntilPosition(right_offset);
}
}

View File

@ -34,12 +34,13 @@ public:
* Does buffer need to know something about mark ranges bounds it is going to read?
* (In case of MergeTree* tables). Mostly needed for reading from remote fs.
*/
void adjustForRange(MarkRange range);
void adjustRightMark(size_t right_mark);
ReadBuffer * data_buffer;
CompressedReadBufferBase * compressed_data_buffer;
private:
std::pair<size_t, size_t> getRightOffsetAndBytesRange(size_t left_mark, size_t right_mark_non_included);
size_t getRightOffset(size_t right_mark_non_included);
DiskPtr disk;
std::string path_prefix;

View File

@ -212,7 +212,7 @@ static ReadBuffer * getStream(
return nullptr;
MergeTreeReaderStream & stream = *it->second;
stream.adjustForRange(MarkRange(seek_to_start ? 0 : from_mark, current_task_last_mark));
stream.adjustRightMark(current_task_last_mark);
if (seek_to_start)
stream.seekToStart();

View File

@ -349,14 +349,20 @@ def parse_args() -> argparse.Namespace:
help="list of image paths to build instead of using pr_info + diff URL, "
"e.g. 'docker/packager/binary'",
)
parser.add_argument("--reports", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"--no-reports",
action="store_true",
action="store_false",
dest="reports",
default=argparse.SUPPRESS,
help="don't push reports to S3 and github",
)
parser.add_argument("--push", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"--no-push-images",
action="store_true",
action="store_false",
dest="push",
default=argparse.SUPPRESS,
help="don't push images to docker hub",
)
@ -375,8 +381,7 @@ def main():
else:
changed_json = os.path.join(TEMP_PATH, "changed_images.json")
push = not args.no_push_images
if push:
if args.push:
subprocess.check_output( # pylint: disable=unexpected-keyword-arg
"docker login --username 'robotclickhouse' --password-stdin",
input=get_parameter_from_ssm("dockerhub_robot_password"),
@ -408,7 +413,7 @@ def main():
images_processing_result = []
for image in changed_images:
images_processing_result += process_image_with_parents(
image, image_versions, push
image, image_versions, args.push
)
result_images[image.repo] = result_version
@ -437,7 +442,7 @@ def main():
print(f"::notice ::Report url: {url}")
print(f'::set-output name=url_output::"{url}"')
if args.no_reports:
if not args.reports:
return
gh = Github(get_best_robot_token())

View File

@ -44,14 +44,20 @@ def parse_args() -> argparse.Namespace:
default=RUNNER_TEMP,
help="path to changed_images_*.json files",
)
parser.add_argument("--reports", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"--no-reports",
action="store_true",
action="store_false",
dest="reports",
default=argparse.SUPPRESS,
help="don't push reports to S3 and github",
)
parser.add_argument("--push", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"--no-push-images",
action="store_true",
action="store_false",
dest="push",
default=argparse.SUPPRESS,
help="don't push images to docker hub",
)
@ -63,7 +69,7 @@ def parse_args() -> argparse.Namespace:
def load_images(path: str, suffix: str) -> Images:
with open(os.path.join(path, CHANGED_IMAGES.format(suffix)), "r") as images:
with open(os.path.join(path, CHANGED_IMAGES.format(suffix)), "rb") as images:
return json.load(images)
@ -125,39 +131,37 @@ def merge_images(to_merge: Dict[str, Images]) -> Dict[str, List[List[str]]]:
def create_manifest(image: str, tags: List[str], push: bool) -> Tuple[str, str]:
tag = tags[0]
manifest = f"{image}:{tag}"
cmd = "docker manifest create --amend {}".format(
" ".join((f"{image}:{t}" for t in tags))
)
cmd = "docker manifest create --amend " + " ".join((f"{image}:{t}" for t in tags))
logging.info("running: %s", cmd)
popen = subprocess.Popen(
with subprocess.Popen(
cmd,
shell=True,
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
universal_newlines=True,
)
retcode = popen.wait()
if retcode != 0:
output = popen.stdout.read() # type: ignore
logging.error("failed to create manifest for %s:\n %s\n", manifest, output)
return manifest, "FAIL"
if not push:
return manifest, "OK"
) as popen:
retcode = popen.wait()
if retcode != 0:
output = popen.stdout.read() # type: ignore
logging.error("failed to create manifest for %s:\n %s\n", manifest, output)
return manifest, "FAIL"
if not push:
return manifest, "OK"
cmd = f"docker manifest push {manifest}"
logging.info("running: %s", cmd)
popen = subprocess.Popen(
with subprocess.Popen(
cmd,
shell=True,
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
universal_newlines=True,
)
retcode = popen.wait()
if retcode != 0:
output = popen.stdout.read() # type: ignore
logging.error("failed to push %s:\n %s\n", manifest, output)
return manifest, "FAIL"
) as popen:
retcode = popen.wait()
if retcode != 0:
output = popen.stdout.read() # type: ignore
logging.error("failed to push %s:\n %s\n", manifest, output)
return manifest, "FAIL"
return manifest, "OK"
@ -167,8 +171,7 @@ def main():
stopwatch = Stopwatch()
args = parse_args()
push = not args.no_push_images
if push:
if args.push:
subprocess.check_output( # pylint: disable=unexpected-keyword-arg
"docker login --username 'robotclickhouse' --password-stdin",
input=get_parameter_from_ssm("dockerhub_robot_password"),
@ -189,12 +192,14 @@ def main():
test_results = [] # type: List[Tuple[str, str]]
for image, versions in merged.items():
for tags in versions:
manifest, test_result = create_manifest(image, tags, push)
manifest, test_result = create_manifest(image, tags, args.push)
test_results.append((manifest, test_result))
if test_result != "OK":
status = "failure"
with open(os.path.join(args.path, "changed_images.json"), "w") as ci:
with open(
os.path.join(args.path, "changed_images.json"), "w", encoding="utf-8"
) as ci:
json.dump(changed_images, ci)
pr_info = PRInfo()
@ -202,10 +207,10 @@ def main():
url = upload_results(s3_helper, pr_info.number, pr_info.sha, test_results, [], NAME)
print("::notice ::Report url: {}".format(url))
print('::set-output name=url_output::"{}"'.format(url))
print(f"::notice ::Report url: {url}")
print(f'::set-output name=url_output::"{url}"')
if args.no_reports:
if not args.reports:
return
if changed_images:

View File

@ -5,8 +5,13 @@ import re
import subprocess
from typing import Optional
TAG_REGEXP = r"^v\d{2}[.][1-9]\d*[.][1-9]\d*[.][1-9]\d*-(testing|prestable|stable|lts)$"
SHA_REGEXP = r"^([0-9]|[a-f]){40}$"
# ^ and $ match subline in `multiple\nlines`
# \A and \Z match only start and end of the whole string
RELEASE_BRANCH_REGEXP = r"\A\d+[.]\d+\Z"
TAG_REGEXP = (
r"\Av\d{2}[.][1-9]\d*[.][1-9]\d*[.][1-9]\d*-(testing|prestable|stable|lts)\Z"
)
SHA_REGEXP = r"\A([0-9]|[a-f]){40}\Z"
# Py 3.8 removeprefix and removesuffix
@ -31,6 +36,13 @@ def commit(name: str):
return name
def release_branch(name: str):
r = re.compile(RELEASE_BRANCH_REGEXP)
if not r.match(name):
raise argparse.ArgumentTypeError("release branch should be as 12.1")
return name
class Runner:
"""lightweight check_output wrapper with stripping last NEW_LINE"""

68
tests/ci/git_test.py Normal file
View File

@ -0,0 +1,68 @@
#!/usr/bin/env python
from unittest.mock import patch
import os.path as p
import unittest
from git_helper import Git, Runner
class TestRunner(unittest.TestCase):
def test_init(self):
runner = Runner()
self.assertEqual(runner.cwd, p.realpath(p.dirname(__file__)))
runner = Runner("/")
self.assertEqual(runner.cwd, "/")
def test_run(self):
runner = Runner()
output = runner.run("echo 1")
self.assertEqual(output, "1")
class TestGit(unittest.TestCase):
def setUp(self):
"""we use dummy git object"""
run_patcher = patch("git_helper.Runner.run", return_value="")
self.run_mock = run_patcher.start()
self.addCleanup(run_patcher.stop)
update_patcher = patch("git_helper.Git.update")
update_mock = update_patcher.start()
self.addCleanup(update_patcher.stop)
self.git = Git()
update_mock.assert_called_once()
self.run_mock.assert_called_once()
self.git.new_branch = "NEW_BRANCH_NAME"
self.git.new_tag = "v21.12.333.22222-stable"
self.git.branch = "old_branch"
self.git.sha = ""
self.git.sha_short = ""
self.git.latest_tag = ""
self.git.description = ""
self.git.commits_since_tag = 0
def test_tags(self):
self.git.new_tag = "v21.12.333.22222-stable"
self.git.latest_tag = "v21.12.333.22222-stable"
for tag_attr in ("new_tag", "latest_tag"):
self.assertEqual(getattr(self.git, tag_attr), "v21.12.333.22222-stable")
setattr(self.git, tag_attr, "")
self.assertEqual(getattr(self.git, tag_attr), "")
for tag in (
"v21.12.333-stable",
"v21.12.333-prestable",
"21.12.333.22222-stable",
"v21.12.333.22222-production",
):
with self.assertRaises(Exception):
setattr(self.git, tag_attr, tag)
def test_tweak(self):
self.git.commits_since_tag = 0
self.assertEqual(self.git.tweak, 1)
self.git.commits_since_tag = 2
self.assertEqual(self.git.tweak, 2)
self.git.latest_tag = "v21.12.333.22222-testing"
self.assertEqual(self.git.tweak, 22224)
self.git.commits_since_tag = 0
self.assertEqual(self.git.tweak, 22222)

View File

@ -253,15 +253,21 @@ def parse_args() -> argparse.Namespace:
default="https://clickhousedb.jfrog.io/artifactory",
help="SaaS Artifactory url",
)
parser.add_argument("--artifactory", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"-n",
"--no-artifactory",
action="store_true",
action="store_false",
dest="artifactory",
default=argparse.SUPPRESS,
help="do not push packages to artifactory",
)
parser.add_argument("--force-download", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"--no-force-download",
action="store_true",
action="store_false",
dest="force_download",
default=argparse.SUPPRESS,
help="do not download packages again if they exist already",
)
@ -303,10 +309,10 @@ def main():
args.commit,
args.check_name,
args.release.version,
not args.no_force_download,
args.force_download,
)
art_client = None
if not args.no_artifactory:
if args.artifactory:
art_client = Artifactory(args.artifactory_url, args.release.type)
if args.deb:

View File

@ -6,7 +6,7 @@ from typing import List, Optional
import argparse
import logging
from git_helper import commit
from git_helper import commit, release_branch
from version_helper import (
FILE_WITH_VERSION_PATH,
ClickHouseVersion,
@ -18,14 +18,45 @@ from version_helper import (
)
class Repo:
VALID = ("ssh", "https", "origin")
def __init__(self, repo: str, protocol: str):
self._repo = repo
self._url = ""
self.url = protocol
@property
def url(self) -> str:
return self._url
@url.setter
def url(self, protocol: str):
if protocol == "ssh":
self._url = f"git@github.com:{self}.git"
elif protocol == "https":
self._url = f"https://github.com/{self}.git"
elif protocol == "origin":
self._url = protocol
else:
raise Exception(f"protocol must be in {self.VALID}")
def __str__(self):
return self._repo
class Release:
BIG = ("major", "minor")
SMALL = ("patch",)
def __init__(self, version: ClickHouseVersion):
self._version = version
self._git = version._git
def __init__(self, repo: Repo, release_commit: str, release_type: str):
self.repo = repo
self._release_commit = ""
self.release_commit = release_commit
self.release_type = release_type
self._version = get_version_from_repo()
self._git = self._version._git
self._release_branch = ""
self._rollback_stack = [] # type: List[str]
def run(self, cmd: str, cwd: Optional[str] = None) -> str:
@ -35,32 +66,45 @@ class Release:
logging.info("Running command%s:\n %s", cwd_text, cmd)
return self._git.run(cmd, cwd)
def update(self):
def set_release_branch(self):
# Get the actual version for the commit before check
with self._checkout(self.release_commit, True):
self.read_version()
self.release_branch = f"{self.version.major}.{self.version.minor}"
self.read_version()
def read_version(self):
self._git.update()
self.version = get_version_from_repo()
def do(self, args: argparse.Namespace):
self.release_commit = args.commit
def do(self, check_dirty: bool, check_branch: bool, with_prestable: bool):
if not args.no_check_dirty:
if check_dirty:
logging.info("Checking if repo is clean")
self.run("git diff HEAD --exit-code")
if not args.no_check_branch:
self.check_branch(args.release_type)
self.set_release_branch()
if args.release_type in self.BIG:
# Checkout to the commit, it will provide the correct current version
with self._checkout(self.release_commit, True):
if args.no_prestable:
if check_branch:
self.check_branch()
with self._checkout(self.release_commit, True):
if self.release_type in self.BIG:
# Checkout to the commit, it will provide the correct current version
if with_prestable:
logging.info("Skipping prestable stage")
else:
with self.prestable(args):
with self.prestable():
logging.info("Prestable part of the releasing is done")
with self.testing(args):
with self.testing():
logging.info("Testing part of the releasing is done")
elif self.release_type in self.SMALL:
with self.stable():
logging.info("Stable part of the releasing is done")
self.log_rollback()
def check_no_tags_after(self):
@ -71,19 +115,27 @@ class Release:
f"{tags_after_commit}\nChoose another commit"
)
def check_branch(self, release_type: str):
if release_type in self.BIG:
def check_branch(self):
if self.release_type in self.BIG:
# Commit to spin up the release must belong to a main branch
output = self.run(f"git branch --contains={self.release_commit} master")
if "master" not in output:
branch = "master"
output = self.run(f"git branch --contains={self.release_commit} {branch}")
if branch not in output:
raise Exception(
f"commit {self.release_commit} must belong to 'master' for "
f"{release_type} release"
f"commit {self.release_commit} must belong to {branch} for "
f"{self.release_type} release"
)
if release_type in self.SMALL:
branch = f"{self.version.major}.{self.version.minor}"
if self._git.branch != branch:
raise Exception(f"branch must be '{branch}' for {release_type} release")
return
elif self.release_type in self.SMALL:
output = self.run(
f"git branch --contains={self.release_commit} {self.release_branch}"
)
if self.release_branch not in output:
raise Exception(
f"commit {self.release_commit} must be in "
f"'{self.release_branch}' branch for {self.release_type} release"
)
return
def log_rollback(self):
if self._rollback_stack:
@ -95,31 +147,60 @@ class Release:
)
@contextmanager
def prestable(self, args: argparse.Namespace):
def prestable(self):
self.check_no_tags_after()
# Create release branch
self.update()
release_branch = f"{self.version.major}.{self.version.minor}"
with self._create_branch(release_branch, self.release_commit):
with self._checkout(release_branch, True):
self.update()
self.read_version()
with self._create_branch(self.release_branch, self.release_commit):
with self._checkout(self.release_branch, True):
self.read_version()
self.version.with_description(VersionType.PRESTABLE)
with self._create_gh_release(args):
with self._bump_prestable_version(release_branch, args):
with self._create_gh_release(True):
with self._bump_prestable_version():
# At this point everything will rollback automatically
yield
@contextmanager
def testing(self, args: argparse.Namespace):
def stable(self):
self.check_no_tags_after()
self.read_version()
version_type = VersionType.STABLE
if self.version.minor % 5 == 3: # our 3 and 8 are LTS
version_type = VersionType.LTS
self.version.with_description(version_type)
with self._create_gh_release(False):
self.version = self.version.update(self.release_type)
self.version.with_description(version_type)
update_cmake_version(self.version)
cmake_path = get_abs_path(FILE_WITH_VERSION_PATH)
# Checkouting the commit of the branch and not the branch itself,
# then we are able to skip rollback
with self._checkout(f"{self.release_branch}@{{0}}", False):
current_commit = self.run("git rev-parse HEAD")
self.run(
f"git commit -m "
f"'Update version to {self.version.string}' '{cmake_path}'"
)
with self._push(
"HEAD", with_rollback_on_fail=False, remote_ref=self.release_branch
):
# DO NOT PUT ANYTHING ELSE HERE
# The push must be the last action and mean the successful release
self._rollback_stack.append(
f"git push {self.repo.url} "
f"+{current_commit}:{self.release_branch}"
)
yield
@contextmanager
def testing(self):
# Create branch for a version bump
self.update()
self.version = self.version.update(args.release_type)
self.read_version()
self.version = self.version.update(self.release_type)
helper_branch = f"{self.version.major}.{self.version.minor}-prepare"
with self._create_branch(helper_branch, self.release_commit):
with self._checkout(helper_branch, True):
self.update()
self.version = self.version.update(args.release_type)
with self._bump_testing_version(helper_branch, args):
with self._bump_testing_version(helper_branch):
yield
@property
@ -132,6 +213,14 @@ class Release:
raise ValueError(f"version must be ClickHouseVersion, not {type(version)}")
self._version = version
@property
def release_branch(self) -> str:
return self._release_branch
@release_branch.setter
def release_branch(self, branch: str):
self._release_branch = release_branch(branch)
@property
def release_commit(self) -> str:
return self._release_commit
@ -141,7 +230,8 @@ class Release:
self._release_commit = commit(release_commit)
@contextmanager
def _bump_prestable_version(self, release_branch: str, args: argparse.Namespace):
def _bump_prestable_version(self):
# Update only git, origal version stays the same
self._git.update()
new_version = self.version.patch_update()
new_version.with_description("prestable")
@ -150,35 +240,38 @@ class Release:
self.run(
f"git commit -m 'Update version to {new_version.string}' '{cmake_path}'"
)
with self._push(release_branch, args):
with self._push(self.release_branch):
with self._create_gh_label(
f"v{release_branch}-must-backport", "10dbed", args
f"v{self.release_branch}-must-backport", "10dbed"
):
with self._create_gh_label(
f"v{release_branch}-affected", "c2bfff", args
f"v{self.release_branch}-affected", "c2bfff"
):
self.run(
f"gh pr create --repo {args.repo} --title 'Release pull "
f"request for branch {release_branch}' --head {release_branch} "
f"gh pr create --repo {self.repo} --title "
f"'Release pull request for branch {self.release_branch}' "
f"--head {self.release_branch} --label release "
"--body 'This PullRequest is a part of ClickHouse release "
"cycle. It is used by CI system only. Do not perform any "
"changes with it.' --label release"
"changes with it.'"
)
# Here the prestable part is done
yield
@contextmanager
def _bump_testing_version(self, helper_branch: str, args: argparse.Namespace):
def _bump_testing_version(self, helper_branch: str):
self.read_version()
self.version = self.version.update(self.release_type)
self.version.with_description("testing")
update_cmake_version(self.version)
cmake_path = get_abs_path(FILE_WITH_VERSION_PATH)
self.run(
f"git commit -m 'Update version to {self.version.string}' '{cmake_path}'"
)
with self._push(helper_branch, args):
with self._push(helper_branch):
body_file = get_abs_path(".github/PULL_REQUEST_TEMPLATE.md")
self.run(
f"gh pr create --repo {args.repo} --title 'Update version after "
f"gh pr create --repo {self.repo} --title 'Update version after "
f"release' --head {helper_branch} --body-file '{body_file}'"
)
# Here the prestable part is done
@ -216,9 +309,12 @@ class Release:
raise
@contextmanager
def _create_gh_label(self, label: str, color: str, args: argparse.Namespace):
self.run(f"gh api repos/{args.repo}/labels -f name={label} -f color={color}")
rollback_cmd = f"gh api repos/{args.repo}/labels/{label} -X DELETE"
def _create_gh_label(self, label: str, color_hex: str):
# API call, https://docs.github.com/en/rest/reference/issues#create-a-label
self.run(
f"gh api repos/{self.repo}/labels -f name={label} -f color={color_hex}"
)
rollback_cmd = f"gh api repos/{self.repo}/labels/{label} -X DELETE"
self._rollback_stack.append(rollback_cmd)
try:
yield
@ -228,15 +324,18 @@ class Release:
raise
@contextmanager
def _create_gh_release(self, args: argparse.Namespace):
with self._create_tag(args):
def _create_gh_release(self, as_prerelease: bool):
with self._create_tag():
# Preserve tag if version is changed
tag = self.version.describe
prerelease = ""
if as_prerelease:
prerelease = "--prerelease"
self.run(
f"gh release create --prerelease --draft --repo {args.repo} "
f"gh release create {prerelease} --draft --repo {self.repo} "
f"--title 'Release {tag}' '{tag}'"
)
rollback_cmd = f"gh release delete --yes --repo {args.repo} '{tag}'"
rollback_cmd = f"gh release delete --yes --repo {self.repo} '{tag}'"
self._rollback_stack.append(rollback_cmd)
try:
yield
@ -246,13 +345,13 @@ class Release:
raise
@contextmanager
def _create_tag(self, args: argparse.Namespace):
def _create_tag(self):
tag = self.version.describe
self.run(f"git tag -a -m 'Release {tag}' '{tag}'")
rollback_cmd = f"git tag -d '{tag}'"
self._rollback_stack.append(rollback_cmd)
try:
with self._push(f"'{tag}'", args):
with self._push(f"'{tag}'"):
yield
except BaseException:
logging.warning("Rolling back tag %s", tag)
@ -260,15 +359,22 @@ class Release:
raise
@contextmanager
def _push(self, ref: str, args: argparse.Namespace):
self.run(f"git push git@github.com:{args.repo}.git {ref}")
rollback_cmd = f"git push -d git@github.com:{args.repo}.git {ref}"
self._rollback_stack.append(rollback_cmd)
def _push(self, ref: str, with_rollback_on_fail: bool = True, remote_ref: str = ""):
if remote_ref == "":
remote_ref = ref
self.run(f"git push {self.repo.url} {ref}:{remote_ref}")
if with_rollback_on_fail:
rollback_cmd = f"git push -d {self.repo.url} {remote_ref}"
self._rollback_stack.append(rollback_cmd)
try:
yield
except BaseException:
logging.warning("Rolling back pushed ref %s", ref)
self.run(rollback_cmd)
if with_rollback_on_fail:
logging.warning("Rolling back pushed ref %s", ref)
self.run(rollback_cmd)
raise
@ -284,52 +390,66 @@ def parse_args() -> argparse.Namespace:
default="ClickHouse/ClickHouse",
help="repository to create the release",
)
parser.add_argument(
"--remote-protocol",
"-p",
default="ssh",
choices=Repo.VALID,
help="repo protocol for git commands remote, 'origin' is a special case and "
"uses 'origin' as a remote",
)
parser.add_argument(
"--type",
default="minor",
# choices=Release.BIG+Release.SMALL, # add support later
choices=Release.BIG + Release.SMALL,
dest="release_type",
help="a release type, new branch is created only for 'major' and 'minor'",
)
parser.add_argument(
"--no-prestable",
action="store_true",
help=f"for release types in {Release.BIG} skip creating prestable release and "
"release branch",
)
parser.add_argument(
"--commit",
default=git.sha,
type=commit,
help="commit create a release, default to HEAD",
)
parser.add_argument("--with-prestable", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"--no-prestable",
dest="with_prestable",
action="store_false",
default=argparse.SUPPRESS,
help=f"if set, for release types in {Release.BIG} skip creating prestable "
"release and release branch",
)
parser.add_argument("--check-dirty", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"--no-check-dirty",
action="store_true",
help="skip check repository for uncommited changes",
dest="check_dirty",
action="store_false",
default=argparse.SUPPRESS,
help="(dangerous) if set, skip check repository for uncommited changes",
)
parser.add_argument("--check-branch", default=True, help=argparse.SUPPRESS)
parser.add_argument(
"--no-check-branch",
action="store_true",
help="by default, 'major' and 'minor' types work only for master, and 'patch' "
"works only for a release branches, that name should be the same as "
"'$MAJOR.$MINOR' version, e.g. 22.2",
dest="check_branch",
action="store_false",
default=argparse.SUPPRESS,
help="(debug or development only) if set, skip the branch check for a run. "
"By default, 'major' and 'minor' types workonly for master, and 'patch' works "
"only for a release branches, that name "
"should be the same as '$MAJOR.$MINOR' version, e.g. 22.2",
)
return parser.parse_args()
def prestable():
pass
def main():
logging.basicConfig(level=logging.INFO)
args = parse_args()
release = Release(get_version_from_repo())
repo = Repo(args.repo, args.remote_protocol)
release = Release(repo, args.commit, args.release_type)
release.do(args)
release.do(args.check_dirty, args.check_branch, args.with_prestable)
if __name__ == "__main__":

View File

@ -105,4 +105,4 @@ if __name__ == "__main__":
args = parser.parse_args()
keys = main(args.token, args.organization, args.team)
print(f"Just shoing off the keys:\n{keys}")
print(f"# Just shoing off the keys:\n{keys}")

View File

@ -0,0 +1 @@
#!/usr/bin/env python3

View File

@ -0,0 +1,41 @@
<clickhouse>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>
<session_timeout_ms>10000</session_timeout_ms>
<snapshot_distance>75</snapshot_distance>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>node1</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<priority>3</priority>
</server>
<server>
<id>2</id>
<hostname>node2</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<start_as_follower>true</start_as_follower>
<priority>2</priority>
</server>
<server>
<id>3</id>
<hostname>node3</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<start_as_follower>true</start_as_follower>
<priority>1</priority>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>

View File

@ -0,0 +1,41 @@
<clickhouse>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>2</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>
<session_timeout_ms>10000</session_timeout_ms>
<snapshot_distance>75</snapshot_distance>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>node1</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<priority>3</priority>
</server>
<server>
<id>2</id>
<hostname>node2</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<start_as_follower>true</start_as_follower>
<priority>2</priority>
</server>
<server>
<id>3</id>
<hostname>node3</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<start_as_follower>true</start_as_follower>
<priority>1</priority>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>

View File

@ -0,0 +1,41 @@
<clickhouse>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>3</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>
<session_timeout_ms>10000</session_timeout_ms>
<snapshot_distance>75</snapshot_distance>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>node1</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<priority>3</priority>
</server>
<server>
<id>2</id>
<hostname>node2</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<start_as_follower>true</start_as_follower>
<priority>2</priority>
</server>
<server>
<id>3</id>
<hostname>node3</hostname>
<port>9234</port>
<can_become_leader>true</can_become_leader>
<start_as_follower>true</start_as_follower>
<priority>1</priority>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>

View File

@ -0,0 +1,16 @@
<clickhouse>
<zookeeper>
<node index="1">
<host>node1</host>
<port>9181</port>
</node>
<node index="2">
<host>node2</host>
<port>9181</port>
</node>
<node index="3">
<host>node3</host>
<port>9181</port>
</node>
</zookeeper>
</clickhouse>

View File

@ -0,0 +1,129 @@
import pytest
from helpers.cluster import ClickHouseCluster
import random
import string
import os
import time
from multiprocessing.dummy import Pool
from helpers.network import PartitionManager
from helpers.test_tools import assert_eq_with_retry
cluster = ClickHouseCluster(__file__)
node1 = cluster.add_instance('node1', main_configs=['configs/enable_keeper1.xml', 'configs/use_keeper.xml'], stay_alive=True)
node2 = cluster.add_instance('node2', main_configs=['configs/enable_keeper2.xml', 'configs/use_keeper.xml'], stay_alive=True)
node3 = cluster.add_instance('node3', main_configs=['configs/enable_keeper3.xml', 'configs/use_keeper.xml'], stay_alive=True)
from kazoo.client import KazooClient, KazooState
@pytest.fixture(scope="module")
def started_cluster():
try:
cluster.start()
yield cluster
finally:
cluster.shutdown()
def smaller_exception(ex):
return '\n'.join(str(ex).split('\n')[0:2])
def wait_node(node):
for _ in range(100):
zk = None
try:
node.query("SELECT * FROM system.zookeeper WHERE path = '/'")
zk = get_fake_zk(node.name, timeout=30.0)
zk.create("/test", sequence=True)
print("node", node.name, "ready")
break
except Exception as ex:
time.sleep(0.2)
print("Waiting until", node.name, "will be ready, exception", ex)
finally:
if zk:
zk.stop()
zk.close()
else:
raise Exception("Can't wait node", node.name, "to become ready")
def wait_nodes():
for node in [node1, node2, node3]:
wait_node(node)
def get_fake_zk(nodename, timeout=30.0):
_fake_zk_instance = KazooClient(hosts=cluster.get_instance_ip(nodename) + ":9181", timeout=timeout)
_fake_zk_instance.start()
return _fake_zk_instance
def assert_eq_stats(stat1, stat2):
assert stat1.version == stat2.version
assert stat1.cversion == stat2.cversion
assert stat1.aversion == stat2.aversion
assert stat1.aversion == stat2.aversion
assert stat1.dataLength == stat2.dataLength
assert stat1.numChildren == stat2.numChildren
assert stat1.ctime == stat2.ctime
assert stat1.mtime == stat2.mtime
def test_between_servers(started_cluster):
try:
wait_nodes()
node1_zk = get_fake_zk("node1")
node2_zk = get_fake_zk("node2")
node3_zk = get_fake_zk("node3")
node1_zk.create("/test_between_servers")
for child_node in range(1000):
node1_zk.create("/test_between_servers/" + str(child_node))
for child_node in range(1000):
node1_zk.set("/test_between_servers/" + str(child_node), b"somevalue")
for child_node in range(1000):
stats1 = node1_zk.exists("/test_between_servers/" + str(child_node))
stats2 = node2_zk.exists("/test_between_servers/" + str(child_node))
stats3 = node3_zk.exists("/test_between_servers/" + str(child_node))
assert_eq_stats(stats1, stats2)
assert_eq_stats(stats2, stats3)
finally:
try:
for zk_conn in [node1_zk, node2_zk, node3_zk]:
zk_conn.stop()
zk_conn.close()
except:
pass
def test_server_restart(started_cluster):
try:
wait_nodes()
node1_zk = get_fake_zk("node1")
node1_zk.create("/test_server_restart")
for child_node in range(1000):
node1_zk.create("/test_server_restart/" + str(child_node))
for child_node in range(1000):
node1_zk.set("/test_server_restart/" + str(child_node), b"somevalue")
node3.restart_clickhouse(kill=True)
node2_zk = get_fake_zk("node2")
node3_zk = get_fake_zk("node3")
for child_node in range(1000):
stats1 = node1_zk.exists("/test_between_servers/" + str(child_node))
stats2 = node2_zk.exists("/test_between_servers/" + str(child_node))
stats3 = node3_zk.exists("/test_between_servers/" + str(child_node))
assert_eq_stats(stats1, stats2)
assert_eq_stats(stats2, stats3)
finally:
try:
for zk_conn in [node1_zk, node2_zk, node3_zk]:
zk_conn.stop()
zk_conn.close()
except:
pass

View File

@ -1,5 +0,0 @@
<clickhouse>
<merge_tree>
<assign_part_uuids>1</assign_part_uuids>
</merge_tree>
</clickhouse>

View File

@ -1,7 +0,0 @@
<clickhouse>
<profiles>
<default>
<experimental_query_deduplication_send_all_part_uuids>1</experimental_query_deduplication_send_all_part_uuids>
</default>
</profiles>
</clickhouse>

View File

@ -1,24 +0,0 @@
<clickhouse>
<remote_servers>
<test_cluster>
<shard>
<replica>
<host>node1</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>node2</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>node3</host>
<port>9000</port>
</replica>
</shard>
</test_cluster>
</remote_servers>
</clickhouse>

View File

@ -1,168 +0,0 @@
import uuid
import pytest
from helpers.cluster import ClickHouseCluster
from helpers.test_tools import TSV
DUPLICATED_UUID = uuid.uuid4()
cluster = ClickHouseCluster(__file__)
node1 = cluster.add_instance(
'node1',
main_configs=['configs/remote_servers.xml', 'configs/deduplication_settings.xml'],
user_configs=['configs/profiles.xml'])
node2 = cluster.add_instance(
'node2',
main_configs=['configs/remote_servers.xml', 'configs/deduplication_settings.xml'],
user_configs=['configs/profiles.xml'])
node3 = cluster.add_instance(
'node3',
main_configs=['configs/remote_servers.xml', 'configs/deduplication_settings.xml'],
user_configs=['configs/profiles.xml'])
@pytest.fixture(scope="module")
def started_cluster():
try:
cluster.start()
yield cluster
finally:
cluster.shutdown()
def prepare_node(node, parts_uuid=None):
node.query("""
CREATE TABLE t(_prefix UInt8 DEFAULT 0, key UInt64, value UInt64)
ENGINE MergeTree()
ORDER BY tuple()
PARTITION BY _prefix
SETTINGS index_granularity = 1
""")
node.query("""
CREATE TABLE d AS t ENGINE=Distributed(test_cluster, default, t)
""")
# Stop merges while populating test data
node.query("SYSTEM STOP MERGES")
# Create 5 parts
for i in range(1, 6):
node.query("INSERT INTO t VALUES ({}, {}, {})".format(i, i, i))
node.query("DETACH TABLE t")
if parts_uuid:
for part, part_uuid in parts_uuid:
script = """
echo -n '{}' > /var/lib/clickhouse/data/default/t/{}/uuid.txt
""".format(part_uuid, part)
node.exec_in_container(["bash", "-c", script])
# Attach table back
node.query("ATTACH TABLE t")
# NOTE:
# due to absence of the ability to lock part, need to operate on parts with preventin merges
# node.query("SYSTEM START MERGES")
# node.query("OPTIMIZE TABLE t FINAL")
print(node.name)
print(node.query("SELECT name, uuid, partition FROM system.parts WHERE table = 't' AND active ORDER BY name"))
assert '5' == node.query("SELECT count() FROM system.parts WHERE table = 't' AND active").strip()
if parts_uuid:
for part, part_uuid in parts_uuid:
assert '1' == node.query(
"SELECT count() FROM system.parts WHERE table = 't' AND uuid = '{}' AND active".format(
part_uuid)).strip()
@pytest.fixture(scope="module")
def prepared_cluster(started_cluster):
print("duplicated UUID: {}".format(DUPLICATED_UUID))
prepare_node(node1, parts_uuid=[("3_3_3_0", DUPLICATED_UUID)])
prepare_node(node2, parts_uuid=[("3_3_3_0", DUPLICATED_UUID)])
prepare_node(node3)
def test_virtual_column(prepared_cluster):
# Part containing `key=3` has the same fingerprint on both nodes,
# we expect it to be included only once in the end result.;
# select query is using virtucal column _part_fingerprint to filter out part in one shard
expected = """
1 2
2 2
3 1
4 2
5 2
"""
assert TSV(expected) == TSV(node1.query("""
SELECT
key,
count() AS c
FROM d
WHERE ((_shard_num = 1) AND (_part_uuid != '{}')) OR (_shard_num = 2)
GROUP BY key
ORDER BY
key ASC
""".format(DUPLICATED_UUID)))
def test_with_deduplication(prepared_cluster):
# Part containing `key=3` has the same fingerprint on both nodes,
# we expect it to be included only once in the end result
expected = """
1 3
2 3
3 2
4 3
5 3
"""
assert TSV(expected) == TSV(node1.query(
"SET allow_experimental_query_deduplication=1; SELECT key, count() c FROM d GROUP BY key ORDER BY key"))
def test_no_merge_with_deduplication(prepared_cluster):
# Part containing `key=3` has the same fingerprint on both nodes,
# we expect it to be included only once in the end result.
# even with distributed_group_by_no_merge=1 the duplicated part should be excluded from the final result
expected = """
1 1
2 1
3 1
4 1
5 1
1 1
2 1
3 1
4 1
5 1
1 1
2 1
4 1
5 1
"""
assert TSV(expected) == TSV(node1.query("SELECT key, count() c FROM d GROUP BY key ORDER BY key", settings={
"allow_experimental_query_deduplication": 1,
"distributed_group_by_no_merge": 1,
}))
def test_without_deduplication(prepared_cluster):
# Part containing `key=3` has the same fingerprint on both nodes,
# but allow_experimental_query_deduplication is disabled,
# so it will not be excluded
expected = """
1 3
2 3
3 3
4 3
5 3
"""
assert TSV(expected) == TSV(node1.query(
"SET allow_experimental_query_deduplication=0; SELECT key, count() c FROM d GROUP BY key ORDER BY key"))

View File

@ -447,6 +447,16 @@ def test_where_false(started_cluster):
cursor.execute("DROP TABLE test")
def test_datetime64(started_cluster):
cursor = started_cluster.postgres_conn.cursor()
cursor.execute("drop table if exists test")
cursor.execute("create table test (ts timestamp)")
cursor.execute("insert into test select '1960-01-01 20:00:00';")
result = node1.query("select * from postgresql(postgres1, table='test')")
assert(result.strip() == '1960-01-01 20:00:00.000000')
if __name__ == '__main__':
cluster.start()
input("Cluster created, press any key to destroy...")

View File

@ -1,14 +1,29 @@
1
1
1
1
1
1
1
1
1
1
1
1
=== Backward compatibility test
1
=== Cannot resolve host
1
1
=== Bad arguments
1
1
=== Not alive host
1
1
1
1
1
=== Code 210 with ipv6
1
1
1
1
1
1
=== Values form config
1
1
===
1
1
1

View File

@ -7,55 +7,75 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# default values test
${CLICKHOUSE_CLIENT} --query "SELECT 1"
# backward compatibility test
echo '=== Backward compatibility test'
${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";
echo '=== Cannot resolve host'
not_resolvable_host="notlocalhost"
exception_msg="Cannot resolve host (${not_resolvable_host}), error 0: ${not_resolvable_host}.
Code: 198. DB::Exception: Not found address of host: ${not_resolvable_host}. (DNS_ERROR)
"
error="$(${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --host "${not_resolvable_host}" --query "SELECT 1" 2>&1 > /dev/null)";
[ "${error}" == "${exception_msg}" ]; echo "$?"
error="$(${CLICKHOUSE_CLIENT} --host "${not_resolvable_host}" --query "SELECT 1" 2>&1 > /dev/null)";
echo "${error}" | grep -Fc "DNS_ERROR"
echo "${error}" | grep -Fq "${not_resolvable_host}" && echo 1 || echo 0
echo '=== Bad arguments'
not_number_port="abc"
exception_msg="Bad arguments: the argument ('${CLICKHOUSE_HOST}:${not_number_port}') for option '--host' is invalid."
error="$(${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${not_number_port}" --query "SELECT 1" 2>&1 > /dev/null)";
[ "${error}" == "${exception_msg}" ]; echo "$?"
error="$(${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${not_number_port}" --query "SELECT 1" 2>&1 > /dev/null)";
echo "${error}" | grep -Fc "Bad arguments"
echo "${error}" | grep -Fc "${not_number_port}"
echo '=== Not alive host'
not_alive_host="10.100.0.0"
${CLICKHOUSE_CLIENT} --host "${not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
not_alive_port="1"
exception_msg="Code: 210. DB::NetException: Connection refused (${CLICKHOUSE_HOST}:${not_alive_port}). (NETWORK_ERROR)
"
error="$(${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${not_alive_port}" --query "SELECT 1" 2>&1 > /dev/null)"
[ "${error}" == "${exception_msg}" ]; echo "$?"
echo "${error}" | grep -Fc "Code: 210"
echo "${error}" | grep -Fc "${CLICKHOUSE_HOST}:${not_alive_port}"
${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${not_alive_port}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --port "${CLICKHOUSE_PORT_TCP}" --port "${not_alive_port}" --query "SELECT 1";
echo '=== Code 210 with ipv6'
ipv6_host_without_brackets="2001:3984:3989::1:1000"
exception_msg="Code: 210. DB::NetException: Connection refused (${ipv6_host_without_brackets}). (NETWORK_ERROR)
"
error="$(${CLICKHOUSE_CLIENT} --host "${ipv6_host_without_brackets}" --query "SELECT 1" 2>&1 > /dev/null)"
[ "${error}" == "${exception_msg}" ]; echo "$?"
echo "${error}" | grep -Fc "Code: 210"
echo "${error}" | grep -Fc "${ipv6_host_without_brackets}"
ipv6_host_with_brackets="[2001:3984:3989::1:1000]"
exception_msg="Code: 210. DB::NetException: Connection refused (${ipv6_host_with_brackets}). (NETWORK_ERROR)
"
error="$(${CLICKHOUSE_CLIENT} --host "${ipv6_host_with_brackets}" --query "SELECT 1" 2>&1 > /dev/null)"
[ "${error}" == "${exception_msg}" ]; echo "$?"
echo "${error}" | grep -Fc "Code: 210"
echo "${error}" | grep -Fc "${ipv6_host_with_brackets}"
exception_msg="Code: 210. DB::NetException: Connection refused (${ipv6_host_with_brackets}:${not_alive_port}). (NETWORK_ERROR)
"
error="$(${CLICKHOUSE_CLIENT} --host "${ipv6_host_with_brackets}" --port "${not_alive_port}" --query "SELECT 1" 2>&1 > /dev/null)"
[ "${error}" == "${exception_msg}" ]; echo "$?"
echo "${error}" | grep -Fc "Code: 210"
echo "${error}" | grep -Fc "${ipv6_host_with_brackets}:${not_alive_port}"
echo '=== Values form config'
${CLICKHOUSE_CLIENT} --query "SELECT 1";
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "${CLICKHOUSE_HOST}" --host "{$not_alive_host}" --port "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "{$not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1" 2> /dev/null;
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --port "${CLICKHOUSE_PORT_TCP}" --port "${CLICKHOUSE_PORT_TCP}" --host "{$not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
CUSTOM_CONFIG="$CURDIR/02100_config.xml"
rm -f ${CUSTOM_CONFIG}
cat << EOF > ${CUSTOM_CONFIG}
<config>
<host>${not_alive_host}</host>
<port>${not_alive_port}</port>
</config>
EOF
error="$(${CLICKHOUSE_CLIENT} --config ${CUSTOM_CONFIG} --query "SELECT 1" 2>&1 > /dev/null)"
echo "${error}" | grep -Fc "DB::NetException"
echo "${error}" | grep -Fc "${not_alive_host}:${not_alive_port}"
rm -f ${CUSTOM_CONFIG}
echo '==='
${CLICKHOUSE_CLIENT} --query "SELECT 1";
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "${CLICKHOUSE_HOST}" --host "{$not_alive_host}" --port "${CLICKHOUSE_PORT_TCP}" --query "SELECT 1";
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --host "{$not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1" 2> /dev/null;
${CLICKHOUSE_CLIENT} --port "${CLICKHOUSE_PORT_TCP}" --port "${CLICKHOUSE_PORT_TCP}" --port "${CLICKHOUSE_PORT_TCP}" --host "{$not_alive_host}" --host "${CLICKHOUSE_HOST}" --query "SELECT 1";

View File

@ -0,0 +1 @@
2001 2

View File

@ -0,0 +1,13 @@
DROP TABLE IF EXISTS calendar;
DROP TABLE IF EXISTS events32;
CREATE TABLE calendar ( `year` Int64, `month` Int64 ) ENGINE = TinyLog;
INSERT INTO calendar VALUES (2000, 1), (2001, 2), (2000, 3);
CREATE TABLE events32 ( `year` Int32, `month` Int32 ) ENGINE = TinyLog;
INSERT INTO events32 VALUES (2001, 2), (2001, 3);
SELECT * FROM calendar WHERE (year, month) IN ( SELECT (year, month) FROM events32 );
DROP TABLE IF EXISTS calendar;
DROP TABLE IF EXISTS events32;

View File

@ -1,4 +1,6 @@
v22.2.3.5-stable 2022-02-25
v22.2.2.1-stable 2022-02-17
v22.1.4.30-stable 2022-02-25
v22.1.3.7-stable 2022-01-23
v22.1.2.2-stable 2022-01-19
v21.12.4.1-stable 2022-01-23

1 v22.2.2.1-stable v22.2.3.5-stable 2022-02-17 2022-02-25
1 v22.2.3.5-stable 2022-02-25
2 v22.2.2.1-stable v22.2.2.1-stable 2022-02-17 2022-02-17
3 v22.1.4.30-stable 2022-02-25
4 v22.1.3.7-stable v22.1.3.7-stable 2022-01-23 2022-01-23
5 v22.1.2.2-stable v22.1.2.2-stable 2022-01-19 2022-01-19
6 v21.12.4.1-stable v21.12.4.1-stable 2022-01-23 2022-01-23

View File

@ -0,0 +1,90 @@
---
title: 'ClickHouse 22.2 Released'
image: 'https://blog-images.clickhouse.com/en/2022/clickhouse-v22-2/featured.jpg'
date: '2022-02-23'
author: 'Alexey Milovidov'
tags: ['company', 'community']
---
We prepared a new ClickHouse release 22.2, so it's nice if you have tried it on 2022-02-22. If not, you can try it today. This latest release includes 2,140 new commits from 118 contributors, including 41 new contributors:
> Aaron Katz, Andre Marianiello, Andrew, Andrii Buriachevskyi, Brian Hunter, CoolT2, Federico Rodriguez, Filippov Denis, Gaurav Kumar, Geoff Genz, HarryLeeIBM, Heena Bansal, ILya Limarenko, Igor Nikonov, IlyaTsoi, Jake Liu, JaySon-Huang, Lemore, Leonid Krylov, Michail Safronov, Mikhail Fursov, Nikita, RogerYK, Roy Bellingan, Saad Ur Rahman, W, Yakov Olkhovskiy, alexeypavlenko, cnmade, grantovsky, hanqf-git, liuneng1994, mlkui, s-kat, tesw yew isal, vahid-sohrabloo, yakov-olkhovskiy, zhifeng, zkun, zxealous, 박동철.
Let me tell you what is most interesting in 22.2...
## Projections are production ready
Projections allow you to have multiple data representations in the same table. For example, you can have data aggregations along with the raw data. There are no restrictions on which aggregate functions can be used - you can have count distinct, quantiles, or whatever you want. You can have data in multiple different sorting orders. ClickHouse will automatically select the most suitable projection for your query, so the query will be automatically optimized.
Projections are somewhat similar to Materialized Views, which also allow you to have incremental aggregation and multiple sorting orders. But unlike Materialized Views, projections are updated atomically and consistently with the main table. The data for projections is being stored in the same "data parts" of the table and is being merged in the same way as the main data.
The feature was developed by **Amos Bird**, a prominent ClickHouse contributor. The [prototype](https://github.com/ClickHouse/ClickHouse/pull/20202) has been available since Feb 2021, it has been merged in the main codebase by **Nikolai Kochetov** in May 2021 under experimental flag, and after 21 follow-up pull requests we ensured that it passed the full set of test suites and enabled it by default.
Read an example of how to optimize queries with projections [in our docs](https://clickhouse.com/docs/en/getting-started/example-datasets/uk-price-paid/#speedup-with-projections).
## Control of file creation and rewriting on data export
When you export your data with an `INSERT INTO TABLE FUNCTION` statement into `file`, `s3` or `hdfs` and the target file already exists, you can now control how to deal with it: you can append new data into the file if it is possible, rewrite it with new data, or create another file with a similar name like 'data.1.parquet.gz'.
Some storage systems like `s3` and some formats like `Parquet` don't support data appending. In previous ClickHouse versions, if you insert multiple times into a file with Parquet data format, you will end up with a file that is not recognized by other systems. Now you can choose between throwing exceptions on subsequent inserts or creating more files.
So, new settings were introduced: `s3_truncate_on_insert`, `s3_create_new_file_on_insert`, `hdfs_truncate_on_insert`, `hdfs_create_new_file_on_insert`, `engine_file_allow_create_multiple_files`.
This feature [was developed](https://github.com/ClickHouse/ClickHouse/pull/33302) by **Pavel Kruglov**.
## Custom deduplication token
`ReplicatedMergeTree` and `MergeTree` types of tables implement block-level deduplication. When a block of data is inserted, its cryptographic hash is calculated and if the same block was already inserted before, then the duplicate is skipped and the insert query succeeds. This makes it possible to implement exactly-once semantics for inserts.
In ClickHouse version 22.2 you can provide your own deduplication token instead of an automatically calculated hash. This makes sense if you already have batch identifiers from some other system and you want to reuse them. It also makes sense when blocks can be identical but they should actually be inserted multiple times. Or the opposite - when blocks contain some random data and you want to deduplicate only by significant columns.
This is implemented by adding the setting `insert_deduplication_token`. The feature was contributed by **Igor Nikonov**.
## DEFAULT keyword for INSERT
A small addition for SQL compatibility - now we allow using the `DEFAULT` keyword instead of a value in `INSERT INTO ... VALUES` statement. It looks like this:
`INSERT INTO test VALUES (1, 'Hello', DEFAULT)`
Thanks to **Andrii Buriachevskyi** for this feature.
## EPHEMERAL columns
A column in a table can have a `DEFAULT` expression like `c INT DEFAULT a + b`. In ClickHouse you can also use `MATERIALIZED` instead of `DEFAULT` if you want the column to be always calculated with the provided expression instead of allowing a user to insert data. And you can use `ALIAS` if you don't want the column to be stored at all but instead to be calculated on the fly if referenced.
Since version 22.2 a new type of column is added: `EPHEMERAL` column. The user can insert data into this column but the column is not stored in a table, it's ephemeral. The purpose of this column is to provide data to calculate other columns that can reference it with `DEFAULT` or `MATERIALIZED` expressions.
This feature was made by **Yakov Olkhovskiy**.
## Improvements for multi-disk configuration
You can configure multiple disks to store ClickHouse data instead of managing RAID and ClickHouse will automatically manage the data placement.
Since version 22.2 ClickHouse can automatically repair broken disks without server restart by downloading the missing parts from replicas and placing them on the healthy disks.
This feature was implemented by **Amos Bird** and is already being used for more than 1.5 years in production at Kuaishou.
Another improvement is the option to specify TTL MOVE TO DISK/VOLUME **IF EXISTS**. It allows replicas with non-uniform disk configuration and to have one replica to move old data to cold storage while another replica has all the data on hot storage. Data will be moved only on replicas that have the specified disk or volume, hence *if exists*. This was developed by **Anton Popov**.
## Flexible memory limits
We split per-query and per-user memory limits into a pair of hard and soft limits. The settings `max_memory_usage` and `max_memory_usage_for_user` act as hard limits. When memory consumption is approaching the hard limit, an exception will be thrown. Two other settings: `max_guaranteed_memory_usage` and `max_guaranteed_memory_usage_for_user` act as soft limits.
A query will be allowed to use more memory than a soft limit if there is available memory. But if there will be memory shortage (relative to the per-user hard limit or total per-server memory consumption), we calculate the "overcommit ratio" - how much more memory every query is consuming relative to the soft limit - and we will kill the most overcommitted query to let other queries run.
In short, your query will not be limited to a few gigabytes of RAM if you have hundreds of gigabytes available.
This experimental feature was implemented by **Dmitry Novik** and is continuing to be developed.
## Shell-style comments in SQL
Now we allow comments starting with `# ` or `#!`, similar to MySQL. The variant with `#!` allows using shell scripts with "shebang" interpreted by `clickhouse-local`.
This feature was contributed by **Aaron Katz**. Very nice.
## And many more...
Maxim Kita, Danila Kutenin, Anton Popov, zhanglistar, Federico Rodriguez, Raúl Marín, Amos Bird and Alexey Milovidov have contributed a ton of performance optimizations for this release. We are obsessed with high performance, as usual. :)
Read the [full changelog](https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md) for the 22.2 release and follow [the roadmap](https://github.com/ClickHouse/ClickHouse/issues/32513).

View File

@ -0,0 +1,75 @@
---
title: 'Opensee: Analyzing Terabytes of Financial Data a Day With ClickHouse'
image: 'https://blog-images.clickhouse.com/en/2022/opensee/featured.png'
date: '2022-02-22'
author: 'Christophe Rivoire, Elena Bessis'
tags: ['company', 'community']
---
Wed like to welcome Christophe Rivoire (UK Country Manager) and Elena Bessis (Product Marketing Assistant) from Opensee as guests to our blog. Today, theyre telling us how their product, powered by ClickHouse, allows financial institutions business users to directly harness 100% of their vast quantities of data instantly and on demand, with no size limitations.
Opensee is a financial technology company providing real time self-service analytics solutions to financial institutions, which help them turn their big data challenges into a competitive advantage — unlocking vital opportunities led by business users. Opensee, formerly ICA, was started by a team of financial industry and technology experts frustrated that no simple big data analytics solution enabled them to dive deeper into all their data easily and efficiently, or perform what-if analysis on the hundreds of terabytes of data they were handling.
So they built their own.
## ClickHouse For Trillions Of Financial Data Points
Financial institutions have always been storing a lot of data (customer data, risk data, transaction data...) for their own decision processes and for regulatory reasons. Since the financial crisis, regulators all around the world have been significantly increasing the reporting requirements, insisting on longer historical ranges and deeper granularity. This combination has generated an exponential amount of data, which has forced financial institutions to review and upgrade their infrastructure. Opensee offers a solution to navigate all these very large data cubes, based on millions, billions or even trillions of data points. In order to build it, a data storage system capable of scaling horizontally with data and with fast OLAP query response time was required. In 2016, after thorough evaluation, Opensee concluded ClickHouse was the obvious solution.
There are many use cases that involve storing and leveraging massive amounts of data on a daily basis, but Opensee built from the strength of their own expertise, evaluating risk linked to activities in the financial market. There are various types of risks (market risk, credit risk, liquidity risk…) and all of them need to aggregate a lot of data in order to calculate linear or non-linear indicators, both business and regulatory, and analyze all those numbers on the fly.
!["Dashboard in Opensee for a Market Risk use case"](https://blog-images.clickhouse.com/en/2022/opensee/dashboard.png)
_Dashboard in Opensee for a Market Risk use case_
## ClickHouse for Scalability, Granularity, Speed and Cost Control
Financial institutions have sometimes believed that their ability to craft efficient storage solutions like data lakes for their vast amounts of data, typically built on a Hadoop stack, would make real-time analytics available. Unfortunately, many of these systems are too slow for at-scale analytics.
Running a query on a Hadoop data lake is just not an option for users with real-time needs! Banks experimented with different types of analytical layers between the data lakes and the users, in order to allow access to their stored data and to run analytics, but ran into new challenges: in-memory computing solutions have a lack of scalability and high hardware costs. Others tried query accelerators but were forced to analyze only prepared data (pre-aggregated or specifically indexed data), losing the granularity which is always required to understand things like daily changes. More recently, financial institutions have been contemplating cloud database management systems, but for very large datasets and calculations the speed of these services is far from what ClickHouse can achieve for their specific use cases.
Ultimately, none of these technologies could simultaneously combine scalability, granularity, speed and cost control, forcing financial institutions into a series of compromises. With Opensee, there is no need to compromise: the platform leverages ClickHouse's capacity to handle the huge volume that data lakes require and the fast response that in-memory databases can give, without the need to pre-aggregate the data.
!["Dashboard in Opensee for a Market Risk use case"](https://blog-images.clickhouse.com/en/2022/opensee/pivot-table.png)
_Pivot table from the Opensee UI on a liquidity use case_
## Opensee Architecture
Opensee provides a series of APIs which allows users to fully abstract all the complexity and in particular the physical data model. These APIs are typically used for data ingestion, data query, model management, etc. Thanks to Opensees low-code API, users dont need to access data through complex quasi-SQL queries, but rather through simple business queries that are optimized by Opensee to deliver performance. Opensees back end, which provides indirect access to Clickhouse, is written in Scala, while PostgreSQL contains all the configuration and context data that must be managed transactionally. Opensee also provides various options for front ends (dedicated Opensee web or rich user interface, Excel, others…) to interact with the data, navigate through the cube and leverage functionality like data versioning — built for the financial institutions use.
!["Dashboard in Opensee for a Market Risk use case"](https://blog-images.clickhouse.com/en/2022/opensee/architecture-chart.png)
_Opensee architecture chart_
## Advantages of ClickHouse
For Opensee, the most valuable feature is horizontal scalability, the capability to shard the data. Next comes the very fast dictionary lookup, rapid calculations with vectorization and the capability to manage array values. In the financial industry, where time series or historical data is everywhere, this capacity to calculate vectors and manage array values is critical.
On top of being a solution that is extremely fast and efficient, other advantages include:
- distributed and replicated, with high availability and a performant map/reduce system
- wide range of features fit for analytics
- really good and extensive format support (csv, json, parquet, orc, protobuf ....)
- very rapid evolutions through the high contributions of a wide community to a very popular Open Source technology
On top of these native ClickHouse strengths and functionalities, Opensee has developed a lot of other functionalities dedicated to financial institutions. To name only a few, a data versioning mechanism has been created allowing business users to either correct on the fly inaccurate data or simulate new values. This What If simulation feature can be used to add, amend or delete transactions,with full auditability and traceability, without deleting any data.
Another key feature is a Python processor which is available to define more complex calculations. Furthermore, the abstraction model layer has been built to remove the complexity of the physical data model for the users and optimize the queries. And, last but not least, in terms of visualization, a UI dedicated to financial institutions has been developed with and for its users.
## Dividing Hardware Costs By 10+
The cost efficiency factor is a key improvement for large financial institutions typically using in-memory computing technology. Dividing by ten (and sometimes more) the hardware cost is no small achievement! Being able to use very large datasets on standard servers on premise or in the cloud is a big achievement. With Opensee powered by ClickHouse, financial institutions are able to alleviate critical limitations of their existing solutions, avoiding legacy compromises and a lack of flexibility. Finally, these organizations are able to provide their users a turn-key solution to analyze all their data sets, which used to be siloed, in one single place, one single data model, one single infrastructure, and all of that in real time, combining very granular and very long historical ranges.
## About Opensee
Opensee empowers financial data divers to analyze deeper and faster. Headquartered in Paris, with offices in London and New York, Opensee is working with a trusted client base across global Tier 1 banks, asset managers, hedge funds and trading platforms.
For more information please visit [www.opensee.io](http://www.opensee.io) or follow them on [LinkedIn](https://www.linkedin.com/company/opensee-company) and [Twitter](https://twitter.com/opensee_io).

View File

@ -1,6 +1,6 @@
<div class="banner bg-light">
<div class="container">
<p class="text-center text-dark mb-0 mx-auto pt-1 pb-1">{{ _('ClickHouse v22.2 is coming soon! Add the Feb 17 Release Webinar to your calendar') }} <a
href="https://www.google.com/calendar/render?action=TEMPLATE&text=ClickHouse+v22.2+Release+Webinar&details=Join+from+a+PC%2C+Mac%2C+iPad%2C+iPhone+or+Android+device%3A%0A%C2%A0+%C2%A0+Please+click+this+URL+to+join.%C2%A0%0Ahttps%3A%2F%2Fzoom.us%2Fj%2F92785669470%3Fpwd%3DMkpCMU9KSmpNTGp6WmZmK2JqV0NwQT09%0A%0A%C2%A0+%C2%A0+Passcode%3A+139285%0A%0A%C2%A0Description%3A+Connect+with+ClickHouse+experts+and+test+out+the+newest+features+and+performance+gains+in+the+v22.2+release.%0A%0AOr+One+tap+mobile%3A%0A%C2%A0+%C2%A0+%2B12532158782%2C%2C92785669470%23%2C%2C%2C%2C%2A139285%23+US+%28Tacoma%29%0A%C2%A0+%C2%A0+%2B13462487799%2C%2C92785669470%23%2C%2C%2C%2C%2A139285%23+US+%28Houston%29%0A%0AOr+join+by+phone%3A%0A%C2%A0+%C2%A0+Dial%28for+higher+quality%2C+dial+a+number+based+on+your+current+location%29%3A%0A%C2%A0+%C2%A0+%C2%A0+%C2%A0+US%3A+%2B1+253+215+8782+or+%2B1+346+248+7799+or+%2B1+669+900+9128+or+%2B1+301+715+8592+or+%2B1+312+626+6799+or+%2B1+646+558+8656%C2%A0%0A%C2%A0+%C2%A0%C2%A0%0A%C2%A0+%C2%A0+Webinar+ID%3A+927+8566+9470%0A%C2%A0+%C2%A0+Passcode%3A+139285%0A%C2%A0+%C2%A0+International+numbers+available%3A+https%3A%2F%2Fzoom.us%2Fu%2FalqvP0je9&location=https%3A%2F%2Fzoom.us%2Fj%2F92785669470%3Fpwd%3DMkpCMU9KSmpNTGp6WmZmK2JqV0NwQT09&dates=20220217T170000Z%2F20220217T180000Z" target="_blank">here</a></p>
<p class="text-center text-dark mb-0 mx-auto pt-1 pb-1">{{ _('ClickHouse v22.3 is coming soon! Add the Mar 17 Release Webinar to your calendar') }} <a
href="http://www.google.com/calendar/event?action=TEMPLATE&dates=20220317T160000Z/20220317T170000Z&text=ClickHouse+v22.3+Release+Webinar&location=https%3A%2F%2Fzoom.us%2Fj%2F91955953263%3Fpwd%3DSXBKWW5ETkNMc1dmVWUxTUJKNm5hUT09&details=Please+click+the+link+below+to+join+the+webinar%3A%0D%0Ahttps%3A%2F%2Fzoom.us%2Fj%2F91955953263%3Fpwd%3DSXBKWW5ETkNMc1dmVWUxTUJKNm5hUT09%0D%0A%0D%0APasscode%3A+139285%0D%0A%0D%0AOr+One+tap+mobile+%3A+%0D%0A++++US%3A+%2B12532158782%2C%2C91955953263%23%2C%2C%2C%2C%2A139285%23++or+%2B13462487799%2C%2C91955953263%23%2C%2C%2C%2C%2A139285%23+%0D%0A%0D%0AOr+Telephone%3A%0D%0A++++Dial%28for+higher+quality%2C+dial+a+number+based+on+your+current+location%29%3A%0D%0A++++++++US%3A+%2B1+253+215+8782++or+%2B1+346+248+7799++or+%2B1+669+900+9128++or+%2B1+301+715+8592++or+%2B1+312+626+6799++or+%2B1+646+558+8656+%0D%0A%0D%0AWebinar+ID%3A+919+5595+3263%0D%0APasscode%3A+139285%0D%0A++++International+numbers+available%3A+https%3A%2F%2Fzoom.us%2Fu%2FasrDyM28Q" target="_blank">here</a></p>
</div>
</div>

View File

@ -3,7 +3,7 @@
<div class="container pt-5 pt-lg-7 pt-xl-15 pb-5 pb-lg-7">
<h1 class="display-1 mb-2 mb-xl-3 mx-auto text-center">
ClickHouse <span class="text-orange">v22.1 Released</span>
ClickHouse <span class="text-orange">v22.2 Released</span>
</h1>
<p class="lead mb-3 mb-lg-5 mb-xl-7 mx-auto text-muted text-center" style="max-width:780px;">
@ -11,7 +11,7 @@
</p>
<p class="d-flex justify-content-center mb-0">
<a href="https://www.youtube.com/watch?v=gP7I2SUBXig&ab_channel=ClickHouse" target="_blank" class="btn btn-primary trailing-link">Watch the Release Webinar on YouTube</a>
<a href="https://www.youtube.com/watch?v=6EG1gwhSTPg" target="_blank" class="btn btn-primary trailing-link">Watch the Release Webinar on YouTube</a>
</p>
</div>