From 7cd8831eb84b25fc65bf188b744f66f57699efb4 Mon Sep 17 00:00:00 2001 From: Alexey Milovidov Date: Mon, 11 Nov 2019 21:55:53 +0300 Subject: [PATCH 1/8] Added a test of javaHashUTF16LE for non BMP code points --- .../tests/queries/0_stateless/00800_function_java_hash.reference | 1 + dbms/tests/queries/0_stateless/00800_function_java_hash.sql | 1 + 2 files changed, 2 insertions(+) diff --git a/dbms/tests/queries/0_stateless/00800_function_java_hash.reference b/dbms/tests/queries/0_stateless/00800_function_java_hash.reference index 6efefd41459..5e1fde8441f 100644 --- a/dbms/tests/queries/0_stateless/00800_function_java_hash.reference +++ b/dbms/tests/queries/0_stateless/00800_function_java_hash.reference @@ -3,5 +3,6 @@ 138768 -2143570108 2145564783 +1258255525 96354 1470786104 diff --git a/dbms/tests/queries/0_stateless/00800_function_java_hash.sql b/dbms/tests/queries/0_stateless/00800_function_java_hash.sql index 2010b8d8311..42435ca42e8 100644 --- a/dbms/tests/queries/0_stateless/00800_function_java_hash.sql +++ b/dbms/tests/queries/0_stateless/00800_function_java_hash.sql @@ -3,5 +3,6 @@ select javaHash('874293087'); select javaHashUTF16LE(convertCharset('a1가', 'utf-8', 'utf-16le')); select javaHashUTF16LE(convertCharset('가나다라마바사아자차카타파하', 'utf-8', 'utf-16le')); select javaHashUTF16LE(convertCharset('FJKLDSJFIOLD_389159837589429', 'utf-8', 'utf-16le')); +select javaHashUTF16LE(convertCharset('𐐀𐐁𐐂𐐃𐐄', 'utf-8', 'utf-16le')); select hiveHash('abc'); select hiveHash('874293087'); From 841e268c5204f6f3ac4224b7e3389189615d5c68 Mon Sep 17 00:00:00 2001 From: zhukai Date: Tue, 19 Nov 2019 22:51:20 +0800 Subject: [PATCH 2/8] Translate the TTL part of MergeTree in the document into Chinese --- docs/zh/operations/table_engines/mergetree.md | 96 ++++++++++++++++++- 1 file changed, 94 insertions(+), 2 deletions(-) diff --git a/docs/zh/operations/table_engines/mergetree.md b/docs/zh/operations/table_engines/mergetree.md index 6d8baea8cf2..9ebf240a1c9 100644 --- a/docs/zh/operations/table_engines/mergetree.md +++ b/docs/zh/operations/table_engines/mergetree.md @@ -296,8 +296,100 @@ INDEX sample_index3 (lower(str), str) TYPE ngrambf_v1(3, 256, 2, 0) GRANULARITY 对表的读操作是自动并行的。 + +## 列和表的TTL {#table_engine-mergetree-ttl} + +TTL可以设置值的生命周期,它既可以为整张表设置,也可以为每个列字段单独设置。如果`TTL`同时作用于表和字段,ClickHouse会使用先到期的那个。 + +被设置TTL的表,必须拥有[Date](../../data_types/date.md) 或 [DateTime](../../data_types/datetime.md) 类型的字段。要定义数据的生命周期,需要在这个日期字段上使用操作符,例如: + +```sql +TTL time_column +TTL time_column + interval +``` + +要定义`interval`, 需要使用 [time interval](../../query_language/operators.md#operators-datetime) 操作符。 + +```sql +TTL date_time + INTERVAL 1 MONTH +TTL date_time + INTERVAL 15 HOUR +``` + +**列字段 TTL** + +当列字段中的值过期时, ClickHouse会将它们替换成数据类型的默认值。如果分区内,某一列的所有值均已过期,则ClickHouse会从文件系统中删除这个分区目录下的列文件。 + +`TTL`子句不能被用于主键字段。 + +示例说明: + +创建一张包含 `TTL` 的表 + +```sql +CREATE TABLE example_table +( + d DateTime, + a Int TTL d + INTERVAL 1 MONTH, + b Int TTL d + INTERVAL 1 MONTH, + c String +) +ENGINE = MergeTree +PARTITION BY toYYYYMM(d) +ORDER BY d; +``` + +为表中已存在的列字段添加 `TTL` + +```sql +ALTER TABLE example_table + MODIFY COLUMN + c String TTL d + INTERVAL 1 DAY; +``` + +修改列字段的 `TTL` + +```sql +ALTER TABLE example_table + MODIFY COLUMN + c String TTL d + INTERVAL 1 MONTH; +``` + +**表 TTL** + +当表内的数据过期时, ClickHouse会删除所有对应的行。 + +举例说明: + +创建一张包含 `TTL` 的表 + +```sql +CREATE TABLE example_table +( + d DateTime, + a Int +) +ENGINE = MergeTree +PARTITION BY toYYYYMM(d) +ORDER BY d +TTL d + INTERVAL 1 MONTH; +``` + +修改表的 `TTL` + +```sql +ALTER TABLE example_table + MODIFY TTL d + INTERVAL 1 DAY; +``` + +**删除数据** + +当ClickHouse合并数据分区时, 会删除TTL过期的数据。 + +当ClickHouse发现数据过期时, 它将会执行一个计划外的合并。要控制这类合并的频率, 你可以设置 [merge_with_ttl_timeout](#mergetree_setting-merge_with_ttl_timeout)。如果该值被设置的太低, 它将导致执行许多的计划外合并,这可能会消耗大量资源。 + +如果在合并的时候执行`SELECT` 查询, 则可能会得到过期的数据。为了避免这种情况,可以在`SELECT`之前使用 [OPTIMIZE](../../query_language/misc.md#misc_operations-optimize) 查询。 + + ## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes} -### Configuration {#table_engine-mergetree-multiple-volumes_configure} - [来源文章](https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/) From 5a007a36d23756f82d46b5bff5300213f2d14ca5 Mon Sep 17 00:00:00 2001 From: zhukai Date: Wed, 20 Nov 2019 10:20:32 +0800 Subject: [PATCH 3/8] Add the translate of merge_with_ttl_timeout setting. Solve the error "Link to nowhere: #mergetree_setting-merge_with_ttl_timeout" Translate the TTL part of MergeTree in the document into Chinese # Conflicts: # docs/zh/operations/table_engines/mergetree.md --- docs/zh/operations/table_engines/mergetree.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/zh/operations/table_engines/mergetree.md b/docs/zh/operations/table_engines/mergetree.md index 9ebf240a1c9..6ef600fe10e 100644 --- a/docs/zh/operations/table_engines/mergetree.md +++ b/docs/zh/operations/table_engines/mergetree.md @@ -72,6 +72,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] - `index_granularity` — 索引粒度。即索引中相邻『标记』间的数据行数。默认值,8192 。该列表中所有可用的参数可以从这里查看 [MergeTreeSettings.h](https://github.com/ClickHouse/ClickHouse/blob/master/dbms/src/Storages/MergeTree/MergeTreeSettings.h) 。 - `use_minimalistic_part_header_in_zookeeper` — 数据片段头在 ZooKeeper 中的存储方式。如果设置了 `use_minimalistic_part_header_in_zookeeper=1` ,ZooKeeper 会存储更少的数据。更多信息参考『服务配置参数』这章中的 [设置描述](../server_settings/settings.md#server-settings-use_minimalistic_part_header_in_zookeeper) 。 - `min_merge_bytes_to_use_direct_io` — 使用直接 I/O 来操作磁盘的合并操作时要求的最小数据量。合并数据片段时,ClickHouse 会计算要被合并的所有数据的总存储空间。如果大小超过了 `min_merge_bytes_to_use_direct_io` 设置的字节数,则 ClickHouse 将使用直接 I/O 接口(`O_DIRECT` 选项)对磁盘读写。如果设置 `min_merge_bytes_to_use_direct_io = 0` ,则会禁用直接 I/O。默认值:`10 * 1024 * 1024 * 1024` 字节。 + + - `merge_with_ttl_timeout` — TTL合并频率的最小间隔时间。默认值: 86400 (1 天)。 **示例配置** From 7e14a102fc370a8cf822ca8a95dccebd71c98e2c Mon Sep 17 00:00:00 2001 From: zhukai Date: Wed, 20 Nov 2019 11:59:49 +0800 Subject: [PATCH 4/8] Add the translate of merge_with_ttl_timeout setting. Solve the "Link to nowhere " error # Conflicts: # docs/zh/operations/table_engines/mergetree.md --- docs/zh/operations/table_engines/mergetree.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/zh/operations/table_engines/mergetree.md b/docs/zh/operations/table_engines/mergetree.md index 6ef600fe10e..e825a17ecd1 100644 --- a/docs/zh/operations/table_engines/mergetree.md +++ b/docs/zh/operations/table_engines/mergetree.md @@ -70,10 +70,14 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] - `SETTINGS` — 影响 `MergeTree` 性能的额外参数: - `index_granularity` — 索引粒度。即索引中相邻『标记』间的数据行数。默认值,8192 。该列表中所有可用的参数可以从这里查看 [MergeTreeSettings.h](https://github.com/ClickHouse/ClickHouse/blob/master/dbms/src/Storages/MergeTree/MergeTreeSettings.h) 。 + - `index_granularity_bytes` — 索引粒度,以字节为单位,默认值: 10Mb。如果仅按数据行数限制索引粒度, 请设置为0(不建议)。 + - `enable_mixed_granularity_parts` — 启用或禁用通过 `index_granularity_bytes` 控制索引粒度的大小。在19.11版本之前, 只有 `index_granularity` 配置能够用于限制索引粒度的大小。当从大表(数十或数百兆)中查询数据时候,`index_granularity_bytes` 配置能够提升ClickHouse的性能。如果你的表内数据量很大,可以开启这项配置用以提升`SELECT` 查询的性能。 - `use_minimalistic_part_header_in_zookeeper` — 数据片段头在 ZooKeeper 中的存储方式。如果设置了 `use_minimalistic_part_header_in_zookeeper=1` ,ZooKeeper 会存储更少的数据。更多信息参考『服务配置参数』这章中的 [设置描述](../server_settings/settings.md#server-settings-use_minimalistic_part_header_in_zookeeper) 。 - `min_merge_bytes_to_use_direct_io` — 使用直接 I/O 来操作磁盘的合并操作时要求的最小数据量。合并数据片段时,ClickHouse 会计算要被合并的所有数据的总存储空间。如果大小超过了 `min_merge_bytes_to_use_direct_io` 设置的字节数,则 ClickHouse 将使用直接 I/O 接口(`O_DIRECT` 选项)对磁盘读写。如果设置 `min_merge_bytes_to_use_direct_io = 0` ,则会禁用直接 I/O。默认值:`10 * 1024 * 1024 * 1024` 字节。 - `merge_with_ttl_timeout` — TTL合并频率的最小间隔时间。默认值: 86400 (1 天)。 + - `write_final_mark` — 启用或禁用在数据片段尾部写入最终索引标记。默认值: 1(不建议更改)。 + **示例配置** @@ -392,6 +396,4 @@ ALTER TABLE example_table 如果在合并的时候执行`SELECT` 查询, 则可能会得到过期的数据。为了避免这种情况,可以在`SELECT`之前使用 [OPTIMIZE](../../query_language/misc.md#misc_operations-optimize) 查询。 -## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes} - [来源文章](https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/) From 0bed00249f08f8412fdcc22fb3cf493ae3862d01 Mon Sep 17 00:00:00 2001 From: zhukai Date: Wed, 20 Nov 2019 12:38:40 +0800 Subject: [PATCH 5/8] Add the missing tags --- docs/zh/operations/table_engines/mergetree.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/zh/operations/table_engines/mergetree.md b/docs/zh/operations/table_engines/mergetree.md index e825a17ecd1..fc7b4967571 100644 --- a/docs/zh/operations/table_engines/mergetree.md +++ b/docs/zh/operations/table_engines/mergetree.md @@ -77,7 +77,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] - `merge_with_ttl_timeout` — TTL合并频率的最小间隔时间。默认值: 86400 (1 天)。 - `write_final_mark` — 启用或禁用在数据片段尾部写入最终索引标记。默认值: 1(不建议更改)。 - + - `storage_policy` — 存储策略。 参见 [使用多个区块装置进行数据存储](#table_engine-mergetree-multiple-volumes). **示例配置** @@ -121,7 +121,7 @@ MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID) 对于主要的配置方法,这里 `MergeTree` 引擎跟前面的例子一样,可以以同样的方式配置。 -## 数据存储 +## 数据存储 {#mergetree-data-storage} 表由按主键排序的数据 *片段* 组成。 @@ -396,4 +396,8 @@ ALTER TABLE example_table 如果在合并的时候执行`SELECT` 查询, 则可能会得到过期的数据。为了避免这种情况,可以在`SELECT`之前使用 [OPTIMIZE](../../query_language/misc.md#misc_operations-optimize) 查询。 +## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes} + +### Configuration {#table_engine-mergetree-multiple-volumes_configure} + [来源文章](https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/) From f8d3a0d5227f25094d88eadbcf664e5b1c9a12da Mon Sep 17 00:00:00 2001 From: Reilee <1324044+reilee@users.noreply.github.com> Date: Fri, 22 Nov 2019 12:28:23 +0900 Subject: [PATCH 6/8] Update index.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 本地人 -> 本地 --- docs/zh/interfaces/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/zh/interfaces/index.md b/docs/zh/interfaces/index.md index 5f0e536916c..3336aa4d105 100644 --- a/docs/zh/interfaces/index.md +++ b/docs/zh/interfaces/index.md @@ -3,7 +3,7 @@ ClickHouse提供了两个网络接口(两者都可以选择包装在TLS中以提高安全性): * [HTTP](http.md),记录在案,易于使用. -* [本地人TCP](tcp.md),这有较少的开销. +* [本地TCP](tcp.md),这有较少的开销. 在大多数情况下,建议使用适当的工具或库,而不是直接与这些工具或库进行交互。 Yandex的官方支持如下: * [命令行客户端](cli.md) From 2d9d116267e0eb05aeac2ce910d73b9dd86359ec Mon Sep 17 00:00:00 2001 From: Alexey Milovidov Date: Sat, 23 Nov 2019 03:18:56 +0300 Subject: [PATCH 7/8] Slightly better exception messages --- dbms/src/Common/Exception.cpp | 2 +- dbms/src/Interpreters/executeQuery.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dbms/src/Common/Exception.cpp b/dbms/src/Common/Exception.cpp index e49600a789e..0ee65293872 100644 --- a/dbms/src/Common/Exception.cpp +++ b/dbms/src/Common/Exception.cpp @@ -261,7 +261,7 @@ std::string getExceptionMessage(const Exception & e, bool with_stacktrace, bool stream << "Code: " << e.code() << ", e.displayText() = " << text; if (with_stacktrace && !has_embedded_stack_trace) - stream << ", Stack trace:\n\n" << e.getStackTrace().toString(); + stream << ", Stack trace (when copying this message, always include the lines below):\n\n" << e.getStackTrace().toString(); } catch (...) {} diff --git a/dbms/src/Interpreters/executeQuery.cpp b/dbms/src/Interpreters/executeQuery.cpp index c84aabd439d..41c8e288ffe 100644 --- a/dbms/src/Interpreters/executeQuery.cpp +++ b/dbms/src/Interpreters/executeQuery.cpp @@ -143,7 +143,7 @@ static void logException(Context & context, QueryLogElement & elem) LOG_ERROR(&Logger::get("executeQuery"), elem.exception << " (from " << context.getClientInfo().current_address.toString() << ")" << " (in query: " << joinLines(elem.query) << ")" - << (!elem.stack_trace.empty() ? ", Stack trace:\n\n" + elem.stack_trace : "")); + << (!elem.stack_trace.empty() ? ", Stack trace (when copying this message, always include the lines below):\n\n" + elem.stack_trace : "")); } From a357239f723ce9a8239d07671b5f4b0f5d3420ac Mon Sep 17 00:00:00 2001 From: tavplubix Date: Tue, 26 Nov 2019 16:08:54 +0300 Subject: [PATCH 8/8] Update CHANGELOG.md --- CHANGELOG.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1eadd966a06..cab51478199 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,13 +5,11 @@ ### New Feature * Add the ability to create dictionaries with DDL queries. [#7360](https://github.com/ClickHouse/ClickHouse/pull/7360) ([alesapin](https://github.com/alesapin)) -* Authentication in S3 table function and storage. Now we have complete support for S3 import/export. [#7623](https://github.com/ClickHouse/ClickHouse/pull/7623) ([Vladimir Chebotarev](https://github.com/excitoon)) * Make `bloom_filter` type of index supporting `LowCardinality` and `Nullable` [#7363](https://github.com/ClickHouse/ClickHouse/issues/7363) [#7561](https://github.com/ClickHouse/ClickHouse/pull/7561) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) * Add function `isValidJSON` to check that passed string is a valid json. [#5910](https://github.com/ClickHouse/ClickHouse/issues/5910) [#7293](https://github.com/ClickHouse/ClickHouse/pull/7293) ([Vdimir](https://github.com/Vdimir)) * Implement `arrayCompact` function [#7328](https://github.com/ClickHouse/ClickHouse/pull/7328) ([Memo](https://github.com/Joeywzr)) * Created function `hex` for Decimal numbers. It works like `hex(reinterpretAsString())`, but doesn't delete last zero bytes. [#7355](https://github.com/ClickHouse/ClickHouse/pull/7355) ([Mikhail Korotov](https://github.com/millb)) * Add `arrayFill` and `arrayReverseFill` functions, which replace elements by other elements in front/back of them in the array. [#7380](https://github.com/ClickHouse/ClickHouse/pull/7380) ([hcz](https://github.com/hczhcz)) -* Up precision of `avg` aggregate function result to max of `Decimal` type [#7446](https://github.com/ClickHouse/ClickHouse/pull/7446) ([Andrey Konyaev](https://github.com/akonyaev90)) * Add `CRC32IEEE()`/`CRC64()` support [#7480](https://github.com/ClickHouse/ClickHouse/pull/7480) ([Azat Khuzhin](https://github.com/azat)) * Implement `char` function similar to one in [mysql](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_char) [#7486](https://github.com/ClickHouse/ClickHouse/pull/7486) ([sundyli](https://github.com/sundy-li)) * Add `bitmapTransform` function. It transforms an array of values in a bitmap to another array of values, the result is a new bitmap [#7598](https://github.com/ClickHouse/ClickHouse/pull/7598) ([Zhichang Yu](https://github.com/yuzhichang))