mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-10 09:32:06 +00:00
Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into datasketches-uniq
This commit is contained in:
commit
69ac23e870
2
contrib/NuRaft
vendored
2
contrib/NuRaft
vendored
@ -1 +1 @@
|
||||
Subproject commit 3d3683e77753cfe015a05fae95ddf418e19f59e1
|
||||
Subproject commit 70468326ad5d72e9497944838484c591dae054ea
|
@ -144,7 +144,7 @@ This query changes the `name` column properties:
|
||||
|
||||
- TTL
|
||||
|
||||
For examples of columns TTL modifying, see [Column TTL](../../engines/table_engines/mergetree_family/mergetree.md#mergetree-column-ttl).
|
||||
For examples of columns TTL modifying, see [Column TTL](../../../engines/table-engines/mergetree-family/mergetree.md#mergetree-column-ttl).
|
||||
|
||||
If the `IF EXISTS` clause is specified, the query won’t return an error if the column doesn’t exist.
|
||||
|
||||
|
38
docs/zh/faq/terms_translation_zh.md
Normal file
38
docs/zh/faq/terms_translation_zh.md
Normal file
@ -0,0 +1,38 @@
|
||||
# 术语翻译约定
|
||||
本文档用来维护从英文翻译成中文的术语集。
|
||||
|
||||
|
||||
|
||||
## 保持英文,不译
|
||||
Parquet
|
||||
|
||||
## 英文 <-> 中文
|
||||
Integer 整数
|
||||
floating-point 浮点数
|
||||
Fitting 拟合
|
||||
Decimal 定点数
|
||||
Tuple 元组
|
||||
function 函数
|
||||
array 数组/阵列
|
||||
hash 哈希/散列
|
||||
Parameters 参数
|
||||
Arguments 参数
|
||||
|
||||
|
||||
##
|
||||
1. 对于array的翻译,保持初始翻译 数组/阵列 不变。
|
||||
|
||||
2. 对于倒装句。翻译时非直译,会调整语序。
|
||||
比如, groupArrayInsertAt 翻译中
|
||||
|
||||
``` text
|
||||
- `x` — [Expression] resulting in one of the [supported data types].
|
||||
```
|
||||
|
||||
``` text
|
||||
`x` — 生成所[支持的数据类型](数据)的[表达式]。
|
||||
```
|
||||
|
||||
3. See also 参见
|
||||
|
||||
|
@ -238,6 +238,6 @@ FROM
|
||||
```
|
||||
|
||||
!!! note "注"
|
||||
查看函数说明 [avg()](../sql-reference/aggregate-functions/reference.md#agg_function-avg) 和 [log()](../sql-reference/functions/math-functions.md) 。
|
||||
查看函数说明 [avg()](../sql-reference/aggregate-functions/reference/avg.md#agg_function-avg) 和 [log()](../sql-reference/functions/math-functions.md) 。
|
||||
|
||||
[原始文章](https://clickhouse.tech/docs/en/guides/apply_catboost_model/) <!--hide-->
|
||||
|
@ -988,15 +988,15 @@ ClickHouse生成异常
|
||||
|
||||
## count_distinct_implementation {#settings-count_distinct_implementation}
|
||||
|
||||
指定其中的 `uniq*` 函数应用于执行 [COUNT(DISTINCT …)](../../sql-reference/aggregate-functions/reference.md#agg_function-count) 建筑。
|
||||
指定其中的 `uniq*` 函数应用于执行 [COUNT(DISTINCT …)](../../sql-reference/aggregate-functions/reference/count.md#agg_function-count) 建筑。
|
||||
|
||||
可能的值:
|
||||
|
||||
- [uniq](../../sql-reference/aggregate-functions/reference.md#agg_function-uniq)
|
||||
- [uniqCombined](../../sql-reference/aggregate-functions/reference.md#agg_function-uniqcombined)
|
||||
- [uniqCombined64](../../sql-reference/aggregate-functions/reference.md#agg_function-uniqcombined64)
|
||||
- [uniqHLL12](../../sql-reference/aggregate-functions/reference.md#agg_function-uniqhll12)
|
||||
- [uniqExact](../../sql-reference/aggregate-functions/reference.md#agg_function-uniqexact)
|
||||
- [uniq](../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)
|
||||
- [uniqCombined](../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined)
|
||||
- [uniqCombined64](../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64)
|
||||
- [uniqHLL12](../../sql-reference/aggregate-functions/reference/uniqhll12.md#agg_function-uniqhll12)
|
||||
- [uniqExact](../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)
|
||||
|
||||
默认值: `uniqExact`.
|
||||
|
||||
|
@ -27,7 +27,7 @@ toc_title: 聚合函数组合器
|
||||
|
||||
## -State {#agg-functions-combinator-state}
|
||||
|
||||
如果应用此combinator,则聚合函数不会返回结果值(例如唯一值的数量 [uniq](reference.md#agg_function-uniq) 函数),但是返回聚合的中间状态(对于 `uniq`,返回的是计算唯一值的数量的哈希表)。 这是一个 `AggregateFunction(...)` 可用于进一步处理或存储在表中以完成稍后的聚合。
|
||||
如果应用此combinator,则聚合函数不会返回结果值(例如唯一值的数量 [uniq](./reference/uniq.md#agg_function-uniq) 函数),但是返回聚合的中间状态(对于 `uniq`,返回的是计算唯一值的数量的哈希表)。 这是一个 `AggregateFunction(...)` 可用于进一步处理或存储在表中以完成稍后的聚合。
|
||||
|
||||
要使用这些状态,请使用:
|
||||
|
||||
@ -209,7 +209,7 @@ FROM
|
||||
|
||||
让我们得到的人的名字,他们的年龄在于的时间间隔 `[30,60)` 和 `[60,75)`。 由于我们使用整数表示的年龄,我们得到的年龄 `[30, 59]` 和 `[60,74]` 间隔。
|
||||
|
||||
要在数组中聚合名称,我们使用 [groupArray](reference.md#agg_function-grouparray) 聚合函数。 这需要一个参数。 在我们的例子中,它是 `name` 列。 `groupArrayResample` 函数应该使用 `age` 按年龄聚合名称, 要定义所需的时间间隔,我们传入 `30, 75, 30` 参数给 `groupArrayResample` 函数。
|
||||
要在数组中聚合名称,我们使用 [groupArray](./reference/grouparray.md#agg_function-grouparray) 聚合函数。 这需要一个参数。 在我们的例子中,它是 `name` 列。 `groupArrayResample` 函数应该使用 `age` 按年龄聚合名称, 要定义所需的时间间隔,我们传入 `30, 75, 30` 参数给 `groupArrayResample` 函数。
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayResample(30, 75, 30)(name, age) FROM people
|
||||
|
@ -493,6 +493,6 @@ FROM
|
||||
|
||||
## sumMapFiltered(keys_to_keep)(keys, values) {#summapfilteredkeys-to-keepkeys-values}
|
||||
|
||||
和 [sumMap](reference.md#agg_functions-summap) 基本一致, 除了一个键数组作为参数传递。这在使用高基数key时尤其有用。
|
||||
和 [sumMap](./reference/summap.md#agg_functions-summap) 基本一致, 除了一个键数组作为参数传递。这在使用高基数key时尤其有用。
|
||||
|
||||
[原始文章](https://clickhouse.tech/docs/en/query_language/agg_functions/parametric_functions/) <!--hide-->
|
||||
|
File diff suppressed because it is too large
Load Diff
13
docs/zh/sql-reference/aggregate-functions/reference/any.md
Normal file
13
docs/zh/sql-reference/aggregate-functions/reference/any.md
Normal file
@ -0,0 +1,13 @@
|
||||
---
|
||||
toc_priority: 6
|
||||
---
|
||||
|
||||
# any {#agg_function-any}
|
||||
|
||||
选择第一个遇到的值。
|
||||
查询可以以任何顺序执行,甚至每次都以不同的顺序执行,因此此函数的结果是不确定的。
|
||||
要获得确定的结果,您可以使用 ‘min’ 或 ‘max’ 功能,而不是 ‘any’.
|
||||
|
||||
在某些情况下,可以依靠执行的顺序。 这适用于SELECT来自使用ORDER BY的子查询的情况。
|
||||
|
||||
当一个 `SELECT` 查询具有 `GROUP BY` 子句或至少一个聚合函数,ClickHouse(相对于MySQL)要求在所有表达式 `SELECT`, `HAVING`,和 `ORDER BY` 子句可以从键或聚合函数计算。 换句话说,从表中选择的每个列必须在键或聚合函数内使用。 要获得像MySQL这样的行为,您可以将其他列放在 `any` 聚合函数。
|
@ -0,0 +1,34 @@
|
||||
---
|
||||
toc_priority: 103
|
||||
---
|
||||
|
||||
# anyHeavy {#anyheavyx}
|
||||
|
||||
选择一个频繁出现的值,使用[heavy hitters](http://www.cs.umd.edu/~samir/498/karp.pdf) 算法。 如果某个值在查询的每个执行线程中出现的情况超过一半,则返回此值。 通常情况下,结果是不确定的。
|
||||
|
||||
``` sql
|
||||
anyHeavy(column)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `column` – The column name。
|
||||
|
||||
**示例**
|
||||
|
||||
使用 [OnTime](../../../getting-started/example-datasets/ontime.md) 数据集,并选择在 `AirlineID` 列任何频繁出现的值。
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT anyHeavy(AirlineID) AS res
|
||||
FROM ontime;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌───res─┐
|
||||
│ 19690 │
|
||||
└───────┘
|
||||
```
|
@ -0,0 +1,9 @@
|
||||
---
|
||||
toc_priority: 104
|
||||
---
|
||||
|
||||
## anyLast {#anylastx}
|
||||
|
||||
选择遇到的最后一个值。
|
||||
其结果和[any](../../../sql-reference/aggregate-functions/reference/any.md) 函数一样是不确定的 。
|
||||
|
@ -0,0 +1,64 @@
|
||||
---
|
||||
toc_priority: 106
|
||||
---
|
||||
|
||||
# argMax {#agg-function-argmax}
|
||||
|
||||
计算 `val` 最大值对应的 `arg` 值。 如果 `val` 最大值存在几个不同的 `arg` 值,输出遇到的第一个值。
|
||||
|
||||
这个函数的Tuple版本将返回 `val` 最大值对应的元组。本函数适合和 `SimpleAggregateFunction` 搭配使用。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
argMax(arg, val)
|
||||
```
|
||||
|
||||
或
|
||||
|
||||
``` sql
|
||||
argMax(tuple(arg, val))
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `arg` — Argument.
|
||||
- `val` — Value.
|
||||
|
||||
**返回值**
|
||||
|
||||
- `val` 最大值对应的 `arg` 值。
|
||||
|
||||
类型: 匹配 `arg` 类型。
|
||||
|
||||
对于输入中的元组:
|
||||
|
||||
- 元组 `(arg, val)`, 其中 `val` 最大值,`arg` 是对应的值。
|
||||
|
||||
类型: [元组](../../../sql-reference/data-types/tuple.md)。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─user─────┬─salary─┐
|
||||
│ director │ 5000 │
|
||||
│ manager │ 3000 │
|
||||
│ worker │ 1000 │
|
||||
└──────────┴────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT argMax(user, salary), argMax(tuple(user, salary), salary), argMax(tuple(user, salary)) FROM salary;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─argMax(user, salary)─┬─argMax(tuple(user, salary), salary)─┬─argMax(tuple(user, salary))─┐
|
||||
│ director │ ('director',5000) │ ('director',5000) │
|
||||
└──────────────────────┴─────────────────────────────────────┴─────────────────────────────┘
|
||||
```
|
@ -0,0 +1,37 @@
|
||||
---
|
||||
toc_priority: 105
|
||||
---
|
||||
|
||||
# argMin {#agg-function-argmin}
|
||||
|
||||
语法: `argMin(arg, val)` 或 `argMin(tuple(arg, val))`
|
||||
|
||||
计算 `val` 最小值对应的 `arg` 值。 如果 `val` 最小值存在几个不同的 `arg` 值,输出遇到的第一个(`arg`)值。
|
||||
|
||||
这个函数的Tuple版本将返回 `val` 最小值对应的tuple。本函数适合和`SimpleAggregateFunction`搭配使用。
|
||||
|
||||
**示例:**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─user─────┬─salary─┐
|
||||
│ director │ 5000 │
|
||||
│ manager │ 3000 │
|
||||
│ worker │ 1000 │
|
||||
└──────────┴────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT argMin(user, salary), argMin(tuple(user, salary)) FROM salary;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─argMin(user, salary)─┬─argMin(tuple(user, salary))─┐
|
||||
│ worker │ ('worker',1000) │
|
||||
└──────────────────────┴─────────────────────────────┘
|
||||
```
|
64
docs/zh/sql-reference/aggregate-functions/reference/avg.md
Normal file
64
docs/zh/sql-reference/aggregate-functions/reference/avg.md
Normal file
@ -0,0 +1,64 @@
|
||||
---
|
||||
toc_priority: 5
|
||||
---
|
||||
|
||||
# avg {#agg_function-avg}
|
||||
|
||||
计算算术平均值。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
avg(x)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `x` — 输入值, 必须是 [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), 或 [Decimal](../../../sql-reference/data-types/decimal.md)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 算术平均值,总是 [Float64](../../../sql-reference/data-types/float.md) 类型。
|
||||
- 输入参数 `x` 为空时返回 `NaN` 。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT avg(x) FROM values('x Int8', 0, 1, 2, 3, 4, 5);
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─avg(x)─┐
|
||||
│ 2.5 │
|
||||
└────────┘
|
||||
```
|
||||
|
||||
**示例**
|
||||
|
||||
创建一个临时表:
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
CREATE table test (t UInt8) ENGINE = Memory;
|
||||
```
|
||||
|
||||
获取算术平均值:
|
||||
|
||||
查询:
|
||||
|
||||
```
|
||||
SELECT avg(t) FROM test;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─avg(x)─┐
|
||||
│ nan │
|
||||
└────────┘
|
||||
```
|
@ -0,0 +1,84 @@
|
||||
---
|
||||
toc_priority: 107
|
||||
---
|
||||
|
||||
# avgWeighted {#avgweighted}
|
||||
|
||||
|
||||
计算 [加权算术平均值](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean)。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
avgWeighted(x, weight)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `x` — 值。
|
||||
- `weight` — 值的加权。
|
||||
|
||||
`x` 和 `weight` 的类型必须是
|
||||
[整数](../../../sql-reference/data-types/int-uint.md), 或
|
||||
[浮点数](../../../sql-reference/data-types/float.md), 或
|
||||
[定点数](../../../sql-reference/data-types/decimal.md),
|
||||
但是可以不一样。
|
||||
|
||||
**返回值**
|
||||
|
||||
- `NaN`。 如果所有的权重都等于0 或所提供的权重参数是空。
|
||||
- 加权平均值。 其他。
|
||||
|
||||
类型: 总是[Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT avgWeighted(x, w)
|
||||
FROM values('x Int8, w Int8', (4, 1), (1, 0), (10, 2))
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─avgWeighted(x, weight)─┐
|
||||
│ 8 │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT avgWeighted(x, w)
|
||||
FROM values('x Int8, w Int8', (0, 0), (1, 0), (10, 0))
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─avgWeighted(x, weight)─┐
|
||||
│ nan │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
CREATE table test (t UInt8) ENGINE = Memory;
|
||||
SELECT avgWeighted(t) FROM test
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─avgWeighted(x, weight)─┐
|
||||
│ nan │
|
||||
└────────────────────────┘
|
||||
```
|
@ -0,0 +1,13 @@
|
||||
---
|
||||
toc_priority: 250
|
||||
---
|
||||
|
||||
# categoricalInformationValue {#categoricalinformationvalue}
|
||||
|
||||
对于每个类别计算 `(P(tag = 1) - P(tag = 0))(log(P(tag = 1)) - log(P(tag = 0)))` 。
|
||||
|
||||
``` sql
|
||||
categoricalInformationValue(category1, category2, ..., tag)
|
||||
```
|
||||
|
||||
结果指示离散(分类)要素如何使用 `[category1, category2, ...]` 有助于使用学习模型预测`tag`的值。
|
15
docs/zh/sql-reference/aggregate-functions/reference/corr.md
Normal file
15
docs/zh/sql-reference/aggregate-functions/reference/corr.md
Normal file
@ -0,0 +1,15 @@
|
||||
---
|
||||
toc_priority: 107
|
||||
---
|
||||
|
||||
# corr {#corrx-y}
|
||||
|
||||
**语法**
|
||||
``` sql
|
||||
`corr(x, y)`
|
||||
```
|
||||
|
||||
计算Pearson相关系数: `Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)^2) * Σ((y - y̅)^2))`。
|
||||
|
||||
!!! note "注"
|
||||
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `corrStable` 函数。 它的工作速度较慢,但提供较低的计算错误。
|
70
docs/zh/sql-reference/aggregate-functions/reference/count.md
Normal file
70
docs/zh/sql-reference/aggregate-functions/reference/count.md
Normal file
@ -0,0 +1,70 @@
|
||||
---
|
||||
toc_priority: 1
|
||||
---
|
||||
|
||||
# count {#agg_function-count}
|
||||
|
||||
|
||||
计数行数或非空值。
|
||||
|
||||
ClickHouse支持以下 `count` 语法:
|
||||
- `count(expr)` 或 `COUNT(DISTINCT expr)`。
|
||||
- `count()` 或 `COUNT(*)`. 该 `count()` 语法是ClickHouse特定的。
|
||||
|
||||
**参数**
|
||||
|
||||
该函数可以采取:
|
||||
|
||||
- 零参数。
|
||||
- 一个 [表达式](../../../sql-reference/syntax.md#syntax-expressions)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 如果没有参数调用函数,它会计算行数。
|
||||
- 如果 [表达式](../../../sql-reference/syntax.md#syntax-expressions) 被传递,则该函数计数此表达式返回非null的次数。 如果表达式返回 [可为空](../../../sql-reference/data-types/nullable.md)类型的值,`count`的结果仍然不 `Nullable`。 如果表达式对于所有的行都返回 `NULL` ,则该函数返回 0 。
|
||||
|
||||
在这两种情况下,返回值的类型为 [UInt64](../../../sql-reference/data-types/int-uint.md)。
|
||||
|
||||
**详细信息**
|
||||
|
||||
ClickHouse支持 `COUNT(DISTINCT ...)` 语法,这种结构的行为取决于 [count_distinct_implementation](../../../operations/settings/settings.md#settings-count_distinct_implementation) 设置。 它定义了用于执行该操作的 [uniq\*](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)函数。 默认值是 [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)函数。
|
||||
|
||||
`SELECT count() FROM table` 这个查询未被优化,因为表中的条目数没有单独存储。 它从表中选择一个小列并计算其值的个数。
|
||||
|
||||
**示例**
|
||||
|
||||
示例1:
|
||||
|
||||
``` sql
|
||||
SELECT count() FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─count()─┐
|
||||
│ 5 │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
示例2:
|
||||
|
||||
``` sql
|
||||
SELECT name, value FROM system.settings WHERE name = 'count_distinct_implementation'
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─name──────────────────────────┬─value─────┐
|
||||
│ count_distinct_implementation │ uniqExact │
|
||||
└───────────────────────────────┴───────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT count(DISTINCT num) FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─uniqExact(num)─┐
|
||||
│ 3 │
|
||||
└────────────────┘
|
||||
```
|
||||
|
||||
这个例子表明 `count(DISTINCT num)` 是通过 `count_distinct_implementation` 的设定值 `uniqExact` 函数来执行的。
|
@ -0,0 +1,15 @@
|
||||
---
|
||||
toc_priority: 36
|
||||
---
|
||||
|
||||
# covarPop {#covarpop}
|
||||
|
||||
**语法**
|
||||
``` sql
|
||||
covarPop(x, y)
|
||||
```
|
||||
|
||||
计算 `Σ((x - x̅)(y - y̅)) / n` 的值。
|
||||
|
||||
!!! note "注"
|
||||
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `covarPopStable` 函数。 它的工作速度较慢,但提供了较低的计算错误。
|
@ -0,0 +1,17 @@
|
||||
---
|
||||
toc_priority: 37
|
||||
---
|
||||
|
||||
# covarSamp {#covarsamp}
|
||||
|
||||
**语法**
|
||||
``` sql
|
||||
covarSamp(x, y)
|
||||
```
|
||||
|
||||
计算 `Σ((x - x̅)(y - y̅)) / (n - 1)` 的值。
|
||||
|
||||
返回Float64。 当 `n <= 1`, 返回 +∞。
|
||||
|
||||
!!! note "注"
|
||||
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `covarSampStable` 函数。 它的工作速度较慢,但提供较低的计算错误。
|
@ -0,0 +1,69 @@
|
||||
---
|
||||
toc_priority: 141
|
||||
---
|
||||
|
||||
# deltaSum {#agg_functions-deltasum}
|
||||
|
||||
计算连续行之间的差值和。如果差值为负,则忽略。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
deltaSum(value)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `value` — 必须是 [整型](../../data-types/int-uint.md) 或者 [浮点型](../../data-types/float.md) 。
|
||||
|
||||
**返回值**
|
||||
|
||||
- `Integer` or `Float` 型的算术差值和。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT deltaSum(arrayJoin([1, 2, 3]));
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─deltaSum(arrayJoin([1, 2, 3]))─┐
|
||||
│ 2 │
|
||||
└────────────────────────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT deltaSum(arrayJoin([1, 2, 3, 0, 3, 4, 2, 3]));
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─deltaSum(arrayJoin([1, 2, 3, 0, 3, 4, 2, 3]))─┐
|
||||
│ 7 │
|
||||
└───────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT deltaSum(arrayJoin([2.25, 3, 4.5]));
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─deltaSum(arrayJoin([2.25, 3, 4.5]))─┐
|
||||
│ 2.25 │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [runningDifference](../../functions/other-functions.md#other_functions-runningdifference)
|
@ -0,0 +1,20 @@
|
||||
---
|
||||
toc_priority: 110
|
||||
---
|
||||
|
||||
# groupArray {#agg_function-grouparray}
|
||||
|
||||
**语法**
|
||||
``` sql
|
||||
groupArray(x)
|
||||
或
|
||||
groupArray(max_size)(x)
|
||||
```
|
||||
|
||||
创建参数值的数组。
|
||||
值可以按任何(不确定)顺序添加到数组中。
|
||||
|
||||
第二个版本(带有 `max_size` 参数)将结果数组的大小限制为 `max_size` 个元素。
|
||||
例如, `groupArray (1) (x)` 相当于 `[any (x)]` 。
|
||||
|
||||
在某些情况下,您仍然可以依赖执行顺序。这适用于SELECT(查询)来自使用了 `ORDER BY` 子查询的情况。
|
@ -0,0 +1,91 @@
|
||||
---
|
||||
toc_priority: 112
|
||||
---
|
||||
|
||||
# groupArrayInsertAt {#grouparrayinsertat}
|
||||
|
||||
在指定位置向数组中插入一个值。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupArrayInsertAt(default_x, size)(x, pos);
|
||||
```
|
||||
|
||||
如果在一个查询中将多个值插入到同一位置,则该函数的行为方式如下:
|
||||
|
||||
- 如果在单个线程中执行查询,则使用第一个插入的值。
|
||||
- 如果在多个线程中执行查询,则结果值是未确定的插入值之一。
|
||||
|
||||
**参数**
|
||||
|
||||
- `x` — 要插入的值。生成所[支持的数据类型](../../../sql-reference/data-types/index.md)(数据)的[表达式](../../../sql-reference/syntax.md#syntax-expressions)。
|
||||
- `pos` — 指定元素 `x` 将被插入的位置。 数组中的索引编号从零开始。 [UInt32](../../../sql-reference/data-types/int-uint.md#uint-ranges).
|
||||
- `default_x` — 在空位置替换的默认值。可选参数。生成 `x` 数据类型 (数据) 的[表达式](../../../sql-reference/syntax.md#syntax-expressions)。 如果 `default_x` 未定义,则 [默认值](../../../sql-reference/statements/create.md#create-default-values) 被使用。
|
||||
- `size`— 结果数组的长度。可选参数。如果使用该参数,必须指定默认值 `default_x` 。 [UInt32](../../../sql-reference/data-types/int-uint.md#uint-ranges)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 具有插入值的数组。
|
||||
|
||||
类型: [阵列](../../../sql-reference/data-types/array.md#data-type-array)。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayInsertAt(toString(number), number * 2) FROM numbers(5);
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─groupArrayInsertAt(toString(number), multiply(number, 2))─┐
|
||||
│ ['0','','1','','2','','3','','4'] │
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayInsertAt('-')(toString(number), number * 2) FROM numbers(5);
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─groupArrayInsertAt('-')(toString(number), multiply(number, 2))─┐
|
||||
│ ['0','-','1','-','2','-','3','-','4'] │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayInsertAt('-', 5)(toString(number), number * 2) FROM numbers(5);
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─groupArrayInsertAt('-', 5)(toString(number), multiply(number, 2))─┐
|
||||
│ ['0','-','1','-','2'] │
|
||||
└───────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
在一个位置多线程插入数据。
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayInsertAt(number, 0) FROM numbers_mt(10) SETTINGS max_block_size = 1;
|
||||
```
|
||||
|
||||
作为这个查询的结果,你会得到 `[0,9]` 范围的随机整数。 例如:
|
||||
|
||||
``` text
|
||||
┌─groupArrayInsertAt(number, 0)─┐
|
||||
│ [7] │
|
||||
└───────────────────────────────┘
|
||||
```
|
@ -0,0 +1,85 @@
|
||||
---
|
||||
toc_priority: 114
|
||||
---
|
||||
|
||||
# groupArrayMovingAvg {#agg_function-grouparraymovingavg}
|
||||
|
||||
计算输入值的移动平均值。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupArrayMovingAvg(numbers_for_summing)
|
||||
groupArrayMovingAvg(window_size)(numbers_for_summing)
|
||||
```
|
||||
|
||||
该函数可以将窗口大小作为参数。 如果未指定,则该函数的窗口大小等于列中的行数。
|
||||
|
||||
**参数**
|
||||
|
||||
- `numbers_for_summing` — [表达式](../../../sql-reference/syntax.md#syntax-expressions) 生成数值数据类型值。
|
||||
- `window_size` — 窗口大小。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 与输入数据大小相同的数组。
|
||||
|
||||
对于输入数据类型是[Integer](../../../sql-reference/data-types/int-uint.md),
|
||||
和[floating-point](../../../sql-reference/data-types/float.md),
|
||||
对应的返回值类型是 `Float64` 。
|
||||
对于输入数据类型是[Decimal](../../../sql-reference/data-types/decimal.md) 返回值类型是 `Decimal128` 。
|
||||
|
||||
该函数对于 `Decimal128` 使用 [四舍五入到零](https://en.wikipedia.org/wiki/Rounding#Rounding_towards_zero). 它截断无意义的小数位来保证结果的数据类型。
|
||||
|
||||
**示例**
|
||||
|
||||
样表 `t`:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t
|
||||
(
|
||||
`int` UInt8,
|
||||
`float` Float32,
|
||||
`dec` Decimal32(2)
|
||||
)
|
||||
ENGINE = TinyLog
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─int─┬─float─┬──dec─┐
|
||||
│ 1 │ 1.1 │ 1.10 │
|
||||
│ 2 │ 2.2 │ 2.20 │
|
||||
│ 4 │ 4.4 │ 4.40 │
|
||||
│ 7 │ 7.77 │ 7.77 │
|
||||
└─────┴───────┴──────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
groupArrayMovingAvg(int) AS I,
|
||||
groupArrayMovingAvg(float) AS F,
|
||||
groupArrayMovingAvg(dec) AS D
|
||||
FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─I────────────────────┬─F─────────────────────────────────────────────────────────────────────────────┬─D─────────────────────┐
|
||||
│ [0.25,0.75,1.75,3.5] │ [0.2750000059604645,0.8250000178813934,1.9250000417232513,3.8499999940395355] │ [0.27,0.82,1.92,3.86] │
|
||||
└──────────────────────┴───────────────────────────────────────────────────────────────────────────────┴───────────────────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
groupArrayMovingAvg(2)(int) AS I,
|
||||
groupArrayMovingAvg(2)(float) AS F,
|
||||
groupArrayMovingAvg(2)(dec) AS D
|
||||
FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─I───────────────┬─F───────────────────────────────────────────────────────────────────────────┬─D─────────────────────┐
|
||||
│ [0.5,1.5,3,5.5] │ [0.550000011920929,1.6500000357627869,3.3000000715255737,6.049999952316284] │ [0.55,1.65,3.30,6.08] │
|
||||
└─────────────────┴─────────────────────────────────────────────────────────────────────────────┴───────────────────────┘
|
||||
```
|
@ -0,0 +1,81 @@
|
||||
---
|
||||
toc_priority: 113
|
||||
---
|
||||
|
||||
# groupArrayMovingSum {#agg_function-grouparraymovingsum}
|
||||
|
||||
|
||||
计算输入值的移动和。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupArrayMovingSum(numbers_for_summing)
|
||||
groupArrayMovingSum(window_size)(numbers_for_summing)
|
||||
```
|
||||
|
||||
该函数可以将窗口大小作为参数。 如果未指定,则该函数的窗口大小等于列中的行数。
|
||||
|
||||
**参数**
|
||||
|
||||
- `numbers_for_summing` — [表达式](../../../sql-reference/syntax.md#syntax-expressions) 生成数值数据类型值。
|
||||
- `window_size` — 窗口大小。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 与输入数据大小相同的数组。
|
||||
对于输入数据类型是[Decimal](../../../sql-reference/data-types/decimal.md) 数组元素类型是 `Decimal128` 。
|
||||
对于其他的数值类型, 获取其对应的 `NearestFieldType` 。
|
||||
|
||||
**示例**
|
||||
|
||||
样表:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t
|
||||
(
|
||||
`int` UInt8,
|
||||
`float` Float32,
|
||||
`dec` Decimal32(2)
|
||||
)
|
||||
ENGINE = TinyLog
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─int─┬─float─┬──dec─┐
|
||||
│ 1 │ 1.1 │ 1.10 │
|
||||
│ 2 │ 2.2 │ 2.20 │
|
||||
│ 4 │ 4.4 │ 4.40 │
|
||||
│ 7 │ 7.77 │ 7.77 │
|
||||
└─────┴───────┴──────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
groupArrayMovingSum(int) AS I,
|
||||
groupArrayMovingSum(float) AS F,
|
||||
groupArrayMovingSum(dec) AS D
|
||||
FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─I──────────┬─F───────────────────────────────┬─D──────────────────────┐
|
||||
│ [1,3,7,14] │ [1.1,3.3000002,7.7000003,15.47] │ [1.10,3.30,7.70,15.47] │
|
||||
└────────────┴─────────────────────────────────┴────────────────────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
groupArrayMovingSum(2)(int) AS I,
|
||||
groupArrayMovingSum(2)(float) AS F,
|
||||
groupArrayMovingSum(2)(dec) AS D
|
||||
FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─I──────────┬─F───────────────────────────────┬─D──────────────────────┐
|
||||
│ [1,3,6,11] │ [1.1,3.3000002,6.6000004,12.17] │ [1.10,3.30,6.60,12.17] │
|
||||
└────────────┴─────────────────────────────────┴────────────────────────┘
|
||||
```
|
@ -0,0 +1,82 @@
|
||||
---
|
||||
toc_priority: 114
|
||||
---
|
||||
|
||||
# groupArraySample {#grouparraysample}
|
||||
|
||||
构建一个参数值的采样数组。
|
||||
结果数组的大小限制为 `max_size` 个元素。参数值被随机选择并添加到数组中。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupArraySample(max_size[, seed])(x)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `max_size` — 结果数组的最大长度。[UInt64](../../data-types/int-uint.md)。
|
||||
- `seed` — 随机数发生器的种子。可选。[UInt64](../../data-types/int-uint.md)。默认值: `123456`。
|
||||
- `x` — 参数 (列名 或者 表达式)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 随机选取参数 `x` (的值)组成的数组。
|
||||
|
||||
类型: [Array](../../../sql-reference/data-types/array.md).
|
||||
|
||||
**示例**
|
||||
|
||||
样表 `colors`:
|
||||
|
||||
``` text
|
||||
┌─id─┬─color──┐
|
||||
│ 1 │ red │
|
||||
│ 2 │ blue │
|
||||
│ 3 │ green │
|
||||
│ 4 │ white │
|
||||
│ 5 │ orange │
|
||||
└────┴────────┘
|
||||
```
|
||||
|
||||
使用列名做参数查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupArraySample(3)(color) as newcolors FROM colors;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌─newcolors──────────────────┐
|
||||
│ ['white','blue','green'] │
|
||||
└────────────────────────────┘
|
||||
```
|
||||
|
||||
使用列名和不同的(随机数)种子查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupArraySample(3, 987654321)(color) as newcolors FROM colors;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌─newcolors──────────────────┐
|
||||
│ ['red','orange','green'] │
|
||||
└────────────────────────────┘
|
||||
```
|
||||
|
||||
使用表达式做参数查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupArraySample(3)(concat('light-', color)) as newcolors FROM colors;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌─newcolors───────────────────────────────────┐
|
||||
│ ['light-blue','light-orange','light-green'] │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,48 @@
|
||||
---
|
||||
toc_priority: 125
|
||||
---
|
||||
|
||||
# groupBitAnd {#groupbitand}
|
||||
|
||||
对于数字序列按位应用 `AND` 。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupBitAnd(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` – 结果为 `UInt*` 类型的表达式。
|
||||
|
||||
**返回值**
|
||||
|
||||
`UInt*` 类型的值。
|
||||
|
||||
**示例**
|
||||
|
||||
测试数据:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
00101100 = 44
|
||||
00011100 = 28
|
||||
00001101 = 13
|
||||
01010101 = 85
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupBitAnd(num) FROM t
|
||||
```
|
||||
|
||||
`num` 是包含测试数据的列。
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
00000100 = 4
|
||||
```
|
@ -0,0 +1,46 @@
|
||||
---
|
||||
toc_priority: 128
|
||||
---
|
||||
|
||||
# groupBitmap {#groupbitmap}
|
||||
|
||||
从无符号整数列进行位图或聚合计算,返回 `UInt64` 类型的基数,如果添加后缀 `State` ,则返回[位图对象](../../../sql-reference/functions/bitmap-functions.md)。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupBitmap(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` – 结果为 `UInt*` 类型的表达式。
|
||||
|
||||
**返回值**
|
||||
|
||||
`UInt64` 类型的值。
|
||||
|
||||
**示例**
|
||||
|
||||
测试数据:
|
||||
|
||||
``` text
|
||||
UserID
|
||||
1
|
||||
1
|
||||
2
|
||||
3
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupBitmap(UserID) as num FROM t
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
num
|
||||
3
|
||||
```
|
@ -0,0 +1,48 @@
|
||||
---
|
||||
toc_priority: 129
|
||||
---
|
||||
|
||||
# groupBitmapAnd {#groupbitmapand}
|
||||
|
||||
计算位图列的 `AND` ,返回 `UInt64` 类型的基数,如果添加后缀 `State` ,则返回 [位图对象](../../../sql-reference/functions/bitmap-functions.md)。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupBitmapAnd(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` – 结果为 `AggregateFunction(groupBitmap, UInt*)` 类型的表达式。
|
||||
|
||||
**返回值**
|
||||
|
||||
`UInt64` 类型的值。
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
DROP TABLE IF EXISTS bitmap_column_expr_test2;
|
||||
CREATE TABLE bitmap_column_expr_test2
|
||||
(
|
||||
tag_id String,
|
||||
z AggregateFunction(groupBitmap, UInt32)
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY tag_id;
|
||||
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32))));
|
||||
|
||||
SELECT groupBitmapAnd(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─groupBitmapAnd(z)─┐
|
||||
│ 3 │
|
||||
└───────────────────┘
|
||||
|
||||
SELECT arraySort(bitmapToArray(groupBitmapAndState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─arraySort(bitmapToArray(groupBitmapAndState(z)))─┐
|
||||
│ [6,8,10] │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,48 @@
|
||||
---
|
||||
toc_priority: 130
|
||||
---
|
||||
|
||||
# groupBitmapOr {#groupbitmapor}
|
||||
|
||||
计算位图列的 `OR` ,返回 `UInt64` 类型的基数,如果添加后缀 `State` ,则返回 [位图对象](../../../sql-reference/functions/bitmap-functions.md)。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupBitmapOr(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` – 结果为 `AggregateFunction(groupBitmap, UInt*)` 类型的表达式。
|
||||
|
||||
**返回值**
|
||||
|
||||
`UInt64` 类型的值。
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
DROP TABLE IF EXISTS bitmap_column_expr_test2;
|
||||
CREATE TABLE bitmap_column_expr_test2
|
||||
(
|
||||
tag_id String,
|
||||
z AggregateFunction(groupBitmap, UInt32)
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY tag_id;
|
||||
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32))));
|
||||
|
||||
SELECT groupBitmapOr(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─groupBitmapOr(z)─┐
|
||||
│ 15 │
|
||||
└──────────────────┘
|
||||
|
||||
SELECT arraySort(bitmapToArray(groupBitmapOrState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─arraySort(bitmapToArray(groupBitmapOrState(z)))─┐
|
||||
│ [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,48 @@
|
||||
---
|
||||
toc_priority: 131
|
||||
---
|
||||
|
||||
# groupBitmapXor {#groupbitmapxor}
|
||||
|
||||
计算位图列的 `XOR` ,返回 `UInt64` 类型的基数,如果添加后缀 `State` ,则返回 [位图对象](../../../sql-reference/functions/bitmap-functions.md)。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupBitmapXor(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` – 结果为 `AggregateFunction(groupBitmap, UInt*)` 类型的表达式。
|
||||
|
||||
**返回值**
|
||||
|
||||
`UInt64` 类型的值。
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
DROP TABLE IF EXISTS bitmap_column_expr_test2;
|
||||
CREATE TABLE bitmap_column_expr_test2
|
||||
(
|
||||
tag_id String,
|
||||
z AggregateFunction(groupBitmap, UInt32)
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY tag_id;
|
||||
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32))));
|
||||
|
||||
SELECT groupBitmapXor(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─groupBitmapXor(z)─┐
|
||||
│ 10 │
|
||||
└───────────────────┘
|
||||
|
||||
SELECT arraySort(bitmapToArray(groupBitmapXorState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─arraySort(bitmapToArray(groupBitmapXorState(z)))─┐
|
||||
│ [1,3,5,6,8,10,11,13,14,15] │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,48 @@
|
||||
---
|
||||
toc_priority: 126
|
||||
---
|
||||
|
||||
# groupBitOr {#groupbitor}
|
||||
|
||||
对于数字序列按位应用 `OR` 。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupBitOr(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` – 结果为 `UInt*` 类型的表达式。
|
||||
|
||||
**返回值**
|
||||
|
||||
`UInt*` 类型的值。
|
||||
|
||||
**示例**
|
||||
|
||||
测试数据::
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
00101100 = 44
|
||||
00011100 = 28
|
||||
00001101 = 13
|
||||
01010101 = 85
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupBitOr(num) FROM t
|
||||
```
|
||||
|
||||
`num` 是包含测试数据的列。
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
01111101 = 125
|
||||
```
|
@ -0,0 +1,48 @@
|
||||
---
|
||||
toc_priority: 127
|
||||
---
|
||||
|
||||
# groupBitXor {#groupbitxor}
|
||||
|
||||
对于数字序列按位应用 `XOR` 。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupBitXor(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` – 结果为 `UInt*` 类型的表达式。
|
||||
|
||||
**返回值**
|
||||
|
||||
`UInt*` 类型的值。
|
||||
|
||||
**示例**
|
||||
|
||||
测试数据:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
00101100 = 44
|
||||
00011100 = 28
|
||||
00001101 = 13
|
||||
01010101 = 85
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT groupBitXor(num) FROM t
|
||||
```
|
||||
|
||||
`num` 是包含测试数据的列。
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
01101000 = 104
|
||||
```
|
@ -0,0 +1,18 @@
|
||||
---
|
||||
toc_priority: 111
|
||||
---
|
||||
|
||||
# groupUniqArray {#groupuniqarray}
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
groupUniqArray(x)
|
||||
或
|
||||
groupUniqArray(max_size)(x)
|
||||
```
|
||||
|
||||
从不同的参数值创建一个数组。 内存消耗和 [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md) 函数是一样的。
|
||||
|
||||
第二个版本(带有 `max_size` 参数)将结果数组的大小限制为 `max_size` 个元素。
|
||||
例如, `groupUniqArray(1)(x)` 相当于 `[any(x)]`.
|
72
docs/zh/sql-reference/aggregate-functions/reference/index.md
Normal file
72
docs/zh/sql-reference/aggregate-functions/reference/index.md
Normal file
@ -0,0 +1,72 @@
|
||||
---
|
||||
toc_folder_title: Reference
|
||||
toc_priority: 36
|
||||
toc_hidden: true
|
||||
---
|
||||
|
||||
# 聚合函数列表 {#aggregate-functions-reference}
|
||||
|
||||
标准聚合函数:
|
||||
|
||||
- [count](../../../sql-reference/aggregate-functions/reference/count.md)
|
||||
- [min](../../../sql-reference/aggregate-functions/reference/min.md)
|
||||
- [max](../../../sql-reference/aggregate-functions/reference/max.md)
|
||||
- [sum](../../../sql-reference/aggregate-functions/reference/sum.md)
|
||||
- [avg](../../../sql-reference/aggregate-functions/reference/avg.md)
|
||||
- [any](../../../sql-reference/aggregate-functions/reference/any.md)
|
||||
- [stddevPop](../../../sql-reference/aggregate-functions/reference/stddevpop.md)
|
||||
- [stddevSamp](../../../sql-reference/aggregate-functions/reference/stddevsamp.md)
|
||||
- [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md)
|
||||
- [varSamp](../../../sql-reference/aggregate-functions/reference/varsamp.md)
|
||||
- [covarPop](../../../sql-reference/aggregate-functions/reference/covarpop.md)
|
||||
- [covarSamp](../../../sql-reference/aggregate-functions/reference/covarsamp.md)
|
||||
|
||||
ClickHouse 特有的聚合函数:
|
||||
|
||||
- [anyHeavy](../../../sql-reference/aggregate-functions/reference/anyheavy.md)
|
||||
- [anyLast](../../../sql-reference/aggregate-functions/reference/anylast.md)
|
||||
- [argMin](../../../sql-reference/aggregate-functions/reference/argmin.md)
|
||||
- [argMax](../../../sql-reference/aggregate-functions/reference/argmax.md)
|
||||
- [avgWeighted](../../../sql-reference/aggregate-functions/reference/avgweighted.md)
|
||||
- [topK](../../../sql-reference/aggregate-functions/reference/topk.md)
|
||||
- [topKWeighted](../../../sql-reference/aggregate-functions/reference/topkweighted.md)
|
||||
- [groupArray](../../../sql-reference/aggregate-functions/reference/grouparray.md)
|
||||
- [groupUniqArray](../../../sql-reference/aggregate-functions/reference/groupuniqarray.md)
|
||||
- [groupArrayInsertAt](../../../sql-reference/aggregate-functions/reference/grouparrayinsertat.md)
|
||||
- [groupArrayMovingAvg](../../../sql-reference/aggregate-functions/reference/grouparraymovingavg.md)
|
||||
- [groupArrayMovingSum](../../../sql-reference/aggregate-functions/reference/grouparraymovingsum.md)
|
||||
- [groupBitAnd](../../../sql-reference/aggregate-functions/reference/groupbitand.md)
|
||||
- [groupBitOr](../../../sql-reference/aggregate-functions/reference/groupbitor.md)
|
||||
- [groupBitXor](../../../sql-reference/aggregate-functions/reference/groupbitxor.md)
|
||||
- [groupBitmap](../../../sql-reference/aggregate-functions/reference/groupbitmap.md)
|
||||
- [groupBitmapAnd](../../../sql-reference/aggregate-functions/reference/groupbitmapand.md)
|
||||
- [groupBitmapOr](../../../sql-reference/aggregate-functions/reference/groupbitmapor.md)
|
||||
- [groupBitmapXor](../../../sql-reference/aggregate-functions/reference/groupbitmapxor.md)
|
||||
- [sumWithOverflow](../../../sql-reference/aggregate-functions/reference/sumwithoverflow.md)
|
||||
- [sumMap](../../../sql-reference/aggregate-functions/reference/summap.md)
|
||||
- [minMap](../../../sql-reference/aggregate-functions/reference/minmap.md)
|
||||
- [maxMap](../../../sql-reference/aggregate-functions/reference/maxmap.md)
|
||||
- [skewSamp](../../../sql-reference/aggregate-functions/reference/skewsamp.md)
|
||||
- [skewPop](../../../sql-reference/aggregate-functions/reference/skewpop.md)
|
||||
- [kurtSamp](../../../sql-reference/aggregate-functions/reference/kurtsamp.md)
|
||||
- [kurtPop](../../../sql-reference/aggregate-functions/reference/kurtpop.md)
|
||||
- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md)
|
||||
- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md)
|
||||
- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md)
|
||||
- [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md)
|
||||
- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniqhll12.md)
|
||||
- [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md)
|
||||
- [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md)
|
||||
- [quantileExactLow](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexactlow)
|
||||
- [quantileExactHigh](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexacthigh)
|
||||
- [quantileExactWeighted](../../../sql-reference/aggregate-functions/reference/quantileexactweighted.md)
|
||||
- [quantileTiming](../../../sql-reference/aggregate-functions/reference/quantiletiming.md)
|
||||
- [quantileTimingWeighted](../../../sql-reference/aggregate-functions/reference/quantiletimingweighted.md)
|
||||
- [quantileDeterministic](../../../sql-reference/aggregate-functions/reference/quantiledeterministic.md)
|
||||
- [quantileTDigest](../../../sql-reference/aggregate-functions/reference/quantiletdigest.md)
|
||||
- [quantileTDigestWeighted](../../../sql-reference/aggregate-functions/reference/quantiletdigestweighted.md)
|
||||
- [simpleLinearRegression](../../../sql-reference/aggregate-functions/reference/simplelinearregression.md)
|
||||
- [stochasticLinearRegression](../../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md)
|
||||
- [stochasticLogisticRegression](../../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md)
|
||||
- [categoricalInformationValue](../../../sql-reference/aggregate-functions/reference/categoricalinformationvalue.md)
|
@ -0,0 +1,37 @@
|
||||
---
|
||||
toc_priority: 150
|
||||
---
|
||||
|
||||
## initializeAggregation {#initializeaggregation}
|
||||
|
||||
初始化你输入行的聚合。用于后缀是 `State` 的函数。
|
||||
用它来测试或处理 `AggregateFunction` 和 `AggregationgMergeTree` 类型的列。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
initializeAggregation (aggregate_function, column_1, column_2)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `aggregate_function` — 聚合函数名。 这个函数的状态 — 正创建的。[String](../../../sql-reference/data-types/string.md#string)。
|
||||
- `column_n` — 将其转换为函数的参数的列。[String](../../../sql-reference/data-types/string.md#string)。
|
||||
|
||||
**返回值**
|
||||
|
||||
返回输入行的聚合结果。返回类型将与 `initializeAgregation` 用作第一个参数的函数的返回类型相同。
|
||||
例如,对于后缀为 `State` 的函数,返回类型将是 `AggregateFunction`。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
```sql
|
||||
SELECT uniqMerge(state) FROM (SELECT initializeAggregation('uniqState', number % 3) AS state FROM system.numbers LIMIT 10000);
|
||||
```
|
||||
结果:
|
||||
|
||||
┌─uniqMerge(state)─┐
|
||||
│ 3 │
|
||||
└──────────────────┘
|
@ -0,0 +1,26 @@
|
||||
---
|
||||
toc_priority: 153
|
||||
---
|
||||
|
||||
# kurtPop {#kurtpop}
|
||||
|
||||
计算给定序列的 [峰度](https://en.wikipedia.org/wiki/Kurtosis)。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
kurtPop(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` — 结果为数字的 [表达式](../../../sql-reference/syntax.md#syntax-expressions)。
|
||||
|
||||
**返回值**
|
||||
|
||||
给定分布的峰度。 类型 — [Float64](../../../sql-reference/data-types/float.md)
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
SELECT kurtPop(value) FROM series_with_value_column;
|
@ -0,0 +1,28 @@
|
||||
---
|
||||
toc_priority: 154
|
||||
---
|
||||
|
||||
# kurtSamp {#kurtsamp}
|
||||
|
||||
计算给定序列的 [峰度样本](https://en.wikipedia.org/wiki/Kurtosis)。
|
||||
它表示随机变量峰度的无偏估计,如果传递的值形成其样本。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
kurtSamp(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` — 结果为数字的 [表达式](../../../sql-reference/syntax.md#syntax-expressions)。
|
||||
|
||||
**返回值**
|
||||
|
||||
给定序列的峰度。类型 — [Float64](../../../sql-reference/data-types/float.md)。 如果 `n <= 1` (`n` 是样本的大小),则该函数返回 `nan`。
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
SELECT kurtSamp(value) FROM series_with_value_column;
|
||||
```
|
@ -0,0 +1,72 @@
|
||||
---
|
||||
toc_priority: 310
|
||||
toc_title: mannWhitneyUTest
|
||||
---
|
||||
|
||||
# mannWhitneyUTest {#mannwhitneyutest}
|
||||
|
||||
对两个总体的样本应用 Mann-Whitney 秩检验。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
mannWhitneyUTest[(alternative[, continuity_correction])](sample_data, sample_index)
|
||||
```
|
||||
|
||||
两个样本的值都在 `sample_data` 列中。如果 `sample_index` 等于 0,则该行的值属于第一个总体的样本。 反之属于第二个总体的样本。
|
||||
零假设是两个总体随机相等。也可以检验单边假设。该检验不假设数据具有正态分布。
|
||||
|
||||
**参数**
|
||||
|
||||
- `sample_data` — 样本数据。[Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) 或 [Decimal](../../../sql-reference/data-types/decimal.md)。
|
||||
- `sample_index` — 样本索引。[Integer](../../../sql-reference/data-types/int-uint.md).
|
||||
|
||||
**参数**
|
||||
|
||||
- `alternative` — 供选假设。(可选,默认值是: `'two-sided'` 。) [String](../../../sql-reference/data-types/string.md)。
|
||||
- `'two-sided'`;
|
||||
- `'greater'`;
|
||||
- `'less'`。
|
||||
- `continuity_correction` — 如果不为0,那么将对p值进行正态近似的连续性修正。(可选,默认:1。) [UInt64](../../../sql-reference/data-types/int-uint.md)。
|
||||
|
||||
**返回值**
|
||||
|
||||
[元组](../../../sql-reference/data-types/tuple.md),有两个元素:
|
||||
|
||||
- 计算出U统计量。[Float64](../../../sql-reference/data-types/float.md)。
|
||||
- 计算出的p值。[Float64](../../../sql-reference/data-types/float.md)。
|
||||
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─sample_data─┬─sample_index─┐
|
||||
│ 10 │ 0 │
|
||||
│ 11 │ 0 │
|
||||
│ 12 │ 0 │
|
||||
│ 1 │ 1 │
|
||||
│ 2 │ 1 │
|
||||
│ 3 │ 1 │
|
||||
└─────────────┴──────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT mannWhitneyUTest('greater')(sample_data, sample_index) FROM mww_ttest;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─mannWhitneyUTest('greater')(sample_data, sample_index)─┐
|
||||
│ (9,0.04042779918503192) │
|
||||
└────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [Mann–Whitney U test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test)
|
||||
- [Stochastic ordering](https://en.wikipedia.org/wiki/Stochastic_ordering)
|
@ -0,0 +1,7 @@
|
||||
---
|
||||
toc_priority: 3
|
||||
---
|
||||
|
||||
# max {#agg_function-max}
|
||||
|
||||
计算最大值。
|
@ -0,0 +1,33 @@
|
||||
---
|
||||
toc_priority: 143
|
||||
---
|
||||
|
||||
# maxMap {#agg_functions-maxmap}
|
||||
|
||||
**语法**
|
||||
|
||||
```sql
|
||||
maxMap(key, value)
|
||||
或
|
||||
maxMap(Tuple(key, value))
|
||||
```
|
||||
|
||||
|
||||
根据 `key` 数组中指定的键对 `value` 数组计算最大值。
|
||||
|
||||
传递 `key` 和 `value` 数组的元组与传递 `key` 和 `value` 的两个数组是同义的。
|
||||
要总计的每一行的 `key` 和 `value` (数组)元素的数量必须相同。
|
||||
返回两个数组组成的元组: 排好序的`key` 和对应 `key` 的 `value` 计算值(最大值)。
|
||||
|
||||
示例:
|
||||
|
||||
``` sql
|
||||
SELECT maxMap(a, b)
|
||||
FROM values('a Array(Int32), b Array(Int64)', ([1, 2], [2, 2]), ([2, 3], [1, 1]))
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─maxMap(a, b)──────┐
|
||||
│ ([1,2,3],[2,2,1]) │
|
||||
└───────────────────┘
|
||||
```
|
@ -0,0 +1,41 @@
|
||||
# median {#median}
|
||||
|
||||
`median*` 函数是 `quantile*` 函数的别名。它们计算数字数据样本的中位数。
|
||||
|
||||
函数:
|
||||
|
||||
- `median` — [quantile](#quantile)别名。
|
||||
- `medianDeterministic` — [quantileDeterministic](#quantiledeterministic)别名。
|
||||
- `medianExact` — [quantileExact](#quantileexact)别名。
|
||||
- `medianExactWeighted` — [quantileExactWeighted](#quantileexactweighted)别名。
|
||||
- `medianTiming` — [quantileTiming](#quantiletiming)别名。
|
||||
- `medianTimingWeighted` — [quantileTimingWeighted](#quantiletimingweighted)别名。
|
||||
- `medianTDigest` — [quantileTDigest](#quantiletdigest)别名。
|
||||
- `medianTDigestWeighted` — [quantileTDigestWeighted](#quantiletdigestweighted)别名。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─val─┐
|
||||
│ 1 │
|
||||
│ 1 │
|
||||
│ 2 │
|
||||
│ 3 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT medianDeterministic(val, 1) FROM t
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─medianDeterministic(val, 1)─┐
|
||||
│ 1.5 │
|
||||
└─────────────────────────────┘
|
||||
```
|
@ -0,0 +1,7 @@
|
||||
---
|
||||
toc_priority: 2
|
||||
---
|
||||
|
||||
## min {#agg_function-min}
|
||||
|
||||
计算最小值。
|
@ -0,0 +1,32 @@
|
||||
---
|
||||
toc_priority: 142
|
||||
---
|
||||
|
||||
# minMap {#agg_functions-minmap}
|
||||
|
||||
**语法**
|
||||
|
||||
```sql
|
||||
minMap(key, value)
|
||||
或
|
||||
minMap(Tuple(key, value))
|
||||
```
|
||||
|
||||
根据 `key` 数组中指定的键对 `value` 数组计算最小值。
|
||||
|
||||
传递 `key` 和 `value` 数组的元组与传递 `key` 和 `value` 的两个数组是同义的。
|
||||
要总计的每一行的 `key` 和 `value` (数组)元素的数量必须相同。
|
||||
返回两个数组组成的元组: 排好序的 `key` 和对应 `key` 的 `value` 计算值(最小值)。
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
SELECT minMap(a, b)
|
||||
FROM values('a Array(Int32), b Array(Int64)', ([1, 2], [2, 2]), ([2, 3], [1, 1]))
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─minMap(a, b)──────┐
|
||||
│ ([1,2,3],[2,1,1]) │
|
||||
└───────────────────┘
|
||||
```
|
@ -0,0 +1,65 @@
|
||||
---
|
||||
toc_priority: 200
|
||||
---
|
||||
|
||||
# quantile {#quantile}
|
||||
|
||||
计算数字序列的近似[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
此函数应用[水塘抽样][reservoir sampling] (https://en.wikipedia.org/wiki/Reservoir_sampling),使用高达8192的水塘大小和随机数发生器采样。
|
||||
结果是不确定的。要获得精确的分位数,使用 [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexact) 函数。
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用 [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) 函数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantile(level)(expr)
|
||||
```
|
||||
|
||||
别名: `median`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]`。默认值:0.5。当 `level=0.5` 时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — 求值表达式,类型为数值类型[data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) 或 [DateTime](../../../sql-reference/data-types/datetime.md)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
类型:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) 用于数字数据类型输入。
|
||||
- [Date](../../../sql-reference/data-types/date.md) 如果输入值是 `Date` 类型。
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) 如果输入值是 `DateTime` 类型。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─val─┐
|
||||
│ 1 │
|
||||
│ 1 │
|
||||
│ 2 │
|
||||
│ 3 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantile(val) FROM t
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantile(val)─┐
|
||||
│ 1.5 │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [中位数](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [分位数](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,66 @@
|
||||
---
|
||||
toc_priority: 206
|
||||
---
|
||||
|
||||
# quantileDeterministic {#quantiledeterministic}
|
||||
|
||||
计算数字序列的近似[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
|
||||
此功能适用 [水塘抽样](https://en.wikipedia.org/wiki/Reservoir_sampling),使用储存器最大到8192和随机数发生器进行采样。 结果是非确定性的。 要获得精确的分位数,请使用 [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexact) 功能。
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用[quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)功能。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileDeterministic(level)(expr, determinator)
|
||||
```
|
||||
|
||||
别名: `medianDeterministic`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。 我们推荐 `level` 值的范围为 `[0.01, 0.99]`。默认值:0.5。 当 `level=0.5`时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — 求值表达式,类型为数值类型[data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) 或 [DateTime](../../../sql-reference/data-types/datetime.md)。
|
||||
- `determinator` — 一个数字,其hash被用来代替在水塘抽样中随机生成的数字,这样可以保证取样的确定性。你可以使用用户ID或者事件ID等任何正数,但是如果相同的 `determinator` 出现多次,那结果很可能不正确。
|
||||
**返回值**
|
||||
|
||||
- 指定层次的近似分位数。
|
||||
|
||||
类型:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) 用于数字数据类型输入。
|
||||
- [Date](../../../sql-reference/data-types/date.md) 如果输入值是 `Date` 类型。
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) 如果输入值是 `DateTime` 类型。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─val─┐
|
||||
│ 1 │
|
||||
│ 1 │
|
||||
│ 2 │
|
||||
│ 3 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileDeterministic(val, 1) FROM t
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileDeterministic(val, 1)─┐
|
||||
│ 1.5 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [中位数](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [分位数](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,170 @@
|
||||
---
|
||||
toc_priority: 202
|
||||
---
|
||||
|
||||
# quantileExact {#quantileexact}
|
||||
|
||||
|
||||
准确计算数字序列的[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
|
||||
为了准确计算,所有输入的数据被合并为一个数组,并且部分的排序。因此该函数需要 `O(n)` 的内存,n为输入数据的个数。但是对于少量数据来说,该函数还是非常有效的。
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用 [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) 函数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileExact(level)(expr)
|
||||
```
|
||||
|
||||
别名: `medianExact`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]`。默认值:0.5。当 `level=0.5` 时,该函数计算[中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — 求值表达式,类型为数值类型[data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) 或 [DateTime](../../../sql-reference/data-types/datetime.md)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
|
||||
类型:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) 对于数字数据类型输入。
|
||||
- [日期](../../../sql-reference/data-types/date.md) 如果输入值具有 `Date` 类型。
|
||||
- [日期时间](../../../sql-reference/data-types/datetime.md) 如果输入值具有 `DateTime` 类型。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExact(number) FROM numbers(10)
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileExact(number)─┐
|
||||
│ 5 │
|
||||
└───────────────────────┘
|
||||
```
|
||||
|
||||
# quantileExactLow {#quantileexactlow}
|
||||
|
||||
和 `quantileExact` 相似, 准确计算数字序列的[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
|
||||
为了准确计算,所有输入的数据被合并为一个数组,并且全排序。这排序[算法](https://en.cppreference.com/w/cpp/algorithm/sort)的复杂度是 `O(N·log(N))`, 其中 `N = std::distance(first, last)` 比较。
|
||||
|
||||
返回值取决于分位数级别和所选取的元素数量,即如果级别是 0.5, 函数返回偶数元素的低位中位数,奇数元素的中位数。中位数计算类似于 python 中使用的[median_low](https://docs.python.org/3/library/statistics.html#statistics.median_low)的实现。
|
||||
|
||||
对于所有其他级别, 返回 `level * size_of_array` 值所对应的索引的元素值。
|
||||
|
||||
例如:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExactLow(0.1)(number) FROM numbers(10)
|
||||
|
||||
┌─quantileExactLow(0.1)(number)─┐
|
||||
│ 1 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用 [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) 函数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileExactLow(level)(expr)
|
||||
```
|
||||
|
||||
别名: `medianExactLow`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]`。默认值:0.5。当 `level=0.5` 时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — — 求值表达式,类型为数值类型[data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) 或 [DateTime](../../../sql-reference/data-types/datetime.md)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
类型:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) 用于数字数据类型输入。
|
||||
- [Date](../../../sql-reference/data-types/date.md) 如果输入值是 `Date` 类型。
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) 如果输入值是 `DateTime` 类型。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExactLow(number) FROM numbers(10)
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileExactLow(number)─┐
|
||||
│ 4 │
|
||||
└──────────────────────────┘
|
||||
```
|
||||
|
||||
# quantileExactHigh {#quantileexacthigh}
|
||||
|
||||
和 `quantileExact` 相似, 准确计算数字序列的[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
|
||||
为了准确计算,所有输入的数据被合并为一个数组,并且全排序。这排序[算法](https://en.cppreference.com/w/cpp/algorithm/sort)的复杂度是 `O(N·log(N))`, 其中 `N = std::distance(first, last)` 比较。
|
||||
|
||||
返回值取决于分位数级别和所选取的元素数量,即如果级别是 0.5, 函数返回偶数元素的低位中位数,奇数元素的中位数。中位数计算类似于 python 中使用的[median_high](https://docs.python.org/3/library/statistics.html#statistics.median_high)的实现。
|
||||
|
||||
对于所有其他级别, 返回 `level * size_of_array` 值所对应的索引的元素值。
|
||||
|
||||
这个实现与当前的 `quantileExact` 实现完全相似。
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用 [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) 函数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileExactHigh(level)(expr)
|
||||
```
|
||||
|
||||
别名: `medianExactHigh`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]`。默认值:0.5。当 `level=0.5` 时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — — 求值表达式,类型为数值类型[data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) 或 [DateTime](../../../sql-reference/data-types/datetime.md)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
类型:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) 用于数字数据类型输入。
|
||||
- [Date](../../../sql-reference/data-types/date.md) 如果输入值是 `Date` 类型。
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) 如果输入值是 `DateTime` 类型。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExactHigh(number) FROM numbers(10)
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileExactHigh(number)─┐
|
||||
│ 5 │
|
||||
└───────────────────────────┘
|
||||
```
|
||||
**参见**
|
||||
|
||||
- [中位数](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [分位数](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,66 @@
|
||||
---
|
||||
toc_priority: 203
|
||||
---
|
||||
|
||||
# quantileExactWeighted {#quantileexactweighted}
|
||||
|
||||
考虑到每个元素的权重,然后准确计算数值序列的[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
|
||||
为了准确计算,所有输入的数据被合并为一个数组,并且部分的排序。每个输入值需要根据 `weight` 计算求和。该算法使用哈希表。正因为如此,在数据重复较多的时候使用的内存是少于[quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexact)的。 您可以使用此函数代替 `quantileExact` 并指定`weight`为 1 。
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用 [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) 函数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileExactWeighted(level)(expr, weight)
|
||||
```
|
||||
|
||||
别名: `medianExactWeighted`。
|
||||
|
||||
**参数**
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]`. 默认值:0.5。当 `level=0.5` 时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — 求值表达式,类型为数值类型[data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) 或 [DateTime](../../../sql-reference/data-types/datetime.md)。
|
||||
- `weight` — 权重序列。 权重是一个数据出现的数值。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
类型:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) 对于数字数据类型输入。
|
||||
- [日期](../../../sql-reference/data-types/date.md) 如果输入值具有 `Date` 类型。
|
||||
- [日期时间](../../../sql-reference/data-types/datetime.md) 如果输入值具有 `DateTime` 类型。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─n─┬─val─┐
|
||||
│ 0 │ 3 │
|
||||
│ 1 │ 2 │
|
||||
│ 2 │ 1 │
|
||||
│ 5 │ 4 │
|
||||
└───┴─────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExactWeighted(n, val) FROM t
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileExactWeighted(n, val)─┐
|
||||
│ 1 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [中位数](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [分位数](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,12 @@
|
||||
---
|
||||
toc_priority: 201
|
||||
---
|
||||
|
||||
# quantiles {#quantiles}
|
||||
|
||||
**语法**
|
||||
``` sql
|
||||
quantiles(level1, level2, …)(x)
|
||||
```
|
||||
|
||||
所有分位数函数(quantile)也有相应的分位数(quantiles)函数: `quantiles`, `quantilesDeterministic`, `quantilesTiming`, `quantilesTimingWeighted`, `quantilesExact`, `quantilesExactWeighted`, `quantilesTDigest`。 这些函数一次计算所列的级别的所有分位数, 并返回结果值的数组。
|
@ -0,0 +1,57 @@
|
||||
---
|
||||
toc_priority: 207
|
||||
---
|
||||
|
||||
# quantileTDigest {#quantiletdigest}
|
||||
|
||||
使用[t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) 算法计算数字序列近似[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
|
||||
最大误差为1%。 内存消耗为 `log(n)`,这里 `n` 是值的个数。 结果取决于运行查询的顺序,并且是不确定的。
|
||||
|
||||
该函数的性能低于 [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md#quantile) 或 [quantileTiming](../../../sql-reference/aggregate-functions/reference/quantiletiming.md#quantiletiming) 的性能。 从状态大小和精度的比值来看,这个函数比 `quantile` 更优秀。
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用 [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) 函数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileTDigest(level)(expr)
|
||||
```
|
||||
|
||||
别名: `medianTDigest`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]` 。默认值:0.5。当 `level=0.5` 时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — 求值表达式,类型为数值类型[data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) 或 [DateTime](../../../sql-reference/data-types/datetime.md)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
类型:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) 用于数字数据类型输入。
|
||||
- [Date](../../../sql-reference/data-types/date.md) 如果输入值是 `Date` 类型。
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) 如果输入值是 `DateTime` 类型。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileTDigest(number) FROM numbers(10)
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileTDigest(number)─┐
|
||||
│ 4.5 │
|
||||
└─────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [中位数](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [分位数](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,58 @@
|
||||
---
|
||||
toc_priority: 208
|
||||
---
|
||||
|
||||
# quantileTDigestWeighted {#quantiletdigestweighted}
|
||||
|
||||
使用[t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) 算法计算数字序列近似[分位数](https://en.wikipedia.org/wiki/Quantile)。该函数考虑了每个序列成员的权重。最大误差为1%。 内存消耗为 `log(n)`,这里 `n` 是值的个数。
|
||||
|
||||
该函数的性能低于 [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md#quantile) 或 [quantileTiming](../../../sql-reference/aggregate-functions/reference/quantiletiming.md#quantiletiming) 的性能。 从状态大小和精度的比值来看,这个函数比 `quantile` 更优秀。
|
||||
|
||||
结果取决于运行查询的顺序,并且是不确定的。
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用 [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) 函数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileTDigestWeighted(level)(expr, weight)
|
||||
```
|
||||
|
||||
别名: `medianTDigestWeighted`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]` 。默认值:0.5。 当 `level=0.5` 时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — 求值表达式,类型为数值类型[data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) 或 [DateTime](../../../sql-reference/data-types/datetime.md)。
|
||||
- `weight` — 权重序列。 权重是一个数据出现的数值。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
类型:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) 用于数字数据类型输入。
|
||||
- [Date](../../../sql-reference/data-types/date.md) 如果输入值是 `Date` 类型。
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) 如果输入值是 `DateTime` 类型。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileTDigestWeighted(number, 1) FROM numbers(10)
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileTDigestWeighted(number, 1)─┐
|
||||
│ 4.5 │
|
||||
└────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [中位数](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [分位数](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,86 @@
|
||||
---
|
||||
toc_priority: 204
|
||||
---
|
||||
|
||||
# quantileTiming {#quantiletiming}
|
||||
|
||||
使用确定的精度计算数字数据序列的[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
|
||||
结果是确定性的(它不依赖于查询处理顺序)。该函数针对描述加载网页时间或后端响应时间等分布的序列进行了优化。
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用[quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)函数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileTiming(level)(expr)
|
||||
```
|
||||
|
||||
别名: `medianTiming`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]` 。默认值:0.5。当 `level=0.5` 时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — 求值[表达式](../../../sql-reference/syntax.md#syntax-expressions) 返回 [Float\*](../../../sql-reference/data-types/float.md) 类型数值。
|
||||
|
||||
- 如果输入负值,那结果是不可预期的。
|
||||
- 如果输入值大于30000(页面加载时间大于30s),那我们假设为30000。
|
||||
|
||||
**精度**
|
||||
|
||||
计算是准确的,如果:
|
||||
|
||||
|
||||
- 值的总数不超过5670。
|
||||
- 总数值超过5670,但页面加载时间小于1024ms。
|
||||
|
||||
否则,计算结果将四舍五入到16毫秒的最接近倍数。
|
||||
|
||||
!!! note "注"
|
||||
对于计算页面加载时间分位数, 此函数比[quantile](../../../sql-reference/aggregate-functions/reference/quantile.md#quantile)更有效和准确。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
类型: `Float32`。
|
||||
|
||||
!!! note "注"
|
||||
如果没有值传递给函数(当使用 `quantileTimingIf`), [NaN](../../../sql-reference/data-types/float.md#data_type-float-nan-inf)被返回。 这样做的目的是将这些案例与导致零的案例区分开来。 参见 [ORDER BY clause](../../../sql-reference/statements/select/order-by.md#select-order-by) 对于 `NaN` 值排序注意事项。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─response_time─┐
|
||||
│ 72 │
|
||||
│ 112 │
|
||||
│ 126 │
|
||||
│ 145 │
|
||||
│ 104 │
|
||||
│ 242 │
|
||||
│ 313 │
|
||||
│ 168 │
|
||||
│ 108 │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileTiming(response_time) FROM t
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileTiming(response_time)─┐
|
||||
│ 126 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [中位数](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [分位数](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,118 @@
|
||||
---
|
||||
toc_priority: 205
|
||||
---
|
||||
|
||||
# quantileTimingWeighted {#quantiletimingweighted}
|
||||
|
||||
根据每个序列成员的权重,使用确定的精度计算数字序列的[分位数](https://en.wikipedia.org/wiki/Quantile)。
|
||||
|
||||
结果是确定性的(它不依赖于查询处理顺序)。该函数针对描述加载网页时间或后端响应时间等分布的序列进行了优化。
|
||||
|
||||
当在一个查询中使用多个不同层次的 `quantile*` 时,内部状态不会被组合(即查询的工作效率低于组合情况)。在这种情况下,使用[quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)功能。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
quantileTimingWeighted(level)(expr, weight)
|
||||
```
|
||||
|
||||
别名: `medianTimingWeighted`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `level` — 分位数层次。可选参数。从0到1的一个float类型的常量。我们推荐 `level` 值的范围为 `[0.01, 0.99]` 。默认值:0.5。当 `level=0.5` 时,该函数计算 [中位数](https://en.wikipedia.org/wiki/Median)。
|
||||
- `expr` — 求值[表达式](../../../sql-reference/syntax.md#syntax-expressions) 返回 [Float\*](../../../sql-reference/data-types/float.md) 类型数值。
|
||||
|
||||
- 如果输入负值,那结果是不可预期的。
|
||||
- 如果输入值大于30000(页面加载时间大于30s),那我们假设为30000。
|
||||
|
||||
- `weight` — 权重序列。 权重是一个数据出现的数值。
|
||||
|
||||
**精度**
|
||||
|
||||
计算是准确的,如果:
|
||||
|
||||
|
||||
- 值的总数不超过5670。
|
||||
- 总数值超过5670,但页面加载时间小于1024ms。
|
||||
|
||||
否则,计算结果将四舍五入到16毫秒的最接近倍数。
|
||||
|
||||
!!! note "注"
|
||||
对于计算页面加载时间分位数, 此函数比[quantile](../../../sql-reference/aggregate-functions/reference/quantile.md#quantile)更有效和准确。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 指定层次的分位数。
|
||||
|
||||
类型: `Float32`。
|
||||
|
||||
!!! note "注"
|
||||
如果没有值传递给函数(当使用 `quantileTimingIf`), [NaN](../../../sql-reference/data-types/float.md#data_type-float-nan-inf)被返回。 这样做的目的是将这些案例与导致零的案例区分开来。 参见 [ORDER BY clause](../../../sql-reference/statements/select/order-by.md#select-order-by) 对于 `NaN` 值排序注意事项。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─response_time─┬─weight─┐
|
||||
│ 68 │ 1 │
|
||||
│ 104 │ 2 │
|
||||
│ 112 │ 3 │
|
||||
│ 126 │ 2 │
|
||||
│ 138 │ 1 │
|
||||
│ 162 │ 1 │
|
||||
└───────────────┴────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantileTimingWeighted(response_time, weight) FROM t
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantileTimingWeighted(response_time, weight)─┐
|
||||
│ 112 │
|
||||
└───────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
# quantilesTimingWeighted {#quantilestimingweighted}
|
||||
|
||||
类似于 `quantileTimingWeighted` , 但接受多个分位数层次参数,并返回一个由这些分位数值组成的数组。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─response_time─┬─weight─┐
|
||||
│ 68 │ 1 │
|
||||
│ 104 │ 2 │
|
||||
│ 112 │ 3 │
|
||||
│ 126 │ 2 │
|
||||
│ 138 │ 1 │
|
||||
│ 162 │ 1 │
|
||||
└───────────────┴────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT quantilesTimingWeighted(0,5, 0.99)(response_time, weight) FROM t
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─quantilesTimingWeighted(0.5, 0.99)(response_time, weight)─┐
|
||||
│ [112,162] │
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [中位数](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [分位数](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,53 @@
|
||||
## rankCorr {#agg_function-rankcorr}
|
||||
|
||||
计算等级相关系数。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
rankCorr(x, y)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `x` — 任意值。[Float32](../../../sql-reference/data-types/float.md#float32-float64) 或 [Float64](../../../sql-reference/data-types/float.md#float32-float64)。
|
||||
- `y` — 任意值。[Float32](../../../sql-reference/data-types/float.md#float32-float64) 或 [Float64](../../../sql-reference/data-types/float.md#float32-float64)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- Returns a rank correlation coefficient of the ranks of x and y. The value of the correlation coefficient ranges from -1 to +1. If less than two arguments are passed, the function will return an exception. The value close to +1 denotes a high linear relationship, and with an increase of one random variable, the second random variable also increases. The value close to -1 denotes a high linear relationship, and with an increase of one random variable, the second random variable decreases. The value close or equal to 0 denotes no relationship between the two random variables.
|
||||
|
||||
类型: [Float64](../../../sql-reference/data-types/float.md#float32-float64)。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT rankCorr(number, number) FROM numbers(100);
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─rankCorr(number, number)─┐
|
||||
│ 1 │
|
||||
└──────────────────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT roundBankers(rankCorr(exp(number), sin(number)), 3) FROM numbers(100);
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─roundBankers(rankCorr(exp(number), sin(number)), 3)─┐
|
||||
│ -0.037 │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
**参见**
|
||||
|
||||
- 斯皮尔曼等级相关系数[Spearman's rank correlation coefficient](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient)
|
@ -0,0 +1,44 @@
|
||||
---
|
||||
toc_priority: 220
|
||||
---
|
||||
|
||||
# simpleLinearRegression {#simplelinearregression}
|
||||
|
||||
执行简单(一维)线性回归。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
simpleLinearRegression(x, y)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `x` — x轴。
|
||||
- `y` — y轴。
|
||||
|
||||
**返回值**
|
||||
|
||||
符合`y = a*x + b`的常量 `(a, b)` 。
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
SELECT arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [0, 1, 2, 3])
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [0, 1, 2, 3])─┐
|
||||
│ (1,0) │
|
||||
└───────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [3, 4, 5, 6])
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [3, 4, 5, 6])─┐
|
||||
│ (1,3) │
|
||||
└───────────────────────────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,27 @@
|
||||
---
|
||||
toc_priority: 150
|
||||
---
|
||||
|
||||
# skewPop {#skewpop}
|
||||
|
||||
计算给定序列的 [偏度] (https://en.wikipedia.org/wiki/Skewness)。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
skewPop(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` — [表达式](../../../sql-reference/syntax.md#syntax-expressions) 返回一个数字。
|
||||
|
||||
**返回值**
|
||||
|
||||
给定分布的偏度。类型 — [Float64](../../../sql-reference/data-types/float.md)
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
SELECT skewPop(value) FROM series_with_value_column;
|
||||
```
|
@ -0,0 +1,29 @@
|
||||
---
|
||||
toc_priority: 151
|
||||
---
|
||||
|
||||
# skewSamp {#skewsamp}
|
||||
|
||||
计算给定序列的 [样本偏度] (https://en.wikipedia.org/wiki/Skewness)。
|
||||
|
||||
如果传递的值形成其样本,它代表了一个随机变量的偏度的无偏估计。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
skewSamp(expr)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
`expr` — [表达式](../../../sql-reference/syntax.md#syntax-expressions) 返回一个数字。
|
||||
|
||||
**返回值**
|
||||
|
||||
给定分布的偏度。 类型 — [Float64](../../../sql-reference/data-types/float.md)。 如果 `n <= 1` (`n` 样本的大小), 函数返回 `nan`。
|
||||
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
SELECT skewSamp(value) FROM series_with_value_column;
|
||||
```
|
@ -0,0 +1,10 @@
|
||||
---
|
||||
toc_priority: 30
|
||||
---
|
||||
|
||||
# stddevPop {#stddevpop}
|
||||
|
||||
结果等于 [varPop] (../../../sql-reference/aggregate-functions/reference/varpop.md)的平方根。
|
||||
|
||||
!!! note "注"
|
||||
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `stddevPopStable` 函数。 它的工作速度较慢,但提供较低的计算错误。
|
@ -0,0 +1,10 @@
|
||||
---
|
||||
toc_priority: 31
|
||||
---
|
||||
|
||||
# stddevSamp {#stddevsamp}
|
||||
|
||||
结果等于 [varSamp] (../../../sql-reference/aggregate-functions/reference/varsamp.md)的平方根。
|
||||
|
||||
!!! note "注"
|
||||
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `stddevSampStable` 函数。 它的工作速度较慢,但提供较低的计算错误。
|
@ -0,0 +1,77 @@
|
||||
---
|
||||
toc_priority: 221
|
||||
---
|
||||
|
||||
# stochasticLinearRegression {#agg_functions-stochasticlinearregression}
|
||||
|
||||
该函数实现随机线性回归。 它支持自定义参数的学习率、L2正则化系数、微批,并且具有少量更新权重的方法([Adam](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam) (默认), [simple SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent), [Momentum](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum), [Nesterov](https://mipt.ru/upload/medialibrary/d7e/41-91.pdf))。
|
||||
|
||||
### 参数 {#agg_functions-stochasticlinearregression-parameters}
|
||||
|
||||
有4个可自定义的参数。它们按顺序传递给函数,但不需要传递所有四个参数——将使用默认值,然而好的模型需要一些参数调整。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
stochasticLinearRegression(1.0, 1.0, 10, 'SGD')
|
||||
```
|
||||
|
||||
1. `learning rate` 当执行梯度下降步骤时,步长的系数。 过大的学习率可能会导致模型的权重无限大。 默认值为 `0.00001`。
|
||||
2. `l2 regularization coefficient` 这可能有助于防止过度拟合。 默认值为 `0.1`。
|
||||
3. `mini-batch size` 设置元素的数量,这些元素将被计算和求和以执行梯度下降的一个步骤。纯随机下降使用一个元素,但是具有小批量(约10个元素)使梯度步骤更稳定。 默认值为 `15`。
|
||||
4. `method for updating weights` 他们是: `Adam` (默认情况下), `SGD`, `Momentum`, `Nesterov`。`Momentum` 和 `Nesterov` 需要更多的计算和内存,但是它们恰好在收敛速度和随机梯度方法的稳定性方面是有用的。
|
||||
|
||||
### 使用 {#agg_functions-stochasticlinearregression-usage}
|
||||
|
||||
`stochasticLinearRegression` 用于两个步骤:拟合模型和预测新数据。 为了拟合模型并保存其状态以供以后使用,我们使用 `-State` 组合器,它基本上保存了状态(模型权重等)。
|
||||
为了预测我们使用函数 [evalMLMethod](../../../sql-reference/functions/machine-learning-functions.md#machine_learning_methods-evalmlmethod), 这需要一个状态作为参数以及特征来预测。
|
||||
|
||||
<a name="stochasticlinearregression-usage-fitting"></a>
|
||||
|
||||
**1.** 拟合
|
||||
|
||||
可以使用这种查询。
|
||||
|
||||
``` sql
|
||||
CREATE TABLE IF NOT EXISTS train_data
|
||||
(
|
||||
param1 Float64,
|
||||
param2 Float64,
|
||||
target Float64
|
||||
) ENGINE = Memory;
|
||||
|
||||
CREATE TABLE your_model ENGINE = Memory AS SELECT
|
||||
stochasticLinearRegressionState(0.1, 0.0, 5, 'SGD')(target, param1, param2)
|
||||
AS state FROM train_data;
|
||||
```
|
||||
|
||||
在这里,我们还需要将数据插入到 `train_data` 表。参数的数量不是固定的,它只取决于传入 `linearRegressionState` 的参数数量。它们都必须是数值。
|
||||
注意,目标值(我们想学习预测的)列作为第一个参数插入。
|
||||
|
||||
**2.** 预测
|
||||
|
||||
在将状态保存到表中之后,我们可以多次使用它进行预测,甚至与其他状态合并,创建新的更好的模型。
|
||||
|
||||
``` sql
|
||||
WITH (SELECT state FROM your_model) AS model SELECT
|
||||
evalMLMethod(model, param1, param2) FROM test_data
|
||||
```
|
||||
|
||||
查询将返回一列预测值。注意,`evalMLMethod` 的第一个参数是 `AggregateFunctionState` 对象, 接下来是特征列。
|
||||
|
||||
`test_data` 是一个类似 `train_data` 的表 但可能不包含目标值。
|
||||
|
||||
### 注 {#agg_functions-stochasticlinearregression-notes}
|
||||
|
||||
1. 要合并两个模型,用户可以创建这样的查询:
|
||||
`sql SELECT state1 + state2 FROM your_models`
|
||||
其中 `your_models` 表包含这两个模型。此查询将返回新的 `AggregateFunctionState` 对象。
|
||||
|
||||
2. 如果没有使用 `-State` 组合器,用户可以为自己的目的获取所创建模型的权重,而不保存模型 。
|
||||
`sql SELECT stochasticLinearRegression(0.01)(target, param1, param2) FROM train_data`
|
||||
这样的查询将拟合模型,并返回其权重——首先是权重,对应模型的参数,最后一个是偏差。 所以在上面的例子中,查询将返回一个具有3个值的列。
|
||||
|
||||
**参见**
|
||||
|
||||
- [随机指标逻辑回归](../../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md#agg_functions-stochasticlogisticregression)
|
||||
- [线性回归和逻辑回归之间的差异](https://stackoverflow.com/questions/12146914/what-is-the-difference-between-linear-regression-and-logistic-regression)
|
@ -0,0 +1,56 @@
|
||||
---
|
||||
toc_priority: 222
|
||||
---
|
||||
|
||||
# stochasticLogisticRegression {#agg_functions-stochasticlogisticregression}
|
||||
|
||||
该函数实现随机逻辑回归。 它可以用于二进制分类问题,支持与stochasticLinearRegression相同的自定义参数,并以相同的方式工作。
|
||||
|
||||
### 参数 {#agg_functions-stochasticlogisticregression-parameters}
|
||||
|
||||
参数与stochasticLinearRegression中的参数完全相同:
|
||||
`learning rate`, `l2 regularization coefficient`, `mini-batch size`, `method for updating weights`.
|
||||
欲了解更多信息,参见 [参数] (#agg_functions-stochasticlinearregression-parameters).
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
stochasticLogisticRegression(1.0, 1.0, 10, 'SGD')
|
||||
```
|
||||
|
||||
**1.** 拟合
|
||||
|
||||
<!-- -->
|
||||
|
||||
参考[stochasticLinearRegression](#stochasticlinearregression-usage-fitting) `拟合` 章节文档。
|
||||
|
||||
预测标签的取值范围为\[-1, 1\]
|
||||
|
||||
**2.** 预测
|
||||
|
||||
<!-- -->
|
||||
|
||||
使用已经保存的state我们可以预测标签为 `1` 的对象的概率。
|
||||
``` sql
|
||||
WITH (SELECT state FROM your_model) AS model SELECT
|
||||
evalMLMethod(model, param1, param2) FROM test_data
|
||||
```
|
||||
|
||||
查询结果返回一个列的概率。注意 `evalMLMethod` 的第一个参数是 `AggregateFunctionState` 对象,接下来的参数是列的特性。
|
||||
|
||||
我们也可以设置概率的范围, 这样需要给元素指定不同的标签。
|
||||
|
||||
``` sql
|
||||
SELECT ans < 1.1 AND ans > 0.5 FROM
|
||||
(WITH (SELECT state FROM your_model) AS model SELECT
|
||||
evalMLMethod(model, param1, param2) AS ans FROM test_data)
|
||||
```
|
||||
|
||||
结果是标签。
|
||||
|
||||
`test_data` 是一个像 `train_data` 一样的表,但是不包含目标值。
|
||||
|
||||
**参见**
|
||||
|
||||
- [随机指标线性回归](../../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md#agg_functions-stochasticlinearregression)
|
||||
- [线性回归和逻辑回归之间的差异](https://stackoverflow.com/questions/12146914/what-is-the-difference-between-linear-regression-and-logistic-regression)
|
@ -0,0 +1,64 @@
|
||||
---
|
||||
toc_priority: 300
|
||||
toc_title: studentTTest
|
||||
---
|
||||
|
||||
# studentTTest {#studentttest}
|
||||
|
||||
对两个总体的样本应用t检验。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
studentTTest(sample_data, sample_index)
|
||||
```
|
||||
|
||||
两个样本的值都在 `sample_data` 列中。如果 `sample_index` 等于 0,则该行的值属于第一个总体的样本。 反之属于第二个总体的样本。
|
||||
零假设是总体的均值相等。假设为方差相等的正态分布。
|
||||
|
||||
**参数**
|
||||
|
||||
- `sample_data` — 样本数据。[Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) 或 [Decimal](../../../sql-reference/data-types/decimal.md)。
|
||||
- `sample_index` — 样本索引。[Integer](../../../sql-reference/data-types/int-uint.md)。
|
||||
|
||||
**返回值**
|
||||
|
||||
[元组](../../../sql-reference/data-types/tuple.md),有两个元素:
|
||||
|
||||
- 计算出的t统计量。 [Float64](../../../sql-reference/data-types/float.md)。
|
||||
- 计算出的p值。[Float64](../../../sql-reference/data-types/float.md)。
|
||||
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─sample_data─┬─sample_index─┐
|
||||
│ 20.3 │ 0 │
|
||||
│ 21.1 │ 0 │
|
||||
│ 21.9 │ 1 │
|
||||
│ 21.7 │ 0 │
|
||||
│ 19.9 │ 1 │
|
||||
│ 21.8 │ 1 │
|
||||
└─────────────┴──────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT studentTTest(sample_data, sample_index) FROM student_ttest;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─studentTTest(sample_data, sample_index)───┐
|
||||
│ (-0.21739130434783777,0.8385421208415731) │
|
||||
└───────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test)
|
||||
- [welchTTest function](../../../sql-reference/aggregate-functions/reference/welchttest.md#welchttest)
|
@ -0,0 +1,8 @@
|
||||
---
|
||||
toc_priority: 4
|
||||
---
|
||||
|
||||
# sum {#agg_function-sum}
|
||||
|
||||
计算总和。
|
||||
只适用于数字。
|
@ -0,0 +1,52 @@
|
||||
---
|
||||
toc_priority: 141
|
||||
---
|
||||
|
||||
# sumMap {#agg_functions-summap}
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
sumMap(key, value)
|
||||
或
|
||||
sumMap(Tuple(key, value))
|
||||
```
|
||||
|
||||
根据 `key` 数组中指定的键对 `value` 数组进行求和。
|
||||
|
||||
传递 `key` 和 `value` 数组的元组与传递 `key` 和 `value` 的两个数组是同义的。
|
||||
要总计的每一行的 `key` 和 `value` (数组)元素的数量必须相同。
|
||||
返回两个数组组成的一个元组: 排好序的 `key` 和对应 `key` 的 `value` 之和。
|
||||
|
||||
示例:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE sum_map(
|
||||
date Date,
|
||||
timeslot DateTime,
|
||||
statusMap Nested(
|
||||
status UInt16,
|
||||
requests UInt64
|
||||
),
|
||||
statusMapTuple Tuple(Array(Int32), Array(Int32))
|
||||
) ENGINE = Log;
|
||||
INSERT INTO sum_map VALUES
|
||||
('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10], ([1, 2, 3], [10, 10, 10])),
|
||||
('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10], ([3, 4, 5], [10, 10, 10])),
|
||||
('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10], ([4, 5, 6], [10, 10, 10])),
|
||||
('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10], ([6, 7, 8], [10, 10, 10]));
|
||||
|
||||
SELECT
|
||||
timeslot,
|
||||
sumMap(statusMap.status, statusMap.requests),
|
||||
sumMap(statusMapTuple)
|
||||
FROM sum_map
|
||||
GROUP BY timeslot
|
||||
```
|
||||
|
||||
``` text
|
||||
┌────────────timeslot─┬─sumMap(statusMap.status, statusMap.requests)─┬─sumMap(statusMapTuple)─────────┐
|
||||
│ 2000-01-01 00:00:00 │ ([1,2,3,4,5],[10,10,20,10,10]) │ ([1,2,3,4,5],[10,10,20,10,10]) │
|
||||
│ 2000-01-01 00:01:00 │ ([4,5,6,7,8],[10,10,20,10,10]) │ ([4,5,6,7,8],[10,10,20,10,10]) │
|
||||
└─────────────────────┴──────────────────────────────────────────────┴────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,9 @@
|
||||
---
|
||||
toc_priority: 140
|
||||
---
|
||||
|
||||
# sumWithOverflow {#sumwithoverflowx}
|
||||
|
||||
使用与输入参数相同的数据类型计算结果的数字总和。如果总和超过此数据类型的最大值,则使用溢出进行计算。
|
||||
|
||||
只适用于数字。
|
43
docs/zh/sql-reference/aggregate-functions/reference/topk.md
Normal file
43
docs/zh/sql-reference/aggregate-functions/reference/topk.md
Normal file
@ -0,0 +1,43 @@
|
||||
---
|
||||
toc_priority: 108
|
||||
---
|
||||
|
||||
# topK {#topk}
|
||||
|
||||
返回指定列中近似最常见值的数组。 生成的数组按值的近似频率降序排序(而不是值本身)。
|
||||
|
||||
实现了[过滤节省空间](http://www.l2f.inesc-id.pt/~fmmb/wiki/uploads/Work/misnis.ref0a.pdf)算法, 使用基于reduce-and-combine的算法,借鉴[并行节省空间](https://arxiv.org/pdf/1401.0702.pdf)。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
topK(N)(x)
|
||||
```
|
||||
此函数不提供保证的结果。 在某些情况下,可能会发生错误,并且可能会返回不是最高频的值。
|
||||
|
||||
我们建议使用 `N < 10` 值,`N` 值越大,性能越低。最大值 `N = 65536`。
|
||||
|
||||
**参数**
|
||||
|
||||
- `N` — 要返回的元素数。
|
||||
|
||||
如果省略该参数,则使用默认值10。
|
||||
|
||||
**参数**
|
||||
|
||||
- `x` – (要计算频次的)值。
|
||||
|
||||
**示例**
|
||||
|
||||
就拿 [OnTime](../../../getting-started/example-datasets/ontime.md) 数据集来说,选择`AirlineID` 列中出现最频繁的三个。
|
||||
|
||||
``` sql
|
||||
SELECT topK(3)(AirlineID) AS res
|
||||
FROM ontime
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─res─────────────────┐
|
||||
│ [19393,19790,19805] │
|
||||
└─────────────────────┘
|
||||
```
|
@ -0,0 +1,42 @@
|
||||
---
|
||||
toc_priority: 109
|
||||
---
|
||||
|
||||
# topKWeighted {#topkweighted}
|
||||
|
||||
类似于 `topK` 但需要一个整数类型的附加参数 - `weight`。 每个输入都被记入 `weight` 次频率计算。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
topKWeighted(N)(x, weight)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `N` — 要返回的元素数。
|
||||
|
||||
**参数**
|
||||
|
||||
- `x` – (要计算频次的)值。
|
||||
- `weight` — 权重。 [UInt8](../../../sql-reference/data-types/int-uint.md)类型。
|
||||
|
||||
**返回值**
|
||||
|
||||
返回具有最大近似权重总和的值数组。
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT topKWeighted(10)(number, number) FROM numbers(1000)
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─topKWeighted(10)(number, number)──────────┐
|
||||
│ [999,998,997,996,995,994,993,992,991,990] │
|
||||
└───────────────────────────────────────────┘
|
||||
```
|
42
docs/zh/sql-reference/aggregate-functions/reference/uniq.md
Normal file
42
docs/zh/sql-reference/aggregate-functions/reference/uniq.md
Normal file
@ -0,0 +1,42 @@
|
||||
---
|
||||
toc_priority: 190
|
||||
---
|
||||
|
||||
# uniq {#agg_function-uniq}
|
||||
|
||||
计算参数的不同值的近似数量。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
uniq(x[, ...])
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
该函数采用可变数量的参数。 参数可以是 `Tuple`, `Array`, `Date`, `DateTime`, `String`, 或数字类型。
|
||||
|
||||
**返回值**
|
||||
|
||||
- [UInt64](../../../sql-reference/data-types/int-uint.md) 类型数值。
|
||||
|
||||
**实现细节**
|
||||
|
||||
功能:
|
||||
|
||||
- 计算聚合中所有参数的哈希值,然后在计算中使用它。
|
||||
|
||||
- 使用自适应采样算法。 对于计算状态,该函数使用最多65536个元素哈希值的样本。
|
||||
|
||||
这个算法是非常精确的,并且对于CPU来说非常高效。如果查询包含一些这样的函数,那和其他聚合函数相比 `uniq` 将是几乎一样快。
|
||||
|
||||
- 确定性地提供结果(它不依赖于查询处理顺序)。
|
||||
|
||||
我们建议在几乎所有情况下使用此功能。
|
||||
|
||||
**参见**
|
||||
|
||||
- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined)
|
||||
- [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64)
|
||||
- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniqhll12.md#agg_function-uniqhll12)
|
||||
- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)
|
@ -0,0 +1,52 @@
|
||||
---
|
||||
toc_priority: 192
|
||||
---
|
||||
|
||||
# uniqCombined {#agg_function-uniqcombined}
|
||||
|
||||
计算不同参数值的近似数量。
|
||||
|
||||
**语法**
|
||||
``` sql
|
||||
uniqCombined(HLL_precision)(x[, ...])
|
||||
```
|
||||
该 `uniqCombined` 函数是计算不同值数量的不错选择。
|
||||
|
||||
**参数**
|
||||
|
||||
该函数采用可变数量的参数。 参数可以是 `Tuple`, `Array`, `Date`, `DateTime`, `String`,或数字类型。
|
||||
|
||||
`HLL_precision` 是以2为底的单元格数的对数 [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog)。可选,您可以将该函数用作 `uniqCombined(x[, ...])`。 `HLL_precision` 的默认值是17,这是有效的96KiB的空间(2^17个单元,每个6比特)。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 一个[UInt64](../../../sql-reference/data-types/int-uint.md)类型的数字。
|
||||
|
||||
**实现细节**
|
||||
|
||||
功能:
|
||||
|
||||
- 为聚合中的所有参数计算哈希(`String`类型用64位哈希,其他32位),然后在计算中使用它。
|
||||
|
||||
- 使用三种算法的组合:数组、哈希表和包含错误修正表的HyperLogLog。
|
||||
|
||||
|
||||
少量的不同的值,使用数组。 值再多一些,使用哈希表。对于大量的数据来说,使用HyperLogLog,HyperLogLog占用一个固定的内存空间。
|
||||
|
||||
- 确定性地提供结果(它不依赖于查询处理顺序)。
|
||||
|
||||
!!! note "注"
|
||||
由于它对非 `String` 类型使用32位哈希,对于基数显著大于`UINT_MAX` ,结果将有非常高的误差(误差将在几百亿不同值之后迅速提高), 因此这种情况,你应该使用 [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64)
|
||||
|
||||
相比于 [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) 函数, 该 `uniqCombined`:
|
||||
|
||||
- 消耗内存要少几倍。
|
||||
- 计算精度高出几倍。
|
||||
- 通常具有略低的性能。 在某些情况下, `uniqCombined` 可以表现得比 `uniq` 好,例如,使用通过网络传输大量聚合状态的分布式查询。
|
||||
|
||||
**参见**
|
||||
|
||||
- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)
|
||||
- [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64)
|
||||
- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniqhll12.md#agg_function-uniqhll12)
|
||||
- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)
|
@ -0,0 +1,7 @@
|
||||
---
|
||||
toc_priority: 193
|
||||
---
|
||||
|
||||
# uniqCombined64 {#agg_function-uniqcombined64}
|
||||
|
||||
和 [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined)一样, 但对于所有数据类型使用64位哈希。
|
@ -0,0 +1,26 @@
|
||||
---
|
||||
toc_priority: 191
|
||||
---
|
||||
|
||||
# uniqExact {#agg_function-uniqexact}
|
||||
|
||||
计算不同参数值的准确数目。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
uniqExact(x[, ...])
|
||||
```
|
||||
如果你绝对需要一个确切的结果,使用 `uniqExact` 函数。 否则使用 [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) 函数。
|
||||
|
||||
`uniqExact` 函数比 `uniq` 使用更多的内存,因为状态的大小随着不同值的数量的增加而无界增长。
|
||||
|
||||
**参数**
|
||||
|
||||
该函数采用可变数量的参数。 参数可以是 `Tuple`, `Array`, `Date`, `DateTime`, `String`,或数字类型。
|
||||
|
||||
**参见**
|
||||
|
||||
- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)
|
||||
- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniqcombined)
|
||||
- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniqhll12)
|
@ -0,0 +1,43 @@
|
||||
---
|
||||
toc_priority: 194
|
||||
---
|
||||
|
||||
# uniqHLL12 {#agg_function-uniqhll12}
|
||||
|
||||
计算不同参数值的近似数量,使用 [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) 算法。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
uniqHLL12(x[, ...])
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
该函数采用可变数量的参数。 参数可以是 `Tuple`, `Array`, `Date`, `DateTime`, `String`,或数字类型。
|
||||
|
||||
**返回值**
|
||||
|
||||
**返回值**
|
||||
|
||||
- 一个[UInt64](../../../sql-reference/data-types/int-uint.md)类型的数字。
|
||||
|
||||
**实现细节**
|
||||
|
||||
功能:
|
||||
|
||||
- 计算聚合中所有参数的哈希值,然后在计算中使用它。
|
||||
|
||||
- 使用 HyperLogLog 算法来近似不同参数值的数量。
|
||||
|
||||
使用2^12个5比特单元。 状态的大小略大于2.5KB。 对于小数据集(<10K元素),结果不是很准确(误差高达10%)。 但是, 对于高基数数据集(10K-100M),结果相当准确,最大误差约为1.6%。Starting from 100M, the estimation error increases, and the function will return very inaccurate results for data sets with extremely high cardinality (1B+ elements).
|
||||
|
||||
- 提供确定结果(它不依赖于查询处理顺序)。
|
||||
|
||||
我们不建议使用此函数。 在大多数情况下, 使用 [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) 或 [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined) 函数。
|
||||
|
||||
**参见**
|
||||
|
||||
- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)
|
||||
- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined)
|
||||
- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)
|
@ -0,0 +1,12 @@
|
||||
---
|
||||
toc_priority: 32
|
||||
---
|
||||
|
||||
# varPop(x) {#varpopx}
|
||||
|
||||
计算 `Σ((x - x̅)^2) / n`,这里 `n` 是样本大小, `x̅` 是 `x` 的平均值。
|
||||
|
||||
换句话说,计算一组数据的离差。 返回 `Float64`。
|
||||
|
||||
!!! note "注"
|
||||
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `varPopStable` 函数。 它的工作速度较慢,但提供较低的计算错误。
|
@ -0,0 +1,15 @@
|
||||
---
|
||||
toc_priority: 33
|
||||
---
|
||||
|
||||
# varSamp {#varsamp}
|
||||
|
||||
计算 `Σ((x - x̅)^2) / (n - 1)`,这里 `n` 是样本大小, `x̅`是`x`的平均值。
|
||||
|
||||
它表示随机变量的方差的无偏估计,如果传递的值形成其样本。
|
||||
|
||||
返回 `Float64`。 当 `n <= 1`,返回 `+∞`。
|
||||
|
||||
!!! note "注"
|
||||
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `varSampStable` 函数。 它的工作速度较慢,但提供较低的计算错误。
|
||||
|
@ -0,0 +1,62 @@
|
||||
---
|
||||
toc_priority: 301
|
||||
toc_title: welchTTest
|
||||
---
|
||||
|
||||
# welchTTest {#welchttest}
|
||||
|
||||
对两个总体的样本应用 Welch t检验。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
welchTTest(sample_data, sample_index)
|
||||
```
|
||||
两个样本的值都在 `sample_data` 列中。如果 `sample_index` 等于 0,则该行的值属于第一个总体的样本。 反之属于第二个总体的样本。
|
||||
零假设是群体的均值相等。假设为正态分布。总体可能具有不相等的方差。
|
||||
|
||||
**参数**
|
||||
|
||||
- `sample_data` — 样本数据。[Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) 或 [Decimal](../../../sql-reference/data-types/decimal.md).
|
||||
- `sample_index` — 样本索引。[Integer](../../../sql-reference/data-types/int-uint.md).
|
||||
|
||||
**返回值**
|
||||
|
||||
[元组](../../../sql-reference/data-types/tuple.md),有两个元素:
|
||||
|
||||
- 计算出的t统计量。 [Float64](../../../sql-reference/data-types/float.md)。
|
||||
- 计算出的p值。[Float64](../../../sql-reference/data-types/float.md)。
|
||||
|
||||
**示例**
|
||||
|
||||
输入表:
|
||||
|
||||
``` text
|
||||
┌─sample_data─┬─sample_index─┐
|
||||
│ 20.3 │ 0 │
|
||||
│ 22.1 │ 0 │
|
||||
│ 21.9 │ 0 │
|
||||
│ 18.9 │ 1 │
|
||||
│ 20.3 │ 1 │
|
||||
│ 19 │ 1 │
|
||||
└─────────────┴──────────────┘
|
||||
```
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT welchTTest(sample_data, sample_index) FROM welch_ttest;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─welchTTest(sample_data, sample_index)─────┐
|
||||
│ (2.7988719532211235,0.051807360348581945) │
|
||||
└───────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**参见**
|
||||
|
||||
- [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test)
|
||||
- [studentTTest function](../../../sql-reference/aggregate-functions/reference/studentttest.md#studentttest)
|
@ -1,26 +1,31 @@
|
||||
---
|
||||
machine_translated: true
|
||||
machine_translated_rev: 71d72c1f237f4a553fe91ba6c6c633e81a49e35b
|
||||
---
|
||||
|
||||
# SimpleAggregateFunction {#data-type-simpleaggregatefunction}
|
||||
|
||||
`SimpleAggregateFunction(name, types_of_arguments…)` 数据类型存储聚合函数的当前值,而不将其完整状态存储为 [`AggregateFunction`](../../sql-reference/data-types/aggregatefunction.md) 有 此优化可应用于具有以下属性的函数:应用函数的结果 `f` 到行集 `S1 UNION ALL S2` 可以通过应用来获得 `f` 行的部分单独设置,然后再次应用 `f` 到结果: `f(S1 UNION ALL S2) = f(f(S1) UNION ALL f(S2))`. 此属性保证部分聚合结果足以计算组合结果,因此我们不必存储和处理任何额外的数据。
|
||||
`SimpleAggregateFunction(name, types_of_arguments…)` 数据类型存储聚合函数的当前值, 并不像 [`AggregateFunction`](../../sql-reference/data-types/aggregatefunction.md) 那样存储其全部状态。这种优化可以应用于具有以下属性函数: 将函数 `f` 应用于行集合 `S1 UNION ALL S2` 的结果,可以通过将 `f` 分别应用于行集合的部分, 然后再将 `f` 应用于结果来获得: `f(S1 UNION ALL S2) = f(f(S1) UNION ALL f(S2))`。 这个属性保证了部分聚合结果足以计算出合并的结果,所以我们不必存储和处理任何额外的数据。
|
||||
|
||||
支持以下聚合函数:
|
||||
|
||||
- [`any`](../../sql-reference/aggregate-functions/reference.md#agg_function-any)
|
||||
- [`anyLast`](../../sql-reference/aggregate-functions/reference.md#anylastx)
|
||||
- [`min`](../../sql-reference/aggregate-functions/reference.md#agg_function-min)
|
||||
- [`max`](../../sql-reference/aggregate-functions/reference.md#agg_function-max)
|
||||
- [`sum`](../../sql-reference/aggregate-functions/reference.md#agg_function-sum)
|
||||
- [`groupBitAnd`](../../sql-reference/aggregate-functions/reference.md#groupbitand)
|
||||
- [`groupBitOr`](../../sql-reference/aggregate-functions/reference.md#groupbitor)
|
||||
- [`groupBitXor`](../../sql-reference/aggregate-functions/reference.md#groupbitxor)
|
||||
- [`groupArrayArray`](../../sql-reference/aggregate-functions/reference.md#agg_function-grouparray)
|
||||
- [`groupUniqArrayArray`](../../sql-reference/aggregate-functions/reference.md#groupuniqarrayx-groupuniqarraymax-sizex)
|
||||
- [`any`](../../sql-reference/aggregate-functions/reference/any.md#agg_function-any)
|
||||
- [`anyLast`](../../sql-reference/aggregate-functions/reference/anylast.md#anylastx)
|
||||
- [`min`](../../sql-reference/aggregate-functions/reference/min.md#agg_function-min)
|
||||
- [`max`](../../sql-reference/aggregate-functions/reference/max.md#agg_function-max)
|
||||
- [`sum`](../../sql-reference/aggregate-functions/reference/sum.md#agg_function-sum)
|
||||
- [`sumWithOverflow`](../../sql-reference/aggregate-functions/reference/sumwithoverflow.md#sumwithoverflowx)
|
||||
- [`groupBitAnd`](../../sql-reference/aggregate-functions/reference/groupbitand.md#groupbitand)
|
||||
- [`groupBitOr`](../../sql-reference/aggregate-functions/reference/groupbitor.md#groupbitor)
|
||||
- [`groupBitXor`](../../sql-reference/aggregate-functions/reference/groupbitxor.md#groupbitxor)
|
||||
- [`groupArrayArray`](../../sql-reference/aggregate-functions/reference/grouparray.md#agg_function-grouparray)
|
||||
- [`groupUniqArrayArray`](../../sql-reference/aggregate-functions/reference/groupuniqarray.md)
|
||||
- [`sumMap`](../../sql-reference/aggregate-functions/reference/summap.md#agg_functions-summap)
|
||||
- [`minMap`](../../sql-reference/aggregate-functions/reference/minmap.md#agg_functions-minmap)
|
||||
- [`maxMap`](../../sql-reference/aggregate-functions/reference/maxmap.md#agg_functions-maxmap)
|
||||
- [`argMin`](../../sql-reference/aggregate-functions/reference/argmin.md)
|
||||
- [`argMax`](../../sql-reference/aggregate-functions/reference/argmax.md)
|
||||
|
||||
的值 `SimpleAggregateFunction(func, Type)` 看起来和存储方式相同 `Type`,所以你不需要应用函数 `-Merge`/`-State` 后缀。 `SimpleAggregateFunction` 具有比更好的性能 `AggregateFunction` 具有相同的聚合功能。
|
||||
|
||||
!!! note "注"
|
||||
`SimpleAggregateFunction(func, Type)` 的值外观和存储方式于 `Type` 相同, 所以你不需要应用带有 `-Merge`/`-State` 后缀的函数。
|
||||
|
||||
`SimpleAggregateFunction` 的性能优于具有相同聚合函数的 `AggregateFunction` 。
|
||||
|
||||
**参数**
|
||||
|
||||
@ -30,11 +35,7 @@ machine_translated_rev: 71d72c1f237f4a553fe91ba6c6c633e81a49e35b
|
||||
**示例**
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t
|
||||
(
|
||||
column1 SimpleAggregateFunction(sum, UInt64),
|
||||
column2 SimpleAggregateFunction(any, String)
|
||||
) ENGINE = ...
|
||||
CREATE TABLE simple (id UInt64, val SimpleAggregateFunction(sum, Double)) ENGINE=AggregatingMergeTree ORDER BY id;
|
||||
```
|
||||
|
||||
[原始文章](https://clickhouse.tech/docs/en/data_types/simpleaggregatefunction/) <!--hide-->
|
||||
|
@ -1,21 +1,19 @@
|
||||
---
|
||||
machine_translated: true
|
||||
machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd
|
||||
toc_priority: 46
|
||||
toc_title: UUID
|
||||
---
|
||||
|
||||
# UUID {#uuid-data-type}
|
||||
|
||||
通用唯一标识符(UUID)是用于标识记录的16字节数。 有关UUID的详细信息,请参阅 [维基百科](https://en.wikipedia.org/wiki/Universally_unique_identifier).
|
||||
通用唯一标识符(UUID)是一个16字节的数字,用于标识记录。有关UUID的详细信息, 参见[维基百科](https://en.wikipedia.org/wiki/Universally_unique_identifier)。
|
||||
|
||||
UUID类型值的示例如下所示:
|
||||
UUID类型值的示例如下:
|
||||
|
||||
``` text
|
||||
61f0c404-5cb3-11e7-907b-a6006ad3dba0
|
||||
```
|
||||
|
||||
如果在插入新记录时未指定UUID列值,则UUID值将用零填充:
|
||||
如果在插入新记录时未指定UUID列的值,则UUID值将用零填充:
|
||||
|
||||
``` text
|
||||
00000000-0000-0000-0000-000000000000
|
||||
@ -23,13 +21,13 @@ UUID类型值的示例如下所示:
|
||||
|
||||
## 如何生成 {#how-to-generate}
|
||||
|
||||
要生成UUID值,ClickHouse提供了 [generateuidv4](../../sql-reference/functions/uuid-functions.md) 功能。
|
||||
要生成UUID值,ClickHouse提供了 [generateuidv4](../../sql-reference/functions/uuid-functions.md) 函数。
|
||||
|
||||
## 用法示例 {#usage-example}
|
||||
|
||||
**示例1**
|
||||
|
||||
此示例演示如何创建具有UUID类型列的表并将值插入到表中。
|
||||
这个例子演示了创建一个具有UUID类型列的表,并在表中插入一个值。
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t_uuid (x UUID, y String) ENGINE=TinyLog
|
||||
@ -51,7 +49,7 @@ SELECT * FROM t_uuid
|
||||
|
||||
**示例2**
|
||||
|
||||
在此示例中,插入新记录时未指定UUID列值。
|
||||
在这个示例中,插入新记录时未指定UUID列的值。
|
||||
|
||||
``` sql
|
||||
INSERT INTO t_uuid (y) VALUES ('Example 2')
|
||||
@ -70,8 +68,7 @@ SELECT * FROM t_uuid
|
||||
|
||||
## 限制 {#restrictions}
|
||||
|
||||
UUID数据类型仅支持以下功能 [字符串](string.md) 数据类型也支持(例如, [min](../../sql-reference/aggregate-functions/reference.md#agg_function-min), [max](../../sql-reference/aggregate-functions/reference.md#agg_function-max),和 [计数](../../sql-reference/aggregate-functions/reference.md#agg_function-count)).
|
||||
UUID数据类型只支持 [字符串](../../sql-reference/data-types/string.md) 数据类型也支持的函数(比如, [min](../../sql-reference/aggregate-functions/reference/min.md#agg_function-min), [max](../../sql-reference/aggregate-functions/reference/max.md#agg_function-max), 和 [count](../../sql-reference/aggregate-functions/reference/count.md#agg_function-count))。
|
||||
|
||||
算术运算不支持UUID数据类型(例如, [abs](../../sql-reference/functions/arithmetic-functions.md#arithm_func-abs))或聚合函数,例如 [sum](../../sql-reference/aggregate-functions/reference.md#agg_function-sum) 和 [avg](../../sql-reference/aggregate-functions/reference.md#agg_function-avg).
|
||||
算术运算不支持UUID数据类型(例如, [abs](../../sql-reference/functions/arithmetic-functions.md#arithm_func-abs))或聚合函数,例如 [sum](../../sql-reference/aggregate-functions/reference/sum.md#agg_function-sum) 和 [avg](../../sql-reference/aggregate-functions/reference/avg.md#agg_function-avg).
|
||||
|
||||
[原始文章](https://clickhouse.tech/docs/en/data_types/uuid/) <!--hide-->
|
||||
|
@ -260,6 +260,11 @@ try
|
||||
if (mark_cache_size)
|
||||
global_context->setMarkCache(mark_cache_size);
|
||||
|
||||
/// A cache for mmapped files.
|
||||
size_t mmap_cache_size = config().getUInt64("mmap_cache_size", 1000); /// The choice of default is arbitrary.
|
||||
if (mmap_cache_size)
|
||||
global_context->setMMappedFileCache(mmap_cache_size);
|
||||
|
||||
/// Load global settings from default_profile and system_profile.
|
||||
global_context->setDefaultProfiles(config());
|
||||
|
||||
|
@ -829,6 +829,11 @@ int Server::main(const std::vector<std::string> & /*args*/)
|
||||
}
|
||||
global_context->setMarkCache(mark_cache_size);
|
||||
|
||||
/// A cache for mmapped files.
|
||||
size_t mmap_cache_size = config().getUInt64("mmap_cache_size", 1000); /// The choice of default is arbitrary.
|
||||
if (mmap_cache_size)
|
||||
global_context->setMMappedFileCache(mmap_cache_size);
|
||||
|
||||
#if USE_EMBEDDED_COMPILER
|
||||
size_t compiled_expression_cache_size = config().getUInt64("compiled_expression_cache_size", 500);
|
||||
CompiledExpressionCacheFactory::instance().init(compiled_expression_cache_size);
|
||||
|
@ -298,6 +298,25 @@
|
||||
<mark_cache_size>5368709120</mark_cache_size>
|
||||
|
||||
|
||||
<!-- If you enable the `min_bytes_to_use_mmap_io` setting,
|
||||
the data in MergeTree tables can be read with mmap to avoid copying from kernel to userspace.
|
||||
It makes sense only for large files and helps only if data reside in page cache.
|
||||
To avoid frequent open/mmap/munmap/close calls (which are very expensive due to consequent page faults)
|
||||
and to reuse mappings from several threads and queries,
|
||||
the cache of mapped files is maintained. Its size is the number of mapped regions (usually equal to the number of mapped files).
|
||||
The amount of data in mapped files can be monitored
|
||||
in system.metrics, system.metric_log by the MMappedFiles, MMappedFileBytes metrics
|
||||
and in system.asynchronous_metrics, system.asynchronous_metrics_log by the MMapCacheCells metric,
|
||||
and also in system.events, system.processes, system.query_log, system.query_thread_log by the
|
||||
CreatedReadBufferMMap, CreatedReadBufferMMapFailed, MMappedFileCacheHits, MMappedFileCacheMisses events.
|
||||
Note that the amount of data in mapped files does not consume memory directly and is not accounted
|
||||
in query or server memory usage - because this memory can be discarded similar to OS page cache.
|
||||
The cache is dropped (the files are closed) automatically on removal of old parts in MergeTree,
|
||||
also it can be dropped manually by the SYSTEM DROP MMAP CACHE query.
|
||||
-->
|
||||
<mmap_cache_size>1000</mmap_cache_size>
|
||||
|
||||
|
||||
<!-- Path to data directory, with trailing slash. -->
|
||||
<path>/var/lib/clickhouse/</path>
|
||||
|
||||
|
@ -124,6 +124,7 @@ enum class AccessType
|
||||
M(SYSTEM_DROP_DNS_CACHE, "SYSTEM DROP DNS, DROP DNS CACHE, DROP DNS", GLOBAL, SYSTEM_DROP_CACHE) \
|
||||
M(SYSTEM_DROP_MARK_CACHE, "SYSTEM DROP MARK, DROP MARK CACHE, DROP MARKS", GLOBAL, SYSTEM_DROP_CACHE) \
|
||||
M(SYSTEM_DROP_UNCOMPRESSED_CACHE, "SYSTEM DROP UNCOMPRESSED, DROP UNCOMPRESSED CACHE, DROP UNCOMPRESSED", GLOBAL, SYSTEM_DROP_CACHE) \
|
||||
M(SYSTEM_DROP_MMAP_CACHE, "SYSTEM DROP MMAP, DROP MMAP CACHE, DROP MMAP", GLOBAL, SYSTEM_DROP_CACHE) \
|
||||
M(SYSTEM_DROP_COMPILED_EXPRESSION_CACHE, "SYSTEM DROP COMPILED EXPRESSION, DROP COMPILED EXPRESSION CACHE, DROP COMPILED EXPRESSIONS", GLOBAL, SYSTEM_DROP_CACHE) \
|
||||
M(SYSTEM_DROP_CACHE, "DROP CACHE", GROUP, SYSTEM) \
|
||||
M(SYSTEM_RELOAD_CONFIG, "RELOAD CONFIG", GLOBAL, SYSTEM_RELOAD) \
|
||||
|
@ -66,6 +66,8 @@
|
||||
M(PartsWide, "Wide parts.") \
|
||||
M(PartsCompact, "Compact parts.") \
|
||||
M(PartsInMemory, "In-memory parts.") \
|
||||
M(MMappedFiles, "Total number of mmapped files.") \
|
||||
M(MMappedFileBytes, "Sum size of mmapped file regions.") \
|
||||
|
||||
namespace CurrentMetrics
|
||||
{
|
||||
|
@ -32,6 +32,8 @@
|
||||
M(UncompressedCacheHits, "") \
|
||||
M(UncompressedCacheMisses, "") \
|
||||
M(UncompressedCacheWeightLost, "") \
|
||||
M(MMappedFileCacheHits, "") \
|
||||
M(MMappedFileCacheMisses, "") \
|
||||
M(IOBufferAllocs, "") \
|
||||
M(IOBufferAllocBytes, "") \
|
||||
M(ArenaAllocChunks, "") \
|
||||
|
@ -33,33 +33,27 @@ bool CachedCompressedReadBuffer::nextImpl()
|
||||
|
||||
/// Let's check for the presence of a decompressed block in the cache, grab the ownership of this block, if it exists.
|
||||
UInt128 key = cache->hash(path, file_pos);
|
||||
owned_cell = cache->get(key);
|
||||
|
||||
if (!owned_cell)
|
||||
owned_cell = cache->getOrSet(key, [&]()
|
||||
{
|
||||
/// If not, read it from the file.
|
||||
initInput();
|
||||
file_in->seek(file_pos, SEEK_SET);
|
||||
|
||||
owned_cell = std::make_shared<UncompressedCacheCell>();
|
||||
auto cell = std::make_shared<UncompressedCacheCell>();
|
||||
|
||||
size_t size_decompressed;
|
||||
size_t size_compressed_without_checksum;
|
||||
owned_cell->compressed_size = readCompressedData(size_decompressed, size_compressed_without_checksum, false);
|
||||
cell->compressed_size = readCompressedData(size_decompressed, size_compressed_without_checksum, false);
|
||||
|
||||
if (owned_cell->compressed_size)
|
||||
if (cell->compressed_size)
|
||||
{
|
||||
owned_cell->additional_bytes = codec->getAdditionalSizeAtTheEndOfBuffer();
|
||||
owned_cell->data.resize(size_decompressed + owned_cell->additional_bytes);
|
||||
decompress(owned_cell->data.data(), size_decompressed, size_compressed_without_checksum);
|
||||
|
||||
cell->additional_bytes = codec->getAdditionalSizeAtTheEndOfBuffer();
|
||||
cell->data.resize(size_decompressed + cell->additional_bytes);
|
||||
decompressTo(cell->data.data(), size_decompressed, size_compressed_without_checksum);
|
||||
}
|
||||
|
||||
/// Put data into cache.
|
||||
/// NOTE: Even if we don't read anything (compressed_size == 0)
|
||||
/// because we can reuse this information and don't reopen file in future
|
||||
cache->set(key, owned_cell);
|
||||
}
|
||||
return cell;
|
||||
});
|
||||
|
||||
if (owned_cell->data.size() == 0)
|
||||
return false;
|
||||
|
@ -21,7 +21,7 @@ bool CompressedReadBuffer::nextImpl()
|
||||
memory.resize(size_decompressed + additional_size_at_the_end_of_buffer);
|
||||
working_buffer = Buffer(memory.data(), &memory[size_decompressed]);
|
||||
|
||||
decompress(working_buffer.begin(), size_decompressed, size_compressed_without_checksum);
|
||||
decompress(working_buffer, size_decompressed, size_compressed_without_checksum);
|
||||
|
||||
return true;
|
||||
}
|
||||
@ -48,7 +48,7 @@ size_t CompressedReadBuffer::readBig(char * to, size_t n)
|
||||
/// If the decompressed block fits entirely where it needs to be copied.
|
||||
if (size_decompressed + additional_size_at_the_end_of_buffer <= n - bytes_read)
|
||||
{
|
||||
decompress(to + bytes_read, size_decompressed, size_compressed_without_checksum);
|
||||
decompressTo(to + bytes_read, size_decompressed, size_compressed_without_checksum);
|
||||
bytes_read += size_decompressed;
|
||||
bytes += size_decompressed;
|
||||
}
|
||||
@ -61,9 +61,9 @@ size_t CompressedReadBuffer::readBig(char * to, size_t n)
|
||||
|
||||
memory.resize(size_decompressed + additional_size_at_the_end_of_buffer);
|
||||
working_buffer = Buffer(memory.data(), &memory[size_decompressed]);
|
||||
pos = working_buffer.begin();
|
||||
|
||||
decompress(working_buffer.begin(), size_decompressed, size_compressed_without_checksum);
|
||||
decompress(working_buffer, size_decompressed, size_compressed_without_checksum);
|
||||
pos = working_buffer.begin();
|
||||
|
||||
bytes_read += read(to + bytes_read, n - bytes_read);
|
||||
break;
|
||||
|
@ -184,7 +184,7 @@ size_t CompressedReadBufferBase::readCompressedData(size_t & size_decompressed,
|
||||
}
|
||||
|
||||
|
||||
void CompressedReadBufferBase::decompress(char * to, size_t size_decompressed, size_t size_compressed_without_checksum)
|
||||
static void readHeaderAndGetCodec(const char * compressed_buffer, size_t size_decompressed, CompressionCodecPtr & codec, bool allow_different_codecs)
|
||||
{
|
||||
ProfileEvents::increment(ProfileEvents::CompressedReadBufferBlocks);
|
||||
ProfileEvents::increment(ProfileEvents::CompressedReadBufferBytes, size_decompressed);
|
||||
@ -210,11 +210,38 @@ void CompressedReadBufferBase::decompress(char * to, size_t size_decompressed, s
|
||||
ErrorCodes::CANNOT_DECOMPRESS);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void CompressedReadBufferBase::decompressTo(char * to, size_t size_decompressed, size_t size_compressed_without_checksum)
|
||||
{
|
||||
readHeaderAndGetCodec(compressed_buffer, size_decompressed, codec, allow_different_codecs);
|
||||
codec->decompress(compressed_buffer, size_compressed_without_checksum, to);
|
||||
}
|
||||
|
||||
|
||||
void CompressedReadBufferBase::decompress(BufferBase::Buffer & to, size_t size_decompressed, size_t size_compressed_without_checksum)
|
||||
{
|
||||
readHeaderAndGetCodec(compressed_buffer, size_decompressed, codec, allow_different_codecs);
|
||||
|
||||
if (codec->isNone())
|
||||
{
|
||||
/// Shortcut for NONE codec to avoid extra memcpy.
|
||||
/// We doing it by changing the buffer `to` to point to existing uncompressed data.
|
||||
|
||||
UInt8 header_size = ICompressionCodec::getHeaderSize();
|
||||
if (size_compressed_without_checksum < header_size)
|
||||
throw Exception(ErrorCodes::CORRUPTED_DATA,
|
||||
"Can't decompress data: the compressed data size ({}, this should include header size) is less than the header size ({})",
|
||||
size_compressed_without_checksum, static_cast<size_t>(header_size));
|
||||
|
||||
to = BufferBase::Buffer(compressed_buffer + header_size, compressed_buffer + size_compressed_without_checksum);
|
||||
}
|
||||
else
|
||||
codec->decompress(compressed_buffer, size_compressed_without_checksum, to.begin());
|
||||
}
|
||||
|
||||
|
||||
/// 'compressed_in' could be initialized lazily, but before first call of 'readCompressedData'.
|
||||
CompressedReadBufferBase::CompressedReadBufferBase(ReadBuffer * in, bool allow_different_codecs_)
|
||||
: compressed_in(in), own_compressed_buffer(0), allow_different_codecs(allow_different_codecs_)
|
||||
|
@ -3,6 +3,7 @@
|
||||
#include <Common/PODArray.h>
|
||||
#include <Compression/LZ4_decompress_faster.h>
|
||||
#include <Compression/ICompressionCodec.h>
|
||||
#include <IO/BufferBase.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
@ -37,7 +38,12 @@ protected:
|
||||
/// Returns number of compressed bytes read.
|
||||
size_t readCompressedData(size_t & size_decompressed, size_t & size_compressed_without_checksum, bool always_copy);
|
||||
|
||||
void decompress(char * to, size_t size_decompressed, size_t size_compressed_without_checksum);
|
||||
/// Decompress into memory pointed by `to`
|
||||
void decompressTo(char * to, size_t size_decompressed, size_t size_compressed_without_checksum);
|
||||
|
||||
/// This method can change location of `to` to avoid unnecessary copy if data is uncompressed.
|
||||
/// It is more efficient for compression codec NONE but not suitable if you want to decompress into specific location.
|
||||
void decompress(BufferBase::Buffer & to, size_t size_decompressed, size_t size_compressed_without_checksum);
|
||||
|
||||
public:
|
||||
/// 'compressed_in' could be initialized lazily, but before first call of 'readCompressedData'.
|
||||
|
@ -31,7 +31,7 @@ bool CompressedReadBufferFromFile::nextImpl()
|
||||
memory.resize(size_decompressed + additional_size_at_the_end_of_buffer);
|
||||
working_buffer = Buffer(memory.data(), &memory[size_decompressed]);
|
||||
|
||||
decompress(working_buffer.begin(), size_decompressed, size_compressed_without_checksum);
|
||||
decompress(working_buffer, size_decompressed, size_compressed_without_checksum);
|
||||
|
||||
return true;
|
||||
}
|
||||
@ -45,9 +45,15 @@ CompressedReadBufferFromFile::CompressedReadBufferFromFile(std::unique_ptr<ReadB
|
||||
|
||||
|
||||
CompressedReadBufferFromFile::CompressedReadBufferFromFile(
|
||||
const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, size_t buf_size, bool allow_different_codecs_)
|
||||
const std::string & path,
|
||||
size_t estimated_size,
|
||||
size_t aio_threshold,
|
||||
size_t mmap_threshold,
|
||||
MMappedFileCache * mmap_cache,
|
||||
size_t buf_size,
|
||||
bool allow_different_codecs_)
|
||||
: BufferWithOwnMemory<ReadBuffer>(0)
|
||||
, p_file_in(createReadBufferFromFileBase(path, estimated_size, aio_threshold, mmap_threshold, buf_size))
|
||||
, p_file_in(createReadBufferFromFileBase(path, estimated_size, aio_threshold, mmap_threshold, mmap_cache, buf_size))
|
||||
, file_in(*p_file_in)
|
||||
{
|
||||
compressed_in = &file_in;
|
||||
@ -108,7 +114,7 @@ size_t CompressedReadBufferFromFile::readBig(char * to, size_t n)
|
||||
/// If the decompressed block fits entirely where it needs to be copied.
|
||||
if (size_decompressed + additional_size_at_the_end_of_buffer <= n - bytes_read)
|
||||
{
|
||||
decompress(to + bytes_read, size_decompressed, size_compressed_without_checksum);
|
||||
decompressTo(to + bytes_read, size_decompressed, size_compressed_without_checksum);
|
||||
bytes_read += size_decompressed;
|
||||
bytes += size_decompressed;
|
||||
}
|
||||
@ -122,9 +128,9 @@ size_t CompressedReadBufferFromFile::readBig(char * to, size_t n)
|
||||
|
||||
memory.resize(size_decompressed + additional_size_at_the_end_of_buffer);
|
||||
working_buffer = Buffer(memory.data(), &memory[size_decompressed]);
|
||||
pos = working_buffer.begin();
|
||||
|
||||
decompress(working_buffer.begin(), size_decompressed, size_compressed_without_checksum);
|
||||
decompress(working_buffer, size_decompressed, size_compressed_without_checksum);
|
||||
pos = working_buffer.begin();
|
||||
|
||||
bytes_read += read(to + bytes_read, n - bytes_read);
|
||||
break;
|
||||
|
@ -9,6 +9,8 @@
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class MMappedFileCache;
|
||||
|
||||
|
||||
/// Unlike CompressedReadBuffer, it can do seek.
|
||||
class CompressedReadBufferFromFile : public CompressedReadBufferBase, public BufferWithOwnMemory<ReadBuffer>
|
||||
@ -31,7 +33,7 @@ public:
|
||||
CompressedReadBufferFromFile(std::unique_ptr<ReadBufferFromFileBase> buf, bool allow_different_codecs_ = false);
|
||||
|
||||
CompressedReadBufferFromFile(
|
||||
const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold,
|
||||
const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, MMappedFileCache * mmap_cache,
|
||||
size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, bool allow_different_codecs_ = false);
|
||||
|
||||
void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block);
|
||||
|
@ -98,7 +98,7 @@ UInt32 ICompressionCodec::decompress(const char * source, UInt32 source_size, ch
|
||||
|
||||
UInt8 header_size = getHeaderSize();
|
||||
if (source_size < header_size)
|
||||
throw Exception(ErrorCodes::CORRUPTED_DATA, "Can't decompress data: the compressed data size ({}), this should include header size) is less than the header size ({})", source_size, size_t(header_size));
|
||||
throw Exception(ErrorCodes::CORRUPTED_DATA, "Can't decompress data: the compressed data size ({}, this should include header size) is less than the header size ({})", source_size, static_cast<size_t>(header_size));
|
||||
|
||||
uint8_t our_method = getMethodByte();
|
||||
uint8_t method = source[0];
|
||||
|
@ -37,7 +37,7 @@ int main(int argc, char ** argv)
|
||||
path,
|
||||
[&]()
|
||||
{
|
||||
return createReadBufferFromFileBase(path, 0, 0, 0);
|
||||
return createReadBufferFromFileBase(path, 0, 0, 0, nullptr);
|
||||
},
|
||||
&cache
|
||||
);
|
||||
@ -56,7 +56,7 @@ int main(int argc, char ** argv)
|
||||
path,
|
||||
[&]()
|
||||
{
|
||||
return createReadBufferFromFileBase(path, 0, 0, 0);
|
||||
return createReadBufferFromFileBase(path, 0, 0, 0, nullptr);
|
||||
},
|
||||
&cache
|
||||
);
|
||||
|
@ -31,6 +31,8 @@ struct Settings;
|
||||
M(UInt64, rotate_log_storage_interval, 10000, "How many records will be stored in one log storage file", 0) \
|
||||
M(UInt64, snapshots_to_keep, 3, "How many compressed snapshots to keep on disk", 0) \
|
||||
M(UInt64, stale_log_gap, 10000, "When node became stale and should receive snapshots from leader", 0) \
|
||||
M(UInt64, fresh_log_gap, 200, "When node became fresh", 0) \
|
||||
M(Bool, quorum_reads, false, "Execute read requests as writes through whole RAFT consesus with similar speed", 0) \
|
||||
M(Bool, force_sync, true, " Call fsync on each change in RAFT changelog", 0)
|
||||
|
||||
DECLARE_SETTINGS_TRAITS(CoordinationSettingsTraits, LIST_OF_COORDINATION_SETTINGS)
|
||||
|
@ -30,6 +30,8 @@ NuKeeperServer::NuKeeperServer(
|
||||
, state_manager(nuraft::cs_new<NuKeeperStateManager>(server_id, "test_keeper_server", config, coordination_settings))
|
||||
, responses_queue(responses_queue_)
|
||||
{
|
||||
if (coordination_settings->quorum_reads)
|
||||
LOG_WARNING(&Poco::Logger::get("NuKeeperServer"), "Quorum reads enabled, NuKeeper will work slower.");
|
||||
}
|
||||
|
||||
void NuKeeperServer::startup()
|
||||
@ -59,6 +61,7 @@ void NuKeeperServer::startup()
|
||||
params.reserved_log_items_ = coordination_settings->reserved_log_items;
|
||||
params.snapshot_distance_ = coordination_settings->snapshot_distance;
|
||||
params.stale_log_gap_ = coordination_settings->stale_log_gap;
|
||||
params.fresh_log_gap_ = coordination_settings->fresh_log_gap;
|
||||
params.client_req_timeout_ = coordination_settings->operation_timeout_ms.totalMilliseconds();
|
||||
params.auto_forwarding_ = coordination_settings->auto_forwarding;
|
||||
params.auto_forwarding_req_timeout_ = coordination_settings->operation_timeout_ms.totalMilliseconds() * 2;
|
||||
@ -106,7 +109,7 @@ nuraft::ptr<nuraft::buffer> getZooKeeperLogEntry(int64_t session_id, const Coord
|
||||
void NuKeeperServer::putRequest(const NuKeeperStorage::RequestForSession & request_for_session)
|
||||
{
|
||||
auto [session_id, request] = request_for_session;
|
||||
if (isLeaderAlive() && request->isReadRequest())
|
||||
if (!coordination_settings->quorum_reads && isLeaderAlive() && request->isReadRequest())
|
||||
{
|
||||
state_machine->processReadRequest(request_for_session);
|
||||
}
|
||||
@ -185,6 +188,9 @@ nuraft::cb_func::ReturnCode NuKeeperServer::callbackFunc(nuraft::cb_func::Type t
|
||||
if (next_index < last_commited || next_index - last_commited <= 1)
|
||||
commited_store = true;
|
||||
|
||||
if (initialized_flag)
|
||||
return nuraft::cb_func::ReturnCode::Ok;
|
||||
|
||||
auto set_initialized = [this] ()
|
||||
{
|
||||
std::unique_lock lock(initialized_mutex);
|
||||
@ -196,10 +202,27 @@ nuraft::cb_func::ReturnCode NuKeeperServer::callbackFunc(nuraft::cb_func::Type t
|
||||
{
|
||||
case nuraft::cb_func::BecomeLeader:
|
||||
{
|
||||
if (commited_store) /// We become leader and store is empty, ready to serve requests
|
||||
/// We become leader and store is empty or we already committed it
|
||||
if (commited_store || initial_batch_committed)
|
||||
set_initialized();
|
||||
return nuraft::cb_func::ReturnCode::Ok;
|
||||
}
|
||||
case nuraft::cb_func::BecomeFollower:
|
||||
case nuraft::cb_func::GotAppendEntryReqFromLeader:
|
||||
{
|
||||
if (isLeaderAlive())
|
||||
{
|
||||
auto leader_index = raft_instance->get_leader_committed_log_idx();
|
||||
auto our_index = raft_instance->get_committed_log_idx();
|
||||
/// This may happen when we start RAFT cluster from scratch.
|
||||
/// Node first became leader, and after that some other node became leader.
|
||||
/// BecameFresh for this node will not be called because it was already fresh
|
||||
/// when it was leader.
|
||||
if (leader_index < our_index + coordination_settings->fresh_log_gap)
|
||||
set_initialized();
|
||||
}
|
||||
return nuraft::cb_func::ReturnCode::Ok;
|
||||
}
|
||||
case nuraft::cb_func::BecomeFresh:
|
||||
{
|
||||
set_initialized(); /// We are fresh follower, ready to serve requests.
|
||||
@ -209,6 +232,7 @@ nuraft::cb_func::ReturnCode NuKeeperServer::callbackFunc(nuraft::cb_func::Type t
|
||||
{
|
||||
if (isLeader()) /// We have committed our log store and we are leader, ready to serve requests.
|
||||
set_initialized();
|
||||
initial_batch_committed = true;
|
||||
return nuraft::cb_func::ReturnCode::Ok;
|
||||
}
|
||||
default: /// ignore other events
|
||||
@ -220,7 +244,7 @@ void NuKeeperServer::waitInit()
|
||||
{
|
||||
std::unique_lock lock(initialized_mutex);
|
||||
int64_t timeout = coordination_settings->startup_timeout.totalMilliseconds();
|
||||
if (!initialized_cv.wait_for(lock, std::chrono::milliseconds(timeout), [&] { return initialized_flag; }))
|
||||
if (!initialized_cv.wait_for(lock, std::chrono::milliseconds(timeout), [&] { return initialized_flag.load(); }))
|
||||
throw Exception(ErrorCodes::RAFT_ERROR, "Failed to wait RAFT initialization");
|
||||
}
|
||||
|
||||
|
@ -31,8 +31,9 @@ private:
|
||||
ResponsesQueue & responses_queue;
|
||||
|
||||
std::mutex initialized_mutex;
|
||||
bool initialized_flag = false;
|
||||
std::atomic<bool> initialized_flag = false;
|
||||
std::condition_variable initialized_cv;
|
||||
std::atomic<bool> initial_batch_committed = false;
|
||||
|
||||
nuraft::cb_func::ReturnCode callbackFunc(nuraft::cb_func::Type type, nuraft::cb_func::Param * param);
|
||||
|
||||
|
@ -241,9 +241,10 @@ NuKeeperStorageSnapshot::~NuKeeperStorageSnapshot()
|
||||
storage->disableSnapshotMode();
|
||||
}
|
||||
|
||||
NuKeeperSnapshotManager::NuKeeperSnapshotManager(const std::string & snapshots_path_, size_t snapshots_to_keep_)
|
||||
NuKeeperSnapshotManager::NuKeeperSnapshotManager(const std::string & snapshots_path_, size_t snapshots_to_keep_, size_t storage_tick_time_)
|
||||
: snapshots_path(snapshots_path_)
|
||||
, snapshots_to_keep(snapshots_to_keep_)
|
||||
, storage_tick_time(storage_tick_time_)
|
||||
{
|
||||
namespace fs = std::filesystem;
|
||||
|
||||
@ -325,22 +326,24 @@ nuraft::ptr<nuraft::buffer> NuKeeperSnapshotManager::serializeSnapshotToBuffer(c
|
||||
return writer.getBuffer();
|
||||
}
|
||||
|
||||
SnapshotMetadataPtr NuKeeperSnapshotManager::deserializeSnapshotFromBuffer(NuKeeperStorage * storage, nuraft::ptr<nuraft::buffer> buffer)
|
||||
SnapshotMetaAndStorage NuKeeperSnapshotManager::deserializeSnapshotFromBuffer(nuraft::ptr<nuraft::buffer> buffer) const
|
||||
{
|
||||
ReadBufferFromNuraftBuffer reader(buffer);
|
||||
CompressedReadBuffer compressed_reader(reader);
|
||||
return NuKeeperStorageSnapshot::deserialize(*storage, compressed_reader);
|
||||
auto storage = std::make_unique<NuKeeperStorage>(storage_tick_time);
|
||||
auto snapshot_metadata = NuKeeperStorageSnapshot::deserialize(*storage, compressed_reader);
|
||||
return std::make_pair(snapshot_metadata, std::move(storage));
|
||||
}
|
||||
|
||||
SnapshotMetadataPtr NuKeeperSnapshotManager::restoreFromLatestSnapshot(NuKeeperStorage * storage)
|
||||
SnapshotMetaAndStorage NuKeeperSnapshotManager::restoreFromLatestSnapshot()
|
||||
{
|
||||
if (existing_snapshots.empty())
|
||||
return nullptr;
|
||||
return {};
|
||||
|
||||
auto buffer = deserializeLatestSnapshotBufferFromDisk();
|
||||
if (!buffer)
|
||||
return nullptr;
|
||||
return deserializeSnapshotFromBuffer(storage, buffer);
|
||||
return {};
|
||||
return deserializeSnapshotFromBuffer(buffer);
|
||||
}
|
||||
|
||||
void NuKeeperSnapshotManager::removeOutdatedSnapshotsIfNeeded()
|
||||
|
@ -40,17 +40,20 @@ public:
|
||||
using NuKeeperStorageSnapshotPtr = std::shared_ptr<NuKeeperStorageSnapshot>;
|
||||
using CreateSnapshotCallback = std::function<void(NuKeeperStorageSnapshotPtr &&)>;
|
||||
|
||||
|
||||
using SnapshotMetaAndStorage = std::pair<SnapshotMetadataPtr, NuKeeperStoragePtr>;
|
||||
|
||||
class NuKeeperSnapshotManager
|
||||
{
|
||||
public:
|
||||
NuKeeperSnapshotManager(const std::string & snapshots_path_, size_t snapshots_to_keep_);
|
||||
NuKeeperSnapshotManager(const std::string & snapshots_path_, size_t snapshots_to_keep_, size_t storage_tick_time_ = 500);
|
||||
|
||||
SnapshotMetadataPtr restoreFromLatestSnapshot(NuKeeperStorage * storage);
|
||||
SnapshotMetaAndStorage restoreFromLatestSnapshot();
|
||||
|
||||
static nuraft::ptr<nuraft::buffer> serializeSnapshotToBuffer(const NuKeeperStorageSnapshot & snapshot);
|
||||
std::string serializeSnapshotBufferToDisk(nuraft::buffer & buffer, size_t up_to_log_idx);
|
||||
|
||||
static SnapshotMetadataPtr deserializeSnapshotFromBuffer(NuKeeperStorage * storage, nuraft::ptr<nuraft::buffer> buffer);
|
||||
SnapshotMetaAndStorage deserializeSnapshotFromBuffer(nuraft::ptr<nuraft::buffer> buffer) const;
|
||||
|
||||
nuraft::ptr<nuraft::buffer> deserializeSnapshotBufferFromDisk(size_t up_to_log_idx) const;
|
||||
nuraft::ptr<nuraft::buffer> deserializeLatestSnapshotBufferFromDisk();
|
||||
@ -74,6 +77,7 @@ private:
|
||||
const std::string snapshots_path;
|
||||
const size_t snapshots_to_keep;
|
||||
std::map<size_t, std::string> existing_snapshots;
|
||||
size_t storage_tick_time;
|
||||
};
|
||||
|
||||
struct CreateSnapshotTask
|
||||
|
@ -4,6 +4,7 @@
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <Common/ZooKeeper/ZooKeeperIO.h>
|
||||
#include <Coordination/NuKeeperSnapshotManager.h>
|
||||
#include <future>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -37,8 +38,7 @@ NuKeeperStorage::RequestForSession parseRequest(nuraft::buffer & data)
|
||||
|
||||
NuKeeperStateMachine::NuKeeperStateMachine(ResponsesQueue & responses_queue_, SnapshotsQueue & snapshots_queue_, const std::string & snapshots_path_, const CoordinationSettingsPtr & coordination_settings_)
|
||||
: coordination_settings(coordination_settings_)
|
||||
, storage(coordination_settings->dead_session_check_period_ms.totalMilliseconds())
|
||||
, snapshot_manager(snapshots_path_, coordination_settings->snapshots_to_keep)
|
||||
, snapshot_manager(snapshots_path_, coordination_settings->snapshots_to_keep, coordination_settings->dead_session_check_period_ms.totalMicroseconds())
|
||||
, responses_queue(responses_queue_)
|
||||
, snapshots_queue(snapshots_queue_)
|
||||
, last_committed_idx(0)
|
||||
@ -60,7 +60,7 @@ void NuKeeperStateMachine::init()
|
||||
try
|
||||
{
|
||||
latest_snapshot_buf = snapshot_manager.deserializeSnapshotBufferFromDisk(latest_log_index);
|
||||
latest_snapshot_meta = snapshot_manager.deserializeSnapshotFromBuffer(&storage, latest_snapshot_buf);
|
||||
std::tie(latest_snapshot_meta, storage) = snapshot_manager.deserializeSnapshotFromBuffer(latest_snapshot_buf);
|
||||
last_committed_idx = latest_snapshot_meta->get_last_log_idx();
|
||||
loaded = true;
|
||||
break;
|
||||
@ -83,6 +83,9 @@ void NuKeeperStateMachine::init()
|
||||
{
|
||||
LOG_DEBUG(log, "No existing snapshots, last committed log index {}", last_committed_idx);
|
||||
}
|
||||
|
||||
if (!storage)
|
||||
storage = std::make_unique<NuKeeperStorage>(coordination_settings->dead_session_check_period_ms.totalMilliseconds());
|
||||
}
|
||||
|
||||
nuraft::ptr<nuraft::buffer> NuKeeperStateMachine::commit(const size_t log_idx, nuraft::buffer & data)
|
||||
@ -96,7 +99,7 @@ nuraft::ptr<nuraft::buffer> NuKeeperStateMachine::commit(const size_t log_idx, n
|
||||
nuraft::buffer_serializer bs(response);
|
||||
{
|
||||
std::lock_guard lock(storage_lock);
|
||||
session_id = storage.getSessionID(session_timeout_ms);
|
||||
session_id = storage->getSessionID(session_timeout_ms);
|
||||
bs.put_i64(session_id);
|
||||
}
|
||||
LOG_DEBUG(log, "Session ID response {} with timeout {}", session_id, session_timeout_ms);
|
||||
@ -109,7 +112,7 @@ nuraft::ptr<nuraft::buffer> NuKeeperStateMachine::commit(const size_t log_idx, n
|
||||
NuKeeperStorage::ResponsesForSessions responses_for_sessions;
|
||||
{
|
||||
std::lock_guard lock(storage_lock);
|
||||
responses_for_sessions = storage.processRequest(request_for_session.request, request_for_session.session_id, log_idx);
|
||||
responses_for_sessions = storage->processRequest(request_for_session.request, request_for_session.session_id, log_idx);
|
||||
for (auto & response_for_session : responses_for_sessions)
|
||||
responses_queue.push(response_for_session);
|
||||
}
|
||||
@ -133,7 +136,7 @@ bool NuKeeperStateMachine::apply_snapshot(nuraft::snapshot & s)
|
||||
|
||||
{
|
||||
std::lock_guard lock(storage_lock);
|
||||
snapshot_manager.deserializeSnapshotFromBuffer(&storage, latest_snapshot_ptr);
|
||||
std::tie(latest_snapshot_meta, storage) = snapshot_manager.deserializeSnapshotFromBuffer(latest_snapshot_ptr);
|
||||
}
|
||||
last_committed_idx = s.get_last_log_idx();
|
||||
return true;
|
||||
@ -157,7 +160,7 @@ void NuKeeperStateMachine::create_snapshot(
|
||||
CreateSnapshotTask snapshot_task;
|
||||
{
|
||||
std::lock_guard lock(storage_lock);
|
||||
snapshot_task.snapshot = std::make_shared<NuKeeperStorageSnapshot>(&storage, snapshot_meta_copy);
|
||||
snapshot_task.snapshot = std::make_shared<NuKeeperStorageSnapshot>(storage.get(), snapshot_meta_copy);
|
||||
}
|
||||
|
||||
snapshot_task.create_snapshot = [this, when_done] (NuKeeperStorageSnapshotPtr && snapshot)
|
||||
@ -179,7 +182,7 @@ void NuKeeperStateMachine::create_snapshot(
|
||||
{
|
||||
/// Must do it with lock (clearing elements from list)
|
||||
std::lock_guard lock(storage_lock);
|
||||
storage.clearGarbageAfterSnapshot();
|
||||
storage->clearGarbageAfterSnapshot();
|
||||
/// Destroy snapshot with lock
|
||||
snapshot.reset();
|
||||
LOG_TRACE(log, "Cleared garbage after snapshot");
|
||||
@ -214,7 +217,7 @@ void NuKeeperStateMachine::save_logical_snp_obj(
|
||||
if (obj_id == 0)
|
||||
{
|
||||
std::lock_guard lock(storage_lock);
|
||||
NuKeeperStorageSnapshot snapshot(&storage, s.get_last_log_idx());
|
||||
NuKeeperStorageSnapshot snapshot(storage.get(), s.get_last_log_idx());
|
||||
cloned_buffer = snapshot_manager.serializeSnapshotToBuffer(snapshot);
|
||||
}
|
||||
else
|
||||
@ -225,7 +228,28 @@ void NuKeeperStateMachine::save_logical_snp_obj(
|
||||
nuraft::ptr<nuraft::buffer> snp_buf = s.serialize();
|
||||
cloned_meta = nuraft::snapshot::deserialize(*snp_buf);
|
||||
|
||||
auto result_path = snapshot_manager.serializeSnapshotBufferToDisk(*cloned_buffer, s.get_last_log_idx());
|
||||
/// Sometimes NuRaft can call save and create snapshots from different threads
|
||||
/// at once. To avoid race conditions we serialize snapshots through snapshots_queue
|
||||
/// TODO: make something better
|
||||
CreateSnapshotTask snapshot_task;
|
||||
std::shared_ptr<std::promise<void>> waiter = std::make_shared<std::promise<void>>();
|
||||
auto future = waiter->get_future();
|
||||
snapshot_task.snapshot = nullptr;
|
||||
snapshot_task.create_snapshot = [this, waiter, cloned_buffer, log_idx = s.get_last_log_idx()] (NuKeeperStorageSnapshotPtr &&)
|
||||
{
|
||||
try
|
||||
{
|
||||
auto result_path = snapshot_manager.serializeSnapshotBufferToDisk(*cloned_buffer, log_idx);
|
||||
LOG_DEBUG(log, "Saved snapshot {} to path {}", log_idx, result_path);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(log);
|
||||
}
|
||||
waiter->set_value();
|
||||
};
|
||||
snapshots_queue.push(std::move(snapshot_task));
|
||||
future.wait();
|
||||
|
||||
{
|
||||
std::lock_guard lock(snapshots_lock);
|
||||
@ -233,7 +257,6 @@ void NuKeeperStateMachine::save_logical_snp_obj(
|
||||
latest_snapshot_meta = cloned_meta;
|
||||
}
|
||||
|
||||
LOG_DEBUG(log, "Created snapshot {} with path {}", s.get_last_log_idx(), result_path);
|
||||
|
||||
obj_id++;
|
||||
}
|
||||
@ -271,7 +294,7 @@ void NuKeeperStateMachine::processReadRequest(const NuKeeperStorage::RequestForS
|
||||
NuKeeperStorage::ResponsesForSessions responses;
|
||||
{
|
||||
std::lock_guard lock(storage_lock);
|
||||
responses = storage.processRequest(request_for_session.request, request_for_session.session_id, std::nullopt);
|
||||
responses = storage->processRequest(request_for_session.request, request_for_session.session_id, std::nullopt);
|
||||
}
|
||||
for (const auto & response : responses)
|
||||
responses_queue.push(response);
|
||||
@ -280,13 +303,13 @@ void NuKeeperStateMachine::processReadRequest(const NuKeeperStorage::RequestForS
|
||||
std::unordered_set<int64_t> NuKeeperStateMachine::getDeadSessions()
|
||||
{
|
||||
std::lock_guard lock(storage_lock);
|
||||
return storage.getDeadSessions();
|
||||
return storage->getDeadSessions();
|
||||
}
|
||||
|
||||
void NuKeeperStateMachine::shutdownStorage()
|
||||
{
|
||||
std::lock_guard lock(storage_lock);
|
||||
storage.finalize();
|
||||
storage->finalize();
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -52,7 +52,7 @@ public:
|
||||
|
||||
NuKeeperStorage & getStorage()
|
||||
{
|
||||
return storage;
|
||||
return *storage;
|
||||
}
|
||||
|
||||
void processReadRequest(const NuKeeperStorage::RequestForSession & request_for_session);
|
||||
@ -68,7 +68,7 @@ private:
|
||||
|
||||
CoordinationSettingsPtr coordination_settings;
|
||||
|
||||
NuKeeperStorage storage;
|
||||
NuKeeperStoragePtr storage;
|
||||
|
||||
NuKeeperSnapshotManager snapshot_manager;
|
||||
|
||||
|
@ -233,7 +233,7 @@ struct NuKeeperStorageGetRequest final : public NuKeeperStorageRequest
|
||||
struct NuKeeperStorageRemoveRequest final : public NuKeeperStorageRequest
|
||||
{
|
||||
using NuKeeperStorageRequest::NuKeeperStorageRequest;
|
||||
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(NuKeeperStorage::Container & container, NuKeeperStorage::Ephemerals & ephemerals, int64_t /*zxid*/, int64_t session_id) const override
|
||||
std::pair<Coordination::ZooKeeperResponsePtr, Undo> process(NuKeeperStorage::Container & container, NuKeeperStorage::Ephemerals & ephemerals, int64_t /*zxid*/, int64_t /*session_id*/) const override
|
||||
{
|
||||
Coordination::ZooKeeperResponsePtr response_ptr = zk_request->makeResponse();
|
||||
Coordination::ZooKeeperRemoveResponse & response = dynamic_cast<Coordination::ZooKeeperRemoveResponse &>(*response_ptr);
|
||||
@ -257,7 +257,12 @@ struct NuKeeperStorageRemoveRequest final : public NuKeeperStorageRequest
|
||||
{
|
||||
auto prev_node = it->value;
|
||||
if (prev_node.stat.ephemeralOwner != 0)
|
||||
ephemerals[session_id].erase(request.path);
|
||||
{
|
||||
auto ephemerals_it = ephemerals.find(prev_node.stat.ephemeralOwner);
|
||||
ephemerals_it->second.erase(request.path);
|
||||
if (ephemerals_it->second.empty())
|
||||
ephemerals.erase(ephemerals_it);
|
||||
}
|
||||
|
||||
auto child_basename = getBaseName(it->key);
|
||||
container.updateValue(parentPath(request.path), [&child_basename] (NuKeeperStorage::Node & parent)
|
||||
@ -271,10 +276,10 @@ struct NuKeeperStorageRemoveRequest final : public NuKeeperStorageRequest
|
||||
|
||||
container.erase(request.path);
|
||||
|
||||
undo = [prev_node, &container, &ephemerals, session_id, path = request.path, child_basename]
|
||||
undo = [prev_node, &container, &ephemerals, path = request.path, child_basename]
|
||||
{
|
||||
if (prev_node.stat.ephemeralOwner != 0)
|
||||
ephemerals[session_id].emplace(path);
|
||||
ephemerals[prev_node.stat.ephemeralOwner].emplace(path);
|
||||
|
||||
container.insert(path, prev_node);
|
||||
container.updateValue(parentPath(path), [&child_basename] (NuKeeperStorage::Node & parent)
|
||||
@ -377,7 +382,6 @@ struct NuKeeperStorageSetRequest final : public NuKeeperStorageRequest
|
||||
{
|
||||
return processWatchesImpl(zk_request->getPath(), watches, list_watches, Coordination::Event::CHANGED);
|
||||
}
|
||||
|
||||
};
|
||||
|
||||
struct NuKeeperStorageListRequest final : public NuKeeperStorageRequest
|
||||
@ -641,6 +645,13 @@ NuKeeperStorage::ResponsesForSessions NuKeeperStorage::processRequest(const Coor
|
||||
for (const auto & ephemeral_path : it->second)
|
||||
{
|
||||
container.erase(ephemeral_path);
|
||||
container.updateValue(parentPath(ephemeral_path), [&ephemeral_path] (NuKeeperStorage::Node & parent)
|
||||
{
|
||||
--parent.stat.numChildren;
|
||||
++parent.stat.cversion;
|
||||
parent.children.erase(getBaseName(ephemeral_path));
|
||||
});
|
||||
|
||||
auto responses = processWatchesImpl(ephemeral_path, watches, list_watches, Coordination::Event::DELETED);
|
||||
results.insert(results.end(), responses.begin(), responses.end());
|
||||
}
|
||||
|
@ -131,4 +131,6 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
using NuKeeperStoragePtr = std::unique_ptr<NuKeeperStorage>;
|
||||
|
||||
}
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user