mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-25 09:02:00 +00:00
WIP update-aggregate-funcions-in-zh
This commit is contained in:
parent
977ebe8b44
commit
8354cdd0e1
11
docs/zh/faq/terms_translation_zh.md
Normal file
11
docs/zh/faq/terms_translation_zh.md
Normal file
@ -0,0 +1,11 @@
|
||||
# 术语翻译约定
|
||||
本文档用来维护从英文翻译成中文的术语集。
|
||||
|
||||
## 保持英文,不译
|
||||
Parquet
|
||||
|
||||
## 英文 <-> 中文
|
||||
Tuple 元组
|
||||
|
||||
|
||||
|
@ -69,49 +69,6 @@ SELECT count(DISTINCT num) FROM t
|
||||
└────────────────┘
|
||||
```
|
||||
|
||||
这个例子表明 `count(DISTINCT num)` 由执行 `uniqExact` 根据功能 `count_distinct_implementation` 设定值。
|
||||
|
||||
## any(x) {#agg_function-any}
|
||||
|
||||
选择第一个遇到的值。
|
||||
查询可以以任何顺序执行,甚至每次都以不同的顺序执行,因此此函数的结果是不确定的。
|
||||
要获得确定的结果,您可以使用 ‘min’ 或 ‘max’ 功能,而不是 ‘any’.
|
||||
|
||||
在某些情况下,可以依靠执行的顺序。 这适用于SELECT来自使用ORDER BY的子查询的情况。
|
||||
|
||||
当一个 `SELECT` 查询具有 `GROUP BY` 子句或至少一个聚合函数,ClickHouse(相对于MySQL)要求在所有表达式 `SELECT`, `HAVING`,和 `ORDER BY` 子句可以从键或聚合函数计算。 换句话说,从表中选择的每个列必须在键或聚合函数内使用。 要获得像MySQL这样的行为,您可以将其他列放在 `any` 聚合函数。
|
||||
|
||||
## anyHeavy(x) {#anyheavyx}
|
||||
|
||||
使用选择一个频繁出现的值 [重打者](http://www.cs.umd.edu/~samir/498/karp.pdf) 算法。 如果某个值在查询的每个执行线程中出现的情况超过一半,则返回此值。 通常情况下,结果是不确定的。
|
||||
|
||||
``` sql
|
||||
anyHeavy(column)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `column` – The column name.
|
||||
|
||||
**示例**
|
||||
|
||||
就拿 [时间](../../getting-started/example-datasets/ontime.md) 数据集,并选择在任何频繁出现的值 `AirlineID` 列。
|
||||
|
||||
``` sql
|
||||
SELECT anyHeavy(AirlineID) AS res
|
||||
FROM ontime
|
||||
```
|
||||
|
||||
``` text
|
||||
┌───res─┐
|
||||
│ 19690 │
|
||||
└───────┘
|
||||
```
|
||||
|
||||
## anyLast(x) {#anylastx}
|
||||
|
||||
选择遇到的最后一个值。
|
||||
其结果是一样不确定的 `any` 功能。
|
||||
|
||||
## groupBitAnd {#groupbitand}
|
||||
|
||||
@ -283,46 +240,6 @@ num
|
||||
3
|
||||
```
|
||||
|
||||
## min(x) {#agg_function-min}
|
||||
|
||||
计算最小值。
|
||||
|
||||
## max(x) {#agg_function-max}
|
||||
|
||||
计算最大值。
|
||||
|
||||
## argMin(arg,val) {#agg-function-argmin}
|
||||
|
||||
计算 ‘arg’ 最小值的值 ‘val’ 价值。 如果有几个不同的值 ‘arg’ 对于最小值 ‘val’,遇到的第一个值是输出。
|
||||
|
||||
**示例:**
|
||||
|
||||
``` text
|
||||
┌─user─────┬─salary─┐
|
||||
│ director │ 5000 │
|
||||
│ manager │ 3000 │
|
||||
│ worker │ 1000 │
|
||||
└──────────┴────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT argMin(user, salary) FROM salary
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─argMin(user, salary)─┐
|
||||
│ worker │
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
## argMax(arg,val) {#agg-function-argmax}
|
||||
|
||||
计算 ‘arg’ 最大值 ‘val’ 价值。 如果有几个不同的值 ‘arg’ 对于最大值 ‘val’,遇到的第一个值是输出。
|
||||
|
||||
## sum(x) {#agg_function-sum}
|
||||
|
||||
计算总和。
|
||||
只适用于数字。
|
||||
|
||||
## sumWithOverflow(x) {#sumwithoverflowx}
|
||||
|
||||
@ -462,12 +379,6 @@ kurtSamp(expr)
|
||||
SELECT kurtSamp(value) FROM series_with_value_column
|
||||
```
|
||||
|
||||
## avg(x) {#agg_function-avg}
|
||||
|
||||
计算平均值。
|
||||
只适用于数字。
|
||||
结果总是Float64。
|
||||
|
||||
## avgWeighted {#avgweighted}
|
||||
|
||||
计算 [加权算术平均值](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean).
|
||||
|
13
docs/zh/sql-reference/aggregate-functions/reference/any.md
Normal file
13
docs/zh/sql-reference/aggregate-functions/reference/any.md
Normal file
@ -0,0 +1,13 @@
|
||||
---
|
||||
toc_priority: 6
|
||||
---
|
||||
|
||||
# any(x) {#agg_function-any}
|
||||
|
||||
选择第一个遇到的值。
|
||||
查询可以以任何顺序执行,甚至每次都以不同的顺序执行,因此此函数的结果是不确定的。
|
||||
要获得确定的结果,您可以使用 ‘min’ 或 ‘max’ 功能,而不是 ‘any’.
|
||||
|
||||
在某些情况下,可以依靠执行的顺序。 这适用于SELECT来自使用ORDER BY的子查询的情况。
|
||||
|
||||
当一个 `SELECT` 查询具有 `GROUP BY` 子句或至少一个聚合函数,ClickHouse(相对于MySQL)要求在所有表达式 `SELECT`, `HAVING`,和 `ORDER BY` 子句可以从键或聚合函数计算。 换句话说,从表中选择的每个列必须在键或聚合函数内使用。 要获得像MySQL这样的行为,您可以将其他列放在 `any` 聚合函数。
|
@ -0,0 +1,30 @@
|
||||
---
|
||||
toc_priority: 103
|
||||
---
|
||||
|
||||
# anyHeavy {#anyheavyx}
|
||||
|
||||
选择一个频繁出现的值,使用[heavy hitters](http://www.cs.umd.edu/~samir/498/karp.pdf) 算法。 如果某个值在查询的每个执行线程中出现的情况超过一半,则返回此值。 通常情况下,结果是不确定的。
|
||||
|
||||
``` sql
|
||||
anyHeavy(column)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `column` – The column name.
|
||||
|
||||
**示例**
|
||||
|
||||
使用 [OnTime](../../getting-started/example-datasets/ontime.md) 数据集,并选择在 `AirlineID` 列任何频繁出现的值。
|
||||
|
||||
``` sql
|
||||
SELECT anyHeavy(AirlineID) AS res
|
||||
FROM ontime
|
||||
```
|
||||
|
||||
``` text
|
||||
┌───res─┐
|
||||
│ 19690 │
|
||||
└───────┘
|
||||
```
|
@ -0,0 +1,9 @@
|
||||
---
|
||||
toc_priority: 104
|
||||
---
|
||||
|
||||
## anyLast {#anylastx}
|
||||
|
||||
选择遇到的最后一个值。
|
||||
其结果和[any](../../../sql-reference/aggregate-functions/reference/any.md) 函数一样是不确定的 。
|
||||
|
@ -0,0 +1,32 @@
|
||||
---
|
||||
toc_priority: 106
|
||||
---
|
||||
|
||||
# argMax {#agg-function-argmax}
|
||||
|
||||
语法: `argMax(arg, val)` 或 `argMax(tuple(arg, val))`
|
||||
|
||||
计算 `val` 最大值对应的 `arg` 值。 如果 `val` 最大值存在几个不同的 `arg` 值,输出遇到的第一个(`arg`)值。
|
||||
|
||||
|
||||
这个函数的Tuple版本将返回`val`最大值对应的tuple。本函数适合和`SimpleAggregateFunction`搭配使用。
|
||||
|
||||
**示例:**
|
||||
|
||||
``` text
|
||||
┌─user─────┬─salary─┐
|
||||
│ director │ 5000 │
|
||||
│ manager │ 3000 │
|
||||
│ worker │ 1000 │
|
||||
└──────────┴────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT argMax(user, salary), argMax(tuple(user, salary)) FROM salary
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─argMax(user, salary)─┬─argMax(tuple(user, salary))─┐
|
||||
│ director │ ('director',5000) │
|
||||
└──────────────────────┴─────────────────────────────┘
|
||||
```
|
@ -0,0 +1,31 @@
|
||||
---
|
||||
toc_priority: 105
|
||||
---
|
||||
|
||||
# argMin {#agg-function-argmin}
|
||||
|
||||
语法: `argMin(arg, val)` 或 `argMin(tuple(arg, val))`
|
||||
|
||||
计算 `val` 最小值对应的 `arg` 值。 如果 `val` 最小值存在几个不同的 `arg` 值,输出遇到的第一个(`arg`)值。
|
||||
|
||||
这个函数的Tuple版本将返回 `val` 最小值对应的tuple。本函数适合和`SimpleAggregateFunction`搭配使用。
|
||||
|
||||
**示例:**
|
||||
|
||||
``` text
|
||||
┌─user─────┬─salary─┐
|
||||
│ director │ 5000 │
|
||||
│ manager │ 3000 │
|
||||
│ worker │ 1000 │
|
||||
└──────────┴────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT argMin(user, salary), argMin(tuple(user, salary)) FROM salary
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─argMin(user, salary)─┬─argMin(tuple(user, salary))─┐
|
||||
│ worker │ ('worker',1000) │
|
||||
└──────────────────────┴─────────────────────────────┘
|
||||
```
|
62
docs/zh/sql-reference/aggregate-functions/reference/avg.md
Normal file
62
docs/zh/sql-reference/aggregate-functions/reference/avg.md
Normal file
@ -0,0 +1,62 @@
|
||||
---
|
||||
toc_priority: 5
|
||||
---
|
||||
|
||||
# avg {#agg_function-avg}
|
||||
|
||||
计算算术平均值。
|
||||
|
||||
**语法**
|
||||
|
||||
``` sql
|
||||
avg(x)
|
||||
```
|
||||
|
||||
**参数**
|
||||
|
||||
- `x` — 列名
|
||||
|
||||
`x` 必须是
|
||||
[Integer](../../../sql-reference/data-types/int-uint.md),
|
||||
[floating-point](../../../sql-reference/data-types/float.md), or
|
||||
[Decimal](../../../sql-reference/data-types/decimal.md).
|
||||
|
||||
**返回值**
|
||||
|
||||
- `NaN`。 参数列为空时返回。
|
||||
- 算术平均值。 其他情况。
|
||||
|
||||
**返回类型** 总是 [Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
SELECT avg(x) FROM values('x Int8', 0, 1, 2, 3, 4, 5)
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─avg(x)─┐
|
||||
│ 2.5 │
|
||||
└────────┘
|
||||
```
|
||||
|
||||
**示例**
|
||||
|
||||
查询:
|
||||
|
||||
``` sql
|
||||
CREATE table test (t UInt8) ENGINE = Memory;
|
||||
SELECT avg(t) FROM test
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─avg(x)─┐
|
||||
│ nan │
|
||||
└────────┘
|
||||
```
|
@ -0,0 +1,99 @@
|
||||
---
|
||||
toc_priority: 107
|
||||
---
|
||||
|
||||
# avgWeighted {#avgweighted}
|
||||
|
||||
Calculates the [weighted arithmetic mean](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean).
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
avgWeighted(x, weight)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `x` — Values.
|
||||
- `weight` — Weights of the values.
|
||||
|
||||
`x` and `weight` must both be
|
||||
[Integer](../../../sql-reference/data-types/int-uint.md),
|
||||
[floating-point](../../../sql-reference/data-types/float.md), or
|
||||
[Decimal](../../../sql-reference/data-types/decimal.md),
|
||||
but may have different types.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- `NaN` if all the weights are equal to 0 or the supplied weights parameter is empty.
|
||||
- Weighted mean otherwise.
|
||||
|
||||
**Return type** is always [Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT avgWeighted(x, w)
|
||||
FROM values('x Int8, w Int8', (4, 1), (1, 0), (10, 2))
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─avgWeighted(x, weight)─┐
|
||||
│ 8 │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT avgWeighted(x, w)
|
||||
FROM values('x Int8, w Float64', (4, 1), (1, 0), (10, 2))
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─avgWeighted(x, weight)─┐
|
||||
│ 8 │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT avgWeighted(x, w)
|
||||
FROM values('x Int8, w Int8', (0, 0), (1, 0), (10, 0))
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─avgWeighted(x, weight)─┐
|
||||
│ nan │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
CREATE table test (t UInt8) ENGINE = Memory;
|
||||
SELECT avgWeighted(t) FROM test
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─avgWeighted(x, weight)─┐
|
||||
│ nan │
|
||||
└────────────────────────┘
|
||||
```
|
@ -0,0 +1,13 @@
|
||||
---
|
||||
toc_priority: 250
|
||||
---
|
||||
|
||||
# categoricalInformationValue {#categoricalinformationvalue}
|
||||
|
||||
Calculates the value of `(P(tag = 1) - P(tag = 0))(log(P(tag = 1)) - log(P(tag = 0)))` for each category.
|
||||
|
||||
``` sql
|
||||
categoricalInformationValue(category1, category2, ..., tag)
|
||||
```
|
||||
|
||||
The result indicates how a discrete (categorical) feature `[category1, category2, ...]` contribute to a learning model which predicting the value of `tag`.
|
12
docs/zh/sql-reference/aggregate-functions/reference/corr.md
Normal file
12
docs/zh/sql-reference/aggregate-functions/reference/corr.md
Normal file
@ -0,0 +1,12 @@
|
||||
---
|
||||
toc_priority: 107
|
||||
---
|
||||
|
||||
# corr {#corrx-y}
|
||||
|
||||
Syntax: `corr(x, y)`
|
||||
|
||||
Calculates the Pearson correlation coefficient: `Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)^2) * Σ((y - y̅)^2))`.
|
||||
|
||||
!!! note "Note"
|
||||
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `corrStable` function. It works slower but provides a lower computational error.
|
69
docs/zh/sql-reference/aggregate-functions/reference/count.md
Normal file
69
docs/zh/sql-reference/aggregate-functions/reference/count.md
Normal file
@ -0,0 +1,69 @@
|
||||
---
|
||||
toc_priority: 1
|
||||
---
|
||||
|
||||
# count {#agg_function-count}
|
||||
|
||||
Counts the number of rows or not-NULL values.
|
||||
|
||||
ClickHouse supports the following syntaxes for `count`:
|
||||
- `count(expr)` or `COUNT(DISTINCT expr)`.
|
||||
- `count()` or `COUNT(*)`. The `count()` syntax is ClickHouse-specific.
|
||||
|
||||
**Parameters**
|
||||
|
||||
The function can take:
|
||||
|
||||
- Zero parameters.
|
||||
- One [expression](../../../sql-reference/syntax.md#syntax-expressions).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- If the function is called without parameters it counts the number of rows.
|
||||
- If the [expression](../../../sql-reference/syntax.md#syntax-expressions) is passed, then the function counts how many times this expression returned not null. If the expression returns a [Nullable](../../../sql-reference/data-types/nullable.md)-type value, then the result of `count` stays not `Nullable`. The function returns 0 if the expression returned `NULL` for all the rows.
|
||||
|
||||
In both cases the type of the returned value is [UInt64](../../../sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Details**
|
||||
|
||||
ClickHouse supports the `COUNT(DISTINCT ...)` syntax. The behavior of this construction depends on the [count_distinct_implementation](../../../operations/settings/settings.md#settings-count_distinct_implementation) setting. It defines which of the [uniq\*](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) functions is used to perform the operation. The default is the [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact) function.
|
||||
|
||||
The `SELECT count() FROM table` query is not optimized, because the number of entries in the table is not stored separately. It chooses a small column from the table and counts the number of values in it.
|
||||
|
||||
**Examples**
|
||||
|
||||
Example 1:
|
||||
|
||||
``` sql
|
||||
SELECT count() FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─count()─┐
|
||||
│ 5 │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
Example 2:
|
||||
|
||||
``` sql
|
||||
SELECT name, value FROM system.settings WHERE name = 'count_distinct_implementation'
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─name──────────────────────────┬─value─────┐
|
||||
│ count_distinct_implementation │ uniqExact │
|
||||
└───────────────────────────────┴───────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT count(DISTINCT num) FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─uniqExact(num)─┐
|
||||
│ 3 │
|
||||
└────────────────┘
|
||||
```
|
||||
|
||||
This example shows that `count(DISTINCT num)` is performed by the `uniqExact` function according to the `count_distinct_implementation` setting value.
|
@ -0,0 +1,12 @@
|
||||
---
|
||||
toc_priority: 36
|
||||
---
|
||||
|
||||
# covarPop {#covarpop}
|
||||
|
||||
Syntax: `covarPop(x, y)`
|
||||
|
||||
Calculates the value of `Σ((x - x̅)(y - y̅)) / n`.
|
||||
|
||||
!!! note "Note"
|
||||
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `covarPopStable` function. It works slower but provides a lower computational error.
|
@ -0,0 +1,12 @@
|
||||
---
|
||||
toc_priority: 37
|
||||
---
|
||||
|
||||
# covarSamp {#covarsamp}
|
||||
|
||||
Calculates the value of `Σ((x - x̅)(y - y̅)) / (n - 1)`.
|
||||
|
||||
Returns Float64. When `n <= 1`, returns +∞.
|
||||
|
||||
!!! note "Note"
|
||||
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `covarSampStable` function. It works slower but provides a lower computational error.
|
@ -0,0 +1,14 @@
|
||||
---
|
||||
toc_priority: 110
|
||||
---
|
||||
|
||||
# groupArray {#agg_function-grouparray}
|
||||
|
||||
Syntax: `groupArray(x)` or `groupArray(max_size)(x)`
|
||||
|
||||
Creates an array of argument values.
|
||||
Values can be added to the array in any (indeterminate) order.
|
||||
|
||||
The second version (with the `max_size` parameter) limits the size of the resulting array to `max_size` elements. For example, `groupArray(1)(x)` is equivalent to `[any (x)]`.
|
||||
|
||||
In some cases, you can still rely on the order of execution. This applies to cases when `SELECT` comes from a subquery that uses `ORDER BY`.
|
@ -0,0 +1,91 @@
|
||||
---
|
||||
toc_priority: 112
|
||||
---
|
||||
|
||||
# groupArrayInsertAt {#grouparrayinsertat}
|
||||
|
||||
Inserts a value into the array at the specified position.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
groupArrayInsertAt(default_x, size)(x, pos);
|
||||
```
|
||||
|
||||
If in one query several values are inserted into the same position, the function behaves in the following ways:
|
||||
|
||||
- If a query is executed in a single thread, the first one of the inserted values is used.
|
||||
- If a query is executed in multiple threads, the resulting value is an undetermined one of the inserted values.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `x` — Value to be inserted. [Expression](../../../sql-reference/syntax.md#syntax-expressions) resulting in one of the [supported data types](../../../sql-reference/data-types/index.md).
|
||||
- `pos` — Position at which the specified element `x` is to be inserted. Index numbering in the array starts from zero. [UInt32](../../../sql-reference/data-types/int-uint.md#uint-ranges).
|
||||
- `default_x`— Default value for substituting in empty positions. Optional parameter. [Expression](../../../sql-reference/syntax.md#syntax-expressions) resulting in the data type configured for the `x` parameter. If `default_x` is not defined, the [default values](../../../sql-reference/statements/create/table.md#create-default-values) are used.
|
||||
- `size`— Length of the resulting array. Optional parameter. When using this parameter, the default value `default_x` must be specified. [UInt32](../../../sql-reference/data-types/int-uint.md#uint-ranges).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Array with inserted values.
|
||||
|
||||
Type: [Array](../../../sql-reference/data-types/array.md#data-type-array).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayInsertAt(toString(number), number * 2) FROM numbers(5);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─groupArrayInsertAt(toString(number), multiply(number, 2))─┐
|
||||
│ ['0','','1','','2','','3','','4'] │
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayInsertAt('-')(toString(number), number * 2) FROM numbers(5);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─groupArrayInsertAt('-')(toString(number), multiply(number, 2))─┐
|
||||
│ ['0','-','1','-','2','-','3','-','4'] │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayInsertAt('-', 5)(toString(number), number * 2) FROM numbers(5);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─groupArrayInsertAt('-', 5)(toString(number), multiply(number, 2))─┐
|
||||
│ ['0','-','1','-','2'] │
|
||||
└───────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Multi-threaded insertion of elements into one position.
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT groupArrayInsertAt(number, 0) FROM numbers_mt(10) SETTINGS max_block_size = 1;
|
||||
```
|
||||
|
||||
As a result of this query you get random integer in the `[0,9]` range. For example:
|
||||
|
||||
``` text
|
||||
┌─groupArrayInsertAt(number, 0)─┐
|
||||
│ [7] │
|
||||
└───────────────────────────────┘
|
||||
```
|
@ -0,0 +1,78 @@
|
||||
---
|
||||
toc_priority: 114
|
||||
---
|
||||
|
||||
# groupArrayMovingAvg {#agg_function-grouparraymovingavg}
|
||||
|
||||
Calculates the moving average of input values.
|
||||
|
||||
``` sql
|
||||
groupArrayMovingAvg(numbers_for_summing)
|
||||
groupArrayMovingAvg(window_size)(numbers_for_summing)
|
||||
```
|
||||
|
||||
The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of rows in the column.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `numbers_for_summing` — [Expression](../../../sql-reference/syntax.md#syntax-expressions) resulting in a numeric data type value.
|
||||
- `window_size` — Size of the calculation window.
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Array of the same size and type as the input data.
|
||||
|
||||
The function uses [rounding towards zero](https://en.wikipedia.org/wiki/Rounding#Rounding_towards_zero). It truncates the decimal places insignificant for the resulting data type.
|
||||
|
||||
**Example**
|
||||
|
||||
The sample table `b`:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t
|
||||
(
|
||||
`int` UInt8,
|
||||
`float` Float32,
|
||||
`dec` Decimal32(2)
|
||||
)
|
||||
ENGINE = TinyLog
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─int─┬─float─┬──dec─┐
|
||||
│ 1 │ 1.1 │ 1.10 │
|
||||
│ 2 │ 2.2 │ 2.20 │
|
||||
│ 4 │ 4.4 │ 4.40 │
|
||||
│ 7 │ 7.77 │ 7.77 │
|
||||
└─────┴───────┴──────┘
|
||||
```
|
||||
|
||||
The queries:
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
groupArrayMovingAvg(int) AS I,
|
||||
groupArrayMovingAvg(float) AS F,
|
||||
groupArrayMovingAvg(dec) AS D
|
||||
FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─I─────────┬─F───────────────────────────────────┬─D─────────────────────┐
|
||||
│ [0,0,1,3] │ [0.275,0.82500005,1.9250001,3.8675] │ [0.27,0.82,1.92,3.86] │
|
||||
└───────────┴─────────────────────────────────────┴───────────────────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
groupArrayMovingAvg(2)(int) AS I,
|
||||
groupArrayMovingAvg(2)(float) AS F,
|
||||
groupArrayMovingAvg(2)(dec) AS D
|
||||
FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─I─────────┬─F────────────────────────────────┬─D─────────────────────┐
|
||||
│ [0,1,3,5] │ [0.55,1.6500001,3.3000002,6.085] │ [0.55,1.65,3.30,6.08] │
|
||||
└───────────┴──────────────────────────────────┴───────────────────────┘
|
||||
```
|
@ -0,0 +1,76 @@
|
||||
---
|
||||
toc_priority: 113
|
||||
---
|
||||
|
||||
# groupArrayMovingSum {#agg_function-grouparraymovingsum}
|
||||
|
||||
Calculates the moving sum of input values.
|
||||
|
||||
``` sql
|
||||
groupArrayMovingSum(numbers_for_summing)
|
||||
groupArrayMovingSum(window_size)(numbers_for_summing)
|
||||
```
|
||||
|
||||
The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of rows in the column.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `numbers_for_summing` — [Expression](../../../sql-reference/syntax.md#syntax-expressions) resulting in a numeric data type value.
|
||||
- `window_size` — Size of the calculation window.
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Array of the same size and type as the input data.
|
||||
|
||||
**Example**
|
||||
|
||||
The sample table:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t
|
||||
(
|
||||
`int` UInt8,
|
||||
`float` Float32,
|
||||
`dec` Decimal32(2)
|
||||
)
|
||||
ENGINE = TinyLog
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─int─┬─float─┬──dec─┐
|
||||
│ 1 │ 1.1 │ 1.10 │
|
||||
│ 2 │ 2.2 │ 2.20 │
|
||||
│ 4 │ 4.4 │ 4.40 │
|
||||
│ 7 │ 7.77 │ 7.77 │
|
||||
└─────┴───────┴──────┘
|
||||
```
|
||||
|
||||
The queries:
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
groupArrayMovingSum(int) AS I,
|
||||
groupArrayMovingSum(float) AS F,
|
||||
groupArrayMovingSum(dec) AS D
|
||||
FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─I──────────┬─F───────────────────────────────┬─D──────────────────────┐
|
||||
│ [1,3,7,14] │ [1.1,3.3000002,7.7000003,15.47] │ [1.10,3.30,7.70,15.47] │
|
||||
└────────────┴─────────────────────────────────┴────────────────────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
groupArrayMovingSum(2)(int) AS I,
|
||||
groupArrayMovingSum(2)(float) AS F,
|
||||
groupArrayMovingSum(2)(dec) AS D
|
||||
FROM t
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─I──────────┬─F───────────────────────────────┬─D──────────────────────┐
|
||||
│ [1,3,6,11] │ [1.1,3.3000002,6.6000004,12.17] │ [1.10,3.30,6.60,12.17] │
|
||||
└────────────┴─────────────────────────────────┴────────────────────────┘
|
||||
```
|
@ -0,0 +1,81 @@
|
||||
---
|
||||
toc_priority: 114
|
||||
---
|
||||
|
||||
# groupArraySample {#grouparraysample}
|
||||
|
||||
Creates an array of sample argument values. The size of the resulting array is limited to `max_size` elements. Argument values are selected and added to the array randomly.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
groupArraySample(max_size[, seed])(x)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `max_size` — Maximum size of the resulting array. [UInt64](../../data-types/int-uint.md).
|
||||
- `seed` — Seed for the random number generator. Optional. [UInt64](../../data-types/int-uint.md). Default value: `123456`.
|
||||
- `x` — Argument (column name or expression).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Array of randomly selected `x` arguments.
|
||||
|
||||
Type: [Array](../../data-types/array.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Consider table `colors`:
|
||||
|
||||
``` text
|
||||
┌─id─┬─color──┐
|
||||
│ 1 │ red │
|
||||
│ 2 │ blue │
|
||||
│ 3 │ green │
|
||||
│ 4 │ white │
|
||||
│ 5 │ orange │
|
||||
└────┴────────┘
|
||||
```
|
||||
|
||||
Query with column name as argument:
|
||||
|
||||
``` sql
|
||||
SELECT groupArraySample(3)(color) as newcolors FROM colors;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```text
|
||||
┌─newcolors──────────────────┐
|
||||
│ ['white','blue','green'] │
|
||||
└────────────────────────────┘
|
||||
```
|
||||
|
||||
Query with column name and different seed:
|
||||
|
||||
``` sql
|
||||
SELECT groupArraySample(3, 987654321)(color) as newcolors FROM colors;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```text
|
||||
┌─newcolors──────────────────┐
|
||||
│ ['red','orange','green'] │
|
||||
└────────────────────────────┘
|
||||
```
|
||||
|
||||
Query with expression as argument:
|
||||
|
||||
``` sql
|
||||
SELECT groupArraySample(3)(concat('light-', color)) as newcolors FROM colors;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```text
|
||||
┌─newcolors───────────────────────────────────┐
|
||||
│ ['light-blue','light-orange','light-green'] │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,46 @@
|
||||
---
|
||||
toc_priority: 125
|
||||
---
|
||||
|
||||
# groupBitAnd {#groupbitand}
|
||||
|
||||
Applies bitwise `AND` for series of numbers.
|
||||
|
||||
``` sql
|
||||
groupBitAnd(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` – An expression that results in `UInt*` type.
|
||||
|
||||
**Return value**
|
||||
|
||||
Value of the `UInt*` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Test data:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
00101100 = 44
|
||||
00011100 = 28
|
||||
00001101 = 13
|
||||
01010101 = 85
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT groupBitAnd(num) FROM t
|
||||
```
|
||||
|
||||
Where `num` is the column with the test data.
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
00000100 = 4
|
||||
```
|
@ -0,0 +1,44 @@
|
||||
---
|
||||
toc_priority: 128
|
||||
---
|
||||
|
||||
# groupBitmap {#groupbitmap}
|
||||
|
||||
Bitmap or Aggregate calculations from a unsigned integer column, return cardinality of type UInt64, if add suffix -State, then return [bitmap object](../../../sql-reference/functions/bitmap-functions.md).
|
||||
|
||||
``` sql
|
||||
groupBitmap(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` – An expression that results in `UInt*` type.
|
||||
|
||||
**Return value**
|
||||
|
||||
Value of the `UInt64` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Test data:
|
||||
|
||||
``` text
|
||||
UserID
|
||||
1
|
||||
1
|
||||
2
|
||||
3
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT groupBitmap(UserID) as num FROM t
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
num
|
||||
3
|
||||
```
|
@ -0,0 +1,46 @@
|
||||
---
|
||||
toc_priority: 129
|
||||
---
|
||||
|
||||
# groupBitmapAnd {#groupbitmapand}
|
||||
|
||||
Calculations the AND of a bitmap column, return cardinality of type UInt64, if add suffix -State, then return [bitmap object](../../../sql-reference/functions/bitmap-functions.md).
|
||||
|
||||
``` sql
|
||||
groupBitmapAnd(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` – An expression that results in `AggregateFunction(groupBitmap, UInt*)` type.
|
||||
|
||||
**Return value**
|
||||
|
||||
Value of the `UInt64` type.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
DROP TABLE IF EXISTS bitmap_column_expr_test2;
|
||||
CREATE TABLE bitmap_column_expr_test2
|
||||
(
|
||||
tag_id String,
|
||||
z AggregateFunction(groupBitmap, UInt32)
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY tag_id;
|
||||
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32))));
|
||||
|
||||
SELECT groupBitmapAnd(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─groupBitmapAnd(z)─┐
|
||||
│ 3 │
|
||||
└───────────────────┘
|
||||
|
||||
SELECT arraySort(bitmapToArray(groupBitmapAndState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─arraySort(bitmapToArray(groupBitmapAndState(z)))─┐
|
||||
│ [6,8,10] │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,46 @@
|
||||
---
|
||||
toc_priority: 130
|
||||
---
|
||||
|
||||
# groupBitmapOr {#groupbitmapor}
|
||||
|
||||
Calculations the OR of a bitmap column, return cardinality of type UInt64, if add suffix -State, then return [bitmap object](../../../sql-reference/functions/bitmap-functions.md). This is equivalent to `groupBitmapMerge`.
|
||||
|
||||
``` sql
|
||||
groupBitmapOr(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` – An expression that results in `AggregateFunction(groupBitmap, UInt*)` type.
|
||||
|
||||
**Return value**
|
||||
|
||||
Value of the `UInt64` type.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
DROP TABLE IF EXISTS bitmap_column_expr_test2;
|
||||
CREATE TABLE bitmap_column_expr_test2
|
||||
(
|
||||
tag_id String,
|
||||
z AggregateFunction(groupBitmap, UInt32)
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY tag_id;
|
||||
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32))));
|
||||
|
||||
SELECT groupBitmapOr(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─groupBitmapOr(z)─┐
|
||||
│ 15 │
|
||||
└──────────────────┘
|
||||
|
||||
SELECT arraySort(bitmapToArray(groupBitmapOrState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─arraySort(bitmapToArray(groupBitmapOrState(z)))─┐
|
||||
│ [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,46 @@
|
||||
---
|
||||
toc_priority: 131
|
||||
---
|
||||
|
||||
# groupBitmapXor {#groupbitmapxor}
|
||||
|
||||
Calculations the XOR of a bitmap column, return cardinality of type UInt64, if add suffix -State, then return [bitmap object](../../../sql-reference/functions/bitmap-functions.md).
|
||||
|
||||
``` sql
|
||||
groupBitmapOr(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` – An expression that results in `AggregateFunction(groupBitmap, UInt*)` type.
|
||||
|
||||
**Return value**
|
||||
|
||||
Value of the `UInt64` type.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
DROP TABLE IF EXISTS bitmap_column_expr_test2;
|
||||
CREATE TABLE bitmap_column_expr_test2
|
||||
(
|
||||
tag_id String,
|
||||
z AggregateFunction(groupBitmap, UInt32)
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY tag_id;
|
||||
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32))));
|
||||
INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32))));
|
||||
|
||||
SELECT groupBitmapXor(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─groupBitmapXor(z)─┐
|
||||
│ 10 │
|
||||
└───────────────────┘
|
||||
|
||||
SELECT arraySort(bitmapToArray(groupBitmapXorState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%');
|
||||
┌─arraySort(bitmapToArray(groupBitmapXorState(z)))─┐
|
||||
│ [1,3,5,6,8,10,11,13,14,15] │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,46 @@
|
||||
---
|
||||
toc_priority: 126
|
||||
---
|
||||
|
||||
# groupBitOr {#groupbitor}
|
||||
|
||||
Applies bitwise `OR` for series of numbers.
|
||||
|
||||
``` sql
|
||||
groupBitOr(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` – An expression that results in `UInt*` type.
|
||||
|
||||
**Return value**
|
||||
|
||||
Value of the `UInt*` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Test data:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
00101100 = 44
|
||||
00011100 = 28
|
||||
00001101 = 13
|
||||
01010101 = 85
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT groupBitOr(num) FROM t
|
||||
```
|
||||
|
||||
Where `num` is the column with the test data.
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
01111101 = 125
|
||||
```
|
@ -0,0 +1,46 @@
|
||||
---
|
||||
toc_priority: 127
|
||||
---
|
||||
|
||||
# groupBitXor {#groupbitxor}
|
||||
|
||||
Applies bitwise `XOR` for series of numbers.
|
||||
|
||||
``` sql
|
||||
groupBitXor(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` – An expression that results in `UInt*` type.
|
||||
|
||||
**Return value**
|
||||
|
||||
Value of the `UInt*` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Test data:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
00101100 = 44
|
||||
00011100 = 28
|
||||
00001101 = 13
|
||||
01010101 = 85
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT groupBitXor(num) FROM t
|
||||
```
|
||||
|
||||
Where `num` is the column with the test data.
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
binary decimal
|
||||
01101000 = 104
|
||||
```
|
@ -0,0 +1,12 @@
|
||||
---
|
||||
toc_priority: 111
|
||||
---
|
||||
|
||||
# groupUniqArray {#groupuniqarray}
|
||||
|
||||
Syntax: `groupUniqArray(x)` or `groupUniqArray(max_size)(x)`
|
||||
|
||||
Creates an array from different argument values. Memory consumption is the same as for the [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md) function.
|
||||
|
||||
The second version (with the `max_size` parameter) limits the size of the resulting array to `max_size` elements.
|
||||
For example, `groupUniqArray(1)(x)` is equivalent to `[any(x)]`.
|
74
docs/zh/sql-reference/aggregate-functions/reference/index.md
Normal file
74
docs/zh/sql-reference/aggregate-functions/reference/index.md
Normal file
@ -0,0 +1,74 @@
|
||||
---
|
||||
toc_folder_title: Reference
|
||||
toc_priority: 36
|
||||
toc_hidden: true
|
||||
---
|
||||
|
||||
# List of Aggregate Functions {#aggregate-functions-reference}
|
||||
|
||||
Standard aggregate functions:
|
||||
|
||||
- [count](../../../sql-reference/aggregate-functions/reference/count.md)
|
||||
- [min](../../../sql-reference/aggregate-functions/reference/min.md)
|
||||
- [max](../../../sql-reference/aggregate-functions/reference/max.md)
|
||||
- [sum](../../../sql-reference/aggregate-functions/reference/sum.md)
|
||||
- [avg](../../../sql-reference/aggregate-functions/reference/avg.md)
|
||||
- [any](../../../sql-reference/aggregate-functions/reference/any.md)
|
||||
- [stddevPop](../../../sql-reference/aggregate-functions/reference/stddevpop.md)
|
||||
- [stddevSamp](../../../sql-reference/aggregate-functions/reference/stddevsamp.md)
|
||||
- [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md)
|
||||
- [varSamp](../../../sql-reference/aggregate-functions/reference/varsamp.md)
|
||||
- [covarPop](../../../sql-reference/aggregate-functions/reference/covarpop.md)
|
||||
- [covarSamp](../../../sql-reference/aggregate-functions/reference/covarsamp.md)
|
||||
|
||||
ClickHouse-specific aggregate functions:
|
||||
|
||||
- [anyHeavy](../../../sql-reference/aggregate-functions/reference/anyheavy.md)
|
||||
- [anyLast](../../../sql-reference/aggregate-functions/reference/anylast.md)
|
||||
- [argMin](../../../sql-reference/aggregate-functions/reference/argmin.md)
|
||||
- [argMax](../../../sql-reference/aggregate-functions/reference/argmax.md)
|
||||
- [avgWeighted](../../../sql-reference/aggregate-functions/reference/avgweighted.md)
|
||||
- [topK](../../../sql-reference/aggregate-functions/reference/topk.md)
|
||||
- [topKWeighted](../../../sql-reference/aggregate-functions/reference/topkweighted.md)
|
||||
- [groupArray](../../../sql-reference/aggregate-functions/reference/grouparray.md)
|
||||
- [groupUniqArray](../../../sql-reference/aggregate-functions/reference/groupuniqarray.md)
|
||||
- [groupArrayInsertAt](../../../sql-reference/aggregate-functions/reference/grouparrayinsertat.md)
|
||||
- [groupArrayMovingAvg](../../../sql-reference/aggregate-functions/reference/grouparraymovingavg.md)
|
||||
- [groupArrayMovingSum](../../../sql-reference/aggregate-functions/reference/grouparraymovingsum.md)
|
||||
- [groupBitAnd](../../../sql-reference/aggregate-functions/reference/groupbitand.md)
|
||||
- [groupBitOr](../../../sql-reference/aggregate-functions/reference/groupbitor.md)
|
||||
- [groupBitXor](../../../sql-reference/aggregate-functions/reference/groupbitxor.md)
|
||||
- [groupBitmap](../../../sql-reference/aggregate-functions/reference/groupbitmap.md)
|
||||
- [groupBitmapAnd](../../../sql-reference/aggregate-functions/reference/groupbitmapand.md)
|
||||
- [groupBitmapOr](../../../sql-reference/aggregate-functions/reference/groupbitmapor.md)
|
||||
- [groupBitmapXor](../../../sql-reference/aggregate-functions/reference/groupbitmapxor.md)
|
||||
- [sumWithOverflow](../../../sql-reference/aggregate-functions/reference/sumwithoverflow.md)
|
||||
- [sumMap](../../../sql-reference/aggregate-functions/reference/summap.md)
|
||||
- [minMap](../../../sql-reference/aggregate-functions/reference/minmap.md)
|
||||
- [maxMap](../../../sql-reference/aggregate-functions/reference/maxmap.md)
|
||||
- [skewSamp](../../../sql-reference/aggregate-functions/reference/skewsamp.md)
|
||||
- [skewPop](../../../sql-reference/aggregate-functions/reference/skewpop.md)
|
||||
- [kurtSamp](../../../sql-reference/aggregate-functions/reference/kurtsamp.md)
|
||||
- [kurtPop](../../../sql-reference/aggregate-functions/reference/kurtpop.md)
|
||||
- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md)
|
||||
- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md)
|
||||
- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md)
|
||||
- [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md)
|
||||
- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniqhll12.md)
|
||||
- [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md)
|
||||
- [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md)
|
||||
- [quantileExactLow](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexactlow)
|
||||
- [quantileExactHigh](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexacthigh)
|
||||
- [quantileExactWeighted](../../../sql-reference/aggregate-functions/reference/quantileexactweighted.md)
|
||||
- [quantileTiming](../../../sql-reference/aggregate-functions/reference/quantiletiming.md)
|
||||
- [quantileTimingWeighted](../../../sql-reference/aggregate-functions/reference/quantiletimingweighted.md)
|
||||
- [quantileDeterministic](../../../sql-reference/aggregate-functions/reference/quantiledeterministic.md)
|
||||
- [quantileTDigest](../../../sql-reference/aggregate-functions/reference/quantiletdigest.md)
|
||||
- [quantileTDigestWeighted](../../../sql-reference/aggregate-functions/reference/quantiletdigestweighted.md)
|
||||
- [simpleLinearRegression](../../../sql-reference/aggregate-functions/reference/simplelinearregression.md)
|
||||
- [stochasticLinearRegression](../../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md)
|
||||
- [stochasticLogisticRegression](../../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md)
|
||||
- [categoricalInformationValue](../../../sql-reference/aggregate-functions/reference/categoricalinformationvalue.md)
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/) <!--hide-->
|
@ -0,0 +1,37 @@
|
||||
---
|
||||
toc_priority: 150
|
||||
---
|
||||
|
||||
## initializeAggregation {#initializeaggregation}
|
||||
|
||||
Initializes aggregation for your input rows. It is intended for the functions with the suffix `State`.
|
||||
Use it for tests or to process columns of types `AggregateFunction` and `AggregationgMergeTree`.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
initializeAggregation (aggregate_function, column_1, column_2);
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `aggregate_function` — Name of the aggregation function. The state of this function — the creating one. [String](../../../sql-reference/data-types/string.md#string).
|
||||
- `column_n` — The column to translate it into the function as it's argument. [String](../../../sql-reference/data-types/string.md#string).
|
||||
|
||||
**Returned value(s)**
|
||||
|
||||
Returns the result of the aggregation for your input rows. The return type will be the same as the return type of function, that `initializeAgregation` takes as first argument.
|
||||
For example for functions with the suffix `State` the return type will be `AggregateFunction`.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT uniqMerge(state) FROM (SELECT initializeAggregation('uniqState', number % 3) AS state FROM system.numbers LIMIT 10000);
|
||||
```
|
||||
Result:
|
||||
|
||||
┌─uniqMerge(state)─┐
|
||||
│ 3 │
|
||||
└──────────────────┘
|
@ -0,0 +1,25 @@
|
||||
---
|
||||
toc_priority: 153
|
||||
---
|
||||
|
||||
# kurtPop {#kurtpop}
|
||||
|
||||
Computes the [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a sequence.
|
||||
|
||||
``` sql
|
||||
kurtPop(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` — [Expression](../../../sql-reference/syntax.md#syntax-expressions) returning a number.
|
||||
|
||||
**Returned value**
|
||||
|
||||
The kurtosis of the given distribution. Type — [Float64](../../../sql-reference/data-types/float.md)
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT kurtPop(value) FROM series_with_value_column
|
||||
```
|
@ -0,0 +1,27 @@
|
||||
---
|
||||
toc_priority: 154
|
||||
---
|
||||
|
||||
# kurtSamp {#kurtsamp}
|
||||
|
||||
Computes the [sample kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a sequence.
|
||||
|
||||
It represents an unbiased estimate of the kurtosis of a random variable if passed values form its sample.
|
||||
|
||||
``` sql
|
||||
kurtSamp(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` — [Expression](../../../sql-reference/syntax.md#syntax-expressions) returning a number.
|
||||
|
||||
**Returned value**
|
||||
|
||||
The kurtosis of the given distribution. Type — [Float64](../../../sql-reference/data-types/float.md). If `n <= 1` (`n` is a size of the sample), then the function returns `nan`.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT kurtSamp(value) FROM series_with_value_column
|
||||
```
|
@ -0,0 +1,7 @@
|
||||
---
|
||||
toc_priority: 3
|
||||
---
|
||||
|
||||
# max {#agg_function-max}
|
||||
|
||||
计算最大值。
|
@ -0,0 +1,28 @@
|
||||
---
|
||||
toc_priority: 143
|
||||
---
|
||||
|
||||
# maxMap {#agg_functions-maxmap}
|
||||
|
||||
Syntax: `maxMap(key, value)` or `maxMap(Tuple(key, value))`
|
||||
|
||||
Calculates the maximum from `value` array according to the keys specified in the `key` array.
|
||||
|
||||
Passing a tuple of keys and value arrays is identical to passing two arrays of keys and values.
|
||||
|
||||
The number of elements in `key` and `value` must be the same for each row that is totaled.
|
||||
|
||||
Returns a tuple of two arrays: keys and values calculated for the corresponding keys.
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
SELECT maxMap(a, b)
|
||||
FROM values('a Array(Int32), b Array(Int64)', ([1, 2], [2, 2]), ([2, 3], [1, 1]))
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─maxMap(a, b)──────┐
|
||||
│ ([1,2,3],[2,2,1]) │
|
||||
└───────────────────┘
|
||||
```
|
@ -0,0 +1,41 @@
|
||||
# median {#median}
|
||||
|
||||
The `median*` functions are the aliases for the corresponding `quantile*` functions. They calculate median of a numeric data sample.
|
||||
|
||||
Functions:
|
||||
|
||||
- `median` — Alias for [quantile](#quantile).
|
||||
- `medianDeterministic` — Alias for [quantileDeterministic](#quantiledeterministic).
|
||||
- `medianExact` — Alias for [quantileExact](#quantileexact).
|
||||
- `medianExactWeighted` — Alias for [quantileExactWeighted](#quantileexactweighted).
|
||||
- `medianTiming` — Alias for [quantileTiming](#quantiletiming).
|
||||
- `medianTimingWeighted` — Alias for [quantileTimingWeighted](#quantiletimingweighted).
|
||||
- `medianTDigest` — Alias for [quantileTDigest](#quantiletdigest).
|
||||
- `medianTDigestWeighted` — Alias for [quantileTDigestWeighted](#quantiletdigestweighted).
|
||||
|
||||
**Example**
|
||||
|
||||
Input table:
|
||||
|
||||
``` text
|
||||
┌─val─┐
|
||||
│ 1 │
|
||||
│ 1 │
|
||||
│ 2 │
|
||||
│ 3 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT medianDeterministic(val, 1) FROM t
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─medianDeterministic(val, 1)─┐
|
||||
│ 1.5 │
|
||||
└─────────────────────────────┘
|
||||
```
|
@ -0,0 +1,7 @@
|
||||
---
|
||||
toc_priority: 2
|
||||
---
|
||||
|
||||
## min {#agg_function-min}
|
||||
|
||||
计算最小值。
|
@ -0,0 +1,28 @@
|
||||
---
|
||||
toc_priority: 142
|
||||
---
|
||||
|
||||
# minMap {#agg_functions-minmap}
|
||||
|
||||
Syntax: `minMap(key, value)` or `minMap(Tuple(key, value))`
|
||||
|
||||
Calculates the minimum from `value` array according to the keys specified in the `key` array.
|
||||
|
||||
Passing a tuple of keys and value arrays is identical to passing two arrays of keys and values.
|
||||
|
||||
The number of elements in `key` and `value` must be the same for each row that is totaled.
|
||||
|
||||
Returns a tuple of two arrays: keys in sorted order, and values calculated for the corresponding keys.
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
SELECT minMap(a, b)
|
||||
FROM values('a Array(Int32), b Array(Int64)', ([1, 2], [2, 2]), ([2, 3], [1, 1]))
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─minMap(a, b)──────┐
|
||||
│ ([1,2,3],[2,1,1]) │
|
||||
└───────────────────┘
|
||||
```
|
@ -0,0 +1,66 @@
|
||||
---
|
||||
toc_priority: 200
|
||||
---
|
||||
|
||||
# quantile {#quantile}
|
||||
|
||||
Computes an approximate [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence.
|
||||
|
||||
This function applies [reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling) with a reservoir size up to 8192 and a random number generator for sampling. The result is non-deterministic. To get an exact quantile, use the [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexact) function.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantile(level)(expr)
|
||||
```
|
||||
|
||||
Alias: `median`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Approximate quantile of the specified level.
|
||||
|
||||
Type:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
|
||||
- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Input table:
|
||||
|
||||
``` text
|
||||
┌─val─┐
|
||||
│ 1 │
|
||||
│ 1 │
|
||||
│ 2 │
|
||||
│ 3 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantile(val) FROM t
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantile(val)─┐
|
||||
│ 1.5 │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
**See Also**
|
||||
|
||||
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,67 @@
|
||||
---
|
||||
toc_priority: 206
|
||||
---
|
||||
|
||||
# quantileDeterministic {#quantiledeterministic}
|
||||
|
||||
Computes an approximate [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence.
|
||||
|
||||
This function applies [reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling) with a reservoir size up to 8192 and deterministic algorithm of sampling. The result is deterministic. To get an exact quantile, use the [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexact) function.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileDeterministic(level)(expr, determinator)
|
||||
```
|
||||
|
||||
Alias: `medianDeterministic`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
|
||||
- `determinator` — Number whose hash is used instead of a random number generator in the reservoir sampling algorithm to make the result of sampling deterministic. As a determinator you can use any deterministic positive number, for example, a user id or an event id. If the same determinator value occures too often, the function works incorrectly.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Approximate quantile of the specified level.
|
||||
|
||||
Type:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
|
||||
- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Input table:
|
||||
|
||||
``` text
|
||||
┌─val─┐
|
||||
│ 1 │
|
||||
│ 1 │
|
||||
│ 2 │
|
||||
│ 3 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileDeterministic(val, 1) FROM t
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileDeterministic(val, 1)─┐
|
||||
│ 1.5 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
**See Also**
|
||||
|
||||
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,164 @@
|
||||
---
|
||||
toc_priority: 202
|
||||
---
|
||||
|
||||
# quantileExact {#quantileexact}
|
||||
|
||||
Exactly computes the [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence.
|
||||
|
||||
To get exact value, all the passed values are combined into an array, which is then partially sorted. Therefore, the function consumes `O(n)` memory, where `n` is a number of values that were passed. However, for a small number of values, the function is very effective.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileExact(level)(expr)
|
||||
```
|
||||
|
||||
Alias: `medianExact`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Quantile of the specified level.
|
||||
|
||||
Type:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
|
||||
- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExact(number) FROM numbers(10)
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileExact(number)─┐
|
||||
│ 5 │
|
||||
└───────────────────────┘
|
||||
```
|
||||
|
||||
# quantileExactLow {#quantileexactlow}
|
||||
|
||||
Similar to `quantileExact`, this computes the exact [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence.
|
||||
|
||||
To get the exact value, all the passed values are combined into an array, which is then fully sorted. The sorting [algorithm's](https://en.cppreference.com/w/cpp/algorithm/sort) complexity is `O(N·log(N))`, where `N = std::distance(first, last)` comparisons.
|
||||
|
||||
The return value depends on the quantile level and the number of elements in the selection, i.e. if the level is 0.5, then the function returns the lower median value for an even number of elements and the middle median value for an odd number of elements. Median is calculated similarly to the [median_low](https://docs.python.org/3/library/statistics.html#statistics.median_low) implementation which is used in python.
|
||||
|
||||
For all other levels, the element at the index corresponding to the value of `level * size_of_array` is returned. For example:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExactLow(0.1)(number) FROM numbers(10)
|
||||
|
||||
┌─quantileExactLow(0.1)(number)─┐
|
||||
│ 1 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileExact(level)(expr)
|
||||
```
|
||||
|
||||
Alias: `medianExactLow`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Quantile of the specified level.
|
||||
|
||||
Type:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
|
||||
- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExactLow(number) FROM numbers(10)
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileExactLow(number)─┐
|
||||
│ 4 │
|
||||
└──────────────────────────┘
|
||||
```
|
||||
# quantileExactHigh {#quantileexacthigh}
|
||||
|
||||
Similar to `quantileExact`, this computes the exact [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence.
|
||||
|
||||
All the passed values are combined into an array, which is then fully sorted,
|
||||
to get the exact value. The sorting [algorithm's](https://en.cppreference.com/w/cpp/algorithm/sort) complexity is `O(N·log(N))`, where `N = std::distance(first, last)` comparisons.
|
||||
|
||||
The return value depends on the quantile level and the number of elements in the selection, i.e. if the level is 0.5, then the function returns the higher median value for an even number of elements and the middle median value for an odd number of elements. Median is calculated similarly to the [median_high](https://docs.python.org/3/library/statistics.html#statistics.median_high) implementation which is used in python. For all other levels, the element at the index corresponding to the value of `level * size_of_array` is returned.
|
||||
|
||||
This implementation behaves exactly similar to the current `quantileExact` implementation.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileExactHigh(level)(expr)
|
||||
```
|
||||
|
||||
Alias: `medianExactHigh`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Quantile of the specified level.
|
||||
|
||||
Type:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
|
||||
- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExactHigh(number) FROM numbers(10)
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileExactHigh(number)─┐
|
||||
│ 5 │
|
||||
└───────────────────────────┘
|
||||
```
|
||||
**See Also**
|
||||
|
||||
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,67 @@
|
||||
---
|
||||
toc_priority: 203
|
||||
---
|
||||
|
||||
# quantileExactWeighted {#quantileexactweighted}
|
||||
|
||||
Exactly computes the [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence, taking into account the weight of each element.
|
||||
|
||||
To get exact value, all the passed values are combined into an array, which is then partially sorted. Each value is counted with its weight, as if it is present `weight` times. A hash table is used in the algorithm. Because of this, if the passed values are frequently repeated, the function consumes less RAM than [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexact). You can use this function instead of `quantileExact` and specify the weight 1.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileExactWeighted(level)(expr, weight)
|
||||
```
|
||||
|
||||
Alias: `medianExactWeighted`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
|
||||
- `weight` — Column with weights of sequence members. Weight is a number of value occurrences.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Quantile of the specified level.
|
||||
|
||||
Type:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
|
||||
- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Input table:
|
||||
|
||||
``` text
|
||||
┌─n─┬─val─┐
|
||||
│ 0 │ 3 │
|
||||
│ 1 │ 2 │
|
||||
│ 2 │ 1 │
|
||||
│ 5 │ 4 │
|
||||
└───┴─────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileExactWeighted(n, val) FROM t
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileExactWeighted(n, val)─┐
|
||||
│ 1 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
**See Also**
|
||||
|
||||
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,9 @@
|
||||
---
|
||||
toc_priority: 201
|
||||
---
|
||||
|
||||
# quantiles {#quantiles}
|
||||
|
||||
Syntax: `quantiles(level1, level2, …)(x)`
|
||||
|
||||
All the quantile functions also have corresponding quantiles functions: `quantiles`, `quantilesDeterministic`, `quantilesTiming`, `quantilesTimingWeighted`, `quantilesExact`, `quantilesExactWeighted`, `quantilesTDigest`. These functions calculate all the quantiles of the listed levels in one pass, and return an array of the resulting values.
|
@ -0,0 +1,57 @@
|
||||
---
|
||||
toc_priority: 207
|
||||
---
|
||||
|
||||
# quantileTDigest {#quantiletdigest}
|
||||
|
||||
Computes an approximate [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm.
|
||||
|
||||
The maximum error is 1%. Memory consumption is `log(n)`, where `n` is a number of values. The result depends on the order of running the query, and is nondeterministic.
|
||||
|
||||
The performance of the function is lower than performance of [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md#quantile) or [quantileTiming](../../../sql-reference/aggregate-functions/reference/quantiletiming.md#quantiletiming). In terms of the ratio of State size to precision, this function is much better than `quantile`.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileTDigest(level)(expr)
|
||||
```
|
||||
|
||||
Alias: `medianTDigest`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Approximate quantile of the specified level.
|
||||
|
||||
Type:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
|
||||
- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileTDigest(number) FROM numbers(10)
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileTDigest(number)─┐
|
||||
│ 4.5 │
|
||||
└─────────────────────────┘
|
||||
```
|
||||
|
||||
**See Also**
|
||||
|
||||
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,58 @@
|
||||
---
|
||||
toc_priority: 208
|
||||
---
|
||||
|
||||
# quantileTDigestWeighted {#quantiletdigestweighted}
|
||||
|
||||
Computes an approximate [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm. The function takes into account the weight of each sequence member. The maximum error is 1%. Memory consumption is `log(n)`, where `n` is a number of values.
|
||||
|
||||
The performance of the function is lower than performance of [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md#quantile) or [quantileTiming](../../../sql-reference/aggregate-functions/reference/quantiletiming.md#quantiletiming). In terms of the ratio of State size to precision, this function is much better than `quantile`.
|
||||
|
||||
The result depends on the order of running the query, and is nondeterministic.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileTDigest(level)(expr)
|
||||
```
|
||||
|
||||
Alias: `medianTDigest`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
|
||||
- `weight` — Column with weights of sequence elements. Weight is a number of value occurrences.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Approximate quantile of the specified level.
|
||||
|
||||
Type:
|
||||
|
||||
- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
|
||||
- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
|
||||
- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileTDigestWeighted(number, 1) FROM numbers(10)
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileTDigestWeighted(number, 1)─┐
|
||||
│ 4.5 │
|
||||
└────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**See Also**
|
||||
|
||||
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,86 @@
|
||||
---
|
||||
toc_priority: 204
|
||||
---
|
||||
|
||||
# quantileTiming {#quantiletiming}
|
||||
|
||||
With the determined precision computes the [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence.
|
||||
|
||||
The result is deterministic (it doesn’t depend on the query processing order). The function is optimized for working with sequences which describe distributions like loading web pages times or backend response times.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileTiming(level)(expr)
|
||||
```
|
||||
|
||||
Alias: `medianTiming`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
|
||||
- `expr` — [Expression](../../../sql-reference/syntax.md#syntax-expressions) over a column values returning a [Float\*](../../../sql-reference/data-types/float.md)-type number.
|
||||
|
||||
- If negative values are passed to the function, the behavior is undefined.
|
||||
- If the value is greater than 30,000 (a page loading time of more than 30 seconds), it is assumed to be 30,000.
|
||||
|
||||
**Accuracy**
|
||||
|
||||
The calculation is accurate if:
|
||||
|
||||
- Total number of values doesn’t exceed 5670.
|
||||
- Total number of values exceeds 5670, but the page loading time is less than 1024ms.
|
||||
|
||||
Otherwise, the result of the calculation is rounded to the nearest multiple of 16 ms.
|
||||
|
||||
!!! note "Note"
|
||||
For calculating page loading time quantiles, this function is more effective and accurate than [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md#quantile).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Quantile of the specified level.
|
||||
|
||||
Type: `Float32`.
|
||||
|
||||
!!! note "Note"
|
||||
If no values are passed to the function (when using `quantileTimingIf`), [NaN](../../../sql-reference/data-types/float.md#data_type-float-nan-inf) is returned. The purpose of this is to differentiate these cases from cases that result in zero. See [ORDER BY clause](../../../sql-reference/statements/select/order-by.md#select-order-by) for notes on sorting `NaN` values.
|
||||
|
||||
**Example**
|
||||
|
||||
Input table:
|
||||
|
||||
``` text
|
||||
┌─response_time─┐
|
||||
│ 72 │
|
||||
│ 112 │
|
||||
│ 126 │
|
||||
│ 145 │
|
||||
│ 104 │
|
||||
│ 242 │
|
||||
│ 313 │
|
||||
│ 168 │
|
||||
│ 108 │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileTiming(response_time) FROM t
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileTiming(response_time)─┐
|
||||
│ 126 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
**See Also**
|
||||
|
||||
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,85 @@
|
||||
---
|
||||
toc_priority: 205
|
||||
---
|
||||
|
||||
# quantileTimingWeighted {#quantiletimingweighted}
|
||||
|
||||
With the determined precision computes the [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence according to the weight of each sequence member.
|
||||
|
||||
The result is deterministic (it doesn’t depend on the query processing order). The function is optimized for working with sequences which describe distributions like loading web pages times or backend response times.
|
||||
|
||||
When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
quantileTimingWeighted(level)(expr, weight)
|
||||
```
|
||||
|
||||
Alias: `medianTimingWeighted`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
|
||||
|
||||
- `expr` — [Expression](../../../sql-reference/syntax.md#syntax-expressions) over a column values returning a [Float\*](../../../sql-reference/data-types/float.md)-type number.
|
||||
|
||||
- If negative values are passed to the function, the behavior is undefined.
|
||||
- If the value is greater than 30,000 (a page loading time of more than 30 seconds), it is assumed to be 30,000.
|
||||
|
||||
- `weight` — Column with weights of sequence elements. Weight is a number of value occurrences.
|
||||
|
||||
**Accuracy**
|
||||
|
||||
The calculation is accurate if:
|
||||
|
||||
- Total number of values doesn’t exceed 5670.
|
||||
- Total number of values exceeds 5670, but the page loading time is less than 1024ms.
|
||||
|
||||
Otherwise, the result of the calculation is rounded to the nearest multiple of 16 ms.
|
||||
|
||||
!!! note "Note"
|
||||
For calculating page loading time quantiles, this function is more effective and accurate than [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md#quantile).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Quantile of the specified level.
|
||||
|
||||
Type: `Float32`.
|
||||
|
||||
!!! note "Note"
|
||||
If no values are passed to the function (when using `quantileTimingIf`), [NaN](../../../sql-reference/data-types/float.md#data_type-float-nan-inf) is returned. The purpose of this is to differentiate these cases from cases that result in zero. See [ORDER BY clause](../../../sql-reference/statements/select/order-by.md#select-order-by) for notes on sorting `NaN` values.
|
||||
|
||||
**Example**
|
||||
|
||||
Input table:
|
||||
|
||||
``` text
|
||||
┌─response_time─┬─weight─┐
|
||||
│ 68 │ 1 │
|
||||
│ 104 │ 2 │
|
||||
│ 112 │ 3 │
|
||||
│ 126 │ 2 │
|
||||
│ 138 │ 1 │
|
||||
│ 162 │ 1 │
|
||||
└───────────────┴────────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT quantileTimingWeighted(response_time, weight) FROM t
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─quantileTimingWeighted(response_time, weight)─┐
|
||||
│ 112 │
|
||||
└───────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**See Also**
|
||||
|
||||
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
|
||||
- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
|
@ -0,0 +1,53 @@
|
||||
## rankCorr {#agg_function-rankcorr}
|
||||
|
||||
Computes a rank correlation coefficient.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
rankCorr(x, y)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `x` — Arbitrary value. [Float32](../../../sql-reference/data-types/float.md#float32-float64) or [Float64](../../../sql-reference/data-types/float.md#float32-float64).
|
||||
- `y` — Arbitrary value. [Float32](../../../sql-reference/data-types/float.md#float32-float64) or [Float64](../../../sql-reference/data-types/float.md#float32-float64).
|
||||
|
||||
**Returned value(s)**
|
||||
|
||||
- Returns a rank correlation coefficient of the ranks of x and y. The value of the correlation coefficient ranges from -1 to +1. If less than two arguments are passed, the function will return an exception. The value close to +1 denotes a high linear relationship, and with an increase of one random variable, the second random variable also increases. The value close to -1 denotes a high linear relationship, and with an increase of one random variable, the second random variable decreases. The value close or equal to 0 denotes no relationship between the two random variables.
|
||||
|
||||
Type: [Float64](../../../sql-reference/data-types/float.md#float32-float64).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT rankCorr(number, number) FROM numbers(100);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─rankCorr(number, number)─┐
|
||||
│ 1 │
|
||||
└──────────────────────────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT roundBankers(rankCorr(exp(number), sin(number)), 3) FROM numbers(100);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─roundBankers(rankCorr(exp(number), sin(number)), 3)─┐
|
||||
│ -0.037 │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
**See Also**
|
||||
|
||||
- [Spearman's rank correlation coefficient](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient)
|
@ -0,0 +1,42 @@
|
||||
---
|
||||
toc_priority: 220
|
||||
---
|
||||
|
||||
# simpleLinearRegression {#simplelinearregression}
|
||||
|
||||
Performs simple (unidimensional) linear regression.
|
||||
|
||||
``` sql
|
||||
simpleLinearRegression(x, y)
|
||||
```
|
||||
|
||||
Parameters:
|
||||
|
||||
- `x` — Column with dependent variable values.
|
||||
- `y` — Column with explanatory variable values.
|
||||
|
||||
Returned values:
|
||||
|
||||
Constants `(a, b)` of the resulting line `y = a*x + b`.
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [0, 1, 2, 3])
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [0, 1, 2, 3])─┐
|
||||
│ (1,0) │
|
||||
└───────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [3, 4, 5, 6])
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [3, 4, 5, 6])─┐
|
||||
│ (1,3) │
|
||||
└───────────────────────────────────────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,25 @@
|
||||
---
|
||||
toc_priority: 150
|
||||
---
|
||||
|
||||
# skewPop {#skewpop}
|
||||
|
||||
Computes the [skewness](https://en.wikipedia.org/wiki/Skewness) of a sequence.
|
||||
|
||||
``` sql
|
||||
skewPop(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` — [Expression](../../../sql-reference/syntax.md#syntax-expressions) returning a number.
|
||||
|
||||
**Returned value**
|
||||
|
||||
The skewness of the given distribution. Type — [Float64](../../../sql-reference/data-types/float.md)
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT skewPop(value) FROM series_with_value_column
|
||||
```
|
@ -0,0 +1,27 @@
|
||||
---
|
||||
toc_priority: 151
|
||||
---
|
||||
|
||||
# skewSamp {#skewsamp}
|
||||
|
||||
Computes the [sample skewness](https://en.wikipedia.org/wiki/Skewness) of a sequence.
|
||||
|
||||
It represents an unbiased estimate of the skewness of a random variable if passed values form its sample.
|
||||
|
||||
``` sql
|
||||
skewSamp(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` — [Expression](../../../sql-reference/syntax.md#syntax-expressions) returning a number.
|
||||
|
||||
**Returned value**
|
||||
|
||||
The skewness of the given distribution. Type — [Float64](../../../sql-reference/data-types/float.md). If `n <= 1` (`n` is the size of the sample), then the function returns `nan`.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT skewSamp(value) FROM series_with_value_column
|
||||
```
|
@ -0,0 +1,10 @@
|
||||
---
|
||||
toc_priority: 30
|
||||
---
|
||||
|
||||
# stddevPop {#stddevpop}
|
||||
|
||||
The result is equal to the square root of [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md).
|
||||
|
||||
!!! note "Note"
|
||||
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `stddevPopStable` function. It works slower but provides a lower computational error.
|
@ -0,0 +1,10 @@
|
||||
---
|
||||
toc_priority: 31
|
||||
---
|
||||
|
||||
# stddevSamp {#stddevsamp}
|
||||
|
||||
The result is equal to the square root of [varSamp](../../../sql-reference/aggregate-functions/reference/varsamp.md).
|
||||
|
||||
!!! note "Note"
|
||||
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `stddevSampStable` function. It works slower but provides a lower computational error.
|
@ -0,0 +1,75 @@
|
||||
---
|
||||
toc_priority: 221
|
||||
---
|
||||
|
||||
# stochasticLinearRegression {#agg_functions-stochasticlinearregression}
|
||||
|
||||
This function implements stochastic linear regression. It supports custom parameters for learning rate, L2 regularization coefficient, mini-batch size and has few methods for updating weights ([Adam](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam) (used by default), [simple SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent), [Momentum](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum), [Nesterov](https://mipt.ru/upload/medialibrary/d7e/41-91.pdf)).
|
||||
|
||||
### Parameters {#agg_functions-stochasticlinearregression-parameters}
|
||||
|
||||
There are 4 customizable parameters. They are passed to the function sequentially, but there is no need to pass all four - default values will be used, however good model required some parameter tuning.
|
||||
|
||||
``` text
|
||||
stochasticLinearRegression(1.0, 1.0, 10, 'SGD')
|
||||
```
|
||||
|
||||
1. `learning rate` is the coefficient on step length, when gradient descent step is performed. Too big learning rate may cause infinite weights of the model. Default is `0.00001`.
|
||||
2. `l2 regularization coefficient` which may help to prevent overfitting. Default is `0.1`.
|
||||
3. `mini-batch size` sets the number of elements, which gradients will be computed and summed to perform one step of gradient descent. Pure stochastic descent uses one element, however having small batches(about 10 elements) make gradient steps more stable. Default is `15`.
|
||||
4. `method for updating weights`, they are: `Adam` (by default), `SGD`, `Momentum`, `Nesterov`. `Momentum` and `Nesterov` require little bit more computations and memory, however they happen to be useful in terms of speed of convergance and stability of stochastic gradient methods.
|
||||
|
||||
### Usage {#agg_functions-stochasticlinearregression-usage}
|
||||
|
||||
`stochasticLinearRegression` is used in two steps: fitting the model and predicting on new data. In order to fit the model and save its state for later usage we use `-State` combinator, which basically saves the state (model weights, etc).
|
||||
To predict we use function [evalMLMethod](../../../sql-reference/functions/machine-learning-functions.md#machine_learning_methods-evalmlmethod), which takes a state as an argument as well as features to predict on.
|
||||
|
||||
<a name="stochasticlinearregression-usage-fitting"></a>
|
||||
|
||||
**1.** Fitting
|
||||
|
||||
Such query may be used.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE IF NOT EXISTS train_data
|
||||
(
|
||||
param1 Float64,
|
||||
param2 Float64,
|
||||
target Float64
|
||||
) ENGINE = Memory;
|
||||
|
||||
CREATE TABLE your_model ENGINE = Memory AS SELECT
|
||||
stochasticLinearRegressionState(0.1, 0.0, 5, 'SGD')(target, param1, param2)
|
||||
AS state FROM train_data;
|
||||
```
|
||||
|
||||
Here we also need to insert data into `train_data` table. The number of parameters is not fixed, it depends only on number of arguments, passed into `linearRegressionState`. They all must be numeric values.
|
||||
Note that the column with target value(which we would like to learn to predict) is inserted as the first argument.
|
||||
|
||||
**2.** Predicting
|
||||
|
||||
After saving a state into the table, we may use it multiple times for prediction, or even merge with other states and create new even better models.
|
||||
|
||||
``` sql
|
||||
WITH (SELECT state FROM your_model) AS model SELECT
|
||||
evalMLMethod(model, param1, param2) FROM test_data
|
||||
```
|
||||
|
||||
The query will return a column of predicted values. Note that first argument of `evalMLMethod` is `AggregateFunctionState` object, next are columns of features.
|
||||
|
||||
`test_data` is a table like `train_data` but may not contain target value.
|
||||
|
||||
### Notes {#agg_functions-stochasticlinearregression-notes}
|
||||
|
||||
1. To merge two models user may create such query:
|
||||
`sql SELECT state1 + state2 FROM your_models`
|
||||
where `your_models` table contains both models. This query will return new `AggregateFunctionState` object.
|
||||
|
||||
2. User may fetch weights of the created model for its own purposes without saving the model if no `-State` combinator is used.
|
||||
`sql SELECT stochasticLinearRegression(0.01)(target, param1, param2) FROM train_data`
|
||||
Such query will fit the model and return its weights - first are weights, which correspond to the parameters of the model, the last one is bias. So in the example above the query will return a column with 3 values.
|
||||
|
||||
**See Also**
|
||||
|
||||
- [stochasticLogisticRegression](../../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md#agg_functions-stochasticlogisticregression)
|
||||
- [Difference between linear and logistic regressions](https://stackoverflow.com/questions/12146914/what-is-the-difference-between-linear-regression-and-logistic-regression)
|
@ -0,0 +1,55 @@
|
||||
---
|
||||
toc_priority: 222
|
||||
---
|
||||
|
||||
# stochasticLogisticRegression {#agg_functions-stochasticlogisticregression}
|
||||
|
||||
This function implements stochastic logistic regression. It can be used for binary classification problem, supports the same custom parameters as stochasticLinearRegression and works the same way.
|
||||
|
||||
### Parameters {#agg_functions-stochasticlogisticregression-parameters}
|
||||
|
||||
Parameters are exactly the same as in stochasticLinearRegression:
|
||||
`learning rate`, `l2 regularization coefficient`, `mini-batch size`, `method for updating weights`.
|
||||
For more information see [parameters](#agg_functions-stochasticlinearregression-parameters).
|
||||
|
||||
``` text
|
||||
stochasticLogisticRegression(1.0, 1.0, 10, 'SGD')
|
||||
```
|
||||
|
||||
**1.** Fitting
|
||||
|
||||
<!-- -->
|
||||
|
||||
See the `Fitting` section in the [stochasticLinearRegression](#stochasticlinearregression-usage-fitting) description.
|
||||
|
||||
Predicted labels have to be in \[-1, 1\].
|
||||
|
||||
**2.** Predicting
|
||||
|
||||
<!-- -->
|
||||
|
||||
Using saved state we can predict probability of object having label `1`.
|
||||
|
||||
``` sql
|
||||
WITH (SELECT state FROM your_model) AS model SELECT
|
||||
evalMLMethod(model, param1, param2) FROM test_data
|
||||
```
|
||||
|
||||
The query will return a column of probabilities. Note that first argument of `evalMLMethod` is `AggregateFunctionState` object, next are columns of features.
|
||||
|
||||
We can also set a bound of probability, which assigns elements to different labels.
|
||||
|
||||
``` sql
|
||||
SELECT ans < 1.1 AND ans > 0.5 FROM
|
||||
(WITH (SELECT state FROM your_model) AS model SELECT
|
||||
evalMLMethod(model, param1, param2) AS ans FROM test_data)
|
||||
```
|
||||
|
||||
Then the result will be labels.
|
||||
|
||||
`test_data` is a table like `train_data` but may not contain target value.
|
||||
|
||||
**See Also**
|
||||
|
||||
- [stochasticLinearRegression](../../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md#agg_functions-stochasticlinearregression)
|
||||
- [Difference between linear and logistic regressions.](https://stackoverflow.com/questions/12146914/what-is-the-difference-between-linear-regression-and-logistic-regression)
|
@ -0,0 +1,8 @@
|
||||
---
|
||||
toc_priority: 4
|
||||
---
|
||||
|
||||
# sum {#agg_function-sum}
|
||||
|
||||
计算总和。
|
||||
只适用于数字。
|
@ -0,0 +1,48 @@
|
||||
---
|
||||
toc_priority: 141
|
||||
---
|
||||
|
||||
# sumMap {#agg_functions-summap}
|
||||
|
||||
Syntax: `sumMap(key, value)` or `sumMap(Tuple(key, value))`
|
||||
|
||||
Totals the `value` array according to the keys specified in the `key` array.
|
||||
|
||||
Passing tuple of keys and values arrays is a synonym to passing two arrays of keys and values.
|
||||
|
||||
The number of elements in `key` and `value` must be the same for each row that is totaled.
|
||||
|
||||
Returns a tuple of two arrays: keys in sorted order, and values summed for the corresponding keys.
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE sum_map(
|
||||
date Date,
|
||||
timeslot DateTime,
|
||||
statusMap Nested(
|
||||
status UInt16,
|
||||
requests UInt64
|
||||
),
|
||||
statusMapTuple Tuple(Array(Int32), Array(Int32))
|
||||
) ENGINE = Log;
|
||||
INSERT INTO sum_map VALUES
|
||||
('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10], ([1, 2, 3], [10, 10, 10])),
|
||||
('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10], ([3, 4, 5], [10, 10, 10])),
|
||||
('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10], ([4, 5, 6], [10, 10, 10])),
|
||||
('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10], ([6, 7, 8], [10, 10, 10]));
|
||||
|
||||
SELECT
|
||||
timeslot,
|
||||
sumMap(statusMap.status, statusMap.requests),
|
||||
sumMap(statusMapTuple)
|
||||
FROM sum_map
|
||||
GROUP BY timeslot
|
||||
```
|
||||
|
||||
``` text
|
||||
┌────────────timeslot─┬─sumMap(statusMap.status, statusMap.requests)─┬─sumMap(statusMapTuple)─────────┐
|
||||
│ 2000-01-01 00:00:00 │ ([1,2,3,4,5],[10,10,20,10,10]) │ ([1,2,3,4,5],[10,10,20,10,10]) │
|
||||
│ 2000-01-01 00:01:00 │ ([4,5,6,7,8],[10,10,20,10,10]) │ ([4,5,6,7,8],[10,10,20,10,10]) │
|
||||
└─────────────────────┴──────────────────────────────────────────────┴────────────────────────────────┘
|
||||
```
|
@ -0,0 +1,9 @@
|
||||
---
|
||||
toc_priority: 140
|
||||
---
|
||||
|
||||
# sumWithOverflow {#sumwithoverflowx}
|
||||
|
||||
Computes the sum of the numbers, using the same data type for the result as for the input parameters. If the sum exceeds the maximum value for this data type, it is calculated with overflow.
|
||||
|
||||
Only works for numbers.
|
42
docs/zh/sql-reference/aggregate-functions/reference/topk.md
Normal file
42
docs/zh/sql-reference/aggregate-functions/reference/topk.md
Normal file
@ -0,0 +1,42 @@
|
||||
---
|
||||
toc_priority: 108
|
||||
---
|
||||
|
||||
# topK {#topk}
|
||||
|
||||
Returns an array of the approximately most frequent values in the specified column. The resulting array is sorted in descending order of approximate frequency of values (not by the values themselves).
|
||||
|
||||
Implements the [Filtered Space-Saving](http://www.l2f.inesc-id.pt/~fmmb/wiki/uploads/Work/misnis.ref0a.pdf) algorithm for analyzing TopK, based on the reduce-and-combine algorithm from [Parallel Space Saving](https://arxiv.org/pdf/1401.0702.pdf).
|
||||
|
||||
``` sql
|
||||
topK(N)(column)
|
||||
```
|
||||
|
||||
This function doesn’t provide a guaranteed result. In certain situations, errors might occur and it might return frequent values that aren’t the most frequent values.
|
||||
|
||||
We recommend using the `N < 10` value; performance is reduced with large `N` values. Maximum value of `N = 65536`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- ‘N’ is the number of elements to return.
|
||||
|
||||
If the parameter is omitted, default value 10 is used.
|
||||
|
||||
**Arguments**
|
||||
|
||||
- ’ x ’ – The value to calculate frequency.
|
||||
|
||||
**Example**
|
||||
|
||||
Take the [OnTime](../../../getting-started/example-datasets/ontime.md) data set and select the three most frequently occurring values in the `AirlineID` column.
|
||||
|
||||
``` sql
|
||||
SELECT topK(3)(AirlineID) AS res
|
||||
FROM ontime
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─res─────────────────┐
|
||||
│ [19393,19790,19805] │
|
||||
└─────────────────────┘
|
||||
```
|
@ -0,0 +1,42 @@
|
||||
---
|
||||
toc_priority: 109
|
||||
---
|
||||
|
||||
# topKWeighted {#topkweighted}
|
||||
|
||||
Similar to `topK` but takes one additional argument of integer type - `weight`. Every value is accounted `weight` times for frequency calculation.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
topKWeighted(N)(x, weight)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `N` — The number of elements to return.
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` – The value.
|
||||
- `weight` — The weight. [UInt8](../../../sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
Returns an array of the values with maximum approximate sum of weights.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT topKWeighted(10)(number, number) FROM numbers(1000)
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─topKWeighted(10)(number, number)──────────┐
|
||||
│ [999,998,997,996,995,994,993,992,991,990] │
|
||||
└───────────────────────────────────────────┘
|
||||
```
|
40
docs/zh/sql-reference/aggregate-functions/reference/uniq.md
Normal file
40
docs/zh/sql-reference/aggregate-functions/reference/uniq.md
Normal file
@ -0,0 +1,40 @@
|
||||
---
|
||||
toc_priority: 190
|
||||
---
|
||||
|
||||
# uniq {#agg_function-uniq}
|
||||
|
||||
Calculates the approximate number of different values of the argument.
|
||||
|
||||
``` sql
|
||||
uniq(x[, ...])
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
The function takes a variable number of parameters. Parameters can be `Tuple`, `Array`, `Date`, `DateTime`, `String`, or numeric types.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- A [UInt64](../../../sql-reference/data-types/int-uint.md)-type number.
|
||||
|
||||
**Implementation details**
|
||||
|
||||
Function:
|
||||
|
||||
- Calculates a hash for all parameters in the aggregate, then uses it in calculations.
|
||||
|
||||
- Uses an adaptive sampling algorithm. For the calculation state, the function uses a sample of element hash values up to 65536.
|
||||
|
||||
This algorithm is very accurate and very efficient on the CPU. When the query contains several of these functions, using `uniq` is almost as fast as using other aggregate functions.
|
||||
|
||||
- Provides the result deterministically (it doesn’t depend on the query processing order).
|
||||
|
||||
We recommend using this function in almost all scenarios.
|
||||
|
||||
**See Also**
|
||||
|
||||
- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined)
|
||||
- [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64)
|
||||
- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniqhll12.md#agg_function-uniqhll12)
|
||||
- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)
|
@ -0,0 +1,51 @@
|
||||
---
|
||||
toc_priority: 192
|
||||
---
|
||||
|
||||
# uniqCombined {#agg_function-uniqcombined}
|
||||
|
||||
Calculates the approximate number of different argument values.
|
||||
|
||||
``` sql
|
||||
uniqCombined(HLL_precision)(x[, ...])
|
||||
```
|
||||
|
||||
The `uniqCombined` function is a good choice for calculating the number of different values.
|
||||
|
||||
**Parameters**
|
||||
|
||||
The function takes a variable number of parameters. Parameters can be `Tuple`, `Array`, `Date`, `DateTime`, `String`, or numeric types.
|
||||
|
||||
`HLL_precision` is the base-2 logarithm of the number of cells in [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog). Optional, you can use the function as `uniqCombined(x[, ...])`. The default value for `HLL_precision` is 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- A number [UInt64](../../../sql-reference/data-types/int-uint.md)-type number.
|
||||
|
||||
**Implementation details**
|
||||
|
||||
Function:
|
||||
|
||||
- Calculates a hash (64-bit hash for `String` and 32-bit otherwise) for all parameters in the aggregate, then uses it in calculations.
|
||||
|
||||
- Uses a combination of three algorithms: array, hash table, and HyperLogLog with an error correction table.
|
||||
|
||||
For a small number of distinct elements, an array is used. When the set size is larger, a hash table is used. For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory.
|
||||
|
||||
- Provides the result deterministically (it doesn’t depend on the query processing order).
|
||||
|
||||
!!! note "Note"
|
||||
Since it uses 32-bit hash for non-`String` type, the result will have very high error for cardinalities significantly larger than `UINT_MAX` (error will raise quickly after a few tens of billions of distinct values), hence in this case you should use [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64)
|
||||
|
||||
Compared to the [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) function, the `uniqCombined`:
|
||||
|
||||
- Consumes several times less memory.
|
||||
- Calculates with several times higher accuracy.
|
||||
- Usually has slightly lower performance. In some scenarios, `uniqCombined` can perform better than `uniq`, for example, with distributed queries that transmit a large number of aggregation states over the network.
|
||||
|
||||
**See Also**
|
||||
|
||||
- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)
|
||||
- [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64)
|
||||
- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniqhll12.md#agg_function-uniqhll12)
|
||||
- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)
|
@ -0,0 +1,7 @@
|
||||
---
|
||||
toc_priority: 193
|
||||
---
|
||||
|
||||
# uniqCombined64 {#agg_function-uniqcombined64}
|
||||
|
||||
Same as [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined), but uses 64-bit hash for all data types.
|
@ -0,0 +1,25 @@
|
||||
---
|
||||
toc_priority: 191
|
||||
---
|
||||
|
||||
# uniqExact {#agg_function-uniqexact}
|
||||
|
||||
Calculates the exact number of different argument values.
|
||||
|
||||
``` sql
|
||||
uniqExact(x[, ...])
|
||||
```
|
||||
|
||||
Use the `uniqExact` function if you absolutely need an exact result. Otherwise use the [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) function.
|
||||
|
||||
The `uniqExact` function uses more memory than `uniq`, because the size of the state has unbounded growth as the number of different values increases.
|
||||
|
||||
**Parameters**
|
||||
|
||||
The function takes a variable number of parameters. Parameters can be `Tuple`, `Array`, `Date`, `DateTime`, `String`, or numeric types.
|
||||
|
||||
**See Also**
|
||||
|
||||
- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)
|
||||
- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniqcombined)
|
||||
- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniqhll12)
|
@ -0,0 +1,39 @@
|
||||
---
|
||||
toc_priority: 194
|
||||
---
|
||||
|
||||
# uniqHLL12 {#agg_function-uniqhll12}
|
||||
|
||||
Calculates the approximate number of different argument values, using the [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) algorithm.
|
||||
|
||||
``` sql
|
||||
uniqHLL12(x[, ...])
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
The function takes a variable number of parameters. Parameters can be `Tuple`, `Array`, `Date`, `DateTime`, `String`, or numeric types.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- A [UInt64](../../../sql-reference/data-types/int-uint.md)-type number.
|
||||
|
||||
**Implementation details**
|
||||
|
||||
Function:
|
||||
|
||||
- Calculates a hash for all parameters in the aggregate, then uses it in calculations.
|
||||
|
||||
- Uses the HyperLogLog algorithm to approximate the number of different argument values.
|
||||
|
||||
212 5-bit cells are used. The size of the state is slightly more than 2.5 KB. The result is not very accurate (up to ~10% error) for small data sets (<10K elements). However, the result is fairly accurate for high-cardinality data sets (10K-100M), with a maximum error of ~1.6%. Starting from 100M, the estimation error increases, and the function will return very inaccurate results for data sets with extremely high cardinality (1B+ elements).
|
||||
|
||||
- Provides the determinate result (it doesn’t depend on the query processing order).
|
||||
|
||||
We don’t recommend using this function. In most cases, use the [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) or [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined) function.
|
||||
|
||||
**See Also**
|
||||
|
||||
- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)
|
||||
- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md#agg_function-uniqcombined)
|
||||
- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)
|
@ -0,0 +1,12 @@
|
||||
---
|
||||
toc_priority: 32
|
||||
---
|
||||
|
||||
# varPop(x) {#varpopx}
|
||||
|
||||
Calculates the amount `Σ((x - x̅)^2) / n`, where `n` is the sample size and `x̅`is the average value of `x`.
|
||||
|
||||
In other words, dispersion for a set of values. Returns `Float64`.
|
||||
|
||||
!!! note "Note"
|
||||
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `varPopStable` function. It works slower but provides a lower computational error.
|
@ -0,0 +1,14 @@
|
||||
---
|
||||
toc_priority: 33
|
||||
---
|
||||
|
||||
# varSamp {#varsamp}
|
||||
|
||||
Calculates the amount `Σ((x - x̅)^2) / (n - 1)`, where `n` is the sample size and `x̅`is the average value of `x`.
|
||||
|
||||
It represents an unbiased estimate of the variance of a random variable if passed values form its sample.
|
||||
|
||||
Returns `Float64`. When `n <= 1`, returns `+∞`.
|
||||
|
||||
!!! note "Note"
|
||||
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `varSampStable` function. It works slower but provides a lower computational error.
|
Loading…
Reference in New Issue
Block a user