WIP update-aggregate-funcions-in-zh

This commit is contained in:
benbiti 2021-02-03 23:22:18 +08:00
parent 8354cdd0e1
commit 7d0430c0ec
9 changed files with 88 additions and 120 deletions

View File

@ -4,8 +4,12 @@
## 保持英文,不译
Parquet
## 英文 <-> 中文
Tuple 元组
## 英文 <-> 中文
Integer 整数
floating-point 浮点数
Decimal 定点数
Tuple 元组
function 函数

View File

@ -379,47 +379,6 @@ kurtSamp(expr)
SELECT kurtSamp(value) FROM series_with_value_column
```
## avgWeighted {#avgweighted}
计算 [加权算术平均值](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean).
**语法**
``` sql
avgWeighted(x, weight)
```
**参数**
- `x` — 值。 [整数](../data-types/int-uint.md) 或 [浮点](../data-types/float.md).
- `weight` — 值的加权。 [整数](../data-types/int-uint.md) 或 [浮点](../data-types/float.md).
`x``weight` 的类型一定是一样的
**返回值**
- 加权平均值。
- `NaN`. 如果所有的权重都等于0。
类型: [Float64](../data-types/float.md).
**示例**
查询:
``` sql
SELECT avgWeighted(x, w)
FROM values('x Int8, w Int8', (4, 1), (1, 0), (10, 2))
```
结果:
``` text
┌─avgWeighted(x, weight)─┐
│ 8 │
└────────────────────────┘
```
## uniq {#agg_function-uniq}
计算参数的不同值的近似数量。

View File

@ -4,42 +4,43 @@ toc_priority: 107
# avgWeighted {#avgweighted}
Calculates the [weighted arithmetic mean](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean).
**Syntax**
计算 [加权算术平均值](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean).
**语法**
``` sql
avgWeighted(x, weight)
```
**Parameters**
**参数**
- `x`Values.
- `weight`Weights of the values.
- `x`值。
- `weight`值的加权。
`x` and `weight` must both be
[Integer](../../../sql-reference/data-types/int-uint.md),
[floating-point](../../../sql-reference/data-types/float.md), or
[Decimal](../../../sql-reference/data-types/decimal.md),
but may have different types.
`x` `weight` 的类型必须是
[整数](../../../sql-reference/data-types/int-uint.md),
[浮点数](../../../sql-reference/data-types/float.md), 或
[定点数](../../../sql-reference/data-types/decimal.md),
但是可以不一样。
**Returned value**
**返回值**
- `NaN` if all the weights are equal to 0 or the supplied weights parameter is empty.
- Weighted mean otherwise.
- `NaN`。 如果所有的权重都等于0 或所提供的权重参数是空。
- 加权平均值。 其他。
**Return type** is always [Float64](../../../sql-reference/data-types/float.md).
类型: 总是[Float64](../data-types/float.md).
**Example**
**示例**
Query:
查询:
``` sql
SELECT avgWeighted(x, w)
FROM values('x Int8, w Int8', (4, 1), (1, 0), (10, 2))
```
Result:
结果:
``` text
┌─avgWeighted(x, weight)─┐
@ -47,33 +48,17 @@ Result:
└────────────────────────┘
```
**Example**
Query:
**示例**
``` sql
SELECT avgWeighted(x, w)
FROM values('x Int8, w Float64', (4, 1), (1, 0), (10, 2))
```
Result:
``` text
┌─avgWeighted(x, weight)─┐
│ 8 │
└────────────────────────┘
```
**Example**
Query:
查询:
``` sql
SELECT avgWeighted(x, w)
FROM values('x Int8, w Int8', (0, 0), (1, 0), (10, 0))
```
Result:
结果:
``` text
┌─avgWeighted(x, weight)─┐
@ -81,16 +66,16 @@ Result:
└────────────────────────┘
```
**Example**
**示例**
Query:
查询:
``` sql
CREATE table test (t UInt8) ENGINE = Memory;
SELECT avgWeighted(t) FROM test
```
Result:
结果:
``` text
┌─avgWeighted(x, weight)─┐

View File

@ -4,10 +4,10 @@ toc_priority: 250
# categoricalInformationValue {#categoricalinformationvalue}
Calculates the value of `(P(tag = 1) - P(tag = 0))(log(P(tag = 1)) - log(P(tag = 0)))` for each category.
对于每个类别计算 `(P(tag = 1) - P(tag = 0))(log(P(tag = 1)) - log(P(tag = 0)))`
``` sql
categoricalInformationValue(category1, category2, ..., tag)
```
The result indicates how a discrete (categorical) feature `[category1, category2, ...]` contribute to a learning model which predicting the value of `tag`.
结果指示离散(分类)要素如何使用 `[category1, category2, ...]` 有助于使用学习模型预测`tag`的值。

View File

@ -4,9 +4,12 @@ toc_priority: 107
# corr {#corrx-y}
Syntax: `corr(x, y)`
**语法**
``` sql
`corr(x, y)`
```
Calculates the Pearson correlation coefficient: `Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)^2) * Σ((y - y̅)^2))`.
计算Pearson相关系数: `Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)^2) * Σ((y - y̅)^2))`
!!! note "Note"
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `corrStable` function. It works slower but provides a lower computational error.
!!! note ""
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `corrStable` 函数。 它的工作速度较慢,但提供较低的计算错误。

View File

@ -4,35 +4,36 @@ toc_priority: 1
# count {#agg_function-count}
Counts the number of rows or not-NULL values.
ClickHouse supports the following syntaxes for `count`:
- `count(expr)` or `COUNT(DISTINCT expr)`.
- `count()` or `COUNT(*)`. The `count()` syntax is ClickHouse-specific.
计数行数或非空值。
**Parameters**
ClickHouse支持以下 `count` 语法:
- `count(expr)``COUNT(DISTINCT expr)`
- `count()``COUNT(*)`. 该 `count()` 语法是ClickHouse特定的。
The function can take:
**参数**
- Zero parameters.
- One [expression](../../../sql-reference/syntax.md#syntax-expressions).
该函数可以采取:
**Returned value**
- 零参数。
- 一个 [表达式](../../../sql-reference/syntax.md#syntax-expressions)。
- If the function is called without parameters it counts the number of rows.
- If the [expression](../../../sql-reference/syntax.md#syntax-expressions) is passed, then the function counts how many times this expression returned not null. If the expression returns a [Nullable](../../../sql-reference/data-types/nullable.md)-type value, then the result of `count` stays not `Nullable`. The function returns 0 if the expression returned `NULL` for all the rows.
**返回值**
In both cases the type of the returned value is [UInt64](../../../sql-reference/data-types/int-uint.md).
- 如果没有参数调用函数,它会计算行数。
- 如果 [表达式](../../../syntax.md#syntax-expressions) 被传递则该函数计数此表达式返回非null的次数。 如果表达式返回 [可为空](../../../sql-reference/data-types/nullable.md)类型的值,`count`的结果仍然不 `Nullable`。 如果表达式对于所有的行都返回 `NULL` ,则该函数返回 0 。
**Details**
在这两种情况下,返回值的类型为 [UInt64](../../../sql-reference/data-types/int-uint.md)。
ClickHouse supports the `COUNT(DISTINCT ...)` syntax. The behavior of this construction depends on the [count_distinct_implementation](../../../operations/settings/settings.md#settings-count_distinct_implementation) setting. It defines which of the [uniq\*](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) functions is used to perform the operation. The default is the [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact) function.
**详细信息**
The `SELECT count() FROM table` query is not optimized, because the number of entries in the table is not stored separately. It chooses a small column from the table and counts the number of values in it.
ClickHouse支持 `COUNT(DISTINCT ...)` 语法,这种结构的行为取决于 [count_distinct_implementation](../../../operations/settings/settings.md#settings-count_distinct_implementation) 设置。 它定义了用于执行该操作的 [uniq\*](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq)函数。 默认值是 [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md#agg_function-uniqexact)函数。
**Examples**
`SELECT count() FROM table` 这个查询未被优化,因为表中的条目数没有单独存储。 它从表中选择一个小列并计算其值的个数。
Example 1:
**示例**
示例1:
``` sql
SELECT count() FROM t
@ -44,7 +45,7 @@ SELECT count() FROM t
└─────────┘
```
Example 2:
示例2:
``` sql
SELECT name, value FROM system.settings WHERE name = 'count_distinct_implementation'
@ -66,4 +67,4 @@ SELECT count(DISTINCT num) FROM t
└────────────────┘
```
This example shows that `count(DISTINCT num)` is performed by the `uniqExact` function according to the `count_distinct_implementation` setting value.
这个例子表明 `count(DISTINCT num)` 是通过 `count_distinct_implementation` 的设定值 `uniqExact` 函数来执行的。

View File

@ -4,9 +4,12 @@ toc_priority: 36
# covarPop {#covarpop}
Syntax: `covarPop(x, y)`
**语法**
``` sql
`covarPop(x, y)`
```
Calculates the value of `Σ((x - x̅)(y - y̅)) / n`.
计算 `Σ((x - x̅)(y - y̅)) / n` 的值。
!!! note "Note"
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `covarPopStable` function. It works slower but provides a lower computational error.
!!! note ""
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `covarPopStable` 函数。 它的工作速度较慢,但提供了较低的计算错误。

View File

@ -4,9 +4,14 @@ toc_priority: 37
# covarSamp {#covarsamp}
Calculates the value of `Σ((x - x̅)(y - y̅)) / (n - 1)`.
**语法**
``` sql
`covarSamp(x, y)`
```
Returns Float64. When `n <= 1`, returns +∞.
计算 `Σ((x - x̅)(y - y̅)) / (n - 1)` 的值。
!!! note "Note"
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `covarSampStable` function. It works slower but provides a lower computational error.
返回Float64。 当 `n <= 1`, 返回 +∞。
!!! note "注"
该函数使用数值不稳定的算法。 如果你需要 [数值稳定性](https://en.wikipedia.org/wiki/Numerical_stability) 在计算中,使用 `covarSampStable` 函数。 它的工作速度较慢,但提供较低的计算错误。

View File

@ -4,11 +4,19 @@ toc_priority: 110
# groupArray {#agg_function-grouparray}
Syntax: `groupArray(x)` or `groupArray(max_size)(x)`
**语法**
``` sql
`groupArray(x)`
Creates an array of argument values.
Values can be added to the array in any (indeterminate) order.
or
The second version (with the `max_size` parameter) limits the size of the resulting array to `max_size` elements. For example, `groupArray(1)(x)` is equivalent to `[any (x)]`.
`groupArray(max_size)(x)`
```
In some cases, you can still rely on the order of execution. This applies to cases when `SELECT` comes from a subquery that uses `ORDER BY`.
创建参数值的数组。
值可以按任何(不确定)顺序添加到数组中。
第二个版本(带有 `max_size` 参数)将结果数组的大小限制为 `max_size` 个元素。
例如, `groupArray (1) (x)` 相当于 `[any (x)]`
在某些情况下您仍然可以依赖执行顺序。这适用于SELECT(查询)来自使用了 `ORDER BY` 子查询的情况。