mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-09 17:14:47 +00:00
Merge pull request #65292 from Blargian/doc_varpop_fix
[Docs] Fix nonsense `varPop`, `varPopStable` docs
This commit is contained in:
commit
a9ce6845ba
@ -25,7 +25,7 @@ stddevPop(x)
|
||||
|
||||
**Returned value**
|
||||
|
||||
Square root of standard deviation of `x`. [Float64](../../data-types/float.md).
|
||||
- Square root of standard deviation of `x`. [Float64](../../data-types/float.md).
|
||||
|
||||
|
||||
**Example**
|
||||
|
@ -4,30 +4,25 @@ slug: "/en/sql-reference/aggregate-functions/reference/varpop"
|
||||
sidebar_position: 32
|
||||
---
|
||||
|
||||
This page covers the `varPop` and `varPopStable` functions available in ClickHouse.
|
||||
|
||||
## varPop
|
||||
|
||||
Calculates the population covariance between two data columns. The population covariance measures the degree to which two variables vary together. Calculates the amount `Σ((x - x̅)^2) / n`, where `n` is the sample size and `x̅`is the average value of `x`.
|
||||
Calculates the population variance.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
covarPop(x, y)
|
||||
varPop(x)
|
||||
```
|
||||
|
||||
Alias: `VAR_POP`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `x`: The first data column. [Numeric](../../../native-protocol/columns.md)
|
||||
- `y`: The second data column. [Numeric](../../../native-protocol/columns.md)
|
||||
- `x`: Population of values to find the population variance of. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
Returns an integer of type `Float64`.
|
||||
|
||||
**Implementation details**
|
||||
|
||||
This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable [`varPopStable`](#varpopstable) function.
|
||||
- Returns the population variance of `x`. [`Float64`](../../data-types/float.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -37,69 +32,21 @@ Query:
|
||||
DROP TABLE IF EXISTS test_data;
|
||||
CREATE TABLE test_data
|
||||
(
|
||||
x Int32,
|
||||
y Int32
|
||||
x UInt8,
|
||||
)
|
||||
ENGINE = Memory;
|
||||
|
||||
INSERT INTO test_data VALUES (1, 2), (2, 3), (3, 5), (4, 6), (5, 8);
|
||||
INSERT INTO test_data VALUES (3), (3), (3), (4), (4), (5), (5), (7), (11), (15);
|
||||
|
||||
SELECT
|
||||
covarPop(x, y) AS covar_pop
|
||||
varPop(x) AS var_pop
|
||||
FROM test_data;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
3
|
||||
```
|
||||
|
||||
## varPopStable
|
||||
|
||||
Calculates population covariance between two data columns using a stable, numerically accurate method to calculate the variance. This function is designed to provide reliable results even with large datasets or values that might cause numerical instability in other implementations.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
covarPopStable(x, y)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `x`: The first data column. [String literal](../../syntax#syntax-string-literal)
|
||||
- `y`: The second data column. [Expression](../../syntax#syntax-expressions)
|
||||
|
||||
**Returned value**
|
||||
|
||||
Returns an integer of type `Float64`.
|
||||
|
||||
**Implementation details**
|
||||
|
||||
Unlike [`varPop`](#varpop), this function uses a stable, numerically accurate algorithm to calculate the population variance to avoid issues like catastrophic cancellation or loss of precision. This function also handles `NaN` and `Inf` values correctly, excluding them from calculations.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
DROP TABLE IF EXISTS test_data;
|
||||
CREATE TABLE test_data
|
||||
(
|
||||
x Int32,
|
||||
y Int32
|
||||
)
|
||||
ENGINE = Memory;
|
||||
|
||||
INSERT INTO test_data VALUES (1, 2), (2, 9), (9, 5), (4, 6), (5, 8);
|
||||
|
||||
SELECT
|
||||
covarPopStable(x, y) AS covar_pop_stable
|
||||
FROM test_data;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0.5999999999999999
|
||||
┌─var_pop─┐
|
||||
│ 14.4 │
|
||||
└─────────┘
|
||||
```
|
||||
|
@ -0,0 +1,52 @@
|
||||
---
|
||||
title: "varPopStable"
|
||||
slug: "/en/sql-reference/aggregate-functions/reference/varpopstable"
|
||||
sidebar_position: 32
|
||||
---
|
||||
|
||||
## varPopStable
|
||||
|
||||
Returns the population variance. Unlike [`varPop`](../reference/varpop.md), this function uses a [numerically stable](https://en.wikipedia.org/wiki/Numerical_stability) algorithm. It works slower but provides a lower computational error.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
varPopStable(x)
|
||||
```
|
||||
|
||||
Alias: `VAR_POP_STABLE`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `x`: Population of values to find the population variance of. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns the population variance of `x`. [Float64](../../data-types/float.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
DROP TABLE IF EXISTS test_data;
|
||||
CREATE TABLE test_data
|
||||
(
|
||||
x UInt8,
|
||||
)
|
||||
ENGINE = Memory;
|
||||
|
||||
INSERT INTO test_data VALUES (3),(3),(3),(4),(4),(5),(5),(7),(11),(15);
|
||||
|
||||
SELECT
|
||||
varPopStable(x) AS var_pop_stable
|
||||
FROM test_data;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─var_pop_stable─┐
|
||||
│ 14.4 │
|
||||
└────────────────┘
|
||||
```
|
@ -4,8 +4,6 @@ slug: /en/sql-reference/aggregate-functions/reference/varsamp
|
||||
sidebar_position: 33
|
||||
---
|
||||
|
||||
This page contains information on the `varSamp` and `varSampStable` ClickHouse functions.
|
||||
|
||||
## varSamp
|
||||
|
||||
Calculate the sample variance of a data set.
|
||||
@ -13,24 +11,27 @@ Calculate the sample variance of a data set.
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
varSamp(expr)
|
||||
varSamp(x)
|
||||
```
|
||||
|
||||
Alias: `VAR_SAMP`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `expr`: An expression representing the data set for which you want to calculate the sample variance. [Expression](../../syntax#syntax-expressions)
|
||||
- `x`: The population for which you want to calculate the sample variance. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
Returns a Float64 value representing the sample variance of the input data set.
|
||||
|
||||
- Returns the sample variance of the input data set `x`. [Float64](../../data-types/float.md).
|
||||
|
||||
**Implementation details**
|
||||
|
||||
The `varSamp()` function calculates the sample variance using the following formula:
|
||||
The `varSamp` function calculates the sample variance using the following formula:
|
||||
|
||||
```plaintext
|
||||
∑(x - mean(x))^2 / (n - 1)
|
||||
```
|
||||
$$
|
||||
\sum\frac{(x - \text{mean}(x))^2}{(n - 1)}
|
||||
$$
|
||||
|
||||
Where:
|
||||
|
||||
@ -38,91 +39,29 @@ Where:
|
||||
- `mean(x)` is the arithmetic mean of the data set.
|
||||
- `n` is the number of data points in the data set.
|
||||
|
||||
The function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the [`varPop()` function](./varpop#varpop) instead.
|
||||
|
||||
This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable [`varSampStable`](#varsampstable) function.
|
||||
The function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use [`varPop`](../reference/varpop.md) instead.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
CREATE TABLE example_table
|
||||
DROP TABLE IF EXISTS test_data;
|
||||
CREATE TABLE test_data
|
||||
(
|
||||
id UInt64,
|
||||
value Float64
|
||||
x Float64
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY id;
|
||||
ENGINE = Memory;
|
||||
|
||||
INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);
|
||||
INSERT INTO test_data VALUES (10.5), (12.3), (9.8), (11.2), (10.7);
|
||||
|
||||
SELECT varSamp(value) FROM example_table;
|
||||
SELECT round(varSamp(x),3) AS var_samp FROM test_data;
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```response
|
||||
0.8650000000000091
|
||||
┌─var_samp─┐
|
||||
│ 0.865 │
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
## varSampStable
|
||||
|
||||
Calculate the sample variance of a data set using a numerically stable algorithm.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
varSampStable(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `expr`: An expression representing the data set for which you want to calculate the sample variance. [Expression](../../syntax#syntax-expressions)
|
||||
|
||||
**Returned value**
|
||||
|
||||
The `varSampStable` function returns a Float64 value representing the sample variance of the input data set.
|
||||
|
||||
**Implementation details**
|
||||
|
||||
The `varSampStable` function calculates the sample variance using the same formula as the [`varSamp`](#varsamp) function:
|
||||
|
||||
```plaintext
|
||||
∑(x - mean(x))^2 / (n - 1)
|
||||
```
|
||||
|
||||
Where:
|
||||
- `x` is each individual data point in the data set.
|
||||
- `mean(x)` is the arithmetic mean of the data set.
|
||||
- `n` is the number of data points in the data set.
|
||||
|
||||
The difference between `varSampStable` and `varSamp` is that `varSampStable` is designed to provide a more deterministic and stable result when dealing with floating-point arithmetic. It uses an algorithm that minimizes the accumulation of rounding errors, which can be particularly important when dealing with large data sets or data with a wide range of values.
|
||||
|
||||
Like `varSamp`, the `varSampStable` function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the [`varPopStable`](./varpop#varpopstable) function instead.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
CREATE TABLE example_table
|
||||
(
|
||||
id UInt64,
|
||||
value Float64
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY id;
|
||||
|
||||
INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);
|
||||
|
||||
SELECT varSampStable(value) FROM example_table;
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```response
|
||||
0.865
|
||||
```
|
||||
|
||||
This query calculates the sample variance of the `value` column in the `example_table` using the `varSampStable()` function. The result shows that the sample variance of the values `[10.5, 12.3, 9.8, 11.2, 10.7]` is approximately 0.865, which may differ slightly from the result of `varSamp` due to the more precise handling of floating-point arithmetic.
|
||||
|
@ -0,0 +1,63 @@
|
||||
---
|
||||
title: "varSampStable"
|
||||
slug: /en/sql-reference/aggregate-functions/reference/varsampstable
|
||||
sidebar_position: 33
|
||||
---
|
||||
|
||||
## varSampStable
|
||||
|
||||
Calculate the sample variance of a data set. Unlike [`varSamp`](../reference/varsamp.md), this function uses a numerically stable algorithm. It works slower but provides a lower computational error.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
varSampStable(x)
|
||||
```
|
||||
|
||||
Alias: `VAR_SAMP_STABLE`
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `x`: The population for which you want to calculate the sample variance. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns the sample variance of the input data set. [Float64](../../data-types/float.md).
|
||||
|
||||
**Implementation details**
|
||||
|
||||
The `varSampStable` function calculates the sample variance using the same formula as the [`varSamp`](../reference/varsamp.md):
|
||||
|
||||
$$
|
||||
\sum\frac{(x - \text{mean}(x))^2}{(n - 1)}
|
||||
$$
|
||||
|
||||
Where:
|
||||
- `x` is each individual data point in the data set.
|
||||
- `mean(x)` is the arithmetic mean of the data set.
|
||||
- `n` is the number of data points in the data set.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
DROP TABLE IF EXISTS test_data;
|
||||
CREATE TABLE test_data
|
||||
(
|
||||
x Float64
|
||||
)
|
||||
ENGINE = Memory;
|
||||
|
||||
INSERT INTO test_data VALUES (10.5), (12.3), (9.8), (11.2), (10.7);
|
||||
|
||||
SELECT round(varSampStable(x),3) AS var_samp_stable FROM test_data;
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```response
|
||||
┌─var_samp_stable─┐
|
||||
│ 0.865 │
|
||||
└─────────────────┘
|
||||
```
|
@ -2852,7 +2852,9 @@ variantElement
|
||||
variantType
|
||||
varint
|
||||
varpop
|
||||
varpopstable
|
||||
varsamp
|
||||
varsampstable
|
||||
vectorized
|
||||
vectorscan
|
||||
vendoring
|
||||
|
Loading…
Reference in New Issue
Block a user