Merge pull request #65292 from Blargian/doc_varpop_fix

[Docs] Fix nonsense `varPop`, `varPopStable` docs
2024-11-24 08:32:02 +00:00 · 2024-06-22 11:15:22 +00:00 · 2024-06-22 11:15:22 +00:00 · a9ce6845ba
commit a9ce6845ba
parent d9dd481d5e c9b4f4ecaa
6 changed files with 150 additions and 147 deletions
--- a/docs/en/sql-reference/aggregate-functions/reference/stddevpop.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/stddevpop.md
@ -25,7 +25,7 @@ stddevPop(x)

 **Returned value**

-Square root of standard deviation of `x`. [Float64](../../data-types/float.md).
+- Square root of standard deviation of `x`. [Float64](../../data-types/float.md).


 **Example**
--- a/docs/en/sql-reference/aggregate-functions/reference/varpop.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/varpop.md
@ -4,30 +4,25 @@ slug: "/en/sql-reference/aggregate-functions/reference/varpop"
 sidebar_position: 32
 ---

-This page covers the `varPop` and `varPopStable` functions available in ClickHouse.
-
 ## varPop

-Calculates the population covariance between two data columns. The population covariance measures the degree to which two variables vary together. Calculates the amount `Σ((x - x̅)^2) / n`, where `n` is the sample size and `x̅`is the average value of `x`.
+Calculates the population variance.

 **Syntax**

 ```sql
-covarPop(x, y)
+varPop(x)
 ```

+Alias: `VAR_POP`.
+
 **Parameters**

- `x`: The first data column. [Numeric](../../../native-protocol/columns.md)
- `y`: The second data column. [Numeric](../../../native-protocol/columns.md)
+- `x`: Population of values to find the population variance of. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).

 **Returned value**

-Returns an integer of type `Float64`.
-
-**Implementation details**
-
-This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable [`varPopStable`](#varpopstable) function.
+- Returns the population variance of `x`. [`Float64`](../../data-types/float.md).

 **Example**

@ -37,69 +32,21 @@ Query:
 DROP TABLE IF EXISTS test_data;
 CREATE TABLE test_data
 (
-    x Int32,
-    y Int32
+    x UInt8,
 )
 ENGINE = Memory;

-INSERT INTO test_data VALUES (1, 2), (2, 3), (3, 5), (4, 6), (5, 8);
+INSERT INTO test_data VALUES (3), (3), (3), (4), (4), (5), (5), (7), (11), (15);

 SELECT
-    covarPop(x, y) AS covar_pop
+    varPop(x) AS var_pop
 FROM test_data;
 ```

 Result:

 ```response
-3
-```
-
-## varPopStable
-
-Calculates population covariance between two data columns using a stable, numerically accurate method to calculate the variance. This function is designed to provide reliable results even with large datasets or values that might cause numerical instability in other implementations.
-
-**Syntax**
-
-```sql
-covarPopStable(x, y)
-```
-
-**Parameters**
-
- `x`: The first data column. [String literal](../../syntax#syntax-string-literal)
- `y`: The second data column. [Expression](../../syntax#syntax-expressions)
-
-**Returned value**
-
-Returns an integer of type `Float64`.
-
-**Implementation details**
-
-Unlike [`varPop`](#varpop), this function uses a stable, numerically accurate algorithm to calculate the population variance to avoid issues like catastrophic cancellation or loss of precision. This function also handles `NaN` and `Inf` values correctly, excluding them from calculations.
-
-**Example**
-
-Query:
-
-```sql
-DROP TABLE IF EXISTS test_data;
-CREATE TABLE test_data
-(
-    x Int32,
-    y Int32
-)
-ENGINE = Memory;
-
-INSERT INTO test_data VALUES (1, 2), (2, 9), (9, 5), (4, 6), (5, 8);
-
-SELECT
-    covarPopStable(x, y) AS covar_pop_stable
-FROM test_data;
-```
-
-Result:
-
-```response
-0.5999999999999999
+┌─var_pop─┐
+│    14.4 │
+└─────────┘
 ```
--- a/docs/en/sql-reference/aggregate-functions/reference/varpopstable.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/varpopstable.md
@ -0,0 +1,52 @@
+---
+title: "varPopStable"
+slug: "/en/sql-reference/aggregate-functions/reference/varpopstable"
+sidebar_position: 32
+---
+
+## varPopStable
+
+Returns the population variance. Unlike [`varPop`](../reference/varpop.md), this function uses a [numerically stable](https://en.wikipedia.org/wiki/Numerical_stability) algorithm. It works slower but provides a lower computational error.
+
+**Syntax**
+
+```sql
+varPopStable(x)
+```
+
+Alias: `VAR_POP_STABLE`.
+
+**Parameters**
+
+- `x`: Population of values to find the population variance of. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
+
+**Returned value**
+
+- Returns the population variance of `x`. [Float64](../../data-types/float.md).
+
+**Example**
+
+Query:
+
+```sql
+DROP TABLE IF EXISTS test_data;
+CREATE TABLE test_data
+(
+    x UInt8,
+)
+ENGINE = Memory;
+
+INSERT INTO test_data VALUES (3),(3),(3),(4),(4),(5),(5),(7),(11),(15);
+
+SELECT
+    varPopStable(x) AS var_pop_stable
+FROM test_data;
+```
+
+Result:
+
+```response
+┌─var_pop_stable─┐
+│           14.4 │
+└────────────────┘
+```
--- a/docs/en/sql-reference/aggregate-functions/reference/varsamp.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/varsamp.md
@ -4,8 +4,6 @@ slug: /en/sql-reference/aggregate-functions/reference/varsamp
 sidebar_position: 33
 ---

-This page contains information on the `varSamp` and `varSampStable` ClickHouse functions.
-
 ## varSamp

 Calculate the sample variance of a data set.
@ -13,24 +11,27 @@ Calculate the sample variance of a data set.
 **Syntax**

 ```sql
-varSamp(expr)
+varSamp(x)
 ```

+Alias: `VAR_SAMP`.
+
 **Parameters**

- `expr`: An expression representing the data set for which you want to calculate the sample variance. [Expression](../../syntax#syntax-expressions)
+- `x`: The population for which you want to calculate the sample variance. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).

 **Returned value**

-Returns a Float64 value representing the sample variance of the input data set.
+
+- Returns the sample variance of the input data set `x`. [Float64](../../data-types/float.md).

 **Implementation details**

-The `varSamp()` function calculates the sample variance using the following formula:
+The `varSamp` function calculates the sample variance using the following formula:

-```plaintext
-∑(x - mean(x))^2 / (n - 1)
-```
+$$
+\sum\frac{(x - \text{mean}(x))^2}{(n - 1)}
+$$

 Where:

@ -38,91 +39,29 @@ Where:
 - `mean(x)` is the arithmetic mean of the data set.
 - `n` is the number of data points in the data set.

-The function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the [`varPop()` function](./varpop#varpop) instead.
-
-This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable [`varSampStable`](#varsampstable) function.
+The function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use [`varPop`](../reference/varpop.md) instead.

 **Example**

 Query:

 ```sql
-CREATE TABLE example_table
+DROP TABLE IF EXISTS test_data;
+CREATE TABLE test_data
 (
-    id UInt64,
-    value Float64
+    x Float64
 )
-ENGINE = MergeTree
-ORDER BY id;
+ENGINE = Memory;

-INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);
+INSERT INTO test_data VALUES (10.5), (12.3), (9.8), (11.2), (10.7);

-SELECT varSamp(value) FROM example_table;
+SELECT round(varSamp(x),3) AS var_samp FROM test_data;
 ```

 Response:

 ```response
-0.8650000000000091
+┌─var_samp─┐
+│    0.865 │
+└──────────┘
 ```
-
-## varSampStable
-
-Calculate the sample variance of a data set using a numerically stable algorithm.
-
-**Syntax**
-
-```sql
-varSampStable(expr)
-```
-
-**Parameters**
-
- `expr`: An expression representing the data set for which you want to calculate the sample variance. [Expression](../../syntax#syntax-expressions)
-
-**Returned value**
-
-The `varSampStable` function returns a Float64 value representing the sample variance of the input data set.
-
-**Implementation details**
-
-The `varSampStable` function calculates the sample variance using the same formula as the [`varSamp`](#varsamp) function:
-
-```plaintext
-∑(x - mean(x))^2 / (n - 1)
-```
-
-Where:
- `x` is each individual data point in the data set.
- `mean(x)` is the arithmetic mean of the data set.
- `n` is the number of data points in the data set.
-
-The difference between `varSampStable` and `varSamp` is that `varSampStable` is designed to provide a more deterministic and stable result when dealing with floating-point arithmetic. It uses an algorithm that minimizes the accumulation of rounding errors, which can be particularly important when dealing with large data sets or data with a wide range of values.
-
-Like `varSamp`, the `varSampStable` function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the [`varPopStable`](./varpop#varpopstable) function instead.
-
-**Example**
-
-Query:
-
-```sql
-CREATE TABLE example_table
-(
-    id UInt64,
-    value Float64
-)
-ENGINE = MergeTree
-ORDER BY id;
-
-INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);
-
-SELECT varSampStable(value) FROM example_table;
-```
-
-Response:
-
-```response
-0.865
-```
-
-This query calculates the sample variance of the `value` column in the `example_table` using the `varSampStable()` function. The result shows that the sample variance of the values `[10.5, 12.3, 9.8, 11.2, 10.7]` is approximately 0.865, which may differ slightly from the result of `varSamp` due to the more precise handling of floating-point arithmetic.
--- a/docs/en/sql-reference/aggregate-functions/reference/varsampstable.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/varsampstable.md
@ -0,0 +1,63 @@
+---
+title: "varSampStable"
+slug: /en/sql-reference/aggregate-functions/reference/varsampstable
+sidebar_position: 33
+---
+
+## varSampStable
+
+Calculate the sample variance of a data set. Unlike [`varSamp`](../reference/varsamp.md), this function uses a numerically stable algorithm. It works slower but provides a lower computational error.
+
+**Syntax**
+
+```sql
+varSampStable(x)
+```
+
+Alias: `VAR_SAMP_STABLE`
+
+**Parameters**
+
+- `x`: The population for which you want to calculate the sample variance. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).
+
+**Returned value**
+
+- Returns the sample variance of the input data set. [Float64](../../data-types/float.md).
+
+**Implementation details**
+
+The `varSampStable` function calculates the sample variance using the same formula as the [`varSamp`](../reference/varsamp.md):
+
+$$
+\sum\frac{(x - \text{mean}(x))^2}{(n - 1)}
+$$
+
+Where:
+- `x` is each individual data point in the data set.
+- `mean(x)` is the arithmetic mean of the data set.
+- `n` is the number of data points in the data set.
+
+**Example**
+
+Query:
+
+```sql
+DROP TABLE IF EXISTS test_data;
+CREATE TABLE test_data
+(
+    x Float64
+)
+ENGINE = Memory;
+
+INSERT INTO test_data VALUES (10.5), (12.3), (9.8), (11.2), (10.7);
+
+SELECT round(varSampStable(x),3) AS var_samp_stable FROM test_data;
+```
+
+Response:
+
+```response
+┌─var_samp_stable─┐
+│           0.865 │
+└─────────────────┘
+```
--- a/utils/check-style/aspell-ignore/en/aspell-dict.txt
+++ b/utils/check-style/aspell-ignore/en/aspell-dict.txt
@ -2852,7 +2852,9 @@ variantElement
 variantType
 varint
 varpop
+varpopstable
 varsamp
+varsampstable
 vectorized
 vectorscan
 vendoring