From a2a8cbb787f182831d70873374b93956c500deae Mon Sep 17 00:00:00 2001 From: taiyang-li <654010905@qq.com> Date: Tue, 7 Mar 2023 21:41:45 +0800 Subject: [PATCH] add some more docs --- .../reference/quantileApprox.md | 35 +++++++++++++---- .../reference/quantiles.md | 38 +++++++++++++++---- 2 files changed, 59 insertions(+), 14 deletions(-) diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantileApprox.md b/docs/en/sql-reference/aggregate-functions/reference/quantileApprox.md index 788f4754324..21b9a3500c4 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/quantileApprox.md +++ b/docs/en/sql-reference/aggregate-functions/reference/quantileApprox.md @@ -5,8 +5,9 @@ sidebar_position: 204 # quantileApprox -Computes the [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence using the [Greenwald-Khanna](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf) algorithm. +Computes the [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence using the [Greenwald-Khanna](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf) algorithm. The Greenwald-Khanna algorithm is an algorithm used to compute quantiles on a stream of data in a highly efficient manner. It was introduced by Michael Greenwald and Sanjeev Khanna in 2001. It is widely used in databases and big data systems where computing accurate quantiles on a large stream of data in real-time is necessary. The algorithm is highly efficient, taking only O(log n) space and O(log log n) time per item (where n is the size of the input). It is also highly accurate, providing an approximate quantile value with high probability. +`quantileApprox` is different from other quantile functions in ClickHouse, because it enables user to control the accuracy of the approximate quantile result. **Syntax** @@ -18,7 +19,7 @@ Alias: `medianApprox`. **Arguments** -- `accuracy` — Accuracy of quantile. Constant positive integer. The larger the better. +- `accuracy` — Accuracy of quantile. Constant positive integer. Larger accuracy value means less error. For example, if the accuracy argument is set to 100, the computed quantile will have an error no greater than 1% with high probability. There is a trade-off between the accuracy of the computed quantiles and the computational complexity of the algorithm. A larger accuracy requires more memory and computational resources to compute the quantile accurately, while a smaller accuracy argument allows for a faster and more memory-efficient computation but with a slightly lower accuracy. - `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median). @@ -39,13 +40,33 @@ Type: **Example** ``` sql -WITH arrayJoin([0, 6, 7, 9, 10]) AS x -SELECT quantileApprox(100, 0.5)(x) +SELECT quantileApprox(1, 0.25)(number + 1) +FROM numbers(1000) +┌─quantileApprox(1, 0.25)(plus(number, 1))─┐ +│ 1 │ +└──────────────────────────────────────────┘ -┌─quantileApprox(100, 0.5)(x)─┐ -│ 7 │ -└──────────────----───────────┘ +SELECT quantileApprox(10, 0.25)(number + 1) +FROM numbers(1000) + +┌─quantileApprox(10, 0.25)(plus(number, 1))─┐ +│ 156 │ +└───────────────────────────────────────────┘ + +SELECT quantileApprox(100, 0.25)(number + 1) +FROM numbers(1000) + +┌─quantileApprox(100, 0.25)(plus(number, 1))─┐ +│ 251 │ +└────────────────────────────────────────────┘ + +SELECT quantileApprox(1000, 0.25)(number + 1) +FROM numbers(1000) + +┌─quantileApprox(1000, 0.25)(plus(number, 1))─┐ +│ 249 │ +└─────────────────────────────────────────────┘ ``` diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantiles.md b/docs/en/sql-reference/aggregate-functions/reference/quantiles.md index ec5855397ce..663e8771723 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/quantiles.md +++ b/docs/en/sql-reference/aggregate-functions/reference/quantiles.md @@ -117,7 +117,9 @@ Result: ## quantilesApprox -Exactly computes the [quantiles](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence using the [Greenwald-Khanna](https://dl.acm.org/doi/10.1145/375663.375670) algorithm. +Computes the [quantiles](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence using the [Greenwald-Khanna](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf) algorithm. The Greenwald-Khanna algorithm is an algorithm used to compute quantiles on a stream of data in a highly efficient manner. It was introduced by Michael Greenwald and Sanjeev Khanna in 2001. It is widely used in databases and big data systems where computing accurate quantiles on a large stream of data in real-time is necessary. The algorithm is highly efficient, taking only O(log n) space and O(log log n) time per item (where n is the size of the input stream). It is also highly accurate, providing approximate quantile values with high probability. + +`quantilesApprox` is different from other quantiles functions in ClickHouse, because it enables user to control the accuracy of the approximate quantiles. **Syntax** @@ -131,7 +133,7 @@ quantilesApprox(accuracy, level1, level2, ...)(expr) **Parameters** -- `accuracy` — Accuracy of quantile. Constant positive integer. The larger the better. +- `accuracy` — Accuracy of quantile. Constant positive integer. Larger accuracy value means less error. For example, if the accuracy argument is set to 100, for example, the computed quantiles will have an error no greater than 1% with high probability. There is a trade-off between the accuracy of the computed quantiles and the computational complexity of the algorithm. A larger accuracy requires more memory and computational resources to compute the quantiles accurately, while a smaller accuracy argument allows for a faster and more memory-efficient computation but with a slightly lower accuracy. - `levelN` — Level of quantile. Constant floating-point number from 0 to 1. @@ -151,10 +153,32 @@ Query: ``` sql -WITH arrayJoin([0, 6, 7, 9, 10]) AS x -SELECT quantilesApprox(100, 0.1, 0.2, 0.3)(x) +SELECT quantilesApprox(1, 0.25, 0.5, 0.75)(number + 1) +FROM numbers(1000) -┌─quantilesApprox(100, 0.1, 0.2, 0.3)(x)─┐ -│ [0,0,6] │ -└────────────────────────────────────────┘ +┌─quantilesApprox(1, 0.25, 0.5, 0.75)(plus(number, 1))─┐ +│ [1,1,1] │ +└──────────────────────────────────────────────────────┘ + +SELECT quantilesApprox(10, 0.25, 0.5, 0.75)(number + 1) +FROM numbers(1000) + +┌─quantilesApprox(10, 0.25, 0.5, 0.75)(plus(number, 1))─┐ +│ [156,413,659] │ +└───────────────────────────────────────────────────────┘ + + +SELECT quantilesApprox(100, 0.25, 0.5, 0.75)(number + 1) +FROM numbers(1000) + +┌─quantilesApprox(100, 0.25, 0.5, 0.75)(plus(number, 1))─┐ +│ [251,498,741] │ +└────────────────────────────────────────────────────────┘ + +SELECT quantilesApprox(1000, 0.25, 0.5, 0.75)(number + 1) +FROM numbers(1000) + +┌─quantilesApprox(1000, 0.25, 0.5, 0.75)(plus(number, 1))─┐ +│ [249,499,749] │ +└─────────────────────────────────────────────────────────┘ ```