From 1feb20d9e01b17876e768c8db2672f9d28e4eac9 Mon Sep 17 00:00:00 2001 From: BayoNet Date: Tue, 23 Jul 2019 11:01:08 +0300 Subject: [PATCH 1/4] DOCAPI-7460: The histogram function docs. --- .../agg_functions/parametric_functions.md | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/docs/en/query_language/agg_functions/parametric_functions.md b/docs/en/query_language/agg_functions/parametric_functions.md index c6a9694ed0c..cefc9e6777f 100644 --- a/docs/en/query_language/agg_functions/parametric_functions.md +++ b/docs/en/query_language/agg_functions/parametric_functions.md @@ -2,6 +2,42 @@ Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments. +## histogram + +Calculates a histogram. + +``` +histogram(number_of_bins)(values) +``` + +**Parameters** + +`number_of_bins` — Number of bins for the histogram. +`values` — [Expression](../syntax.md#expressions) resulting in a data sample. + +**Returned values** + +- [Array](../../data_types/array.md) of [Tuples](../../data_types/tuple.md) of the following format: + + ``` + [(lower_1, upper_1, height_1), ... (lower_N, upper_N, height_N)] + ``` + + - `lower` — Lower bound of the bin. + - `upper` — Upper bound of the bin. + - `height` — Calculated height of the bin. + +**Example** + +```sql +SELECT histogram(5)(number + 1) FROM (SELECT * FROM system.numbers LIMIT 20) +``` +```text +┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐ +│ [(1,4.5,4),(4.5,8.5,4),(8.5,12.75,4.125),(12.75,17,4.625),(17,20,3.25)] │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + ## sequenceMatch(pattern)(time, cond1, cond2, ...) Pattern matching for event chains. From 221ab6a04f32b3be40cf80c123b8b67d5609fd01 Mon Sep 17 00:00:00 2001 From: BayoNet Date: Tue, 23 Jul 2019 11:18:09 +0300 Subject: [PATCH 2/4] DOCAPI-7460: Link fix. --- docs/en/query_language/agg_functions/parametric_functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/query_language/agg_functions/parametric_functions.md b/docs/en/query_language/agg_functions/parametric_functions.md index cefc9e6777f..da6052545dc 100644 --- a/docs/en/query_language/agg_functions/parametric_functions.md +++ b/docs/en/query_language/agg_functions/parametric_functions.md @@ -13,7 +13,7 @@ histogram(number_of_bins)(values) **Parameters** `number_of_bins` — Number of bins for the histogram. -`values` — [Expression](../syntax.md#expressions) resulting in a data sample. +`values` — [Expression](../syntax.md#syntax-expressions) resulting in a data sample. **Returned values** From bd493727b655d95bcadcce9bff7a3a4aad8cf304 Mon Sep 17 00:00:00 2001 From: BayoNet Date: Wed, 31 Jul 2019 08:55:10 +0300 Subject: [PATCH 3/4] DOCAPI-7460: Added link to algorithm. --- docs/en/query_language/agg_functions/parametric_functions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/en/query_language/agg_functions/parametric_functions.md b/docs/en/query_language/agg_functions/parametric_functions.md index da6052545dc..d27cb5d9431 100644 --- a/docs/en/query_language/agg_functions/parametric_functions.md +++ b/docs/en/query_language/agg_functions/parametric_functions.md @@ -10,6 +10,8 @@ Calculates a histogram. histogram(number_of_bins)(values) ``` +The functions uses [A Streaming Parallel Decision Tree Algorithm](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf). It calculates the borders of histogram bins automatically, and in common case the widths of bins are not equal. + **Parameters** `number_of_bins` — Number of bins for the histogram. From c43f9030da6fa0464b08a57972048f1a3f31d2f7 Mon Sep 17 00:00:00 2001 From: BayoNet Date: Tue, 20 Aug 2019 18:36:08 +0300 Subject: [PATCH 4/4] DOCAPI-7460: Clarifications. --- .../agg_functions/parametric_functions.md | 43 ++++++++++++++++--- .../functions/other_functions.md | 2 +- 2 files changed, 38 insertions(+), 7 deletions(-) diff --git a/docs/en/query_language/agg_functions/parametric_functions.md b/docs/en/query_language/agg_functions/parametric_functions.md index d27cb5d9431..84898a61133 100644 --- a/docs/en/query_language/agg_functions/parametric_functions.md +++ b/docs/en/query_language/agg_functions/parametric_functions.md @@ -4,18 +4,18 @@ Some aggregate functions can accept not only argument columns (used for compress ## histogram -Calculates a histogram. +Calculates an adaptive histogram. It doesn't guarantee precise results. ``` histogram(number_of_bins)(values) ``` - -The functions uses [A Streaming Parallel Decision Tree Algorithm](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf). It calculates the borders of histogram bins automatically, and in common case the widths of bins are not equal. + +The functions uses [A Streaming Parallel Decision Tree Algorithm](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf). The borders of histogram bins are adjusted as a new data enters a function, and in common case the widths of bins are not equal. **Parameters** -`number_of_bins` — Number of bins for the histogram. -`values` — [Expression](../syntax.md#syntax-expressions) resulting in a data sample. +`number_of_bins` — Upper limit for a number of bins for the histogram. Function automatically calculates the number of bins. It tries to reach the specified number of bins, but if it fails, it uses less number of bins. +`values` — [Expression](../syntax.md#syntax-expressions) resulting in input values. **Returned values** @@ -32,7 +32,12 @@ The functions uses [A Streaming Parallel Decision Tree Algorithm](http://jmlr.or **Example** ```sql -SELECT histogram(5)(number + 1) FROM (SELECT * FROM system.numbers LIMIT 20) +SELECT histogram(5)(number + 1) +FROM ( + SELECT * + FROM system.numbers + LIMIT 20 +) ``` ```text ┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐ @@ -40,6 +45,32 @@ SELECT histogram(5)(number + 1) FROM (SELECT * FROM system.numbers LIMIT 20) └─────────────────────────────────────────────────────────────────────────┘ ``` +You can visualize a histogram with the [bar](../other_functions.md#function-bar) function, for example: + +```sql +WITH histogram(5)(rand() % 100) AS hist +SELECT + arrayJoin(hist).3 AS height, + bar(height, 0, 6, 5) AS bar +FROM +( + SELECT * + FROM system.numbers + LIMIT 20 +) +``` +```text +┌─height─┬─bar───┐ +│ 2.125 │ █▋ │ +│ 3.25 │ ██▌ │ +│ 5.625 │ ████▏ │ +│ 5.625 │ ████▏ │ +│ 3.375 │ ██▌ │ +└────────┴───────┘ +``` + +In this case you should remember, that you don't know the borders of histogram bins. + ## sequenceMatch(pattern)(time, cond1, cond2, ...) Pattern matching for event chains. diff --git a/docs/en/query_language/functions/other_functions.md b/docs/en/query_language/functions/other_functions.md index 007f1352775..268c245d24f 100644 --- a/docs/en/query_language/functions/other_functions.md +++ b/docs/en/query_language/functions/other_functions.md @@ -120,7 +120,7 @@ Accepts constant strings: database name, table name, and column name. Returns a The function throws an exception if the table does not exist. For elements in a nested data structure, the function checks for the existence of a column. For the nested data structure itself, the function returns 0. -## bar +## bar {#function-bar} Allows building a unicode-art diagram.