mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-09-20 08:40:50 +00:00
return docs back, added test
This commit is contained in:
parent
6910cdbac4
commit
090f05cf6c
@ -0,0 +1,55 @@
|
|||||||
|
---
|
||||||
|
slug: /en/sql-reference/aggregate-functions/reference/approxtopk
|
||||||
|
sidebar_position: 212
|
||||||
|
---
|
||||||
|
|
||||||
|
# approx_top_k
|
||||||
|
|
||||||
|
Returns an array of the approximately most frequent values and their counts in the specified column. The resulting array is sorted in descending order of approximate frequency of values (not by the values themselves).
|
||||||
|
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
approx_top_k(N)(column)
|
||||||
|
approx_top_k(N, reserved)(column)
|
||||||
|
```
|
||||||
|
|
||||||
|
This function does not provide a guaranteed result. In certain situations, errors might occur and it might return frequent values that aren’t the most frequent values.
|
||||||
|
|
||||||
|
We recommend using the `N < 10` value; performance is reduced with large `N` values. Maximum value of `N = 65536`.
|
||||||
|
|
||||||
|
**Parameters**
|
||||||
|
|
||||||
|
- `N` — The number of elements to return. Optional. Default value: 10.
|
||||||
|
- `reseved` — Defines, how many cells reserved for values. If uniq(column) > reseved, result of topK function will be approximate. Optional. Default value: N * 3.
|
||||||
|
|
||||||
|
**Arguments**
|
||||||
|
|
||||||
|
- `column` — The value to calculate frequency.
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
|
||||||
|
Query:
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
SELECT approx_top_k(2)(k)
|
||||||
|
FROM VALUES('k Char, w UInt64', ('y', 1), ('y', 1), ('x', 5), ('y', 1), ('z', 10));
|
||||||
|
```
|
||||||
|
|
||||||
|
Result:
|
||||||
|
|
||||||
|
``` text
|
||||||
|
┌─approx_top_k(2)(k)────┐
|
||||||
|
│ [('y',3,0),('x',1,0)] │
|
||||||
|
└───────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
# approx_top_count
|
||||||
|
|
||||||
|
Is an alias to `approx_top_k` function
|
||||||
|
|
||||||
|
**See Also**
|
||||||
|
|
||||||
|
- [topK](../../../sql-reference/aggregate-functions/reference/topk.md)
|
||||||
|
- [topKWeighted](../../../sql-reference/aggregate-functions/reference/topkweighted.md)
|
||||||
|
- [approx_top_sum](../../../sql-reference/aggregate-functions/reference/approxtopsum.md)
|
||||||
|
|
@ -0,0 +1,51 @@
|
|||||||
|
---
|
||||||
|
slug: /en/sql-reference/aggregate-functions/reference/approxtopsum
|
||||||
|
sidebar_position: 212
|
||||||
|
---
|
||||||
|
|
||||||
|
# approx_top_sum
|
||||||
|
|
||||||
|
Returns an array of the approximately most frequent values and their counts in the specified column. The resulting array is sorted in descending order of approximate frequency of values (not by the values themselves). Additionally, the weight of the value is taken into account.
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
approx_top_sum(N)(column, weight)
|
||||||
|
approx_top_sum(N, reserved)(column, weight)
|
||||||
|
```
|
||||||
|
|
||||||
|
This function does not provide a guaranteed result. In certain situations, errors might occur and it might return frequent values that aren’t the most frequent values.
|
||||||
|
|
||||||
|
We recommend using the `N < 10` value; performance is reduced with large `N` values. Maximum value of `N = 65536`.
|
||||||
|
|
||||||
|
**Parameters**
|
||||||
|
|
||||||
|
- `N` — The number of elements to return. Optional. Default value: 10.
|
||||||
|
- `reseved` — Defines, how many cells reserved for values. If uniq(column) > reseved, result of topK function will be approximate. Optional. Default value: N * 3.
|
||||||
|
|
||||||
|
**Arguments**
|
||||||
|
|
||||||
|
- `column` — The value to calculate frequency.
|
||||||
|
- `weight` — The weight. Every value is accounted `weight` times for frequency calculation. [UInt64](../../../sql-reference/data-types/int-uint.md).
|
||||||
|
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
|
||||||
|
Query:
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
SELECT approx_top_sum(2)(k, w)
|
||||||
|
FROM VALUES('k Char, w UInt64', ('y', 1), ('y', 1), ('x', 5), ('y', 1), ('z', 10))
|
||||||
|
```
|
||||||
|
|
||||||
|
Result:
|
||||||
|
|
||||||
|
``` text
|
||||||
|
┌─approx_top_sum(2)(k, w)─┐
|
||||||
|
│ [('z',10,0),('x',5,0)] │
|
||||||
|
└─────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**See Also**
|
||||||
|
|
||||||
|
- [topK](../../../sql-reference/aggregate-functions/reference/topk.md)
|
||||||
|
- [topKWeighted](../../../sql-reference/aggregate-functions/reference/topkweighted.md)
|
||||||
|
- [approx_top_k](../../../sql-reference/aggregate-functions/reference/approxtopk.md)
|
@ -11,21 +11,23 @@ Implements the [Filtered Space-Saving](https://doi.org/10.1016/j.ins.2010.08.024
|
|||||||
|
|
||||||
``` sql
|
``` sql
|
||||||
topK(N)(column)
|
topK(N)(column)
|
||||||
|
topK(N, load_factor)(column)
|
||||||
|
topK(N, load_factor, 'counts')(column)
|
||||||
```
|
```
|
||||||
|
|
||||||
This function does not provide a guaranteed result. In certain situations, errors might occur and it might return frequent values that aren’t the most frequent values.
|
This function does not provide a guaranteed result. In certain situations, errors might occur and it might return frequent values that aren’t the most frequent values.
|
||||||
|
|
||||||
We recommend using the `N < 10` value; performance is reduced with large `N` values. Maximum value of `N = 65536`.
|
We recommend using the `N < 10` value; performance is reduced with large `N` values. Maximum value of `N = 65536`.
|
||||||
|
|
||||||
**Arguments**
|
**Parameters**
|
||||||
|
|
||||||
- `N` – The number of elements to return.
|
- `N` — The number of elements to return. Optional. Default value: 10.
|
||||||
|
- `load_factor` — Defines, how many cells reserved for values. If uniq(column) > N * load_factor, result of topK function will be approximate. Optional. Default value: 3.
|
||||||
If the parameter is omitted, default value 10 is used.
|
- `counts` — Defines, should result contain approximate count and error value.
|
||||||
|
|
||||||
**Arguments**
|
**Arguments**
|
||||||
|
|
||||||
- `x` – The value to calculate frequency.
|
- `column` — The value to calculate frequency.
|
||||||
|
|
||||||
**Example**
|
**Example**
|
||||||
|
|
||||||
@ -41,3 +43,9 @@ FROM ontime
|
|||||||
│ [19393,19790,19805] │
|
│ [19393,19790,19805] │
|
||||||
└─────────────────────┘
|
└─────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**See Also**
|
||||||
|
|
||||||
|
- [topKWeighted](../../../sql-reference/aggregate-functions/reference/topkweighted.md)
|
||||||
|
- [approx_top_k](../../../sql-reference/aggregate-functions/reference/approxtopk.md)
|
||||||
|
- [approx_top_sum](../../../sql-reference/aggregate-functions/reference/approxtopsum.md)
|
@ -10,13 +10,20 @@ Returns an array of the approximately most frequent values in the specified colu
|
|||||||
**Syntax**
|
**Syntax**
|
||||||
|
|
||||||
``` sql
|
``` sql
|
||||||
topKWeighted(N)(x, weight)
|
topKWeighted(N)(column, weight)
|
||||||
|
topKWeighted(N, load_factor)(column, weight)
|
||||||
|
topKWeighted(N, load_factor, 'counts')(column, weight)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Parameters**
|
||||||
|
|
||||||
|
- `N` — The number of elements to return. Optional. Default value: 10.
|
||||||
|
- `load_factor` — Defines, how many cells reserved for values. If uniq(column) > N * load_factor, result of topK function will be approximate. Optional. Default value: 3.
|
||||||
|
- `counts` — Defines, should result contain approximate count and error value.
|
||||||
|
|
||||||
**Arguments**
|
**Arguments**
|
||||||
|
|
||||||
- `N` — The number of elements to return.
|
- `column` — The value.
|
||||||
- `x` — The value.
|
|
||||||
- `weight` — The weight. Every value is accounted `weight` times for frequency calculation. [UInt64](../../../sql-reference/data-types/int-uint.md).
|
- `weight` — The weight. Every value is accounted `weight` times for frequency calculation. [UInt64](../../../sql-reference/data-types/int-uint.md).
|
||||||
|
|
||||||
**Returned value**
|
**Returned value**
|
||||||
@ -40,6 +47,23 @@ Result:
|
|||||||
└────────────────────────┘
|
└────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Query:
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
SELECT topKWeighted(2, 10, 'counts')(k, w)
|
||||||
|
FROM VALUES('k Char, w UInt64', ('y', 1), ('y', 1), ('x', 5), ('y', 1), ('z', 10))
|
||||||
|
```
|
||||||
|
|
||||||
|
Result:
|
||||||
|
|
||||||
|
``` text
|
||||||
|
┌─topKWeighted(2, 10, 'counts')(k, w)─┐
|
||||||
|
│ [('z',10,0),('x',5,0)] │
|
||||||
|
└─────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
**See Also**
|
**See Also**
|
||||||
|
|
||||||
- [topK](../../../sql-reference/aggregate-functions/reference/topk.md)
|
- [topK](../../../sql-reference/aggregate-functions/reference/topk.md)
|
||||||
|
- [approx_top_k](../../../sql-reference/aggregate-functions/reference/approxtopk.md)
|
||||||
|
- [approx_top_sum](../../../sql-reference/aggregate-functions/reference/approxtopsum.md)
|
@ -0,0 +1,2 @@
|
|||||||
|
[('6_1',342),('6_0',341),('5_0',109),('5_1',108),('4_1',34)] [('6_1',225036),('6_0',224378),('5_0',22672),('5_1',22464),('4_1',2244)] [('6_1',342,11),('6_0',341,11),('5_0',109,3)] [('6_1',225036,0),('6_0',224378,0),('5_0',22672,0)] [('6_1',342,11),('6_0',341,11),('5_0',109,3)] [('6_1',342,11),('6_0',341,11),('5_0',109,3)] [('6_1',225036,0),('6_0',224378,0),('5_0',22672,0)]
|
||||||
|
[(6,683,0),(5,217,0),(4,68,0),(3,22,0)]
|
@ -0,0 +1,26 @@
|
|||||||
|
WITH
|
||||||
|
arraySlice(arrayReverseSort(x -> (x.2), arrayZip(untuple(sumMap(([k], [1]))))), 1, 5) AS topKExact,
|
||||||
|
arraySlice(arrayReverseSort(x -> (x.2), arrayZip(untuple(sumMap(([k], [w]))))), 1, 5) AS topKWeightedExact
|
||||||
|
SELECT
|
||||||
|
topKExact,
|
||||||
|
topKWeightedExact,
|
||||||
|
topK(3, 2, 'counts')(k) AS topK_counts,
|
||||||
|
topKWeighted(3, 2, 'counts')(k, w) AS topKWeighted_counts,
|
||||||
|
approx_top_count(3, 6)(k) AS approx_top_count,
|
||||||
|
approx_top_k(3, 6)(k) AS approx_top_k,
|
||||||
|
approx_top_sum(3, 6)(k, w) AS approx_top_sum
|
||||||
|
FROM
|
||||||
|
(
|
||||||
|
SELECT
|
||||||
|
concat(countDigits(number * number), '_', number % 2) AS k,
|
||||||
|
number AS w
|
||||||
|
FROM numbers(1000)
|
||||||
|
);
|
||||||
|
|
||||||
|
SELECT topKMerge(4, 2, 'counts')(state) FROM ( SELECT topKState(4, 2, 'counts')(countDigits(number * number)) AS state FROM numbers(1000));
|
||||||
|
|
||||||
|
SELECT topKMerge(4, 3, 'counts')(state) FROM ( SELECT topKState(4, 2, 'counts')(countDigits(number * number)) AS state FROM numbers(1000)); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT }
|
||||||
|
|
||||||
|
SELECT topKMerge(4, 2)(state) FROM ( SELECT topKState(4, 2, 'counts')(countDigits(number * number)) AS state FROM numbers(1000)); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT }
|
||||||
|
|
||||||
|
SELECT topKMerge(state) FROM ( SELECT topKState(4, 2, 'counts')(countDigits(number * number)) AS state FROM numbers(1000)); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT }
|
Loading…
Reference in New Issue
Block a user