diff --git a/docs/en/query_language/agg_functions/reference.md b/docs/en/query_language/agg_functions/reference.md index 4a26063cfa4..868bd4e630e 100644 --- a/docs/en/query_language/agg_functions/reference.md +++ b/docs/en/query_language/agg_functions/reference.md @@ -962,9 +962,9 @@ The result is equal to the square root of `varSamp(x)`. The result is equal to the square root of `varPop(x)`. -## topK(N)(column) +## topK(N)(x) -Returns an array of the most frequent values in the specified column. The resulting array is sorted in descending order of frequency of values (not by the values themselves). +Returns an array of the approximately most frequent values in the specified column. The resulting array is sorted in descending order of approximate frequency of values (not by the values themselves). Implements the [ Filtered Space-Saving](http://www.l2f.inesc-id.pt/~fmmb/wiki/uploads/Work/misnis.ref0a.pdf) algorithm for analyzing TopK, based on the reduce-and-combine algorithm from [Parallel Space Saving](https://arxiv.org/pdf/1401.0702.pdf). @@ -976,10 +976,15 @@ This function doesn't provide a guaranteed result. In certain situations, errors We recommend using the `N < 10 ` value; performance is reduced with large `N` values. Maximum value of ` N = 65536`. +**Parameters** + +- 'N' is the number of elements to return. + +If the parameter is omitted, default value 10 is used. + **Arguments** -- 'N' is the number of values. -- ' x ' – The column. +- ' x ' – The value to calculate frequency. **Example** @@ -998,9 +1003,7 @@ FROM ontime ## topKWeighted {#topkweighted} -Returns an array of the most frequent values in the specified column. The resulting array is sorted by weight of value. If the optional parameter `N` is passed, сreates a slice of array with n elements. - -We recommend using the `N < 10 ` value; performance is reduced with large `N` values. Maximum value of ` N = 65536`. +Similar to `topK` but takes one additional argument of integer type - `weight`. Every value is accounted `weight` times for frequency calculation. **Syntax** @@ -1010,13 +1013,16 @@ topKWeighted(N)(x, weight) **Parameters** -- `N` — The number of elements to return. Optional parameter. [UInt8](../../data_types/int_uint.md). -- `x` – The column. +- `N` — The number of elements to return. + +**Arguments** + +- `x` – The value. - `weight` — The weight. [UInt8](../../data_types/int_uint.md). **Returned value** -Returns an array of the most frequent values. +Returns an array of the values with maximum approximate sum of weights. **Example**