DOCAPI-7415: EN review, RU translation. Docs for the -Resample aggregate function combinator. (#7017)

* Update combinators.md (#39) * DOCAPI-7415: RU translation * DOCAPI-7415: fix.
2024-11-22 15:42:02 +00:00 · 2019-09-24 03:04:52 +03:00 · 2019-09-24 03:04:52 +03:00 · f26fdc63a0
commit f26fdc63a0
parent 6db4cb8117
3 changed files with 85 additions and 14 deletions
--- a/docs/en/query_language/agg_functions/combinators.md
+++ b/docs/en/query_language/agg_functions/combinators.md
@ -16,7 +16,7 @@ The -Array suffix can be appended to any aggregate function. In this case, the a

 Example 1: `sumArray(arr)` - Totals all the elements of all 'arr' arrays. In this example, it could have been written more simply: `sum(arraySum(arr))`.

-Example 2: `uniqArray(arr)` – Count the number of unique elements in all 'arr' arrays. This could be done an easier way: `uniq(arrayJoin(arr))`, but it's not always possible to add 'arrayJoin' to a query.
+Example 2: `uniqArray(arr)` – Counts the number of unique elements in all 'arr' arrays. This could be done an easier way: `uniq(arrayJoin(arr))`, but it's not always possible to add 'arrayJoin' to a query.

 -If and -Array can be combined. However, 'Array' must come first, then 'If'. Examples: `uniqArrayIf(arr, cond)`, `quantilesTimingArrayIf(level1, level2)(arr, cond)`. Due to this order, the 'cond' argument can't be an array.

@ -44,9 +44,9 @@ Merges the intermediate aggregation states in the same way as the -Merge combina

 Converts an aggregate function for tables into an aggregate function for arrays that aggregates the corresponding array items and returns an array of results. For example, `sumForEach` for the arrays `[1, 2]`, `[3, 4, 5]`and`[6, 7]`returns the result `[10, 13, 5]` after adding together the corresponding array items.

-## -Resample
+## -Resample {#agg_functions-combinator-resample}

-Allows to divide data by groups, and then separately aggregates the data in those groups. Groups are created by splitting the values of one of the columns into intervals.
+Lets you divide data into groups, and then separately aggregates the data in those groups. Groups are created by splitting the values from one column into intervals.

 ```sql
 <aggFunction>Resample(start, end, step)(<aggFunction_params>, resampling_key)
@ -54,16 +54,16 @@ Allows to divide data by groups, and then separately aggregates the data in thos

 **Parameters**

- `start` — Starting value of the whole required interval for the values of `resampling_key`. 
- `stop` — Ending value of the whole required interval for the values of `resampling_key`. The whole interval doesn't include the `stop` value `[start, stop)`.
- `step` — Step for separating the whole interval by subintervals. The `aggFunction` is executed over each of those subintervals independently.
- `resampling_key` — Column, which values are used for separating data by intervals.
- `aggFunction_params` — Parameters of `aggFunction`.
+- `start` — Starting value of the whole required interval for `resampling_key` values. 
+- `stop` — Ending value of the whole required interval for `resampling_key` values. The whole interval doesn't include the `stop` value `[start, stop)`.
+- `step` — Step for separating the whole interval into subintervals. The `aggFunction` is executed over each of those subintervals independently.
+- `resampling_key` — Column whose values are used for separating data into intervals.
+- `aggFunction_params` — `aggFunction` parameters.


 **Returned values**

- Array of `aggFunction` results for each of subintervals.
+- Array of `aggFunction` results for each subinterval.

 **Example**

@ -80,9 +80,9 @@ Consider the `people` table with the following data:
 └────────┴─────┴──────┘
 ```

-Let's get the names of the persons which age lies in the intervals of `[30,60)` and `[60,75)`. As we use integer representation of age, then there are ages of `[30, 59]` and `[60,74]`.
+Let's get the names of the people whose age lies in the intervals of `[30,60)` and `[60,75)`. Since we use integer representation for age, we get ages in the `[30, 59]` and `[60,74]` intervals.

-For aggregating names into the array, we use the aggregate function [groupArray](reference.md#agg_function-grouparray). It takes a single argument. For our case, it is the `name` column. The `groupArrayResample` function should use the `age` column to aggregate names by age. To define required intervals, we pass the `(30, 75, 30)` arguments into the `groupArrayResample` function.
+To aggregate names in an array, we use the [groupArray](reference.md#agg_function-grouparray) aggregate function. It takes one argument. In our case, it's the `name` column. The `groupArrayResample` function should use the `age` column to aggregate names by age. To define the required intervals, we pass the `30, 75, 30` arguments into the `groupArrayResample` function.

 ```sql
 SELECT groupArrayResample(30, 75, 30)(name, age) from people
@ -95,9 +95,9 @@ SELECT groupArrayResample(30, 75, 30)(name, age) from people

 Consider the results.

-`Jonh` is out of the sample because he is too young. Other people are distributed according to the specified age intervals.
+`Jonh` is out of the sample because he's too young. Other people are distributed according to the specified age intervals.

-Now, let's count the total number of people and their average wage in the specified age intervals.
+Now let's count the total number of people and their average wage in the specified age intervals.

 ```sql
 SELECT
--- a/docs/ru/query_language/agg_functions/combinators.md
+++ b/docs/ru/query_language/agg_functions/combinators.md
@ -46,4 +46,75 @@

 Преобразует агрегатную функцию для таблиц в агрегатную функцию для массивов, которая применяет агрегирование для соответствующих элементов массивов и возвращает массив результатов. Например, `sumForEach` для массивов `[1, 2]`, `[3, 4, 5]` и `[6, 7]` даст результат `[10, 13, 5]`, сложив соответственные элементы массивов.

+
+## -Resample {#agg_functions-combinator-resample}
+
+
+Позволяет поделить данные на группы, а затем по-отдельности агрегирует данные для этих групп. Группы образуются разбиением значений одного из столбцов на интервалы.
+
+```sql
+<aggFunction>Resample(start, end, step)(<aggFunction_params>, resampling_key)
+```
+
+**Параметры**
+
+- `start` — начальное значение для интервала значений `resampling_key`. 
+- `stop` — конечное значение для интервала значений `resampling_key`. Интервал не включает значение `stop` (`[start, stop)`).
+- `step` — шаг деления полного интервала на подинтервалы. Функция `aggFunction` выполняется для каждого из подинтервалов независимо.
+- `resampling_key` — столбец, значения которого используются для разделения данных на интервалы.
+- `aggFunction_params` — параметры `aggFunction`.
+
+
+**Возвращаемые значения**
+
+- Массив результатов `aggFunction` для каждого подинтервала.
+
+**Пример**
+
+
+Рассмотрим таблицу `people` со следующими данными:
+
+```text
+┌─name───┬─age─┬─wage─┐
+│ John   │  16 │   10 │
+│ Alice  │  30 │   15 │
+│ Mary   │  35 │    8 │
+│ Evelyn │  48 │ 11.5 │
+│ David  │  62 │  9.9 │
+│ Brian  │  60 │   16 │
+└────────┴─────┴──────┘
+```
+
+Получим имена людей, чей возраст находится в интервалах `[30,60)` и `[60,75)`. Поскольку мы используем целочисленное представление возраста, то интервалы будут выглядеть как `[30, 59]` и `[60,74]`.
+
+Чтобы собрать имена в массив, возьмём агрегатную функцию [groupArray](reference.md#agg_function-grouparray). Она принимает один аргумент. В нашем случае, это столбец `name`. Функция `groupArrayResample` должна использовать столбец `age` для агрегирования имён по возрасту. Чтобы определить необходимые интервалы, передадим в функцию `groupArrayResample` аргументы `30, 75, 30`.
+
+```sql
+SELECT groupArrayResample(30, 75, 30)(name, age) from people
+```
+```text
+┌─groupArrayResample(30, 75, 30)(name, age)─────┐
+│ [['Alice','Mary','Evelyn'],['David','Brian']] │
+└───────────────────────────────────────────────┘
+```
+
+Посмотрим на результаты.
+
+`Jonh` не попал в выдачу, поскольку слишком молод. Остальные распределены согласно заданным возрастным интервалам.
+
+Теперь посчитаем общее количество людей и их среднюю заработную плату в заданных возрастных интервалах.
+
+
+```sql
+SELECT
+    countResample(30, 75, 30)(name, age) AS amount,
+    avgResample(30, 75, 30)(wage, age) AS avg_wage
+FROM people
+```
+```text
+┌─amount─┬─avg_wage──────────────────┐
+│ [3,2]  │ [11.5,12.949999809265137] │
+└────────┴───────────────────────────┘
+```
+
 [Оригинальная статья](https://clickhouse.yandex/docs/ru/query_language/agg_functions/combinators/) <!--hide-->
--- a/docs/ru/query_language/agg_functions/reference.md
+++ b/docs/ru/query_language/agg_functions/reference.md
@ -661,7 +661,7 @@ uniqExact(x[, ...])
 - [uniqCombined](#agg_function-uniqcombined)
 - [uniqHLL12](#agg_function-uniqhll12)

-## groupArray(x), groupArray(max_size)(x)
+## groupArray(x), groupArray(max_size)(x) {#agg_function-grouparray}

 Составляет массив из значений аргумента.
 Значения в массив могут быть добавлены в любом (недетерминированном) порядке.