Merge pull request #21569 from lehasm/alexey-sm-DOCSUP-7099-translate-runningConcurrency

DOCSUP-7099: edit and translate (runningConcurrency, max_parallel_replicas)
2024-11-21 15:12:02 +00:00 · 2021-03-19 22:14:05 +03:00 · 2021-03-19 22:14:05 +03:00 · 328c2b4ca8
commit 328c2b4ca8
parent c611a6d47a e452596bc8
4 changed files with 117 additions and 41 deletions
--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@ -1097,14 +1097,25 @@ See the section “WITH TOTALS modifier”.

 ## max_parallel_replicas {#settings-max_parallel_replicas}

-The maximum number of replicas for each shard when executing a query. In limited circumstances, this can make a query faster by executing it on more servers. This setting is only useful for replicated tables with a sampling key. There are cases where performance will not improve or even worsen:
+The maximum number of replicas for each shard when executing a query.

- the position of the sampling key in the partitioning key's order doesn't allow efficient range scans
- adding a sampling key to the table makes filtering by other columns less efficient
- the sampling key is an expression that is expensive to calculate
- the cluster's latency distribution has a long tail, so that querying more servers increases the query's overall latency
+Possible values:

-In addition, this setting will produce incorrect results when joins or subqueries are involved, and all tables don't meet certain conditions. See [Distributed Subqueries and max_parallel_replicas](../../sql-reference/operators/in.md#max_parallel_replica-subqueries) for more details.
+-   Positive integer.
+
+Default value: `1`.
+
+**Additional Info** 
+
+This setting is useful for replicated tables with a sampling key. A query may be processed faster if it is executed on several servers in parallel. But the query performance may degrade in the following cases:
+
+- The position of the sampling key in the partitioning key doesn't allow efficient range scans.
+- Adding a sampling key to the table makes filtering by other columns less efficient.
+- The sampling key is an expression that is expensive to calculate.
+- The cluster latency distribution has a long tail, so that querying more servers increases the query overall latency.
+
+!!! warning "Warning"
+    This setting will produce incorrect results when joins or subqueries are involved, and all tables don't meet certain requirements. See [Distributed Subqueries and max_parallel_replicas](../../sql-reference/operators/in.md#max_parallel_replica-subqueries) for more details.

 ## compile {#compile}

--- a/docs/en/sql-reference/functions/other-functions.md
+++ b/docs/en/sql-reference/functions/other-functions.md
@ -907,66 +907,64 @@ WHERE diff != 1

 ## runningDifferenceStartingWithFirstValue {#runningdifferencestartingwithfirstvalue}

-Same as for [runningDifference](../../sql-reference/functions/other-functions.md#other_functions-runningdifference), the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.
+Same as for [runningDifference](./other-functions.md#other_functions-runningdifference), the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.

 ## runningConcurrency {#runningconcurrency}

-Given a series of beginning time and ending time of events, this function calculates concurrency of the events at each of the data point, that is, the beginning time.
+Calculates the number of concurrent events.
+Each event has a start time and an end time. The start time is included in the event, while the end time is excluded. Columns with a start time and an end time must be of the same data type. 
+The function calculates the total number of active (concurrent) events for each event start time.
+

 !!! warning "Warning"
-    Events spanning multiple data blocks will not be processed correctly. The function resets its state for each new data block.
-
-The result of the function depends on the order of data in the block. It assumes the beginning time is sorted in ascending order.
+    Events must be ordered by the start time in ascending order. If this requirement is violated the function raises an exception.
+    Every data block is processed separately. If events from different data blocks overlap then they can not be processed correctly.

 **Syntax**

 ``` sql
-runningConcurrency(begin, end)
+runningConcurrency(start, end)
 ```

 **Arguments**

-   `begin` — A column for the beginning time of events (inclusive). [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md), or [DateTime64](../../sql-reference/data-types/datetime64.md).
-   `end` — A column for the ending time of events (exclusive).  [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md), or [DateTime64](../../sql-reference/data-types/datetime64.md).
-
-Note that two columns `begin` and `end` must have the same type.
+-   `start` — A column with the start time of events. [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md), or [DateTime64](../../sql-reference/data-types/datetime64.md).
+-   `end` — A column with the end time of events.  [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md), or [DateTime64](../../sql-reference/data-types/datetime64.md).

 **Returned values**

-   The concurrency of events at the data point.
+-   The number of concurrent events at each event start time.

 Type: [UInt32](../../sql-reference/data-types/int-uint.md)

 **Example**

-Input table:
+Consider the table:

 ``` text
-┌───────────────begin─┬─────────────────end─┐
-│ 2020-12-01 00:00:00 │ 2020-12-01 00:59:59 │
-│ 2020-12-01 00:30:00 │ 2020-12-01 00:59:59 │
-│ 2020-12-01 00:40:00 │ 2020-12-01 01:30:30 │
-│ 2020-12-01 01:10:00 │ 2020-12-01 01:30:30 │
-│ 2020-12-01 01:50:00 │ 2020-12-01 01:59:59 │
-└─────────────────────┴─────────────────────┘
+┌──────start─┬────────end─┐
+│ 2021-03-03 │ 2021-03-11 │
+│ 2021-03-06 │ 2021-03-12 │
+│ 2021-03-07 │ 2021-03-08 │
+│ 2021-03-11 │ 2021-03-12 │
+└────────────┴────────────┘
 ```

 Query:

 ``` sql
-SELECT runningConcurrency(begin, end) FROM example
+SELECT start, runningConcurrency(start, end) FROM example_table;
 ```

 Result:

 ``` text
-┌─runningConcurrency(begin, end)─┐
-│                              1 │
-│                              2 │
-│                              3 │
-│                              2 │
-│                              1 │
-└────────────────────────────────┘
+┌──────start─┬─runningConcurrency(start, end)─┐
+│ 2021-03-03 │                              1 │
+│ 2021-03-06 │                              2 │
+│ 2021-03-07 │                              3 │
+│ 2021-03-11 │                              2 │
+└────────────┴────────────────────────────────┘
 ```

 ## MACNumToString(num) {#macnumtostringnum}
--- a/docs/ru/operations/settings/settings.md
+++ b/docs/ru/operations/settings/settings.md
@ -1086,14 +1086,24 @@ load_balancing = round_robin

 ## max_parallel_replicas {#settings-max_parallel_replicas}

-Максимальное кол-во реплик для каждого шарда во время исполениня запроса из distributed. В некоторых случаях, это может привести к более быстрому исполнению запроса за счет выполнения на большем кол-ве серверов. Эта настройка полезна только для реплицируемых таблиц созданных с использованием SAMPLING KEY выражения. Есть случаи когда производительность не улучшится или даже ухудшится: 
+Максимальное количество используемых реплик каждого шарда при выполнении запроса.

- Позиция ключа семплирования в ключе партицирования не позволяет делать эффективные сканирования по диапозонам
- Добавление семплирующего ключа к таблице, делает фильтрацию других колонок менее эффективной
- Выражение используемое для вычисления ключа семплирования требует больших вычислительных затрат
- Распределение сетевых задержек внутри кластера имеет длинный хвост, так что запрос большего количества серверов может увеличить общую задержку запроса
+Возможные значения:
+
+-   Целое положительное число.
+
+**Дополнительная информация**
+
+Эта настройка полезна для реплицируемых таблиц с ключом сэмплирования. Запрос может обрабатываться быстрее, если он выполняется на нескольких серверах параллельно. Однако производительность обработки запроса, наоборот, может упасть в следующих ситуациях:
+
+- Позиция ключа сэмплирования в ключе партиционирования не позволяет выполнять эффективное сканирование.
+- Добавление ключа сэмплирования в таблицу делает фильтрацию по другим столбцам менее эффективной.
+- Ключ сэмплирования является выражением, которое сложно вычисляется.
+- У распределения сетевых задержек в кластере длинный «хвост», из-за чего при параллельных запросах к нескольким серверам увеличивается среднее время задержки.
+
+!!! warning "Предупреждение"
+    Параллельное выполнение запроса может привести к неверному результату, если в запросе есть объединение или подзапросы и при этом таблицы не удовлетворяют определенным требованиям. Подробности смотрите в разделе [Распределенные подзапросы и max_parallel_replicas](../../sql-reference/operators/in.md#max_parallel_replica-subqueries).

-Кроме того, эта настройка может привести к некорректным результатам когда используются join или подзапросы и все таблицы не соответсвуют определенным условиям. Подробнее [Распределенные подзапросы и max_parallel_replicas](../../sql-reference/operators/in.md#max_parallel_replica-subqueries) for more details.

 ## compile {#compile}

--- a/docs/ru/sql-reference/functions/other-functions.md
+++ b/docs/ru/sql-reference/functions/other-functions.md
@ -772,7 +772,7 @@ FROM numbers(16)
 └────────────┴───────┴───────────┴────────────────┘
 ```

-## runningDifference(x) {#runningdifferencex}
+## runningDifference(x) {#other_functions-runningdifference}

 Считает разницу между последовательными значениями строк в блоке данных.
 Возвращает 0 для первой строки и разницу с предыдущей строкой для каждой последующей строки.
@ -849,7 +849,64 @@ WHERE diff != 1

 ## runningDifferenceStartingWithFirstValue {#runningdifferencestartingwithfirstvalue}

-То же, что и \[runningDifference\] (./other_functions.md # other_functions-runningdifference), но в первой строке возвращается значение первой строки, а не ноль.
+То же, что и [runningDifference](./other-functions.md#other_functions-runningdifference), но в первой строке возвращается значение первой строки, а не ноль.
+
+## runningConcurrency {#runningconcurrency}
+
+Подсчитывает количество одновременно идущих событий.
+У каждого события есть время начала и время окончания. Считается, что время начала включено в событие, а время окончания исключено из него. Столбцы со временем начала и окончания событий должны иметь одинаковый тип данных. 
+Функция подсчитывает количество событий, происходящих одновременно на момент начала каждого из событий в выборке. 
+
+!!! warning "Предупреждение"
+    События должны быть отсортированы по возрастанию времени начала. Если это требование нарушено, то функция вызывает исключение.
+    Каждый блок данных обрабатывается независимо. Если события из разных блоков данных накладываются по времени, они не могут быть корректно обработаны.
+
+**Синтаксис**
+
+``` sql
+runningConcurrency(start, end)
+```
+
+**Аргументы**
+
+-   `start` — Столбец с временем начала событий. [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md) или [DateTime64](../../sql-reference/data-types/datetime64.md).
+-   `end` — Столбец с временем окончания событий.  [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md) или [DateTime64](../../sql-reference/data-types/datetime64.md).
+
+**Возвращаемое значение**
+
+-   Количество одновременно идущих событий на момент начала каждого события.
+
+Тип: [UInt32](../../sql-reference/data-types/int-uint.md)
+
+**Пример**
+
+Рассмотрим таблицу:
+
+``` text
+┌──────start─┬────────end─┐
+│ 2021-03-03 │ 2021-03-11 │
+│ 2021-03-06 │ 2021-03-12 │
+│ 2021-03-07 │ 2021-03-08 │
+│ 2021-03-11 │ 2021-03-12 │
+└────────────┴────────────┘
+```
+
+Запрос:
+
+``` sql
+SELECT start, runningConcurrency(start, end) FROM example_table;
+```
+
+Результат:
+
+``` text
+┌──────start─┬─runningConcurrency(start, end)─┐
+│ 2021-03-03 │                              1 │
+│ 2021-03-06 │                              2 │
+│ 2021-03-07 │                              3 │
+│ 2021-03-11 │                              2 │
+└────────────┴────────────────────────────────┘
+```

 ## MACNumToString(num) {#macnumtostringnum}