2020-04-03 13:23:32 +00:00
---
toc_priority: 38
2020-05-15 04:34:54 +00:00
toc_title: Parametric
2020-04-03 13:23:32 +00:00
---
# Parametric Aggregate Functions {#aggregate_functions_parametric}
2017-12-28 15:13:23 +00:00
Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments.
2020-03-20 10:10:48 +00:00
## histogram {#histogram}
2019-07-23 08:01:08 +00:00
2021-05-27 19:44:11 +00:00
Calculates an adaptive histogram. It does not guarantee precise results.
2019-07-23 08:01:08 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-07-23 08:01:08 +00:00
histogram(number_of_bins)(values)
```
2020-03-20 10:10:48 +00:00
2019-09-23 23:59:49 +00:00
The functions uses [A Streaming Parallel Decision Tree Algorithm ](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf ). The borders of histogram bins are adjusted as new data enters a function. In common case, the widths of bins are not equal.
2019-07-31 05:55:10 +00:00
2021-02-15 21:33:53 +00:00
**Arguments**
`values` — [Expression ](../../sql-reference/syntax.md#syntax-expressions ) resulting in input values.
2019-07-23 08:01:08 +00:00
**Parameters**
2019-09-23 23:59:49 +00:00
`number_of_bins` — Upper limit for the number of bins in the histogram. The function automatically calculates the number of bins. It tries to reach the specified number of bins, but if it fails, it uses fewer bins.
2019-07-23 08:01:08 +00:00
**Returned values**
2020-04-30 18:19:18 +00:00
- [Array ](../../sql-reference/data-types/array.md ) of [Tuples ](../../sql-reference/data-types/tuple.md ) of the following format:
2019-07-23 08:01:08 +00:00
2020-03-21 04:11:51 +00:00
```
[(lower_1, upper_1, height_1), ... (lower_N, upper_N, height_N)]
```
2019-07-23 08:01:08 +00:00
2020-03-21 04:11:51 +00:00
- `lower` — Lower bound of the bin.
- `upper` — Upper bound of the bin.
- `height` — Calculated height of the bin.
2019-07-23 08:01:08 +00:00
**Example**
2020-03-20 10:10:48 +00:00
``` sql
SELECT histogram(5)(number + 1)
2019-08-20 15:36:08 +00:00
FROM (
2020-03-20 10:10:48 +00:00
SELECT *
FROM system.numbers
2019-08-20 15:36:08 +00:00
LIMIT 20
)
2019-07-23 08:01:08 +00:00
```
2020-03-20 10:10:48 +00:00
``` text
2019-07-23 08:01:08 +00:00
┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐
│ [(1,4.5,4),(4.5,8.5,4),(8.5,12.75,4.125),(12.75,17,4.625),(17,20,3.25)] │
└─────────────────────────────────────────────────────────────────────────┘
```
2020-04-30 18:19:18 +00:00
You can visualize a histogram with the [bar ](../../sql-reference/functions/other-functions.md#function-bar ) function, for example:
2019-08-20 15:36:08 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-08-20 15:36:08 +00:00
WITH histogram(5)(rand() % 100) AS hist
2020-03-20 10:10:48 +00:00
SELECT
arrayJoin(hist).3 AS height,
2019-08-20 15:36:08 +00:00
bar(height, 0, 6, 5) AS bar
2020-03-20 10:10:48 +00:00
FROM
2019-08-20 15:36:08 +00:00
(
SELECT *
FROM system.numbers
LIMIT 20
)
```
2020-03-20 10:10:48 +00:00
``` text
2019-08-20 15:36:08 +00:00
┌─height─┬─bar───┐
│ 2.125 │ █▋ │
│ 3.25 │ ██▌ │
│ 5.625 │ ████▏ │
│ 5.625 │ ████▏ │
│ 3.375 │ ██▌ │
└────────┴───────┘
```
2021-05-27 19:44:11 +00:00
In this case, you should remember that you do not know the histogram bin borders.
2019-08-20 15:36:08 +00:00
2020-03-20 10:10:48 +00:00
## sequenceMatch(pattern)(timestamp, cond1, cond2, …) {#function-sequencematch}
2017-12-28 15:13:23 +00:00
2019-10-14 11:03:23 +00:00
Checks whether the sequence contains an event chain that matches the pattern.
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-09-24 23:19:14 +00:00
sequenceMatch(pattern)(timestamp, cond1, cond2, ...)
```
!!! warning "Warning"
Events that occur at the same second may lay in the sequence in an undefined order affecting the result.
2021-02-15 21:33:53 +00:00
**Arguments**
2019-09-24 23:19:14 +00:00
2020-04-30 18:19:18 +00:00
- `timestamp` — Column considered to contain time data. Typical data types are `Date` and `DateTime` . You can also use any of the supported [UInt ](../../sql-reference/data-types/int-uint.md ) data types.
2019-09-24 23:19:14 +00:00
2020-03-21 04:11:51 +00:00
- `cond1` , `cond2` — Conditions that describe the chain of events. Data type: `UInt8` . You can pass up to 32 condition arguments. The function takes only the events described in these conditions into account. If the sequence contains data that isn’ t described in a condition, the function skips them.
2019-09-24 23:19:14 +00:00
2021-02-15 21:33:53 +00:00
**Parameters**
- `pattern` — Pattern string. See [Pattern syntax ](#sequence-function-pattern-syntax ).
2019-09-24 23:19:14 +00:00
**Returned values**
2020-03-21 04:11:51 +00:00
- 1, if the pattern is matched.
- 0, if the pattern isn’ t matched.
2019-09-24 23:19:14 +00:00
Type: `UInt8` .
< a name = "sequence-function-pattern-syntax" > < / a >
**Pattern syntax**
2020-03-21 04:11:51 +00:00
- `(?N)` — Matches the condition argument at position `N` . Conditions are numbered in the `[1, 32]` range. For example, `(?1)` matches the argument passed to the `cond1` parameter.
2019-09-24 23:19:14 +00:00
2021-05-27 19:44:11 +00:00
- `.*` — Matches any number of events. You do not need conditional arguments to match this element of the pattern.
2019-09-24 23:19:14 +00:00
2021-06-15 16:58:20 +00:00
- `(?t operator value)` — Sets the time in seconds that should separate two events. For example, pattern `(?1)(?t>1800)(?2)` matches events that occur more than 1800 seconds from each other. An arbitrary number of any events can lay between these events. You can use the `>=` , `>` , `<` , `<=` , `==` operators.
2019-09-24 23:19:14 +00:00
**Examples**
Consider data in the `t` table:
2020-03-20 10:10:48 +00:00
``` text
2019-09-24 23:19:14 +00:00
┌─time─┬─number─┐
│ 1 │ 1 │
│ 2 │ 3 │
│ 3 │ 2 │
└──────┴────────┘
```
Perform the query:
2020-03-20 10:10:48 +00:00
``` sql
2019-09-24 23:19:14 +00:00
SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2) FROM t
```
2020-03-20 10:10:48 +00:00
``` text
2019-09-24 23:19:14 +00:00
┌─sequenceMatch('(?1)(?2)')(time, equals(number, 1), equals(number, 2))─┐
│ 1 │
└───────────────────────────────────────────────────────────────────────┘
```
2017-12-28 15:13:23 +00:00
2019-10-14 11:03:23 +00:00
The function found the event chain where number 2 follows number 1. It skipped number 3 between them, because the number is not described as an event. If we want to take this number into account when searching for the event chain given in the example, we should make a condition for it.
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-09-24 23:19:14 +00:00
SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2, number = 3) FROM t
```
2020-03-20 10:10:48 +00:00
``` text
2019-09-24 23:19:14 +00:00
┌─sequenceMatch('(?1)(?2)')(time, equals(number, 1), equals(number, 2), equals(number, 3))─┐
│ 0 │
└──────────────────────────────────────────────────────────────────────────────────────────┘
```
2017-12-28 15:13:23 +00:00
2021-08-26 12:42:18 +00:00
In this case, the function couldn’ t find the event chain matching the pattern, because the event for number 3 occurred between 1 and 2. If in the same case we checked the condition for number 4, the sequence would match the pattern.
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-09-24 23:19:14 +00:00
SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2, number = 4) FROM t
```
2020-03-20 10:10:48 +00:00
``` text
2019-09-24 23:19:14 +00:00
┌─sequenceMatch('(?1)(?2)')(time, equals(number, 1), equals(number, 2), equals(number, 4))─┐
│ 1 │
└──────────────────────────────────────────────────────────────────────────────────────────┘
```
**See Also**
2020-03-21 04:11:51 +00:00
- [sequenceCount ](#function-sequencecount )
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
## sequenceCount(pattern)(time, cond1, cond2, …) {#function-sequencecount}
2017-12-28 15:13:23 +00:00
2021-05-27 19:44:11 +00:00
Counts the number of event chains that matched the pattern. The function searches event chains that do not overlap. It starts to search for the next chain after the current chain is matched.
2019-09-24 23:19:14 +00:00
!!! warning "Warning"
Events that occur at the same second may lay in the sequence in an undefined order affecting the result.
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-09-24 23:19:14 +00:00
sequenceCount(pattern)(timestamp, cond1, cond2, ...)
2017-12-28 15:13:23 +00:00
```
2021-02-15 21:33:53 +00:00
**Arguments**
2019-09-24 23:19:14 +00:00
2020-04-30 18:19:18 +00:00
- `timestamp` — Column considered to contain time data. Typical data types are `Date` and `DateTime` . You can also use any of the supported [UInt ](../../sql-reference/data-types/int-uint.md ) data types.
2019-09-24 23:19:14 +00:00
2020-03-21 04:11:51 +00:00
- `cond1` , `cond2` — Conditions that describe the chain of events. Data type: `UInt8` . You can pass up to 32 condition arguments. The function takes only the events described in these conditions into account. If the sequence contains data that isn’ t described in a condition, the function skips them.
2017-12-28 15:13:23 +00:00
2021-02-15 21:33:53 +00:00
**Parameters**
- `pattern` — Pattern string. See [Pattern syntax ](#sequence-function-pattern-syntax ).
2019-09-24 23:19:14 +00:00
**Returned values**
2017-12-28 15:13:23 +00:00
2020-03-21 04:11:51 +00:00
- Number of non-overlapping event chains that are matched.
2017-12-28 15:13:23 +00:00
2019-09-24 23:19:14 +00:00
Type: `UInt64` .
2017-12-28 15:13:23 +00:00
2019-09-24 23:19:14 +00:00
**Example**
2017-12-28 15:13:23 +00:00
2019-09-24 23:19:14 +00:00
Consider data in the `t` table:
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
``` text
2019-09-24 23:19:14 +00:00
┌─time─┬─number─┐
│ 1 │ 1 │
│ 2 │ 3 │
│ 3 │ 2 │
│ 4 │ 1 │
│ 5 │ 3 │
│ 6 │ 2 │
└──────┴────────┘
```
Count how many times the number 2 occurs after the number 1 with any amount of other numbers between them:
2020-03-20 10:10:48 +00:00
``` sql
2019-09-24 23:19:14 +00:00
SELECT sequenceCount('(?1).*(?2)')(time, number = 1, number = 2) FROM t
```
2020-03-20 10:10:48 +00:00
``` text
2019-09-24 23:19:14 +00:00
┌─sequenceCount('(?1).*(?2)')(time, equals(number, 1), equals(number, 2))─┐
│ 2 │
└─────────────────────────────────────────────────────────────────────────┘
```
2017-12-28 15:13:23 +00:00
2019-09-24 23:19:14 +00:00
**See Also**
2017-12-28 15:13:23 +00:00
2020-03-21 04:11:51 +00:00
- [sequenceMatch ](#function-sequencematch )
2017-12-28 15:13:23 +00:00
2020-03-18 18:43:51 +00:00
## windowFunnel {#windowfunnel}
2018-05-13 08:18:35 +00:00
2018-09-04 11:18:59 +00:00
Searches for event chains in a sliding time window and calculates the maximum number of events that occurred from the chain.
2018-05-13 08:18:35 +00:00
2019-12-25 20:55:07 +00:00
The function works according to the algorithm:
2018-09-04 11:18:59 +00:00
2020-03-21 04:11:51 +00:00
- The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts.
2019-12-25 20:55:07 +00:00
2020-03-21 04:11:51 +00:00
- If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter isn’ t incremented.
2019-12-25 20:55:07 +00:00
2020-03-21 04:11:51 +00:00
- If the data has multiple event chains at varying points of completion, the function will only output the size of the longest chain.
2018-09-04 11:18:59 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2019-12-25 20:55:07 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2021-03-23 11:44:10 +00:00
windowFunnel(window, [mode, [mode, ... ]])(timestamp, cond1, cond2, ..., condN)
2019-12-25 20:55:07 +00:00
```
2021-02-15 21:33:53 +00:00
**Arguments**
- `timestamp` — Name of the column containing the timestamp. Data types supported: [Date ](../../sql-reference/data-types/date.md ), [DateTime ](../../sql-reference/data-types/datetime.md#data_type-datetime ) and other unsigned integer types (note that even though timestamp supports the `UInt64` type, it’ s value can’ t exceed the Int64 maximum, which is 2^63 - 1).
- `cond` — Conditions or data describing the chain of events. [UInt8 ](../../sql-reference/data-types/int-uint.md ).
2019-12-25 20:55:07 +00:00
**Parameters**
2021-05-14 11:07:11 +00:00
- `window` — Length of the sliding window, it is the time interval between the first and the last condition. The unit of `window` depends on the `timestamp` itself and varies. Determined using the expression `timestamp of cond1 <= timestamp of cond2 <= ... <= timestamp of condN <= timestamp of cond1 + window` .
2021-03-23 11:44:10 +00:00
- `mode` — It is an optional argument. One or more modes can be set.
2021-08-20 17:38:13 +00:00
- `'strict_deduplication'` — If the same condition holds for the sequence of events, then such repeating event interrupts further processing.
2021-03-23 11:44:10 +00:00
- `'strict_order'` — Don't allow interventions of other events. E.g. in the case of `A->B->D->C` , it stops finding `A->B->C` at the `D` and the max event level is 2.
- `'strict_increase'` — Apply conditions only to events with strictly increasing timestamps.
2019-12-25 20:55:07 +00:00
2018-09-04 11:18:59 +00:00
**Returned value**
2019-12-25 20:55:07 +00:00
The maximum number of consecutive triggered conditions from the chain within the sliding time window.
All the chains in the selection are analyzed.
Type: `Integer` .
2018-05-13 08:18:35 +00:00
2018-09-04 11:18:59 +00:00
**Example**
2018-05-13 08:18:35 +00:00
2019-12-25 20:55:07 +00:00
Determine if a set period of time is enough for the user to select a phone and purchase it twice in the online store.
2018-05-13 08:18:35 +00:00
2018-09-04 11:18:59 +00:00
Set the following chain of events:
2020-03-20 10:10:48 +00:00
1. The user logged in to their account on the store (`eventID = 1003`).
2. The user searches for a phone (`eventID = 1007, product = 'phone'`).
3. The user placed an order (`eventID = 1009`).
4. The user made the order again (`eventID = 1010`).
2018-09-04 11:18:59 +00:00
2019-12-25 20:55:07 +00:00
Input table:
2020-03-20 10:10:48 +00:00
``` text
2019-12-25 20:55:07 +00:00
┌─event_date─┬─user_id─┬───────────timestamp─┬─eventID─┬─product─┐
│ 2019-01-28 │ 1 │ 2019-01-29 10:00:00 │ 1003 │ phone │
└────────────┴─────────┴─────────────────────┴─────────┴─────────┘
┌─event_date─┬─user_id─┬───────────timestamp─┬─eventID─┬─product─┐
│ 2019-01-31 │ 1 │ 2019-01-31 09:00:00 │ 1007 │ phone │
└────────────┴─────────┴─────────────────────┴─────────┴─────────┘
┌─event_date─┬─user_id─┬───────────timestamp─┬─eventID─┬─product─┐
│ 2019-01-30 │ 1 │ 2019-01-30 08:00:00 │ 1009 │ phone │
└────────────┴─────────┴─────────────────────┴─────────┴─────────┘
┌─event_date─┬─user_id─┬───────────timestamp─┬─eventID─┬─product─┐
│ 2019-02-01 │ 1 │ 2019-02-01 08:00:00 │ 1010 │ phone │
└────────────┴─────────┴─────────────────────┴─────────┴─────────┘
```
Find out how far the user `user_id` could get through the chain in a period in January-February of 2019.
2018-09-04 11:18:59 +00:00
2019-12-25 20:55:07 +00:00
Query:
2018-05-13 08:18:35 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2018-05-13 08:18:35 +00:00
SELECT
level,
count() AS c
FROM
(
SELECT
user_id,
2019-12-25 20:55:07 +00:00
windowFunnel(6048000000000000)(timestamp, eventID = 1003, eventID = 1009, eventID = 1007, eventID = 1010) AS level
FROM trend
WHERE (event_date >= '2019-01-01') AND (event_date < = '2019-02-02')
2018-05-13 08:18:35 +00:00
GROUP BY user_id
)
GROUP BY level
2021-05-04 10:19:45 +00:00
ORDER BY level ASC;
2018-05-13 08:18:35 +00:00
```
2019-12-25 20:55:07 +00:00
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-12-25 20:55:07 +00:00
┌─level─┬─c─┐
│ 4 │ 1 │
└───────┴───┘
```
2018-05-13 08:18:35 +00:00
2020-03-18 18:43:51 +00:00
## retention {#retention}
2018-08-16 03:11:35 +00:00
2019-12-07 18:20:08 +00:00
The function takes as arguments a set of conditions from 1 to 32 arguments of type `UInt8` that indicate whether a certain condition was met for the event.
2020-05-15 04:34:54 +00:00
Any condition can be specified as an argument (as in [WHERE ](../../sql-reference/statements/select/where.md#select-where )).
2018-08-16 03:11:35 +00:00
2020-06-12 11:03:01 +00:00
The conditions, except the first, apply in pairs: the result of the second will be true if the first and second are true, of the third if the first and third are true, etc.
2018-08-16 03:11:35 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2018-08-16 03:11:35 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-12-07 18:20:08 +00:00
retention(cond1, cond2, ..., cond32);
```
2021-02-15 21:33:53 +00:00
**Arguments**
2019-12-07 18:20:08 +00:00
2021-03-13 18:18:45 +00:00
- `cond` — An expression that returns a `UInt8` result (1 or 0).
2019-12-07 18:20:08 +00:00
**Returned value**
The array of 1 or 0.
2021-03-13 18:18:45 +00:00
- 1 — Condition was met for the event.
- 0 — Condition wasn’ t met for the event.
2019-12-07 18:20:08 +00:00
Type: `UInt8` .
**Example**
2020-03-20 10:10:48 +00:00
Let’ s consider an example of calculating the `retention` function to determine site traffic.
2019-12-07 18:20:08 +00:00
**1.** С reate a table to illustrate an example.
2020-03-20 10:10:48 +00:00
``` sql
2020-02-02 22:46:43 +00:00
CREATE TABLE retention_test(date Date, uid Int32) ENGINE = Memory;
2019-12-07 18:20:08 +00:00
INSERT INTO retention_test SELECT '2020-01-01', number FROM numbers(5);
INSERT INTO retention_test SELECT '2020-01-02', number FROM numbers(10);
INSERT INTO retention_test SELECT '2020-01-03', number FROM numbers(15);
```
Input table:
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-12-07 18:20:08 +00:00
SELECT * FROM retention_test
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-12-07 18:20:08 +00:00
┌───────date─┬─uid─┐
│ 2020-01-01 │ 0 │
│ 2020-01-01 │ 1 │
│ 2020-01-01 │ 2 │
│ 2020-01-01 │ 3 │
│ 2020-01-01 │ 4 │
└────────────┴─────┘
┌───────date─┬─uid─┐
│ 2020-01-02 │ 0 │
│ 2020-01-02 │ 1 │
│ 2020-01-02 │ 2 │
│ 2020-01-02 │ 3 │
│ 2020-01-02 │ 4 │
│ 2020-01-02 │ 5 │
│ 2020-01-02 │ 6 │
│ 2020-01-02 │ 7 │
│ 2020-01-02 │ 8 │
│ 2020-01-02 │ 9 │
└────────────┴─────┘
┌───────date─┬─uid─┐
│ 2020-01-03 │ 0 │
│ 2020-01-03 │ 1 │
│ 2020-01-03 │ 2 │
│ 2020-01-03 │ 3 │
│ 2020-01-03 │ 4 │
│ 2020-01-03 │ 5 │
│ 2020-01-03 │ 6 │
│ 2020-01-03 │ 7 │
│ 2020-01-03 │ 8 │
│ 2020-01-03 │ 9 │
│ 2020-01-03 │ 10 │
│ 2020-01-03 │ 11 │
│ 2020-01-03 │ 12 │
│ 2020-01-03 │ 13 │
│ 2020-01-03 │ 14 │
└────────────┴─────┘
```
**2.** Group users by unique ID `uid` using the `retention` function.
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-12-07 18:20:08 +00:00
SELECT
uid,
retention(date = '2020-01-01', date = '2020-01-02', date = '2020-01-03') AS r
FROM retention_test
WHERE date IN ('2020-01-01', '2020-01-02', '2020-01-03')
GROUP BY uid
ORDER BY uid ASC
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-12-07 18:20:08 +00:00
┌─uid─┬─r───────┐
│ 0 │ [1,1,1] │
│ 1 │ [1,1,1] │
│ 2 │ [1,1,1] │
│ 3 │ [1,1,1] │
│ 4 │ [1,1,1] │
│ 5 │ [0,0,0] │
│ 6 │ [0,0,0] │
│ 7 │ [0,0,0] │
│ 8 │ [0,0,0] │
│ 9 │ [0,0,0] │
│ 10 │ [0,0,0] │
│ 11 │ [0,0,0] │
│ 12 │ [0,0,0] │
│ 13 │ [0,0,0] │
│ 14 │ [0,0,0] │
└─────┴─────────┘
```
2020-03-20 10:10:48 +00:00
**3.** Calculate the total number of site visits per day.
2019-12-07 18:20:08 +00:00
Query:
2018-08-16 03:11:35 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2018-09-04 11:18:59 +00:00
SELECT
sum(r[1]) AS r1,
sum(r[2]) AS r2,
2018-08-16 03:11:35 +00:00
sum(r[3]) AS r3
2018-09-04 11:18:59 +00:00
FROM
2018-08-16 03:11:35 +00:00
(
2018-09-04 11:18:59 +00:00
SELECT
2019-02-04 13:30:28 +00:00
uid,
2019-12-07 18:20:08 +00:00
retention(date = '2020-01-01', date = '2020-01-02', date = '2020-01-03') AS r
FROM retention_test
WHERE date IN ('2020-01-01', '2020-01-02', '2020-01-03')
2018-08-16 03:11:35 +00:00
GROUP BY uid
2018-09-04 11:18:59 +00:00
)
2018-08-16 03:11:35 +00:00
```
2019-12-07 18:20:08 +00:00
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-12-07 18:20:08 +00:00
┌─r1─┬─r2─┬─r3─┐
│ 5 │ 5 │ 5 │
└────┴────┴────┘
```
Where:
2018-08-16 03:11:35 +00:00
2020-03-21 04:11:51 +00:00
- `r1` - the number of unique visitors who visited the site during 2020-01-01 (the `cond1` condition).
- `r2` - the number of unique visitors who visited the site during a specific time period between 2020-01-01 and 2020-01-02 (`cond1` and `cond2` conditions).
- `r3` - the number of unique visitors who visited the site during a specific time period between 2020-01-01 and 2020-01-03 (`cond1` and `cond3` conditions).
2018-05-13 08:18:35 +00:00
2020-03-20 10:10:48 +00:00
## uniqUpTo(N)(x) {#uniquptonx}
2017-12-28 15:13:23 +00:00
Calculates the number of different argument values if it is less than or equal to N. If the number of different argument values is greater than N, it returns N + 1.
Recommended for use with small Ns, up to 10. The maximum value of N is 100.
For the state of an aggregate function, it uses the amount of memory equal to 1 + N \* the size of one value of bytes.
For strings, it stores a non-cryptographic hash of 8 bytes. That is, the calculation is approximated for strings.
The function also works for several arguments.
It works as fast as possible, except for cases when a large N value is used and the number of unique values is slightly less than N.
Usage example:
2020-03-20 10:10:48 +00:00
``` text
2017-12-28 15:13:23 +00:00
Problem: Generate a report that shows only keywords that produced at least 5 unique users.
Solution: Write in the GROUP BY query SearchPhrase HAVING uniqUpTo(4)(UserID) >= 5
```
2018-10-16 10:47:17 +00:00
2019-01-22 16:47:43 +00:00
2020-10-13 17:23:29 +00:00
## sumMapFiltered(keys_to_keep)(keys, values) {#summapfilteredkeys-to-keepkeys-values}
2019-01-22 16:47:43 +00:00
2020-06-18 08:24:31 +00:00
Same behavior as [sumMap ](../../sql-reference/aggregate-functions/reference/summap.md#agg_functions-summap ) except that an array of keys is passed as a parameter. This can be especially useful when working with a high cardinality of keys.
2021-01-28 23:51:57 +00:00
## sequenceNextNode {#sequenceNextNode}
2021-06-13 12:20:48 +00:00
Returns a value of the next event that matched an event chain.
2021-01-28 23:51:57 +00:00
2021-04-08 09:10:41 +00:00
_Experimental function, `SET allow_experimental_funnel_functions = 1` to enable it._
2021-01-28 23:51:57 +00:00
**Syntax**
``` sql
2021-04-08 09:10:41 +00:00
sequenceNextNode(direction, base)(timestamp, event_column, base_condition, event1, event2, event3, ...)
2021-01-28 23:51:57 +00:00
```
**Parameters**
2021-06-13 12:20:48 +00:00
- `direction` — Used to navigate to directions.
- forward — Moving forward.
- backward — Moving backward.
- `base` — Used to set the base point.
- head — Set the base point to the first event.
- tail — Set the base point to the last event.
2021-06-14 12:39:53 +00:00
- first_match — Set the base point to the first matched `event1` .
- last_match — Set the base point to the last matched `event1` .
2021-07-29 15:27:50 +00:00
2021-02-24 11:53:24 +00:00
**Arguments**
2021-06-13 12:20:48 +00:00
- `timestamp` — Name of the column containing the timestamp. Data types supported: [Date ](../../sql-reference/data-types/date.md ), [DateTime ](../../sql-reference/data-types/datetime.md#data_type-datetime ) and other unsigned integer types.
- `event_column` — Name of the column containing the value of the next event to be returned. Data types supported: [String ](../../sql-reference/data-types/string.md ) and [Nullable(String) ](../../sql-reference/data-types/nullable.md ).
2021-04-08 09:10:41 +00:00
- `base_condition` — Condition that the base point must fulfill.
2021-06-15 07:27:38 +00:00
- `event1` , `event2` , ... — Conditions describing the chain of events. [UInt8 ](../../sql-reference/data-types/int-uint.md ).
2021-01-28 23:51:57 +00:00
2021-06-13 12:20:48 +00:00
**Returned values**
2021-01-28 23:51:57 +00:00
2021-06-13 12:20:48 +00:00
- `event_column[next_index]` — If the pattern is matched and next value exists.
- `NULL` - If the pattern isn’ t matched or next value doesn't exist.
Type: [Nullable(String) ](../../sql-reference/data-types/nullable.md ).
2021-01-28 23:51:57 +00:00
**Example**
2021-06-14 12:54:27 +00:00
It can be used when events are A->B->C->D->E and you want to know the event following B->C, which is D.
2021-01-28 23:51:57 +00:00
2021-06-13 12:20:48 +00:00
The query statement searching the event following A->B:
2021-01-28 23:51:57 +00:00
``` sql
CREATE TABLE test_flow (
2021-07-29 15:20:55 +00:00
dt DateTime,
id int,
2021-03-10 11:22:44 +00:00
page String)
2021-07-29 15:20:55 +00:00
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(dt)
2021-01-28 23:51:57 +00:00
ORDER BY id;
2021-06-14 12:39:53 +00:00
INSERT INTO test_flow VALUES (1, 1, 'A') (2, 1, 'B') (3, 1, 'C') (4, 1, 'D') (5, 1, 'E');
2021-01-28 23:51:57 +00:00
2021-04-08 09:10:41 +00:00
SELECT id, sequenceNextNode('forward', 'head')(dt, page, page = 'A', page = 'A', page = 'B') as next_flow FROM test_flow GROUP BY id;
2021-01-28 23:51:57 +00:00
```
Result:
``` text
┌─id─┬─next_flow─┐
2021-03-10 11:22:44 +00:00
│ 1 │ C │
2021-01-28 23:51:57 +00:00
└────┴───────────┘
```
2021-03-10 11:22:44 +00:00
**Behavior for `forward` and `head` **
2021-06-13 12:20:48 +00:00
``` sql
2021-03-10 11:22:44 +00:00
ALTER TABLE test_flow DELETE WHERE 1 = 1 settings mutations_sync = 1;
INSERT INTO test_flow VALUES (1, 1, 'Home') (2, 1, 'Gift') (3, 1, 'Exit');
INSERT INTO test_flow VALUES (1, 2, 'Home') (2, 2, 'Home') (3, 2, 'Gift') (4, 2, 'Basket');
INSERT INTO test_flow VALUES (1, 3, 'Gift') (2, 3, 'Home') (3, 3, 'Gift') (4, 3, 'Basket');
```
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:10:41 +00:00
SELECT id, sequenceNextNode('forward', 'head')(dt, page, page = 'Home', page = 'Home', page = 'Gift') FROM test_flow GROUP BY id;
2021-07-29 15:20:55 +00:00
2021-03-10 11:22:44 +00:00
dt id page
1970-01-01 09:00:01 1 Home // Base point, Matched with Home
1970-01-01 09:00:02 1 Gift // Matched with Gift
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:03 1 Exit // The result
2021-03-10 11:22:44 +00:00
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 2 Home // Base point, Matched with Home
1970-01-01 09:00:02 2 Home // Unmatched with Gift
1970-01-01 09:00:03 2 Gift
2021-07-29 15:27:50 +00:00
1970-01-01 09:00:04 2 Basket
2021-07-29 15:20:55 +00:00
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 3 Gift // Base point, Unmatched with Home
2021-07-29 15:27:50 +00:00
1970-01-01 09:00:02 3 Home
1970-01-01 09:00:03 3 Gift
1970-01-01 09:00:04 3 Basket
2021-03-10 11:22:44 +00:00
```
**Behavior for `backward` and `tail` **
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:10:41 +00:00
SELECT id, sequenceNextNode('backward', 'tail')(dt, page, page = 'Basket', page = 'Basket', page = 'Gift') FROM test_flow GROUP BY id;
2021-03-10 11:22:44 +00:00
dt id page
1970-01-01 09:00:01 1 Home
1970-01-01 09:00:02 1 Gift
1970-01-01 09:00:03 1 Exit // Base point, Unmatched with Basket
2021-07-29 15:27:50 +00:00
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:01 2 Home
1970-01-01 09:00:02 2 Home // The result
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:03 2 Gift // Matched with Gift
1970-01-01 09:00:04 2 Basket // Base point, Matched with Basket
2021-07-29 15:27:50 +00:00
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 3 Gift
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:02 3 Home // The result
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:03 3 Gift // Base point, Matched with Gift
2021-03-10 11:22:44 +00:00
1970-01-01 09:00:04 3 Basket // Base point, Matched with Basket
```
**Behavior for `forward` and `first_match` **
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:10:41 +00:00
SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, page = 'Gift', page = 'Gift') FROM test_flow GROUP BY id;
2021-03-10 11:22:44 +00:00
dt id page
1970-01-01 09:00:01 1 Home
1970-01-01 09:00:02 1 Gift // Base point
1970-01-01 09:00:03 1 Exit // The result
2021-07-29 15:27:50 +00:00
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:01 2 Home
1970-01-01 09:00:02 2 Home
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:03 2 Gift // Base point
1970-01-01 09:00:04 2 Basket The result
2021-07-29 15:27:50 +00:00
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 3 Gift // Base point
2021-06-14 12:39:53 +00:00
1970-01-01 09:00:02 3 Home // The result
2021-07-29 15:27:50 +00:00
1970-01-01 09:00:03 3 Gift
1970-01-01 09:00:04 3 Basket
2021-03-10 11:22:44 +00:00
```
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:10:41 +00:00
SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, page = 'Gift', page = 'Gift', page = 'Home') FROM test_flow GROUP BY id;
2021-03-10 11:22:44 +00:00
dt id page
1970-01-01 09:00:01 1 Home
1970-01-01 09:00:02 1 Gift // Base point
1970-01-01 09:00:03 1 Exit // Unmatched with Home
2021-07-29 15:27:50 +00:00
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:01 2 Home
1970-01-01 09:00:02 2 Home
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:03 2 Gift // Base point
1970-01-01 09:00:04 2 Basket // Unmatched with Home
2021-07-29 15:27:50 +00:00
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 3 Gift // Base point
1970-01-01 09:00:02 3 Home // Matched with Home
1970-01-01 09:00:03 3 Gift // The result
2021-07-29 15:27:50 +00:00
1970-01-01 09:00:04 3 Basket
2021-03-10 11:22:44 +00:00
```
**Behavior for `backward` and `last_match` **
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:10:41 +00:00
SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, page = 'Gift', page = 'Gift') FROM test_flow GROUP BY id;
2021-03-10 11:22:44 +00:00
dt id page
1970-01-01 09:00:01 1 Home // The result
1970-01-01 09:00:02 1 Gift // Base point
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:03 1 Exit
2021-07-29 15:27:50 +00:00
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:01 2 Home
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:02 2 Home // The result
1970-01-01 09:00:03 2 Gift // Base point
2021-07-29 15:27:50 +00:00
1970-01-01 09:00:04 2 Basket
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:01 3 Gift
2021-03-10 11:22:44 +00:00
1970-01-01 09:00:02 3 Home // The result
2021-07-29 15:27:50 +00:00
1970-01-01 09:00:03 3 Gift // Base point
1970-01-01 09:00:04 3 Basket
2021-03-10 11:22:44 +00:00
```
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:13:10 +00:00
SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, page = 'Gift', page = 'Gift', page = 'Home') FROM test_flow GROUP BY id;
2021-03-10 11:22:44 +00:00
dt id page
1970-01-01 09:00:01 1 Home // Matched with Home, the result is null
1970-01-01 09:00:02 1 Gift // Base point
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:03 1 Exit
2021-07-29 15:27:50 +00:00
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 2 Home // The result
1970-01-01 09:00:02 2 Home // Matched with Home
1970-01-01 09:00:03 2 Gift // Base point
2021-07-29 15:27:50 +00:00
1970-01-01 09:00:04 2 Basket
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 3 Gift // The result
2021-03-10 11:22:44 +00:00
1970-01-01 09:00:02 3 Home // Matched with Home
2021-07-29 15:27:50 +00:00
1970-01-01 09:00:03 3 Gift // Base point
1970-01-01 09:00:04 3 Basket
2021-03-15 09:31:50 +00:00
```
**Behavior for `base_condition` **
2021-06-13 12:20:48 +00:00
``` sql
2021-03-15 09:31:50 +00:00
CREATE TABLE test_flow_basecond
(
`dt` DateTime,
`id` int,
`page` String,
`ref` String
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(dt)
2021-06-13 12:51:26 +00:00
ORDER BY id;
2021-03-15 09:31:50 +00:00
INSERT INTO test_flow_basecond VALUES (1, 1, 'A', 'ref4') (2, 1, 'A', 'ref3') (3, 1, 'B', 'ref2') (4, 1, 'B', 'ref1');
```
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:13:10 +00:00
SELECT id, sequenceNextNode('forward', 'head')(dt, page, ref = 'ref1', page = 'A') FROM test_flow_basecond GROUP BY id;
2021-03-15 09:31:50 +00:00
2021-07-29 15:20:55 +00:00
dt id page ref
2021-06-13 12:51:26 +00:00
1970-01-01 09:00:01 1 A ref4 // The head can not be base point because the ref column of the head unmatched with 'ref1'.
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:02 1 A ref3
1970-01-01 09:00:03 1 B ref2
1970-01-01 09:00:04 1 B ref1
2021-03-15 09:31:50 +00:00
```
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:13:10 +00:00
SELECT id, sequenceNextNode('backward', 'tail')(dt, page, ref = 'ref4', page = 'B') FROM test_flow_basecond GROUP BY id;
2021-03-15 09:31:50 +00:00
2021-07-29 15:20:55 +00:00
dt id page ref
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 1 A ref4
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:02 1 A ref3
1970-01-01 09:00:03 1 B ref2
2021-06-13 12:51:26 +00:00
1970-01-01 09:00:04 1 B ref1 // The tail can not be base point because the ref column of the tail unmatched with 'ref4'.
2021-03-15 09:31:50 +00:00
```
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:13:10 +00:00
SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, ref = 'ref3', page = 'A') FROM test_flow_basecond GROUP BY id;
2021-03-15 09:31:50 +00:00
2021-07-29 15:20:55 +00:00
dt id page ref
2021-06-13 12:51:26 +00:00
1970-01-01 09:00:01 1 A ref4 // This row can not be base point because the ref column unmatched with 'ref3'.
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:02 1 A ref3 // Base point
1970-01-01 09:00:03 1 B ref2 // The result
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:04 1 B ref1
2021-03-15 09:31:50 +00:00
```
2021-06-13 12:20:48 +00:00
``` sql
2021-04-08 09:13:10 +00:00
SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, ref = 'ref2', page = 'B') FROM test_flow_basecond GROUP BY id;
2021-03-15 09:31:50 +00:00
2021-07-29 15:20:55 +00:00
dt id page ref
2021-03-15 09:31:50 +00:00
1970-01-01 09:00:01 1 A ref4
1970-01-01 09:00:02 1 A ref3 // The result
1970-01-01 09:00:03 1 B ref2 // Base point
2021-07-29 15:20:55 +00:00
1970-01-01 09:00:04 1 B ref1 // This row can not be base point because the ref column unmatched with 'ref2'.
2021-03-10 11:22:44 +00:00
```