mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-17 13:13:36 +00:00
73 lines
3.0 KiB
Markdown
73 lines
3.0 KiB
Markdown
|
<a name="aggregate_functions_parametric"></a>
|
|||
|
|
|||
|
# Parametric aggregate functions
|
|||
|
|
|||
|
Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments.
|
|||
|
|
|||
|
## sequenceMatch(pattern)(time, cond1, cond2, ...)
|
|||
|
|
|||
|
Pattern matching for event chains.
|
|||
|
|
|||
|
`pattern` is a string containing a pattern to match. The pattern is similar to a regular expression.
|
|||
|
|
|||
|
`time` is the time of the event with the DateTime type.
|
|||
|
|
|||
|
`cond1`, `cond2` ... is from one to 32 arguments of type UInt8 that indicate whether a certain condition was met for the event.
|
|||
|
|
|||
|
The function collects a sequence of events in RAM. Then it checks whether this sequence matches the pattern.
|
|||
|
It returns UInt8: 0 if the pattern isn't matched, or 1 if it matches.
|
|||
|
|
|||
|
Example: `sequenceMatch ('(?1).*(?2)')(EventTime, URL LIKE '%company%', URL LIKE '%cart%')`
|
|||
|
|
|||
|
- whether there was a chain of events in which a pageview with 'company' in the address occurred earlier than a pageview with 'cart' in the address.
|
|||
|
|
|||
|
This is a singular example. You could write it using other aggregate functions:
|
|||
|
|
|||
|
```text
|
|||
|
minIf(EventTime, URL LIKE '%company%') < maxIf(EventTime, URL LIKE '%cart%').
|
|||
|
```
|
|||
|
|
|||
|
However, there is no such solution for more complex situations.
|
|||
|
|
|||
|
Pattern syntax:
|
|||
|
|
|||
|
`(?1)` refers to the condition (any number can be used in place of 1).
|
|||
|
|
|||
|
`.*` is any number of any events.
|
|||
|
|
|||
|
`(?t>=1800)` is a time condition.
|
|||
|
|
|||
|
Any quantity of any type of events is allowed over the specified time.
|
|||
|
|
|||
|
Instead of `>=`, the following operators can be used:`<`, `>`, `<=`.
|
|||
|
|
|||
|
Any number may be specified in place of 1800.
|
|||
|
|
|||
|
Events that occur during the same second can be put in the chain in any order. This may affect the result of the function.
|
|||
|
|
|||
|
## sequenceCount(pattern)(time, cond1, cond2, ...)
|
|||
|
|
|||
|
Works the same way as the sequenceMatch function, but instead of returning whether there is an event chain, it returns UInt64 with the number of event chains found.
|
|||
|
Chains are searched for without overlapping. In other words, the next chain can start only after the end of the previous one.
|
|||
|
|
|||
|
## uniqUpTo(N)(x)
|
|||
|
|
|||
|
Calculates the number of different argument values if it is less than or equal to N. If the number of different argument values is greater than N, it returns N + 1.
|
|||
|
|
|||
|
Recommended for use with small Ns, up to 10. The maximum value of N is 100.
|
|||
|
|
|||
|
For the state of an aggregate function, it uses the amount of memory equal to 1 + N \* the size of one value of bytes.
|
|||
|
For strings, it stores a non-cryptographic hash of 8 bytes. That is, the calculation is approximated for strings.
|
|||
|
|
|||
|
The function also works for several arguments.
|
|||
|
|
|||
|
It works as fast as possible, except for cases when a large N value is used and the number of unique values is slightly less than N.
|
|||
|
|
|||
|
Usage example:
|
|||
|
|
|||
|
```text
|
|||
|
Problem: Generate a report that shows only keywords that produced at least 5 unique users.
|
|||
|
Solution: Write in the GROUP BY query SearchPhrase HAVING uniqUpTo(4)(UserID) >= 5
|
|||
|
```
|
|||
|
|