mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-15 02:41:59 +00:00
782 lines
33 KiB
Markdown
782 lines
33 KiB
Markdown
---
|
|
slug: /en/sql-reference/window-functions/
|
|
sidebar_label: Window Functions
|
|
sidebar_position: 1
|
|
---
|
|
|
|
# Window Functions
|
|
|
|
Windows functions let you perform calculations across a set of rows that are related to the current row.
|
|
Some of the calculations that you can do are similar to those that can be done with an aggregate function, but a window function doesn't cause rows to be grouped into a single output - the individual rows are still returned.
|
|
|
|
## Standard Window Functions
|
|
|
|
ClickHouse supports the standard grammar for defining windows and window functions. The table below indicates whether a feature is currently supported.
|
|
|
|
| Feature | Supported? |
|
|
|--------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|
| ad hoc window specification (`count(*) over (partition by id order by time desc)`) | ✅ |
|
|
| expressions involving window functions, e.g. `(count(*) over ()) / 2)` | ✅ |
|
|
| `WINDOW` clause (`select ... from table window w as (partition by id)`) | ✅ |
|
|
| `ROWS` frame | ✅ |
|
|
| `RANGE` frame | ✅ (the default) |
|
|
| `INTERVAL` syntax for `DateTime` `RANGE OFFSET` frame | ❌ (specify the number of seconds instead (`RANGE` works with any numeric type).) |
|
|
| `GROUPS` frame | ❌ |
|
|
| Calculating aggregate functions over a frame (`sum(value) over (order by time)`) | ✅ (All aggregate functions are supported) |
|
|
| `rank()`, `dense_rank()`, `row_number()` | ✅ <br/>Alias: `denseRank()` |
|
|
| `percent_rank()` | ✅ Efficiently computes the relative standing of a value within a partition in a dataset. This function effectively replaces the more verbose and computationally intensive manual SQL calculation expressed as `ifNull((rank() OVER(PARTITION BY x ORDER BY y) - 1) / nullif(count(1) OVER(PARTITION BY x) - 1, 0), 0)` <br/>Alias: `percentRank()`|
|
|
| `lag/lead(value, offset)` | ❌ <br/> You can use one of the following workarounds:<br/> 1) `any(value) over (.... rows between <offset> preceding and <offset> preceding)`, or `following` for `lead` <br/> 2) `lagInFrame/leadInFrame`, which are analogous, but respect the window frame. To get behavior identical to `lag/lead`, use `rows between unbounded preceding and unbounded following` |
|
|
| ntile(buckets) | ✅ <br/> Specify window like, (partition by x order by y rows between unbounded preceding and unrounded following). |
|
|
|
|
## ClickHouse-specific Window Functions
|
|
|
|
There is also the following ClickHouse specific window function:
|
|
|
|
### nonNegativeDerivative(metric_column, timestamp_column[, INTERVAL X UNITS])
|
|
|
|
Finds non-negative derivative for given `metric_column` by `timestamp_column`.
|
|
`INTERVAL` can be omitted, default is `INTERVAL 1 SECOND`.
|
|
The computed value is the following for each row:
|
|
- `0` for 1st row,
|
|
- ${\text{metric}_i - \text{metric}_{i-1} \over \text{timestamp}_i - \text{timestamp}_{i-1}} * \text{interval}$ for $i_{th}$ row.
|
|
|
|
## Syntax
|
|
|
|
```text
|
|
aggregate_function (column_name)
|
|
OVER ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
|
|
[ROWS or RANGE expression_to_bound_rows_withing_the_group]] | [window_name])
|
|
FROM table_name
|
|
WINDOW window_name as ([[PARTITION BY grouping_column] [ORDER BY sorting_column]])
|
|
```
|
|
|
|
- `PARTITION BY` - defines how to break a resultset into groups.
|
|
- `ORDER BY` - defines how to order rows inside the group during calculation aggregate_function.
|
|
- `ROWS or RANGE` - defines bounds of a frame, aggregate_function is calculated within a frame.
|
|
- `WINDOW` - allows multiple expressions to use the same window definition.
|
|
|
|
```text
|
|
PARTITION
|
|
┌─────────────────┐ <-- UNBOUNDED PRECEDING (BEGINNING of the PARTITION)
|
|
│ │
|
|
│ │
|
|
│=================│ <-- N PRECEDING <─┐
|
|
│ N ROWS │ │ F
|
|
│ Before CURRENT │ │ R
|
|
│~~~~~~~~~~~~~~~~~│ <-- CURRENT ROW │ A
|
|
│ M ROWS │ │ M
|
|
│ After CURRENT │ │ E
|
|
│=================│ <-- M FOLLOWING <─┘
|
|
│ │
|
|
│ │
|
|
└─────────────────┘ <--- UNBOUNDED FOLLOWING (END of the PARTITION)
|
|
```
|
|
|
|
### Functions
|
|
|
|
These functions can be used only as a window function.
|
|
|
|
- [`row_number()`](./row_number.md) - Number the current row within its partition starting from 1.
|
|
- [`first_value(x)`](./first_value.md) - Return the first value evaluated within its ordered frame.
|
|
- [`last_value(x)`](./last_value.md) - Return the last value evaluated within its ordered frame.
|
|
- [`nth_value(x, offset)`](./nth_value.md) - Return the first non-NULL value evaluated against the nth row (offset) in its ordered frame.
|
|
- [`rank()`](./rank.md) - Rank the current row within its partition with gaps.
|
|
- [`dense_rank()`](./dense_rank.md) - Rank the current row within its partition without gaps.
|
|
- [`lagInFrame(x)`](./lagInFrame.md) - Return a value evaluated at the row that is at a specified physical offset row before the current row within the ordered frame.
|
|
- [`leadInFrame(x)`](./leadInFrame.md) - Return a value evaluated at the row that is offset rows after the current row within the ordered frame.
|
|
|
|
## Examples
|
|
|
|
Let's have a look at some examples of how window functions can be used.
|
|
|
|
### Numbering rows
|
|
|
|
```sql
|
|
CREATE TABLE salaries
|
|
(
|
|
`team` String,
|
|
`player` String,
|
|
`salary` UInt32,
|
|
`position` String
|
|
)
|
|
Engine = Memory;
|
|
|
|
INSERT INTO salaries FORMAT Values
|
|
('Port Elizabeth Barbarians', 'Gary Chen', 195000, 'F'),
|
|
('New Coreystad Archdukes', 'Charles Juarez', 190000, 'F'),
|
|
('Port Elizabeth Barbarians', 'Michael Stanley', 150000, 'D'),
|
|
('New Coreystad Archdukes', 'Scott Harrison', 150000, 'D'),
|
|
('Port Elizabeth Barbarians', 'Robert George', 195000, 'M');
|
|
```
|
|
|
|
```sql
|
|
SELECT
|
|
player,
|
|
salary,
|
|
row_number() OVER (ORDER BY salary ASC) AS row
|
|
FROM salaries;
|
|
```
|
|
|
|
```text
|
|
┌─player──────────┬─salary─┬─row─┐
|
|
│ Michael Stanley │ 150000 │ 1 │
|
|
│ Scott Harrison │ 150000 │ 2 │
|
|
│ Charles Juarez │ 190000 │ 3 │
|
|
│ Gary Chen │ 195000 │ 4 │
|
|
│ Robert George │ 195000 │ 5 │
|
|
└─────────────────┴────────┴─────┘
|
|
```
|
|
|
|
```sql
|
|
SELECT
|
|
player,
|
|
salary,
|
|
row_number() OVER (ORDER BY salary ASC) AS row,
|
|
rank() OVER (ORDER BY salary ASC) AS rank,
|
|
dense_rank() OVER (ORDER BY salary ASC) AS denseRank
|
|
FROM salaries;
|
|
```
|
|
|
|
```text
|
|
┌─player──────────┬─salary─┬─row─┬─rank─┬─denseRank─┐
|
|
│ Michael Stanley │ 150000 │ 1 │ 1 │ 1 │
|
|
│ Scott Harrison │ 150000 │ 2 │ 1 │ 1 │
|
|
│ Charles Juarez │ 190000 │ 3 │ 3 │ 2 │
|
|
│ Gary Chen │ 195000 │ 4 │ 4 │ 3 │
|
|
│ Robert George │ 195000 │ 5 │ 4 │ 3 │
|
|
└─────────────────┴────────┴─────┴──────┴───────────┘
|
|
```
|
|
|
|
### Aggregation functions
|
|
|
|
Compare each player's salary to the average for their team.
|
|
|
|
```sql
|
|
SELECT
|
|
player,
|
|
salary,
|
|
team,
|
|
avg(salary) OVER (PARTITION BY team) AS teamAvg,
|
|
salary - teamAvg AS diff
|
|
FROM salaries;
|
|
```
|
|
|
|
```text
|
|
┌─player──────────┬─salary─┬─team──────────────────────┬─teamAvg─┬───diff─┐
|
|
│ Charles Juarez │ 190000 │ New Coreystad Archdukes │ 170000 │ 20000 │
|
|
│ Scott Harrison │ 150000 │ New Coreystad Archdukes │ 170000 │ -20000 │
|
|
│ Gary Chen │ 195000 │ Port Elizabeth Barbarians │ 180000 │ 15000 │
|
|
│ Michael Stanley │ 150000 │ Port Elizabeth Barbarians │ 180000 │ -30000 │
|
|
│ Robert George │ 195000 │ Port Elizabeth Barbarians │ 180000 │ 15000 │
|
|
└─────────────────┴────────┴───────────────────────────┴─────────┴────────┘
|
|
```
|
|
|
|
Compare each player's salary to the maximum for their team.
|
|
|
|
```sql
|
|
SELECT
|
|
player,
|
|
salary,
|
|
team,
|
|
max(salary) OVER (PARTITION BY team) AS teamMax,
|
|
salary - teamMax AS diff
|
|
FROM salaries;
|
|
```
|
|
|
|
```text
|
|
┌─player──────────┬─salary─┬─team──────────────────────┬─teamMax─┬───diff─┐
|
|
│ Charles Juarez │ 190000 │ New Coreystad Archdukes │ 190000 │ 0 │
|
|
│ Scott Harrison │ 150000 │ New Coreystad Archdukes │ 190000 │ -40000 │
|
|
│ Gary Chen │ 195000 │ Port Elizabeth Barbarians │ 195000 │ 0 │
|
|
│ Michael Stanley │ 150000 │ Port Elizabeth Barbarians │ 195000 │ -45000 │
|
|
│ Robert George │ 195000 │ Port Elizabeth Barbarians │ 195000 │ 0 │
|
|
└─────────────────┴────────┴───────────────────────────┴─────────┴────────┘
|
|
```
|
|
|
|
### Partitioning by column
|
|
|
|
```sql
|
|
CREATE TABLE wf_partition
|
|
(
|
|
`part_key` UInt64,
|
|
`value` UInt64,
|
|
`order` UInt64
|
|
)
|
|
ENGINE = Memory;
|
|
|
|
INSERT INTO wf_partition FORMAT Values
|
|
(1,1,1), (1,2,2), (1,3,3), (2,0,0), (3,0,0);
|
|
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER (PARTITION BY part_key) AS frame_values
|
|
FROM wf_partition
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─part_key─┬─value─┬─order─┬─frame_values─┐
|
|
│ 1 │ 1 │ 1 │ [1,2,3] │ <┐
|
|
│ 1 │ 2 │ 2 │ [1,2,3] │ │ 1-st group
|
|
│ 1 │ 3 │ 3 │ [1,2,3] │ <┘
|
|
│ 2 │ 0 │ 0 │ [0] │ <- 2-nd group
|
|
│ 3 │ 0 │ 0 │ [0] │ <- 3-d group
|
|
└──────────┴───────┴───────┴──────────────┘
|
|
```
|
|
|
|
### Frame bounding
|
|
|
|
```sql
|
|
CREATE TABLE wf_frame
|
|
(
|
|
`part_key` UInt64,
|
|
`value` UInt64,
|
|
`order` UInt64
|
|
)
|
|
ENGINE = Memory;
|
|
|
|
INSERT INTO wf_frame FORMAT Values
|
|
(1,1,1), (1,2,2), (1,3,3), (1,4,4), (1,5,5);
|
|
```
|
|
|
|
```sql
|
|
-- Frame is bounded by bounds of a partition (BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER (
|
|
PARTITION BY part_key
|
|
ORDER BY order ASC
|
|
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
|
|
) AS frame_values
|
|
FROM wf_frame
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─part_key─┬─value─┬─order─┬─frame_values─┐
|
|
│ 1 │ 1 │ 1 │ [1,2,3,4,5] │
|
|
│ 1 │ 2 │ 2 │ [1,2,3,4,5] │
|
|
│ 1 │ 3 │ 3 │ [1,2,3,4,5] │
|
|
│ 1 │ 4 │ 4 │ [1,2,3,4,5] │
|
|
│ 1 │ 5 │ 5 │ [1,2,3,4,5] │
|
|
└──────────┴───────┴───────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- short form - no bound expression, no order by,
|
|
-- an equalent of `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER (PARTITION BY part_key) AS frame_values_short,
|
|
groupArray(value) OVER (PARTITION BY part_key
|
|
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
|
|
) AS frame_values
|
|
FROM wf_frame
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
┌─part_key─┬─value─┬─order─┬─frame_values_short─┬─frame_values─┐
|
|
│ 1 │ 1 │ 1 │ [1,2,3,4,5] │ [1,2,3,4,5] │
|
|
│ 1 │ 2 │ 2 │ [1,2,3,4,5] │ [1,2,3,4,5] │
|
|
│ 1 │ 3 │ 3 │ [1,2,3,4,5] │ [1,2,3,4,5] │
|
|
│ 1 │ 4 │ 4 │ [1,2,3,4,5] │ [1,2,3,4,5] │
|
|
│ 1 │ 5 │ 5 │ [1,2,3,4,5] │ [1,2,3,4,5] │
|
|
└──────────┴───────┴───────┴────────────────────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- frame is bounded by the beginning of a partition and the current row
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER (
|
|
PARTITION BY part_key
|
|
ORDER BY order ASC
|
|
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
|
) AS frame_values
|
|
FROM wf_frame
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─part_key─┬─value─┬─order─┬─frame_values─┐
|
|
│ 1 │ 1 │ 1 │ [1] │
|
|
│ 1 │ 2 │ 2 │ [1,2] │
|
|
│ 1 │ 3 │ 3 │ [1,2,3] │
|
|
│ 1 │ 4 │ 4 │ [1,2,3,4] │
|
|
│ 1 │ 5 │ 5 │ [1,2,3,4,5] │
|
|
└──────────┴───────┴───────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- short form (frame is bounded by the beginning of a partition and the current row)
|
|
-- an equalent of `ORDER BY order ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER (PARTITION BY part_key ORDER BY order ASC) AS frame_values_short,
|
|
groupArray(value) OVER (PARTITION BY part_key ORDER BY order ASC
|
|
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
|
) AS frame_values
|
|
FROM wf_frame
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─part_key─┬─value─┬─order─┬─frame_values_short─┬─frame_values─┐
|
|
│ 1 │ 1 │ 1 │ [1] │ [1] │
|
|
│ 1 │ 2 │ 2 │ [1,2] │ [1,2] │
|
|
│ 1 │ 3 │ 3 │ [1,2,3] │ [1,2,3] │
|
|
│ 1 │ 4 │ 4 │ [1,2,3,4] │ [1,2,3,4] │
|
|
│ 1 │ 5 │ 5 │ [1,2,3,4,5] │ [1,2,3,4,5] │
|
|
└──────────┴───────┴───────┴────────────────────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- frame is bounded by the beginning of a partition and the current row, but order is backward
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER (PARTITION BY part_key ORDER BY order DESC) AS frame_values
|
|
FROM wf_frame
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─part_key─┬─value─┬─order─┬─frame_values─┐
|
|
│ 1 │ 1 │ 1 │ [5,4,3,2,1] │
|
|
│ 1 │ 2 │ 2 │ [5,4,3,2] │
|
|
│ 1 │ 3 │ 3 │ [5,4,3] │
|
|
│ 1 │ 4 │ 4 │ [5,4] │
|
|
│ 1 │ 5 │ 5 │ [5] │
|
|
└──────────┴───────┴───────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- sliding frame - 1 PRECEDING ROW AND CURRENT ROW
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER (
|
|
PARTITION BY part_key
|
|
ORDER BY order ASC
|
|
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
|
|
) AS frame_values
|
|
FROM wf_frame
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─part_key─┬─value─┬─order─┬─frame_values─┐
|
|
│ 1 │ 1 │ 1 │ [1] │
|
|
│ 1 │ 2 │ 2 │ [1,2] │
|
|
│ 1 │ 3 │ 3 │ [2,3] │
|
|
│ 1 │ 4 │ 4 │ [3,4] │
|
|
│ 1 │ 5 │ 5 │ [4,5] │
|
|
└──────────┴───────┴───────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- sliding frame - ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER (
|
|
PARTITION BY part_key
|
|
ORDER BY order ASC
|
|
ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING
|
|
) AS frame_values
|
|
FROM wf_frame
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─part_key─┬─value─┬─order─┬─frame_values─┐
|
|
│ 1 │ 1 │ 1 │ [1,2,3,4,5] │
|
|
│ 1 │ 2 │ 2 │ [1,2,3,4,5] │
|
|
│ 1 │ 3 │ 3 │ [2,3,4,5] │
|
|
│ 1 │ 4 │ 4 │ [3,4,5] │
|
|
│ 1 │ 5 │ 5 │ [4,5] │
|
|
└──────────┴───────┴───────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- row_number does not respect the frame, so rn_1 = rn_2 = rn_3 != rn_4
|
|
SELECT
|
|
part_key,
|
|
value,
|
|
order,
|
|
groupArray(value) OVER w1 AS frame_values,
|
|
row_number() OVER w1 AS rn_1,
|
|
sum(1) OVER w1 AS rn_2,
|
|
row_number() OVER w2 AS rn_3,
|
|
sum(1) OVER w2 AS rn_4
|
|
FROM wf_frame
|
|
WINDOW
|
|
w1 AS (PARTITION BY part_key ORDER BY order DESC),
|
|
w2 AS (
|
|
PARTITION BY part_key
|
|
ORDER BY order DESC
|
|
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
|
|
)
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─part_key─┬─value─┬─order─┬─frame_values─┬─rn_1─┬─rn_2─┬─rn_3─┬─rn_4─┐
|
|
│ 1 │ 1 │ 1 │ [5,4,3,2,1] │ 5 │ 5 │ 5 │ 2 │
|
|
│ 1 │ 2 │ 2 │ [5,4,3,2] │ 4 │ 4 │ 4 │ 2 │
|
|
│ 1 │ 3 │ 3 │ [5,4,3] │ 3 │ 3 │ 3 │ 2 │
|
|
│ 1 │ 4 │ 4 │ [5,4] │ 2 │ 2 │ 2 │ 2 │
|
|
│ 1 │ 5 │ 5 │ [5] │ 1 │ 1 │ 1 │ 1 │
|
|
└──────────┴───────┴───────┴──────────────┴──────┴──────┴──────┴──────┘
|
|
```
|
|
|
|
```sql
|
|
-- first_value and last_value respect the frame
|
|
SELECT
|
|
groupArray(value) OVER w1 AS frame_values_1,
|
|
first_value(value) OVER w1 AS first_value_1,
|
|
last_value(value) OVER w1 AS last_value_1,
|
|
groupArray(value) OVER w2 AS frame_values_2,
|
|
first_value(value) OVER w2 AS first_value_2,
|
|
last_value(value) OVER w2 AS last_value_2
|
|
FROM wf_frame
|
|
WINDOW
|
|
w1 AS (PARTITION BY part_key ORDER BY order ASC),
|
|
w2 AS (PARTITION BY part_key ORDER BY order ASC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─frame_values_1─┬─first_value_1─┬─last_value_1─┬─frame_values_2─┬─first_value_2─┬─last_value_2─┐
|
|
│ [1] │ 1 │ 1 │ [1] │ 1 │ 1 │
|
|
│ [1,2] │ 1 │ 2 │ [1,2] │ 1 │ 2 │
|
|
│ [1,2,3] │ 1 │ 3 │ [2,3] │ 2 │ 3 │
|
|
│ [1,2,3,4] │ 1 │ 4 │ [3,4] │ 3 │ 4 │
|
|
│ [1,2,3,4,5] │ 1 │ 5 │ [4,5] │ 4 │ 5 │
|
|
└────────────────┴───────────────┴──────────────┴────────────────┴───────────────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- second value within the frame
|
|
SELECT
|
|
groupArray(value) OVER w1 AS frame_values_1,
|
|
nth_value(value, 2) OVER w1 AS second_value
|
|
FROM wf_frame
|
|
WINDOW w1 AS (PARTITION BY part_key ORDER BY order ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─frame_values_1─┬─second_value─┐
|
|
│ [1] │ 0 │
|
|
│ [1,2] │ 2 │
|
|
│ [1,2,3] │ 2 │
|
|
│ [1,2,3,4] │ 2 │
|
|
│ [2,3,4,5] │ 3 │
|
|
└────────────────┴──────────────┘
|
|
```
|
|
|
|
```sql
|
|
-- second value within the frame + Null for missing values
|
|
SELECT
|
|
groupArray(value) OVER w1 AS frame_values_1,
|
|
nth_value(toNullable(value), 2) OVER w1 AS second_value
|
|
FROM wf_frame
|
|
WINDOW w1 AS (PARTITION BY part_key ORDER BY order ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
|
|
ORDER BY
|
|
part_key ASC,
|
|
value ASC;
|
|
|
|
┌─frame_values_1─┬─second_value─┐
|
|
│ [1] │ ᴺᵁᴸᴸ │
|
|
│ [1,2] │ 2 │
|
|
│ [1,2,3] │ 2 │
|
|
│ [1,2,3,4] │ 2 │
|
|
│ [2,3,4,5] │ 3 │
|
|
└────────────────┴──────────────┘
|
|
```
|
|
|
|
## Real world examples
|
|
|
|
The following examples solve common real-world problems.
|
|
|
|
### Maximum/total salary per department
|
|
|
|
```sql
|
|
CREATE TABLE employees
|
|
(
|
|
`department` String,
|
|
`employee_name` String,
|
|
`salary` Float
|
|
)
|
|
ENGINE = Memory;
|
|
|
|
INSERT INTO employees FORMAT Values
|
|
('Finance', 'Jonh', 200),
|
|
('Finance', 'Joan', 210),
|
|
('Finance', 'Jean', 505),
|
|
('IT', 'Tim', 200),
|
|
('IT', 'Anna', 300),
|
|
('IT', 'Elen', 500);
|
|
```
|
|
|
|
```sql
|
|
SELECT
|
|
department,
|
|
employee_name AS emp,
|
|
salary,
|
|
max_salary_per_dep,
|
|
total_salary_per_dep,
|
|
round((salary / total_salary_per_dep) * 100, 2) AS `share_per_dep(%)`
|
|
FROM
|
|
(
|
|
SELECT
|
|
department,
|
|
employee_name,
|
|
salary,
|
|
max(salary) OVER wndw AS max_salary_per_dep,
|
|
sum(salary) OVER wndw AS total_salary_per_dep
|
|
FROM employees
|
|
WINDOW wndw AS (
|
|
PARTITION BY department
|
|
rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
|
|
)
|
|
ORDER BY
|
|
department ASC,
|
|
employee_name ASC
|
|
);
|
|
|
|
┌─department─┬─emp──┬─salary─┬─max_salary_per_dep─┬─total_salary_per_dep─┬─share_per_dep(%)─┐
|
|
│ Finance │ Jean │ 505 │ 505 │ 915 │ 55.19 │
|
|
│ Finance │ Joan │ 210 │ 505 │ 915 │ 22.95 │
|
|
│ Finance │ Jonh │ 200 │ 505 │ 915 │ 21.86 │
|
|
│ IT │ Anna │ 300 │ 500 │ 1000 │ 30 │
|
|
│ IT │ Elen │ 500 │ 500 │ 1000 │ 50 │
|
|
│ IT │ Tim │ 200 │ 500 │ 1000 │ 20 │
|
|
└────────────┴──────┴────────┴────────────────────┴──────────────────────┴──────────────────┘
|
|
```
|
|
|
|
### Cumulative sum
|
|
|
|
```sql
|
|
CREATE TABLE warehouse
|
|
(
|
|
`item` String,
|
|
`ts` DateTime,
|
|
`value` Float
|
|
)
|
|
ENGINE = Memory
|
|
|
|
INSERT INTO warehouse VALUES
|
|
('sku38', '2020-01-01', 9),
|
|
('sku38', '2020-02-01', 1),
|
|
('sku38', '2020-03-01', -4),
|
|
('sku1', '2020-01-01', 1),
|
|
('sku1', '2020-02-01', 1),
|
|
('sku1', '2020-03-01', 1);
|
|
```
|
|
|
|
```sql
|
|
SELECT
|
|
item,
|
|
ts,
|
|
value,
|
|
sum(value) OVER (PARTITION BY item ORDER BY ts ASC) AS stock_balance
|
|
FROM warehouse
|
|
ORDER BY
|
|
item ASC,
|
|
ts ASC;
|
|
|
|
┌─item──┬──────────────────ts─┬─value─┬─stock_balance─┐
|
|
│ sku1 │ 2020-01-01 00:00:00 │ 1 │ 1 │
|
|
│ sku1 │ 2020-02-01 00:00:00 │ 1 │ 2 │
|
|
│ sku1 │ 2020-03-01 00:00:00 │ 1 │ 3 │
|
|
│ sku38 │ 2020-01-01 00:00:00 │ 9 │ 9 │
|
|
│ sku38 │ 2020-02-01 00:00:00 │ 1 │ 10 │
|
|
│ sku38 │ 2020-03-01 00:00:00 │ -4 │ 6 │
|
|
└───────┴─────────────────────┴───────┴───────────────┘
|
|
```
|
|
|
|
### Moving / Sliding Average (per 3 rows)
|
|
|
|
```sql
|
|
CREATE TABLE sensors
|
|
(
|
|
`metric` String,
|
|
`ts` DateTime,
|
|
`value` Float
|
|
)
|
|
ENGINE = Memory;
|
|
|
|
insert into sensors values('cpu_temp', '2020-01-01 00:00:00', 87),
|
|
('cpu_temp', '2020-01-01 00:00:01', 77),
|
|
('cpu_temp', '2020-01-01 00:00:02', 93),
|
|
('cpu_temp', '2020-01-01 00:00:03', 87),
|
|
('cpu_temp', '2020-01-01 00:00:04', 87),
|
|
('cpu_temp', '2020-01-01 00:00:05', 87),
|
|
('cpu_temp', '2020-01-01 00:00:06', 87),
|
|
('cpu_temp', '2020-01-01 00:00:07', 87);
|
|
```
|
|
|
|
```sql
|
|
SELECT
|
|
metric,
|
|
ts,
|
|
value,
|
|
avg(value) OVER (
|
|
PARTITION BY metric
|
|
ORDER BY ts ASC
|
|
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
|
|
) AS moving_avg_temp
|
|
FROM sensors
|
|
ORDER BY
|
|
metric ASC,
|
|
ts ASC;
|
|
|
|
┌─metric───┬──────────────────ts─┬─value─┬───moving_avg_temp─┐
|
|
│ cpu_temp │ 2020-01-01 00:00:00 │ 87 │ 87 │
|
|
│ cpu_temp │ 2020-01-01 00:00:01 │ 77 │ 82 │
|
|
│ cpu_temp │ 2020-01-01 00:00:02 │ 93 │ 85.66666666666667 │
|
|
│ cpu_temp │ 2020-01-01 00:00:03 │ 87 │ 85.66666666666667 │
|
|
│ cpu_temp │ 2020-01-01 00:00:04 │ 87 │ 89 │
|
|
│ cpu_temp │ 2020-01-01 00:00:05 │ 87 │ 87 │
|
|
│ cpu_temp │ 2020-01-01 00:00:06 │ 87 │ 87 │
|
|
│ cpu_temp │ 2020-01-01 00:00:07 │ 87 │ 87 │
|
|
└──────────┴─────────────────────┴───────┴───────────────────┘
|
|
```
|
|
|
|
### Moving / Sliding Average (per 10 seconds)
|
|
|
|
```sql
|
|
SELECT
|
|
metric,
|
|
ts,
|
|
value,
|
|
avg(value) OVER (PARTITION BY metric ORDER BY ts
|
|
Range BETWEEN 10 PRECEDING AND CURRENT ROW) AS moving_avg_10_seconds_temp
|
|
FROM sensors
|
|
ORDER BY
|
|
metric ASC,
|
|
ts ASC;
|
|
|
|
┌─metric───┬──────────────────ts─┬─value─┬─moving_avg_10_seconds_temp─┐
|
|
│ cpu_temp │ 2020-01-01 00:00:00 │ 87 │ 87 │
|
|
│ cpu_temp │ 2020-01-01 00:01:10 │ 77 │ 77 │
|
|
│ cpu_temp │ 2020-01-01 00:02:20 │ 93 │ 93 │
|
|
│ cpu_temp │ 2020-01-01 00:03:30 │ 87 │ 87 │
|
|
│ cpu_temp │ 2020-01-01 00:04:40 │ 87 │ 87 │
|
|
│ cpu_temp │ 2020-01-01 00:05:50 │ 87 │ 87 │
|
|
│ cpu_temp │ 2020-01-01 00:06:00 │ 87 │ 87 │
|
|
│ cpu_temp │ 2020-01-01 00:07:10 │ 87 │ 87 │
|
|
└──────────┴─────────────────────┴───────┴────────────────────────────┘
|
|
```
|
|
|
|
### Moving / Sliding Average (per 10 days)
|
|
|
|
Temperature is stored with second precision, but using `Range` and `ORDER BY toDate(ts)` we form a frame with the size of 10 units, and because of `toDate(ts)` the unit is a day.
|
|
|
|
```sql
|
|
CREATE TABLE sensors
|
|
(
|
|
`metric` String,
|
|
`ts` DateTime,
|
|
`value` Float
|
|
)
|
|
ENGINE = Memory;
|
|
|
|
insert into sensors values('ambient_temp', '2020-01-01 00:00:00', 16),
|
|
('ambient_temp', '2020-01-01 12:00:00', 16),
|
|
('ambient_temp', '2020-01-02 11:00:00', 9),
|
|
('ambient_temp', '2020-01-02 12:00:00', 9),
|
|
('ambient_temp', '2020-02-01 10:00:00', 10),
|
|
('ambient_temp', '2020-02-01 12:00:00', 10),
|
|
('ambient_temp', '2020-02-10 12:00:00', 12),
|
|
('ambient_temp', '2020-02-10 13:00:00', 12),
|
|
('ambient_temp', '2020-02-20 12:00:01', 16),
|
|
('ambient_temp', '2020-03-01 12:00:00', 16),
|
|
('ambient_temp', '2020-03-01 12:00:00', 16),
|
|
('ambient_temp', '2020-03-01 12:00:00', 16);
|
|
```
|
|
|
|
```sql
|
|
SELECT
|
|
metric,
|
|
ts,
|
|
value,
|
|
round(avg(value) OVER (PARTITION BY metric ORDER BY toDate(ts)
|
|
Range BETWEEN 10 PRECEDING AND CURRENT ROW),2) AS moving_avg_10_days_temp
|
|
FROM sensors
|
|
ORDER BY
|
|
metric ASC,
|
|
ts ASC;
|
|
|
|
┌─metric───────┬──────────────────ts─┬─value─┬─moving_avg_10_days_temp─┐
|
|
│ ambient_temp │ 2020-01-01 00:00:00 │ 16 │ 16 │
|
|
│ ambient_temp │ 2020-01-01 12:00:00 │ 16 │ 16 │
|
|
│ ambient_temp │ 2020-01-02 11:00:00 │ 9 │ 12.5 │
|
|
│ ambient_temp │ 2020-01-02 12:00:00 │ 9 │ 12.5 │
|
|
│ ambient_temp │ 2020-02-01 10:00:00 │ 10 │ 10 │
|
|
│ ambient_temp │ 2020-02-01 12:00:00 │ 10 │ 10 │
|
|
│ ambient_temp │ 2020-02-10 12:00:00 │ 12 │ 11 │
|
|
│ ambient_temp │ 2020-02-10 13:00:00 │ 12 │ 11 │
|
|
│ ambient_temp │ 2020-02-20 12:00:01 │ 16 │ 13.33 │
|
|
│ ambient_temp │ 2020-03-01 12:00:00 │ 16 │ 16 │
|
|
│ ambient_temp │ 2020-03-01 12:00:00 │ 16 │ 16 │
|
|
│ ambient_temp │ 2020-03-01 12:00:00 │ 16 │ 16 │
|
|
└──────────────┴─────────────────────┴───────┴─────────────────────────┘
|
|
```
|
|
|
|
## References
|
|
|
|
### GitHub Issues
|
|
|
|
The roadmap for the initial support of window functions is [in this issue](https://github.com/ClickHouse/ClickHouse/issues/18097).
|
|
|
|
All GitHub issues related to window functions have the [comp-window-functions](https://github.com/ClickHouse/ClickHouse/labels/comp-window-functions) tag.
|
|
|
|
### Tests
|
|
|
|
These tests contain the examples of the currently supported grammar:
|
|
|
|
https://github.com/ClickHouse/ClickHouse/blob/master/tests/performance/window_functions.xml
|
|
|
|
https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/01591_window_functions.sql
|
|
|
|
### Postgres Docs
|
|
|
|
https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW
|
|
|
|
https://www.postgresql.org/docs/devel/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
|
|
|
|
https://www.postgresql.org/docs/devel/functions-window.html
|
|
|
|
https://www.postgresql.org/docs/devel/tutorial-window.html
|
|
|
|
### MySQL Docs
|
|
|
|
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html
|
|
|
|
https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html
|
|
|
|
https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
|
|
|
|
|
|
## Related Content
|
|
|
|
- Blog: [Working with time series data in ClickHouse](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse)
|
|
- Blog: [Window and array functions for Git commit sequences](https://clickhouse.com/blog/clickhouse-window-array-functions-git-commits)
|
|
- Blog: [Getting Data Into ClickHouse - Part 3 - Using S3](https://clickhouse.com/blog/getting-data-into-clickhouse-part-3-s3)
|