mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-27 01:51:59 +00:00
add window view doc(en)
This commit is contained in:
parent
5fc20b3e6a
commit
03dc76da15
114
docs/en/sql-reference/functions/window-functions.md
Normal file
114
docs/en/sql-reference/functions/window-functions.md
Normal file
@ -0,0 +1,114 @@
|
||||
---
|
||||
toc_priority: 68
|
||||
toc_title: Window
|
||||
---
|
||||
|
||||
# Window Functions {#window-functions}
|
||||
|
||||
Window functions indicate the lower and upper window bound of records in WindowView. The functions for working with WindowView are listed below.
|
||||
|
||||
## tumble {#window-functions-tumble}
|
||||
|
||||
A tumbling time window assigns records to non-overlapping, continuous windows with a fixed duration (interval).
|
||||
|
||||
``` sql
|
||||
tumble(time_attr, interval [, timezone])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
- `time_attr` - Date and time. [DateTime](../../sql-reference/data-types/datetime.md) data type.
|
||||
- `interval` - Window interval in [Interval](../../sql-reference/data-types/special-data-types/interval.md) data type.
|
||||
- `timezone` — [Timezone name](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) (optional).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- The lower and upper bound of the tumble window.
|
||||
|
||||
Type: `Tuple(DateTime, DateTime)`
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT tumble(now(), toIntervalDay('1'))
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─tumble(now(), toIntervalDay('1'))─────────────┐
|
||||
│ ['2020-01-01 00:00:00','2020-01-02 00:00:00'] │
|
||||
└───────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## hop {#window-functions-hop}
|
||||
|
||||
A hopping time window has a fixed duration (`window_interval`) and hops by a specified hop interval (`hop_interval`). If the `hop_interval` is smaller than the `window_interval`, hopping windows are overlapping. Thus, records can be assigned to multiple windows.
|
||||
|
||||
``` sql
|
||||
hop(time_attr, hop_interval, window_interval [, timezone])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `time_attr` - Date and time. [DateTime](../../sql-reference/data-types/datetime.md) data type.
|
||||
- `hop_interval` - Hop interval in [Interval](../../sql-reference/data-types/special-data-types/interval.md) data type. Should be a positive number.
|
||||
- `window_interval` - Window interval in [Interval](../../sql-reference/data-types/special-data-types/interval.md) data type. Should be a positive number.
|
||||
- `timezone` — [Timezone name](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) (optional).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- The lower and upper bound of the hop window. Since hop windows are
|
||||
overlapped, the function only returns the bound of the **first** window when
|
||||
hop function is used **without** `WINDOW VIEW`.
|
||||
|
||||
Type: `Tuple(DateTime, DateTime)`
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT hop(now(), INTERVAL '1' SECOND, INTERVAL '2' SECOND)
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─hop(now(), toIntervalSecond('1'), toIntervalSecond('2'))──┐
|
||||
│ ('2020-01-14 16:58:22','2020-01-14 16:58:24') │
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## tumbleStart {#window-functions-tumblestart}
|
||||
|
||||
Indicate the lower bound of a tumble function.
|
||||
|
||||
``` sql
|
||||
tumbleStart(time_attr, interval [, timezone]);
|
||||
```
|
||||
|
||||
## tumbleEnd {#window-functions-tumbleend}
|
||||
|
||||
Indicate the upper bound of a tumble function.
|
||||
|
||||
``` sql
|
||||
tumbleEnd(time_attr, interval [, timezone]);
|
||||
```
|
||||
|
||||
## hopStart {#window-functions-hopstart}
|
||||
|
||||
Indicate the lower bound of a hop function.
|
||||
|
||||
``` sql
|
||||
hopStart(time_attr, hop_interval, window_interval [, timezone]);
|
||||
```
|
||||
|
||||
## hopEnd {#window-functions-hopend}
|
||||
|
||||
Indicate the upper bound of a hop function.
|
||||
|
||||
``` sql
|
||||
hopEnd(time_attr, hop_interval, window_interval [, timezone]);
|
||||
```
|
@ -5,7 +5,7 @@ toc_title: VIEW
|
||||
|
||||
# CREATE VIEW {#create-view}
|
||||
|
||||
Creates a new view. Views can be [normal](#normal), [materialized](#materialized) and [live](#live-view) (the latter is an experimental feature).
|
||||
Creates a new view. Views can be [normal](#normal), [materialized](#materialized), [live](#live-view), and [window](#window-view) (live view and window view are experimental features).
|
||||
|
||||
## Normal View {#normal}
|
||||
|
||||
@ -243,3 +243,119 @@ Most common uses of live view tables include:
|
||||
|
||||
**See Also**
|
||||
- [ALTER LIVE VIEW](../alter/view.md#alter-live-view)
|
||||
|
||||
## Window View [Experimental] {#window-view}
|
||||
|
||||
!!! important "Important"
|
||||
This is an experimental feature that may change in backwards-incompatible ways in the future releases.
|
||||
Enable usage of window views and `WATCH` query using [allow_experimental_window_view](../../../operations/settings/settings.md#allow-experimental-window-view) setting. Input the command `set allow_experimental_window_view = 1`.
|
||||
|
||||
``` sql
|
||||
CREATE WINDOW VIEW [IF NOT EXISTS] [db.]table_name [TO [db.]table_name] [ENGINE = engine] [WATERMARK = strategy] [ALLOWED_LATENESS = interval_function] AS SELECT ... GROUP BY window_function
|
||||
```
|
||||
|
||||
Window view can aggregate data by time window and output the results when the window is ready to fire. It stores the partial aggregation results in an inner(or specified) table and can push the processing result to a specified table or push notifications using the WATCH query.
|
||||
|
||||
Creating a window view is similar to creating `MATERIALIZED VIEW`. Window view needs an inner storage engine to store intermediate data. The inner storage will use `AggregatingMergeTree` as the default engine.
|
||||
|
||||
### Window Functions {#window-view-windowfunctions}
|
||||
|
||||
[WindowFunctions](../../functions/window-functions.md) are used to indicate the lower and upper window bound of records. The window view needs to be used with a window function.
|
||||
|
||||
### TIME ATTRIBUTES {#window-view-timeattributes}
|
||||
|
||||
Window view supports **processing time** and **event time** process.
|
||||
|
||||
**Processing time** allows window view to produce results based on the local machine's time and is used by default. It is the most straightforward notion of time but does not provide determinism. The processing time attribute can be defined by setting the `time_attr` of the window function to a table column or using the function `now()`. The following query creates a window view with processing time.
|
||||
|
||||
``` sql
|
||||
CREATE WINDOW VIEW wv AS SELECT count(number), tumbleStart(w_id) as w_start from date GROUP BY tumble(now(), INTERVAL '5' SECOND) as w_id
|
||||
```
|
||||
|
||||
**Event time** is the time that each individual event occurred on its producing device. This time is typically embedded within the records when it is generated. Event time processing allows for consistent results even in case of out-of-order events or late events. Window view supports event time processing by using `WATERMARK` syntax.
|
||||
|
||||
Window view provides three watermark strategies.
|
||||
|
||||
* `STRICTLY_ASCENDING`: Emits a watermark of the maximum observed timestamp so far. Rows that have a timestamp smaller to the max timestamp are not late.
|
||||
* `ASCENDING`: Emits a watermark of the maximum observed timestamp so far minus 1. Rows that have a timestamp equal and smaller to the max timestamp are not late.
|
||||
* `BOUNDED`: WATERMARK=INTERVAL. Emits watermarks, which are the maximum observed timestamp minus the specified delay.
|
||||
|
||||
The following queries are examples of creating a window view with `WATERMARK`.
|
||||
|
||||
``` sql
|
||||
CREATE WINDOW VIEW wv WATERMARK=STRICTLY_ASCENDING AS SELECT count(number) FROM date GROUP BY tumble(timestamp, INTERVAL '5' SECOND);
|
||||
CREATE WINDOW VIEW wv WATERMARK=ASCENDING AS SELECT count(number) FROM date GROUP BY tumble(timestamp, INTERVAL '5' SECOND);
|
||||
CREATE WINDOW VIEW wv WATERMARK=INTERVAL '3' SECOND AS SELECT count(number) FROM date GROUP BY tumble(timestamp, INTERVAL '5' SECOND);
|
||||
```
|
||||
|
||||
By default, the window will be fired when the watermark comes, and elements that arrived behind the watermark will be dropped. Window view supports late event processing by setting `ALLOWED_LATENESS=INTERVAL`. An example of lateness handling is:
|
||||
|
||||
``` sql
|
||||
CREATE WINDOW VIEW test.wv TO test.dst WATERMARK=ASCENDING ALLOWED_LATENESS=INTERVAL '2' SECOND AS SELECT count(a) AS count, tumbleEnd(wid) AS w_end FROM test.mt GROUP BY tumble(timestamp, INTERVAL '5' SECOND) AS wid;
|
||||
```
|
||||
|
||||
Note that elements emitted by a late firing should be treated as updated results of a previous computation. Instead of firing at the end of windows, the window view will fire immediately when the late event arrives. Thus, it will result in multiple outputs for the same window. Users need to take these duplicated results into account or deduplicate them.
|
||||
|
||||
### Monitoring New Windows{#window-view-monitoring}
|
||||
|
||||
Window view supports the `WATCH` query to constantly append the processing results to the console or use `TO` syntax to output the results to a table.
|
||||
|
||||
``` sql
|
||||
WATCH [db.]name [LIMIT n]
|
||||
```
|
||||
|
||||
`WATCH` query acts similar as in `LIVE VIEW`. A `LIMIT` can be specified to set the number of updates to receive before terminating the query.
|
||||
|
||||
### Settings {#window-view-settings}
|
||||
|
||||
- `window_view_clean_interval`: The clean interval of window view in seconds to free outdated data. The system will retain the windows that have not been fully triggered according to the system time or `WATERMARK` configuration, and the other data will be deleted.
|
||||
- `window_view_heartbeat_interval`: The heartbeat interval in seconds to indicate the watch query is alive.
|
||||
|
||||
### Example {#window-view-example}
|
||||
|
||||
Suppose we need to count the number of click logs per 10 seconds in a log table called `data`, and its table structure is:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE data ( `id` UInt64, `timestamp` DateTime) ENGINE = Memory;
|
||||
```
|
||||
|
||||
First, we create a window view with tumble window of 10 seconds interval:
|
||||
|
||||
``` sql
|
||||
CREATE WINDOW VIEW wv as select count(id), tumbleStart(w_id) as window_start from data group by tumble(timestamp, INTERVAL '10' SECOND) as w_id
|
||||
```
|
||||
|
||||
Then, we use the `WATCH` query to get the results.
|
||||
|
||||
``` sql
|
||||
WATCH wv
|
||||
```
|
||||
|
||||
When logs are inserted into table `data`,
|
||||
|
||||
``` sql
|
||||
INSERT INTO data VALUES(1,now())
|
||||
```
|
||||
|
||||
The `WATCH` query should print the results as follows:
|
||||
|
||||
``` text
|
||||
┌─count(id)─┬────────window_start─┐
|
||||
│ 1 │ 2020-01-14 16:56:40 │
|
||||
└───────────┴─────────────────────┘
|
||||
```
|
||||
|
||||
Alternatively, we can attach the output to another table using `TO` syntax.
|
||||
|
||||
``` sql
|
||||
CREATE WINDOW VIEW wv TO dst AS SELECT count(id), tumbleStart(w_id) as window_start FROM data GROUP BY tumble(timestamp, INTERVAL '10' SECOND) as w_id
|
||||
```
|
||||
|
||||
Additional examples can be found among stateful tests of ClickHouse (they are named `*window_view*` there).
|
||||
|
||||
### Window View Usage {#window-view-usage}
|
||||
|
||||
The window view is useful in the following scenarios:
|
||||
|
||||
* **Monitoring**: Aggregate and calculate the metrics logs by time, and output the results to a target table. The dashboard can use the target table as a source table.
|
||||
* **Analyzing**: Automatically aggregate and preprocess data in the time window. This can be useful when analyzing a large number of logs. The preprocessing eliminates repeated calculations in multiple queries and reduces query latency.
|
||||
|
Loading…
Reference in New Issue
Block a user