mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-14 10:22:10 +00:00
608 lines
22 KiB
Markdown
608 lines
22 KiB
Markdown
---
|
||
slug: /en/sql-reference/statements/select/order-by
|
||
sidebar_label: ORDER BY
|
||
---
|
||
|
||
# ORDER BY Clause
|
||
|
||
The `ORDER BY` clause contains
|
||
|
||
- a list of expressions, e.g. `ORDER BY visits, search_phrase`,
|
||
- a list of numbers referring to columns in the `SELECT` clause, e.g. `ORDER BY 2, 1`, or
|
||
- `*` (without other expressions or numbers) which means all columns of the `SELECT` clause: `ORDER BY *`.
|
||
|
||
To disable sorting by column numbers, set setting [enable_positional_arguments](../../../operations/settings/settings.md#enable-positional-arguments) = 0.
|
||
|
||
The `ORDER BY` clause can be attributed by a `DESC` (descending) or `ASC` (ascending) modifier which determines the sorting direction.
|
||
Unless an explicit sort order is specified, `ASC` is used by default.
|
||
The sorting direction applies to a single expression, not to the entire list, e.g. `ORDER BY Visits DESC, SearchPhrase`.
|
||
Also, sorting is performed case-sensitively.
|
||
|
||
Rows with identical values for a sort expressions are returned in an arbitrary and non-deterministic order.
|
||
If the `ORDER BY` clause is omitted in a `SELECT` statement, the row order is also arbitrary and non-deterministic.
|
||
|
||
## Sorting of Special Values
|
||
|
||
There are two approaches to `NaN` and `NULL` sorting order:
|
||
|
||
- By default or with the `NULLS LAST` modifier: first the values, then `NaN`, then `NULL`.
|
||
- With the `NULLS FIRST` modifier: first `NULL`, then `NaN`, then other values.
|
||
|
||
### Example
|
||
|
||
For the table
|
||
|
||
``` text
|
||
┌─x─┬────y─┐
|
||
│ 1 │ ᴺᵁᴸᴸ │
|
||
│ 2 │ 2 │
|
||
│ 1 │ nan │
|
||
│ 2 │ 2 │
|
||
│ 3 │ 4 │
|
||
│ 5 │ 6 │
|
||
│ 6 │ nan │
|
||
│ 7 │ ᴺᵁᴸᴸ │
|
||
│ 6 │ 7 │
|
||
│ 8 │ 9 │
|
||
└───┴──────┘
|
||
```
|
||
|
||
Run the query `SELECT * FROM t_null_nan ORDER BY y NULLS FIRST` to get:
|
||
|
||
``` text
|
||
┌─x─┬────y─┐
|
||
│ 1 │ ᴺᵁᴸᴸ │
|
||
│ 7 │ ᴺᵁᴸᴸ │
|
||
│ 1 │ nan │
|
||
│ 6 │ nan │
|
||
│ 2 │ 2 │
|
||
│ 2 │ 2 │
|
||
│ 3 │ 4 │
|
||
│ 5 │ 6 │
|
||
│ 6 │ 7 │
|
||
│ 8 │ 9 │
|
||
└───┴──────┘
|
||
```
|
||
|
||
When floating point numbers are sorted, NaNs are separate from the other values. Regardless of the sorting order, NaNs come at the end. In other words, for ascending sorting they are placed as if they are larger than all the other numbers, while for descending sorting they are placed as if they are smaller than the rest.
|
||
|
||
## Collation Support
|
||
|
||
For sorting by [String](../../../sql-reference/data-types/string.md) values, you can specify collation (comparison). Example: `ORDER BY SearchPhrase COLLATE 'tr'` - for sorting by keyword in ascending order, using the Turkish alphabet, case insensitive, assuming that strings are UTF-8 encoded. `COLLATE` can be specified or not for each expression in ORDER BY independently. If `ASC` or `DESC` is specified, `COLLATE` is specified after it. When using `COLLATE`, sorting is always case-insensitive.
|
||
|
||
Collate is supported in [LowCardinality](../../../sql-reference/data-types/lowcardinality.md), [Nullable](../../../sql-reference/data-types/nullable.md), [Array](../../../sql-reference/data-types/array.md) and [Tuple](../../../sql-reference/data-types/tuple.md).
|
||
|
||
We only recommend using `COLLATE` for final sorting of a small number of rows, since sorting with `COLLATE` is less efficient than normal sorting by bytes.
|
||
|
||
## Collation Examples
|
||
|
||
Example only with [String](../../../sql-reference/data-types/string.md) values:
|
||
|
||
Input table:
|
||
|
||
``` text
|
||
┌─x─┬─s────┐
|
||
│ 1 │ bca │
|
||
│ 2 │ ABC │
|
||
│ 3 │ 123a │
|
||
│ 4 │ abc │
|
||
│ 5 │ BCA │
|
||
└───┴──────┘
|
||
```
|
||
|
||
Query:
|
||
|
||
```sql
|
||
SELECT * FROM collate_test ORDER BY s ASC COLLATE 'en';
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌─x─┬─s────┐
|
||
│ 3 │ 123a │
|
||
│ 4 │ abc │
|
||
│ 2 │ ABC │
|
||
│ 1 │ bca │
|
||
│ 5 │ BCA │
|
||
└───┴──────┘
|
||
```
|
||
|
||
Example with [Nullable](../../../sql-reference/data-types/nullable.md):
|
||
|
||
Input table:
|
||
|
||
``` text
|
||
┌─x─┬─s────┐
|
||
│ 1 │ bca │
|
||
│ 2 │ ᴺᵁᴸᴸ │
|
||
│ 3 │ ABC │
|
||
│ 4 │ 123a │
|
||
│ 5 │ abc │
|
||
│ 6 │ ᴺᵁᴸᴸ │
|
||
│ 7 │ BCA │
|
||
└───┴──────┘
|
||
```
|
||
|
||
Query:
|
||
|
||
```sql
|
||
SELECT * FROM collate_test ORDER BY s ASC COLLATE 'en';
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌─x─┬─s────┐
|
||
│ 4 │ 123a │
|
||
│ 5 │ abc │
|
||
│ 3 │ ABC │
|
||
│ 1 │ bca │
|
||
│ 7 │ BCA │
|
||
│ 6 │ ᴺᵁᴸᴸ │
|
||
│ 2 │ ᴺᵁᴸᴸ │
|
||
└───┴──────┘
|
||
```
|
||
|
||
Example with [Array](../../../sql-reference/data-types/array.md):
|
||
|
||
Input table:
|
||
|
||
``` text
|
||
┌─x─┬─s─────────────┐
|
||
│ 1 │ ['Z'] │
|
||
│ 2 │ ['z'] │
|
||
│ 3 │ ['a'] │
|
||
│ 4 │ ['A'] │
|
||
│ 5 │ ['z','a'] │
|
||
│ 6 │ ['z','a','a'] │
|
||
│ 7 │ [''] │
|
||
└───┴───────────────┘
|
||
```
|
||
|
||
Query:
|
||
|
||
```sql
|
||
SELECT * FROM collate_test ORDER BY s ASC COLLATE 'en';
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌─x─┬─s─────────────┐
|
||
│ 7 │ [''] │
|
||
│ 3 │ ['a'] │
|
||
│ 4 │ ['A'] │
|
||
│ 2 │ ['z'] │
|
||
│ 5 │ ['z','a'] │
|
||
│ 6 │ ['z','a','a'] │
|
||
│ 1 │ ['Z'] │
|
||
└───┴───────────────┘
|
||
```
|
||
|
||
Example with [LowCardinality](../../../sql-reference/data-types/lowcardinality.md) string:
|
||
|
||
Input table:
|
||
|
||
```text
|
||
┌─x─┬─s───┐
|
||
│ 1 │ Z │
|
||
│ 2 │ z │
|
||
│ 3 │ a │
|
||
│ 4 │ A │
|
||
│ 5 │ za │
|
||
│ 6 │ zaa │
|
||
│ 7 │ │
|
||
└───┴─────┘
|
||
```
|
||
|
||
Query:
|
||
|
||
```sql
|
||
SELECT * FROM collate_test ORDER BY s ASC COLLATE 'en';
|
||
```
|
||
|
||
Result:
|
||
|
||
```text
|
||
┌─x─┬─s───┐
|
||
│ 7 │ │
|
||
│ 3 │ a │
|
||
│ 4 │ A │
|
||
│ 2 │ z │
|
||
│ 1 │ Z │
|
||
│ 5 │ za │
|
||
│ 6 │ zaa │
|
||
└───┴─────┘
|
||
```
|
||
|
||
Example with [Tuple](../../../sql-reference/data-types/tuple.md):
|
||
|
||
```text
|
||
┌─x─┬─s───────┐
|
||
│ 1 │ (1,'Z') │
|
||
│ 2 │ (1,'z') │
|
||
│ 3 │ (1,'a') │
|
||
│ 4 │ (2,'z') │
|
||
│ 5 │ (1,'A') │
|
||
│ 6 │ (2,'Z') │
|
||
│ 7 │ (2,'A') │
|
||
└───┴─────────┘
|
||
```
|
||
|
||
Query:
|
||
|
||
```sql
|
||
SELECT * FROM collate_test ORDER BY s ASC COLLATE 'en';
|
||
```
|
||
|
||
Result:
|
||
|
||
```text
|
||
┌─x─┬─s───────┐
|
||
│ 3 │ (1,'a') │
|
||
│ 5 │ (1,'A') │
|
||
│ 2 │ (1,'z') │
|
||
│ 1 │ (1,'Z') │
|
||
│ 7 │ (2,'A') │
|
||
│ 4 │ (2,'z') │
|
||
│ 6 │ (2,'Z') │
|
||
└───┴─────────┘
|
||
```
|
||
|
||
## Implementation Details
|
||
|
||
Less RAM is used if a small enough [LIMIT](../../../sql-reference/statements/select/limit.md) is specified in addition to `ORDER BY`. Otherwise, the amount of memory spent is proportional to the volume of data for sorting. For distributed query processing, if [GROUP BY](../../../sql-reference/statements/select/group-by.md) is omitted, sorting is partially done on remote servers, and the results are merged on the requestor server. This means that for distributed sorting, the volume of data to sort can be greater than the amount of memory on a single server.
|
||
|
||
If there is not enough RAM, it is possible to perform sorting in external memory (creating temporary files on a disk). Use the setting `max_bytes_before_external_sort` for this purpose. If it is set to 0 (the default), external sorting is disabled. If it is enabled, when the volume of data to sort reaches the specified number of bytes, the collected data is sorted and dumped into a temporary file. After all data is read, all the sorted files are merged and the results are output. Files are written to the `/var/lib/clickhouse/tmp/` directory in the config (by default, but you can use the `tmp_path` parameter to change this setting).
|
||
|
||
Running a query may use more memory than `max_bytes_before_external_sort`. For this reason, this setting must have a value significantly smaller than `max_memory_usage`. As an example, if your server has 128 GB of RAM and you need to run a single query, set `max_memory_usage` to 100 GB, and `max_bytes_before_external_sort` to 80 GB.
|
||
|
||
External sorting works much less effectively than sorting in RAM.
|
||
|
||
## Optimization of Data Reading
|
||
|
||
If `ORDER BY` expression has a prefix that coincides with the table sorting key, you can optimize the query by using the [optimize_read_in_order](../../../operations/settings/settings.md#optimize_read_in_order) setting.
|
||
|
||
When the `optimize_read_in_order` setting is enabled, the ClickHouse server uses the table index and reads the data in order of the `ORDER BY` key. This allows to avoid reading all data in case of specified [LIMIT](../../../sql-reference/statements/select/limit.md). So queries on big data with small limit are processed faster.
|
||
|
||
Optimization works with both `ASC` and `DESC` and does not work together with [GROUP BY](../../../sql-reference/statements/select/group-by.md) clause and [FINAL](../../../sql-reference/statements/select/from.md#select-from-final) modifier.
|
||
|
||
When the `optimize_read_in_order` setting is disabled, the ClickHouse server does not use the table index while processing `SELECT` queries.
|
||
|
||
Consider disabling `optimize_read_in_order` manually, when running queries that have `ORDER BY` clause, large `LIMIT` and [WHERE](../../../sql-reference/statements/select/where.md) condition that requires to read huge amount of records before queried data is found.
|
||
|
||
Optimization is supported in the following table engines:
|
||
|
||
- [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md) (including [materialized views](../../../sql-reference/statements/create/view.md#materialized-view)),
|
||
- [Merge](../../../engines/table-engines/special/merge.md),
|
||
- [Buffer](../../../engines/table-engines/special/buffer.md)
|
||
|
||
In `MaterializedView`-engine tables the optimization works with views like `SELECT ... FROM merge_tree_table ORDER BY pk`. But it is not supported in the queries like `SELECT ... FROM view ORDER BY pk` if the view query does not have the `ORDER BY` clause.
|
||
|
||
## ORDER BY Expr WITH FILL Modifier
|
||
|
||
This modifier also can be combined with [LIMIT … WITH TIES modifier](../../../sql-reference/statements/select/limit.md#limit-with-ties).
|
||
|
||
`WITH FILL` modifier can be set after `ORDER BY expr` with optional `FROM expr`, `TO expr` and `STEP expr` parameters.
|
||
All missed values of `expr` column will be filled sequentially and other columns will be filled as defaults.
|
||
|
||
To fill multiple columns, add `WITH FILL` modifier with optional parameters after each field name in `ORDER BY` section.
|
||
|
||
``` sql
|
||
ORDER BY expr [WITH FILL] [FROM const_expr] [TO const_expr] [STEP const_numeric_expr], ... exprN [WITH FILL] [FROM expr] [TO expr] [STEP numeric_expr]
|
||
[INTERPOLATE [(col [AS expr], ... colN [AS exprN])]]
|
||
```
|
||
|
||
`WITH FILL` can be applied for fields with Numeric (all kinds of float, decimal, int) or Date/DateTime types. When applied for `String` fields, missed values are filled with empty strings.
|
||
When `FROM const_expr` not defined sequence of filling use minimal `expr` field value from `ORDER BY`.
|
||
When `TO const_expr` not defined sequence of filling use maximum `expr` field value from `ORDER BY`.
|
||
When `STEP const_numeric_expr` defined then `const_numeric_expr` interprets `as is` for numeric types, as `days` for Date type, as `seconds` for DateTime type. It also supports [INTERVAL](https://clickhouse.com/docs/en/sql-reference/data-types/special-data-types/interval/) data type representing time and date intervals.
|
||
When `STEP const_numeric_expr` omitted then sequence of filling use `1.0` for numeric type, `1 day` for Date type and `1 second` for DateTime type.
|
||
`INTERPOLATE` can be applied to columns not participating in `ORDER BY WITH FILL`. Such columns are filled based on previous fields values by applying `expr`. If `expr` is not present will repeat previous value. Omitted list will result in including all allowed columns.
|
||
|
||
Example of a query without `WITH FILL`:
|
||
|
||
``` sql
|
||
SELECT n, source FROM (
|
||
SELECT toFloat32(number % 10) AS n, 'original' AS source
|
||
FROM numbers(10) WHERE number % 3 = 1
|
||
) ORDER BY n;
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌─n─┬─source───┐
|
||
│ 1 │ original │
|
||
│ 4 │ original │
|
||
│ 7 │ original │
|
||
└───┴──────────┘
|
||
```
|
||
|
||
Same query after applying `WITH FILL` modifier:
|
||
|
||
``` sql
|
||
SELECT n, source FROM (
|
||
SELECT toFloat32(number % 10) AS n, 'original' AS source
|
||
FROM numbers(10) WHERE number % 3 = 1
|
||
) ORDER BY n WITH FILL FROM 0 TO 5.51 STEP 0.5;
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌───n─┬─source───┐
|
||
│ 0 │ │
|
||
│ 0.5 │ │
|
||
│ 1 │ original │
|
||
│ 1.5 │ │
|
||
│ 2 │ │
|
||
│ 2.5 │ │
|
||
│ 3 │ │
|
||
│ 3.5 │ │
|
||
│ 4 │ original │
|
||
│ 4.5 │ │
|
||
│ 5 │ │
|
||
│ 5.5 │ │
|
||
│ 7 │ original │
|
||
└─────┴──────────┘
|
||
```
|
||
|
||
For the case with multiple fields `ORDER BY field2 WITH FILL, field1 WITH FILL` order of filling will follow the order of fields in the `ORDER BY` clause.
|
||
|
||
Example:
|
||
|
||
``` sql
|
||
SELECT
|
||
toDate((number * 10) * 86400) AS d1,
|
||
toDate(number * 86400) AS d2,
|
||
'original' AS source
|
||
FROM numbers(10)
|
||
WHERE (number % 3) = 1
|
||
ORDER BY
|
||
d2 WITH FILL,
|
||
d1 WITH FILL STEP 5;
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌───d1───────┬───d2───────┬─source───┐
|
||
│ 1970-01-11 │ 1970-01-02 │ original │
|
||
│ 1970-01-01 │ 1970-01-03 │ │
|
||
│ 1970-01-01 │ 1970-01-04 │ │
|
||
│ 1970-02-10 │ 1970-01-05 │ original │
|
||
│ 1970-01-01 │ 1970-01-06 │ │
|
||
│ 1970-01-01 │ 1970-01-07 │ │
|
||
│ 1970-03-12 │ 1970-01-08 │ original │
|
||
└────────────┴────────────┴──────────┘
|
||
```
|
||
|
||
Field `d1` does not fill in and use the default value cause we do not have repeated values for `d2` value, and the sequence for `d1` can’t be properly calculated.
|
||
|
||
The following query with the changed field in `ORDER BY`:
|
||
|
||
``` sql
|
||
SELECT
|
||
toDate((number * 10) * 86400) AS d1,
|
||
toDate(number * 86400) AS d2,
|
||
'original' AS source
|
||
FROM numbers(10)
|
||
WHERE (number % 3) = 1
|
||
ORDER BY
|
||
d1 WITH FILL STEP 5,
|
||
d2 WITH FILL;
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌───d1───────┬───d2───────┬─source───┐
|
||
│ 1970-01-11 │ 1970-01-02 │ original │
|
||
│ 1970-01-16 │ 1970-01-01 │ │
|
||
│ 1970-01-21 │ 1970-01-01 │ │
|
||
│ 1970-01-26 │ 1970-01-01 │ │
|
||
│ 1970-01-31 │ 1970-01-01 │ │
|
||
│ 1970-02-05 │ 1970-01-01 │ │
|
||
│ 1970-02-10 │ 1970-01-05 │ original │
|
||
│ 1970-02-15 │ 1970-01-01 │ │
|
||
│ 1970-02-20 │ 1970-01-01 │ │
|
||
│ 1970-02-25 │ 1970-01-01 │ │
|
||
│ 1970-03-02 │ 1970-01-01 │ │
|
||
│ 1970-03-07 │ 1970-01-01 │ │
|
||
│ 1970-03-12 │ 1970-01-08 │ original │
|
||
└────────────┴────────────┴──────────┘
|
||
```
|
||
|
||
The following query uses the `INTERVAL` data type of 1 day for each data filled on column `d1`:
|
||
|
||
``` sql
|
||
SELECT
|
||
toDate((number * 10) * 86400) AS d1,
|
||
toDate(number * 86400) AS d2,
|
||
'original' AS source
|
||
FROM numbers(10)
|
||
WHERE (number % 3) = 1
|
||
ORDER BY
|
||
d1 WITH FILL STEP INTERVAL 1 DAY,
|
||
d2 WITH FILL;
|
||
```
|
||
|
||
Result:
|
||
```
|
||
┌─────────d1─┬─────────d2─┬─source───┐
|
||
│ 1970-01-11 │ 1970-01-02 │ original │
|
||
│ 1970-01-12 │ 1970-01-01 │ │
|
||
│ 1970-01-13 │ 1970-01-01 │ │
|
||
│ 1970-01-14 │ 1970-01-01 │ │
|
||
│ 1970-01-15 │ 1970-01-01 │ │
|
||
│ 1970-01-16 │ 1970-01-01 │ │
|
||
│ 1970-01-17 │ 1970-01-01 │ │
|
||
│ 1970-01-18 │ 1970-01-01 │ │
|
||
│ 1970-01-19 │ 1970-01-01 │ │
|
||
│ 1970-01-20 │ 1970-01-01 │ │
|
||
│ 1970-01-21 │ 1970-01-01 │ │
|
||
│ 1970-01-22 │ 1970-01-01 │ │
|
||
│ 1970-01-23 │ 1970-01-01 │ │
|
||
│ 1970-01-24 │ 1970-01-01 │ │
|
||
│ 1970-01-25 │ 1970-01-01 │ │
|
||
│ 1970-01-26 │ 1970-01-01 │ │
|
||
│ 1970-01-27 │ 1970-01-01 │ │
|
||
│ 1970-01-28 │ 1970-01-01 │ │
|
||
│ 1970-01-29 │ 1970-01-01 │ │
|
||
│ 1970-01-30 │ 1970-01-01 │ │
|
||
│ 1970-01-31 │ 1970-01-01 │ │
|
||
│ 1970-02-01 │ 1970-01-01 │ │
|
||
│ 1970-02-02 │ 1970-01-01 │ │
|
||
│ 1970-02-03 │ 1970-01-01 │ │
|
||
│ 1970-02-04 │ 1970-01-01 │ │
|
||
│ 1970-02-05 │ 1970-01-01 │ │
|
||
│ 1970-02-06 │ 1970-01-01 │ │
|
||
│ 1970-02-07 │ 1970-01-01 │ │
|
||
│ 1970-02-08 │ 1970-01-01 │ │
|
||
│ 1970-02-09 │ 1970-01-01 │ │
|
||
│ 1970-02-10 │ 1970-01-05 │ original │
|
||
│ 1970-02-11 │ 1970-01-01 │ │
|
||
│ 1970-02-12 │ 1970-01-01 │ │
|
||
│ 1970-02-13 │ 1970-01-01 │ │
|
||
│ 1970-02-14 │ 1970-01-01 │ │
|
||
│ 1970-02-15 │ 1970-01-01 │ │
|
||
│ 1970-02-16 │ 1970-01-01 │ │
|
||
│ 1970-02-17 │ 1970-01-01 │ │
|
||
│ 1970-02-18 │ 1970-01-01 │ │
|
||
│ 1970-02-19 │ 1970-01-01 │ │
|
||
│ 1970-02-20 │ 1970-01-01 │ │
|
||
│ 1970-02-21 │ 1970-01-01 │ │
|
||
│ 1970-02-22 │ 1970-01-01 │ │
|
||
│ 1970-02-23 │ 1970-01-01 │ │
|
||
│ 1970-02-24 │ 1970-01-01 │ │
|
||
│ 1970-02-25 │ 1970-01-01 │ │
|
||
│ 1970-02-26 │ 1970-01-01 │ │
|
||
│ 1970-02-27 │ 1970-01-01 │ │
|
||
│ 1970-02-28 │ 1970-01-01 │ │
|
||
│ 1970-03-01 │ 1970-01-01 │ │
|
||
│ 1970-03-02 │ 1970-01-01 │ │
|
||
│ 1970-03-03 │ 1970-01-01 │ │
|
||
│ 1970-03-04 │ 1970-01-01 │ │
|
||
│ 1970-03-05 │ 1970-01-01 │ │
|
||
│ 1970-03-06 │ 1970-01-01 │ │
|
||
│ 1970-03-07 │ 1970-01-01 │ │
|
||
│ 1970-03-08 │ 1970-01-01 │ │
|
||
│ 1970-03-09 │ 1970-01-01 │ │
|
||
│ 1970-03-10 │ 1970-01-01 │ │
|
||
│ 1970-03-11 │ 1970-01-01 │ │
|
||
│ 1970-03-12 │ 1970-01-08 │ original │
|
||
└────────────┴────────────┴──────────┘
|
||
```
|
||
|
||
Example of a query without `INTERPOLATE`:
|
||
|
||
``` sql
|
||
SELECT n, source, inter FROM (
|
||
SELECT toFloat32(number % 10) AS n, 'original' AS source, number as inter
|
||
FROM numbers(10) WHERE number % 3 = 1
|
||
) ORDER BY n WITH FILL FROM 0 TO 5.51 STEP 0.5;
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌───n─┬─source───┬─inter─┐
|
||
│ 0 │ │ 0 │
|
||
│ 0.5 │ │ 0 │
|
||
│ 1 │ original │ 1 │
|
||
│ 1.5 │ │ 0 │
|
||
│ 2 │ │ 0 │
|
||
│ 2.5 │ │ 0 │
|
||
│ 3 │ │ 0 │
|
||
│ 3.5 │ │ 0 │
|
||
│ 4 │ original │ 4 │
|
||
│ 4.5 │ │ 0 │
|
||
│ 5 │ │ 0 │
|
||
│ 5.5 │ │ 0 │
|
||
│ 7 │ original │ 7 │
|
||
└─────┴──────────┴───────┘
|
||
```
|
||
|
||
Same query after applying `INTERPOLATE`:
|
||
|
||
``` sql
|
||
SELECT n, source, inter FROM (
|
||
SELECT toFloat32(number % 10) AS n, 'original' AS source, number as inter
|
||
FROM numbers(10) WHERE number % 3 = 1
|
||
) ORDER BY n WITH FILL FROM 0 TO 5.51 STEP 0.5 INTERPOLATE (inter AS inter + 1);
|
||
```
|
||
|
||
Result:
|
||
|
||
``` text
|
||
┌───n─┬─source───┬─inter─┐
|
||
│ 0 │ │ 0 │
|
||
│ 0.5 │ │ 0 │
|
||
│ 1 │ original │ 1 │
|
||
│ 1.5 │ │ 2 │
|
||
│ 2 │ │ 3 │
|
||
│ 2.5 │ │ 4 │
|
||
│ 3 │ │ 5 │
|
||
│ 3.5 │ │ 6 │
|
||
│ 4 │ original │ 4 │
|
||
│ 4.5 │ │ 5 │
|
||
│ 5 │ │ 6 │
|
||
│ 5.5 │ │ 7 │
|
||
│ 7 │ original │ 7 │
|
||
└─────┴──────────┴───────┘
|
||
```
|
||
|
||
## Filling grouped by sorting prefix
|
||
|
||
It can be useful to fill rows which have the same values in particular columns independently, - a good example is filling missing values in time series.
|
||
Assume there is the following time series table:
|
||
``` sql
|
||
CREATE TABLE timeseries
|
||
(
|
||
`sensor_id` UInt64,
|
||
`timestamp` DateTime64(3, 'UTC'),
|
||
`value` Float64
|
||
)
|
||
ENGINE = Memory;
|
||
|
||
SELECT * FROM timeseries;
|
||
|
||
┌─sensor_id─┬───────────────timestamp─┬─value─┐
|
||
│ 234 │ 2021-12-01 00:00:03.000 │ 3 │
|
||
│ 432 │ 2021-12-01 00:00:01.000 │ 1 │
|
||
│ 234 │ 2021-12-01 00:00:07.000 │ 7 │
|
||
│ 432 │ 2021-12-01 00:00:05.000 │ 5 │
|
||
└───────────┴─────────────────────────┴───────┘
|
||
```
|
||
And we'd like to fill missing values for each sensor independently with 1 second interval.
|
||
The way to achieve it is to use `sensor_id` column as sorting prefix for filling column `timestamp`:
|
||
```
|
||
SELECT *
|
||
FROM timeseries
|
||
ORDER BY
|
||
sensor_id,
|
||
timestamp WITH FILL
|
||
INTERPOLATE ( value AS 9999 )
|
||
|
||
┌─sensor_id─┬───────────────timestamp─┬─value─┐
|
||
│ 234 │ 2021-12-01 00:00:03.000 │ 3 │
|
||
│ 234 │ 2021-12-01 00:00:04.000 │ 9999 │
|
||
│ 234 │ 2021-12-01 00:00:05.000 │ 9999 │
|
||
│ 234 │ 2021-12-01 00:00:06.000 │ 9999 │
|
||
│ 234 │ 2021-12-01 00:00:07.000 │ 7 │
|
||
│ 432 │ 2021-12-01 00:00:01.000 │ 1 │
|
||
│ 432 │ 2021-12-01 00:00:02.000 │ 9999 │
|
||
│ 432 │ 2021-12-01 00:00:03.000 │ 9999 │
|
||
│ 432 │ 2021-12-01 00:00:04.000 │ 9999 │
|
||
│ 432 │ 2021-12-01 00:00:05.000 │ 5 │
|
||
└───────────┴─────────────────────────┴───────┘
|
||
```
|
||
Here, the `value` column was interpolated with `9999` just to make filled rows more noticeable.
|
||
This behavior is controlled by setting `use_with_fill_by_sorting_prefix` (enabled by default)
|
||
|
||
## Related content
|
||
|
||
- Blog: [Working with time series data in ClickHouse](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse)
|