ClickHouse/docs/en/sql-reference/statements/select/distinct.md
2022-08-28 10:53:34 -04:00

111 lines
2.9 KiB
Markdown

---
slug: /en/sql-reference/statements/select/distinct
sidebar_label: DISTINCT
---
# DISTINCT Clause
If `SELECT DISTINCT` is specified, only unique rows will remain in a query result. Thus only a single row will remain out of all the sets of fully matching rows in the result.
You can specify the list of columns that must have unique values: `SELECT DISTINCT ON (column1, column2,...)`. If the columns are not specified, all of them are taken into consideration.
Consider the table:
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 2 │ 2 │ 2 │
│ 1 │ 1 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
Using `DISTINCT` without specifying columns:
```sql
SELECT DISTINCT * FROM t1;
```
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 1 │ 1 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
Using `DISTINCT` with specified columns:
```sql
SELECT DISTINCT ON (a,b) * FROM t1;
```
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
## DISTINCT and ORDER BY
ClickHouse supports using the `DISTINCT` and `ORDER BY` clauses for different columns in one query. The `DISTINCT` clause is executed before the `ORDER BY` clause.
Consider the table:
``` text
┌─a─┬─b─┐
│ 2 │ 1 │
│ 1 │ 2 │
│ 3 │ 3 │
│ 2 │ 4 │
└───┴───┘
```
Selecting data:
```sql
SELECT DISTINCT a FROM t1 ORDER BY b ASC;
```
``` text
┌─a─┐
│ 2 │
│ 1 │
│ 3 │
└───┘
```
Selecting data with the different sorting direction:
```sql
SELECT DISTINCT a FROM t1 ORDER BY b DESC;
```
``` text
┌─a─┐
│ 3 │
│ 1 │
│ 2 │
└───┘
```
Row `2, 4` was cut before sorting.
Take this implementation specificity into account when programming queries.
## Null Processing
`DISTINCT` works with [NULL](../../../sql-reference/syntax.md#null-literal) as if `NULL` were a specific value, and `NULL==NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` occur only once. It differs from `NULL` processing in most other contexts.
## Alternatives
It is possible to obtain the same result by applying [GROUP BY](../../../sql-reference/statements/select/group-by.md) across the same set of values as specified as `SELECT` clause, without using any aggregate functions. But there are few differences from `GROUP BY` approach:
- `DISTINCT` can be applied together with `GROUP BY`.
- When [ORDER BY](../../../sql-reference/statements/select/order-by.md) is omitted and [LIMIT](../../../sql-reference/statements/select/limit.md) is defined, the query stops running immediately after the required number of different rows has been read.
- Data blocks are output as they are processed, without waiting for the entire query to finish running.