mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-15 19:02:04 +00:00
313 lines
11 KiB
Markdown
313 lines
11 KiB
Markdown
---
|
||
slug: /en/sql-reference/statements/select/array-join
|
||
sidebar_label: ARRAY JOIN
|
||
---
|
||
|
||
# ARRAY JOIN Clause
|
||
|
||
It is a common operation for tables that contain an array column to produce a new table that has a column with each individual array element of that initial column, while values of other columns are duplicated. This is the basic case of what `ARRAY JOIN` clause does.
|
||
|
||
Its name comes from the fact that it can be looked at as executing `JOIN` with an array or nested data structure. The intent is similar to the [arrayJoin](../../../sql-reference/functions/array-join.md#functions_arrayjoin) function, but the clause functionality is broader.
|
||
|
||
Syntax:
|
||
|
||
```sql
|
||
SELECT <expr_list>
|
||
FROM <left_subquery>
|
||
[LEFT] ARRAY JOIN <array>
|
||
[WHERE|PREWHERE <expr>]
|
||
...
|
||
```
|
||
|
||
You can specify only one `ARRAY JOIN` clause in a `SELECT` query.
|
||
|
||
Supported types of `ARRAY JOIN` are listed below:
|
||
|
||
- `ARRAY JOIN` - In base case, empty arrays are not included in the result of `JOIN`.
|
||
- `LEFT ARRAY JOIN` - The result of `JOIN` contains rows with empty arrays. The value for an empty array is set to the default value for the array element type (usually 0, empty string or NULL).
|
||
|
||
## Basic ARRAY JOIN Examples
|
||
|
||
The examples below demonstrate the usage of the `ARRAY JOIN` and `LEFT ARRAY JOIN` clauses. Let’s create a table with an [Array](../../../sql-reference/data-types/array.md) type column and insert values into it:
|
||
|
||
```sql
|
||
CREATE TABLE arrays_test
|
||
(
|
||
s String,
|
||
arr Array(UInt8)
|
||
) ENGINE = Memory;
|
||
|
||
INSERT INTO arrays_test
|
||
VALUES ('Hello', [1,2]), ('World', [3,4,5]), ('Goodbye', []);
|
||
```
|
||
|
||
```response
|
||
┌─s───────────┬─arr─────┐
|
||
│ Hello │ [1,2] │
|
||
│ World │ [3,4,5] │
|
||
│ Goodbye │ [] │
|
||
└─────────────┴─────────┘
|
||
```
|
||
|
||
The example below uses the `ARRAY JOIN` clause:
|
||
|
||
```sql
|
||
SELECT s, arr
|
||
FROM arrays_test
|
||
ARRAY JOIN arr;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─arr─┐
|
||
│ Hello │ 1 │
|
||
│ Hello │ 2 │
|
||
│ World │ 3 │
|
||
│ World │ 4 │
|
||
│ World │ 5 │
|
||
└───────┴─────┘
|
||
```
|
||
|
||
The next example uses the `LEFT ARRAY JOIN` clause:
|
||
|
||
```sql
|
||
SELECT s, arr
|
||
FROM arrays_test
|
||
LEFT ARRAY JOIN arr;
|
||
```
|
||
|
||
```response
|
||
┌─s───────────┬─arr─┐
|
||
│ Hello │ 1 │
|
||
│ Hello │ 2 │
|
||
│ World │ 3 │
|
||
│ World │ 4 │
|
||
│ World │ 5 │
|
||
│ Goodbye │ 0 │
|
||
└─────────────┴─────┘
|
||
```
|
||
|
||
## Using Aliases
|
||
|
||
An alias can be specified for an array in the `ARRAY JOIN` clause. In this case, an array item can be accessed by this alias, but the array itself is accessed by the original name. Example:
|
||
|
||
```sql
|
||
SELECT s, arr, a
|
||
FROM arrays_test
|
||
ARRAY JOIN arr AS a;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─arr─────┬─a─┐
|
||
│ Hello │ [1,2] │ 1 │
|
||
│ Hello │ [1,2] │ 2 │
|
||
│ World │ [3,4,5] │ 3 │
|
||
│ World │ [3,4,5] │ 4 │
|
||
│ World │ [3,4,5] │ 5 │
|
||
└───────┴─────────┴───┘
|
||
```
|
||
|
||
Using aliases, you can perform `ARRAY JOIN` with an external array. For example:
|
||
|
||
```sql
|
||
SELECT s, arr_external
|
||
FROM arrays_test
|
||
ARRAY JOIN [1, 2, 3] AS arr_external;
|
||
```
|
||
|
||
```response
|
||
┌─s───────────┬─arr_external─┐
|
||
│ Hello │ 1 │
|
||
│ Hello │ 2 │
|
||
│ Hello │ 3 │
|
||
│ World │ 1 │
|
||
│ World │ 2 │
|
||
│ World │ 3 │
|
||
│ Goodbye │ 1 │
|
||
│ Goodbye │ 2 │
|
||
│ Goodbye │ 3 │
|
||
└─────────────┴──────────────┘
|
||
```
|
||
|
||
Multiple arrays can be comma-separated in the `ARRAY JOIN` clause. In this case, `JOIN` is performed with them simultaneously (the direct sum, not the cartesian product). Note that all the arrays must have the same size by default. Example:
|
||
|
||
```sql
|
||
SELECT s, arr, a, num, mapped
|
||
FROM arrays_test
|
||
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(x -> x + 1, arr) AS mapped;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─arr─────┬─a─┬─num─┬─mapped─┐
|
||
│ Hello │ [1,2] │ 1 │ 1 │ 2 │
|
||
│ Hello │ [1,2] │ 2 │ 2 │ 3 │
|
||
│ World │ [3,4,5] │ 3 │ 1 │ 4 │
|
||
│ World │ [3,4,5] │ 4 │ 2 │ 5 │
|
||
│ World │ [3,4,5] │ 5 │ 3 │ 6 │
|
||
└───────┴─────────┴───┴─────┴────────┘
|
||
```
|
||
|
||
The example below uses the [arrayEnumerate](../../../sql-reference/functions/array-functions.md#array_functions-arrayenumerate) function:
|
||
|
||
```sql
|
||
SELECT s, arr, a, num, arrayEnumerate(arr)
|
||
FROM arrays_test
|
||
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─arr─────┬─a─┬─num─┬─arrayEnumerate(arr)─┐
|
||
│ Hello │ [1,2] │ 1 │ 1 │ [1,2] │
|
||
│ Hello │ [1,2] │ 2 │ 2 │ [1,2] │
|
||
│ World │ [3,4,5] │ 3 │ 1 │ [1,2,3] │
|
||
│ World │ [3,4,5] │ 4 │ 2 │ [1,2,3] │
|
||
│ World │ [3,4,5] │ 5 │ 3 │ [1,2,3] │
|
||
└───────┴─────────┴───┴─────┴─────────────────────┘
|
||
```
|
||
|
||
Multiple arrays with different sizes can be joined by using: `SETTINGS enable_unaligned_array_join = 1`. Example:
|
||
|
||
```sql
|
||
SELECT s, arr, a, b
|
||
FROM arrays_test ARRAY JOIN arr as a, [['a','b'],['c']] as b
|
||
SETTINGS enable_unaligned_array_join = 1;
|
||
```
|
||
|
||
```response
|
||
┌─s───────┬─arr─────┬─a─┬─b─────────┐
|
||
│ Hello │ [1,2] │ 1 │ ['a','b'] │
|
||
│ Hello │ [1,2] │ 2 │ ['c'] │
|
||
│ World │ [3,4,5] │ 3 │ ['a','b'] │
|
||
│ World │ [3,4,5] │ 4 │ ['c'] │
|
||
│ World │ [3,4,5] │ 5 │ [] │
|
||
│ Goodbye │ [] │ 0 │ ['a','b'] │
|
||
│ Goodbye │ [] │ 0 │ ['c'] │
|
||
└─────────┴─────────┴───┴───────────┘
|
||
```
|
||
|
||
## ARRAY JOIN with Nested Data Structure
|
||
|
||
`ARRAY JOIN` also works with [nested data structures](../../../sql-reference/data-types/nested-data-structures/index.md):
|
||
|
||
```sql
|
||
CREATE TABLE nested_test
|
||
(
|
||
s String,
|
||
nest Nested(
|
||
x UInt8,
|
||
y UInt32)
|
||
) ENGINE = Memory;
|
||
|
||
INSERT INTO nested_test
|
||
VALUES ('Hello', [1,2], [10,20]), ('World', [3,4,5], [30,40,50]), ('Goodbye', [], []);
|
||
```
|
||
|
||
```response
|
||
┌─s───────┬─nest.x──┬─nest.y─────┐
|
||
│ Hello │ [1,2] │ [10,20] │
|
||
│ World │ [3,4,5] │ [30,40,50] │
|
||
│ Goodbye │ [] │ [] │
|
||
└─────────┴─────────┴────────────┘
|
||
```
|
||
|
||
```sql
|
||
SELECT s, `nest.x`, `nest.y`
|
||
FROM nested_test
|
||
ARRAY JOIN nest;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─nest.x─┬─nest.y─┐
|
||
│ Hello │ 1 │ 10 │
|
||
│ Hello │ 2 │ 20 │
|
||
│ World │ 3 │ 30 │
|
||
│ World │ 4 │ 40 │
|
||
│ World │ 5 │ 50 │
|
||
└───────┴────────┴────────┘
|
||
```
|
||
|
||
When specifying names of nested data structures in `ARRAY JOIN`, the meaning is the same as `ARRAY JOIN` with all the array elements that it consists of. Examples are listed below:
|
||
|
||
```sql
|
||
SELECT s, `nest.x`, `nest.y`
|
||
FROM nested_test
|
||
ARRAY JOIN `nest.x`, `nest.y`;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─nest.x─┬─nest.y─┐
|
||
│ Hello │ 1 │ 10 │
|
||
│ Hello │ 2 │ 20 │
|
||
│ World │ 3 │ 30 │
|
||
│ World │ 4 │ 40 │
|
||
│ World │ 5 │ 50 │
|
||
└───────┴────────┴────────┘
|
||
```
|
||
|
||
This variation also makes sense:
|
||
|
||
```sql
|
||
SELECT s, `nest.x`, `nest.y`
|
||
FROM nested_test
|
||
ARRAY JOIN `nest.x`;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─nest.x─┬─nest.y─────┐
|
||
│ Hello │ 1 │ [10,20] │
|
||
│ Hello │ 2 │ [10,20] │
|
||
│ World │ 3 │ [30,40,50] │
|
||
│ World │ 4 │ [30,40,50] │
|
||
│ World │ 5 │ [30,40,50] │
|
||
└───────┴────────┴────────────┘
|
||
```
|
||
|
||
An alias may be used for a nested data structure, in order to select either the `JOIN` result or the source array. Example:
|
||
|
||
```sql
|
||
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`
|
||
FROM nested_test
|
||
ARRAY JOIN nest AS n;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─n.x─┬─n.y─┬─nest.x──┬─nest.y─────┐
|
||
│ Hello │ 1 │ 10 │ [1,2] │ [10,20] │
|
||
│ Hello │ 2 │ 20 │ [1,2] │ [10,20] │
|
||
│ World │ 3 │ 30 │ [3,4,5] │ [30,40,50] │
|
||
│ World │ 4 │ 40 │ [3,4,5] │ [30,40,50] │
|
||
│ World │ 5 │ 50 │ [3,4,5] │ [30,40,50] │
|
||
└───────┴─────┴─────┴─────────┴────────────┘
|
||
```
|
||
|
||
Example of using the [arrayEnumerate](../../../sql-reference/functions/array-functions.md#array_functions-arrayenumerate) function:
|
||
|
||
```sql
|
||
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`, num
|
||
FROM nested_test
|
||
ARRAY JOIN nest AS n, arrayEnumerate(`nest.x`) AS num;
|
||
```
|
||
|
||
```response
|
||
┌─s─────┬─n.x─┬─n.y─┬─nest.x──┬─nest.y─────┬─num─┐
|
||
│ Hello │ 1 │ 10 │ [1,2] │ [10,20] │ 1 │
|
||
│ Hello │ 2 │ 20 │ [1,2] │ [10,20] │ 2 │
|
||
│ World │ 3 │ 30 │ [3,4,5] │ [30,40,50] │ 1 │
|
||
│ World │ 4 │ 40 │ [3,4,5] │ [30,40,50] │ 2 │
|
||
│ World │ 5 │ 50 │ [3,4,5] │ [30,40,50] │ 3 │
|
||
└───────┴─────┴─────┴─────────┴────────────┴─────┘
|
||
```
|
||
|
||
## Implementation Details
|
||
|
||
The query execution order is optimized when running `ARRAY JOIN`. Although `ARRAY JOIN` must always be specified before the [WHERE](../../../sql-reference/statements/select/where.md)/[PREWHERE](../../../sql-reference/statements/select/prewhere.md) clause in a query, technically they can be performed in any order, unless result of `ARRAY JOIN` is used for filtering. The processing order is controlled by the query optimizer.
|
||
|
||
### Incompatibility with short-circuit function evaluation
|
||
|
||
[Short-circuit function evaluation](../../../operations/settings/index.md#short-circuit-function-evaluation) is a feature that optimizes the execution of complex expressions in specific functions such as `if`, `multiIf`, `and`, and `or`. It prevents potential exceptions, such as division by zero, from occurring during the execution of these functions.
|
||
|
||
`arrayJoin` is always executed and not supported for short circuit function evaluation. That's because it's a unique function processed separately from all other functions during query analysis and execution and requires additional logic that doesn't work with short circuit function execution. The reason is that the number of rows in the result depends on the arrayJoin result, and it's too complex and expensive to implement lazy execution of `arrayJoin`.
|
||
|
||
## Related content
|
||
|
||
- Blog: [Working with time series data in ClickHouse](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse)
|