ClickHouse/docs/en/sql-reference/statements/select/from.md

81 lines
3.7 KiB
Markdown
Raw Normal View History

---
2022-08-28 14:53:34 +00:00
slug: /en/sql-reference/statements/select/from
sidebar_label: FROM
---
2022-06-02 10:55:18 +00:00
# FROM Clause
The `FROM` clause specifies the source to read data from:
- [Table](../../../engines/table-engines/index.md)
- [Subquery](../../../sql-reference/statements/select/index.md)
- [Table function](../../../sql-reference/table-functions/index.md#table-functions)
[JOIN](../../../sql-reference/statements/select/join.md) and [ARRAY JOIN](../../../sql-reference/statements/select/array-join.md) clauses may also be used to extend the functionality of the `FROM` clause.
Subquery is another `SELECT` query that may be specified in parenthesis inside `FROM` clause.
2024-09-09 08:04:53 +00:00
The `FROM` can contain multiple data sources, separated by commas, which is equivalent of performing [CROSS JOIN](../../../sql-reference/statements/select/join.md) on them.
`FROM` can optionally appear before a `SELECT` clause. This is a ClickHouse-specific extension of standard SQL which makes `SELECT` statements easier to read. Example:
```sql
FROM table
SELECT *
```
2022-06-02 10:55:18 +00:00
## FINAL Modifier
2024-04-19 20:34:26 +00:00
When `FINAL` is specified, ClickHouse fully merges the data before returning the result. This also performs all data transformations that happen during merges for the given table engine.
2024-04-19 20:34:26 +00:00
It is applicable when selecting data from from tables using the following table engines:
- `ReplacingMergeTree`
- `SummingMergeTree`
- `AggregatingMergeTree`
- `CollapsingMergeTree`
- `VersionedCollapsingMergeTree`
`SELECT` queries with `FINAL` are executed in parallel. The [max_final_threads](../../../operations/settings/settings.md#max-final-threads) setting limits the number of threads used.
2022-06-02 10:55:18 +00:00
### Drawbacks
2024-04-19 20:34:26 +00:00
Queries that use `FINAL` execute slightly slower than similar queries that do not use `FINAL` because:
- Data is merged during query execution.
2024-04-19 20:34:26 +00:00
- Queries with `FINAL` may read primary key columns in addition to the columns specified in the query.
`FINAL` requires additional compute and memory resources because the processing that normally would occur at merge time must occur in memory at the time of the query. However, using FINAL is sometimes necessary in order to produce accurate results (as data may not yet be fully merged). It is less expensive than running `OPTIMIZE` to force a merge.
2024-04-19 20:34:26 +00:00
As an alternative to using `FINAL`, it is sometimes possible to use different queries that assume the background processes of the `MergeTree` engine have not yet occurred and deal with it by applying an aggregation (for example, to discard duplicates). If you need to use `FINAL` in your queries in order to get the required results, it is okay to do so but be aware of the additional processing required.
2023-02-24 01:39:46 +00:00
`FINAL` can be applied automatically using [FINAL](../../../operations/settings/settings.md#final) setting to all tables in a query using a session or a user profile.
2023-02-24 01:21:36 +00:00
2024-04-19 20:34:26 +00:00
### Example Usage
2024-09-09 08:04:53 +00:00
Using the `FINAL` keyword
2024-04-19 20:34:26 +00:00
```sql
SELECT x, y FROM mytable FINAL WHERE x > 1;
```
2024-09-09 08:04:53 +00:00
Using `FINAL` as a query-level setting
2024-04-19 20:34:26 +00:00
```sql
SELECT x, y FROM mytable WHERE x > 1 SETTINGS final = 1;
```
2024-09-09 08:04:53 +00:00
Using `FINAL` as a session-level setting
2024-04-19 20:34:26 +00:00
```sql
SET final = 1;
2024-04-19 20:36:41 +00:00
SELECT x, y FROM mytable WHERE x > 1;
2024-04-19 20:34:26 +00:00
```
2022-06-02 10:55:18 +00:00
## Implementation Details
If the `FROM` clause is omitted, data will be read from the `system.one` table.
The `system.one` table contains exactly one row (this table fulfills the same purpose as the DUAL table found in other DBMSs).
To execute a query, all the columns listed in the query are extracted from the appropriate table. Any columns not needed for the external query are thrown out of the subqueries.
If a query does not list any columns (for example, `SELECT count() FROM t`), some column is extracted from the table anyway (the smallest one is preferred), in order to calculate the number of rows.