ClickHouse/docs/en/engines/table-engines/special/file.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

95 lines
3.9 KiB
Markdown
Raw Normal View History

2020-04-03 13:23:32 +00:00
---
2022-08-28 14:53:34 +00:00
slug: /en/engines/table-engines/special/file
sidebar_position: 40
sidebar_label: File
2020-04-03 13:23:32 +00:00
---
2022-06-02 10:55:18 +00:00
# File Table Engine
2020-06-10 20:22:56 +00:00
The File table engine keeps the data in a file in one of the supported [file formats](../../../interfaces/formats.md#formats) (`TabSeparated`, `Native`, etc.).
2020-06-10 20:22:56 +00:00
Usage scenarios:
- Data export from ClickHouse to file.
- Convert data from one format to another.
- Updating data in ClickHouse via editing a file on a disk.
## Usage in ClickHouse Server {#usage-in-clickhouse-server}
2020-03-20 10:10:48 +00:00
``` sql
File(Format)
```
The `Format` parameter specifies one of the available file formats. To perform
`SELECT` queries, the format must be supported for input, and to perform
2020-03-20 10:10:48 +00:00
`INSERT` queries for output. The available formats are listed in the
2020-04-03 13:23:32 +00:00
[Formats](../../../interfaces/formats.md#formats) section.
2021-08-24 10:49:59 +00:00
ClickHouse does not allow specifying filesystem path for `File`. It will use folder defined by [path](../../../operations/server-configuration-parameters/settings.md) setting in server configuration.
2020-03-20 10:10:48 +00:00
When creating table using `File(Format)` it creates empty subdirectory in that folder. When data is written to that table, its put into `data.Format` file in that subdirectory.
2020-07-11 11:05:49 +00:00
You may manually create this subfolder and file in server filesystem and then [ATTACH](../../../sql-reference/statements/attach.md) it to table information with matching name, so you can query data from that file.
2023-03-27 18:54:05 +00:00
:::note
Be careful with this functionality, because ClickHouse does not keep track of external changes to such files. The result of simultaneous writes via ClickHouse and outside of ClickHouse is undefined.
:::
## Example {#example}
**1.** Set up the `file_engine_table` table:
2020-03-20 10:10:48 +00:00
``` sql
CREATE TABLE file_engine_table (name String, value UInt32) ENGINE=File(TabSeparated)
```
By default ClickHouse will create folder `/var/lib/clickhouse/data/default/file_engine_table`.
**2.** Manually create `/var/lib/clickhouse/data/default/file_engine_table/data.TabSeparated` containing:
2020-03-20 10:10:48 +00:00
``` bash
$ cat data.TabSeparated
2020-03-20 10:10:48 +00:00
one 1
two 2
```
**3.** Query the data:
2020-03-20 10:10:48 +00:00
``` sql
SELECT * FROM file_engine_table
```
2020-03-20 10:10:48 +00:00
``` text
┌─name─┬─value─┐
│ one │ 1 │
│ two │ 2 │
└──────┴───────┘
```
## Usage in ClickHouse-local {#usage-in-clickhouse-local}
2021-02-23 13:43:40 +00:00
In [clickhouse-local](../../../operations/utilities/clickhouse-local.md) File engine accepts file path in addition to `Format`. Default input/output streams can be specified using numeric or human-readable names like `0` or `stdin`, `1` or `stdout`. It is possible to read and write compressed files based on an additional engine parameter or file extension (`gz`, `br` or `xz`).
**Example:**
2020-03-20 10:10:48 +00:00
``` bash
$ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64) ENGINE = File(CSV, stdin); SELECT a, b FROM table; DROP TABLE table"
```
## Details of Implementation {#details-of-implementation}
- Multiple `SELECT` queries can be performed concurrently, but `INSERT` queries will wait each other.
- Supported creating new file by `INSERT` query.
- If file exists, `INSERT` would append new values in it.
- Not supported:
- `ALTER`
- `SELECT ... SAMPLE`
- Indices
- Replication
2023-01-25 14:18:55 +00:00
## PARTITION BY
`PARTITION BY` — Optional. It is possible to create separate files by partitioning the data on a partition key. In most cases, you don't need a partition key, and if it is needed you generally don't need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).
For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](/docs/en/sql-reference/data-types/date.md). The partition names here have the `"YYYYMM"` format.