Add ORC output format

Задокументировал вывод данных в ORC формате.
This commit is contained in:
Dmitriy 2021-03-24 00:08:07 +03:00
parent ed4a184bd4
commit 7154b36a2d

View File

@ -50,7 +50,7 @@ The supported formats are:
| [Parquet](#data-format-parquet) | ✔ | ✔ | | [Parquet](#data-format-parquet) | ✔ | ✔ |
| [Arrow](#data-format-arrow) | ✔ | ✔ | | [Arrow](#data-format-arrow) | ✔ | ✔ |
| [ArrowStream](#data-format-arrow-stream) | ✔ | ✔ | | [ArrowStream](#data-format-arrow-stream) | ✔ | ✔ |
| [ORC](#data-format-orc) | ✔ | | | [ORC](#data-format-orc) | ✔ | |
| [RowBinary](#rowbinary) | ✔ | ✔ | | [RowBinary](#rowbinary) | ✔ | ✔ |
| [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ | | [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ |
| [Native](#native) | ✔ | ✔ | | [Native](#native) | ✔ | ✔ |
@ -1284,36 +1284,37 @@ To exchange data with Hadoop, you can use [HDFS table engine](../engines/table-e
## ORC {#data-format-orc} ## ORC {#data-format-orc}
[Apache ORC](https://orc.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. You can only insert data in this format to ClickHouse. [Apache ORC](https://orc.apache.org/) is a columnar storage format widespread in the [Hadoop](https://hadoop.apache.org/) ecosystem.
### Data Types Matching {#data_types-matching-3} ### Data Types Matching {#data_types-matching-3}
The table below shows supported data types and how they match ClickHouse [data types](../sql-reference/data-types/index.md) in `INSERT` queries. The table below shows supported data types and how they match ClickHouse [data types](../sql-reference/data-types/index.md).
| ORC data type (`INSERT`) | ClickHouse data type | | ORC data type (`INSERT`) | ClickHouse data type | ORC data type (`SELECT`) |
|--------------------------|-----------------------------------------------------| |--------------------------|-----------------------------------------------------|--------------------------|
| `UINT8`, `BOOL` | [UInt8](../sql-reference/data-types/int-uint.md) | | `UINT8`, `BOOL` | [UInt8](../sql-reference/data-types/int-uint.md) | `UINT8` |
| `INT8` | [Int8](../sql-reference/data-types/int-uint.md) | | `INT8` | [Int8](../sql-reference/data-types/int-uint.md) | `INT8` |
| `UINT16` | [UInt16](../sql-reference/data-types/int-uint.md) | | `UINT16` | [UInt16](../sql-reference/data-types/int-uint.md) | `UINT16` |
| `INT16` | [Int16](../sql-reference/data-types/int-uint.md) | | `INT16` | [Int16](../sql-reference/data-types/int-uint.md) | `INT16` |
| `UINT32` | [UInt32](../sql-reference/data-types/int-uint.md) | | `UINT32` | [UInt32](../sql-reference/data-types/int-uint.md) | `UINT32` |
| `INT32` | [Int32](../sql-reference/data-types/int-uint.md) | | `INT32` | [Int32](../sql-reference/data-types/int-uint.md) | `INT32` |
| `UINT64` | [UInt64](../sql-reference/data-types/int-uint.md) | | `UINT64` | [UInt64](../sql-reference/data-types/int-uint.md) | `UINT64` |
| `INT64` | [Int64](../sql-reference/data-types/int-uint.md) | | `INT64` | [Int64](../sql-reference/data-types/int-uint.md) | `INT64` |
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | | `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | | `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
| `DATE32` | [Date](../sql-reference/data-types/date.md) | | `DATE32` | [Date](../sql-reference/data-types/date.md) | `DATE32` |
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | | `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `TIMESTAMP` |
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | | `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | | `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
| `-` | [Array](../sql-reference/data-types/array.md) | `LIST` |
ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type. ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` or `SELECT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type.
Unsupported ORC data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`. Unsupported ORC data types: `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`.
The data types of ClickHouse table columns dont have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) the data to the data type set for the ClickHouse table column. The data types of ClickHouse table columns dont have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) the data to the data type set for the ClickHouse table column.
### Inserting Data {#inserting-data-2} ### Inserting and Selecting Data {#inserting-and-selecting-data-1}
You can insert ORC data from a file into ClickHouse table by the following command: You can insert ORC data from a file into ClickHouse table by the following command:
@ -1321,6 +1322,12 @@ You can insert ORC data from a file into ClickHouse table by the following comma
$ cat filename.orc | clickhouse-client --query="INSERT INTO some_table FORMAT ORC" $ cat filename.orc | clickhouse-client --query="INSERT INTO some_table FORMAT ORC"
``` ```
You can select data from a ClickHouse table and save them into some file in the ORC format by the following command:
``` bash
$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT ORC" > {filename.orc}
```
To exchange data with Hadoop, you can use [HDFS table engine](../engines/table-engines/integrations/hdfs.md). To exchange data with Hadoop, you can use [HDFS table engine](../engines/table-engines/integrations/hdfs.md).
## LineAsString {#lineasstring} ## LineAsString {#lineasstring}