From 8792f70a67521eb58d9b98954c74632758d32537 Mon Sep 17 00:00:00 2001 From: Sergei Shtykov Date: Fri, 27 Dec 2019 12:12:01 +0300 Subject: [PATCH 1/3] CLICKHOUSEDOCS-395: ORC format support --- docs/en/interfaces/formats.md | 48 ++++++++++++++++++++++++++++++++--- 1 file changed, 45 insertions(+), 3 deletions(-) diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index eebdf10702d..cea3605202a 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -29,6 +29,7 @@ The supported formats are: | [PrettySpace](#prettyspace) | ✗ | ✔ | | [Protobuf](#protobuf) | ✔ | ✔ | | [Parquet](#data-format-parquet) | ✔ | ✔ | +| [ORC](#data-format-orc) | ✔ | ✗ | | [RowBinary](#rowbinary) | ✔ | ✔ | | [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ | | [Native](#native) | ✔ | ✔ | @@ -954,13 +955,54 @@ Data types of a ClickHouse table columns can differ from the corresponding field You can insert Parquet data from a file into ClickHouse table by the following command: ```bash -cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet" +$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet" ``` You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command: -```sql -clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq} +```bash +$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq} +``` + +To exchange data with the Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). + +## ORC {#data-format-orc} + +[Apache ORC](https://orc.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports only read operations for this format. + +### Data Types Matching + +The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` queries. + +| ORC data type (`INSERT`) | ClickHouse data type | +| -------------------- | ------------------ | +| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) | +| `INT8` | [Int8](../data_types/int_uint.md) | +| `UINT16` | [UInt16](../data_types/int_uint.md) | +| `INT16` | [Int16](../data_types/int_uint.md) | +| `UINT32` | [UInt32](../data_types/int_uint.md) | +| `INT32` | [Int32](../data_types/int_uint.md) | +| `UINT64` | [UInt64](../data_types/int_uint.md) | +| `INT64` | [Int64](../data_types/int_uint.md) | +| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) | +| `DOUBLE` | [Float64](../data_types/float.md) | +| `DATE32` | [Date](../data_types/date.md) | +| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) | +| `STRING`, `BINARY` | [String](../data_types/string.md) | +| `DECIMAL` | [Decimal](../data_types/decimal.md) | + +ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type. + +Unsupported ORC data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`. + +Data types of a ClickHouse table columns can differ from the corresponding fields of the ORC data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column. + +### Inserting and Selecting Data + +You can insert Parquet data from a file into ClickHouse table by the following command: + +```bash +$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT ORC" ``` To exchange data with the Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). From fa2d7ae0828ba1e61f82e7a5b768f330f24e3074 Mon Sep 17 00:00:00 2001 From: Sergei Shtykov Date: Fri, 27 Dec 2019 12:16:02 +0300 Subject: [PATCH 2/3] CLICKHOUSEDOCS-395: Fix. --- docs/en/interfaces/formats.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index cea3605202a..2adf8ff1550 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -997,7 +997,7 @@ Unsupported ORC data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UU Data types of a ClickHouse table columns can differ from the corresponding fields of the ORC data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column. -### Inserting and Selecting Data +### Inserting Data You can insert Parquet data from a file into ClickHouse table by the following command: From fc197d7747e36bccc531659bbb61d86336d277b1 Mon Sep 17 00:00:00 2001 From: alexey-milovidov Date: Fri, 27 Dec 2019 19:49:02 +0300 Subject: [PATCH 3/3] Update formats.md --- docs/en/interfaces/formats.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index 2adf8ff1550..b37c9cdddb2 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -964,7 +964,7 @@ You can select data from a ClickHouse table and save them into some file in the $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq} ``` -To exchange data with the Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). +To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). ## ORC {#data-format-orc} @@ -1005,7 +1005,7 @@ You can insert Parquet data from a file into ClickHouse table by the following c $ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT ORC" ``` -To exchange data with the Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). +To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). ## Format Schema {#formatschema}