mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-10 09:32:06 +00:00
Merge pull request #26053 from sevirov/sevirov-DOCSUP-10557-support_complex_types_in_arrow_parquet_orc_formats
DOCSUP-10557: Support complex types in Arrow/Parquet/ORC formats
This commit is contained in:
commit
1d57931774
@ -1246,12 +1246,14 @@ The table below shows supported data types and how they match ClickHouse [data t
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `STRING` |
|
||||
| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `STRING` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |
|
||||
| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` |
|
||||
| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` |
|
||||
|
||||
Arrays can be nested and can have a value of the `Nullable` type as an argument.
|
||||
Arrays can be nested and can have a value of the `Nullable` type as an argument. `Tuple` and `Map` types also can be nested.
|
||||
|
||||
ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query treats the Parquet `DECIMAL` type as the ClickHouse `Decimal128` type.
|
||||
|
||||
@ -1299,13 +1301,17 @@ The table below shows supported data types and how they match ClickHouse [data t
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `FLOAT64` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `UTF8` |
|
||||
| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `UTF8` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
| `DECIMAL256` | [Decimal256](../sql-reference/data-types/decimal.md)| `DECIMAL256` |
|
||||
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |
|
||||
| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` |
|
||||
| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` |
|
||||
|
||||
Arrays can be nested and can have a value of the `Nullable` type as an argument.
|
||||
Arrays can be nested and can have a value of the `Nullable` type as an argument. `Tuple` and `Map` types also can be nested.
|
||||
|
||||
The `DICTIONARY` type is supported for `INSERT` queries, and for `SELECT` queries there is an [output_format_arrow_low_cardinality_as_dictionary](../operations/settings/settings.md#output-format-arrow-low-cardinality-as-dictionary) setting that allows to output [LowCardinality](../sql-reference/data-types/lowcardinality.md) type as a `DICTIONARY` type.
|
||||
|
||||
ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` query treats the Arrow `DECIMAL` type as the ClickHouse `Decimal128` type.
|
||||
|
||||
@ -1358,8 +1364,10 @@ The table below shows supported data types and how they match ClickHouse [data t
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |
|
||||
| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` |
|
||||
| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` |
|
||||
|
||||
Arrays can be nested and can have a value of the `Nullable` type as an argument.
|
||||
Arrays can be nested and can have a value of the `Nullable` type as an argument. `Tuple` and `Map` types also can be nested.
|
||||
|
||||
ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type.
|
||||
|
||||
|
@ -3220,3 +3220,14 @@ Default value: `1`.
|
||||
**Usage**
|
||||
|
||||
If the setting is set to `0`, the table function does not make Nullable columns and inserts default values instead of NULL. This is also applicable for NULL values inside arrays.
|
||||
|
||||
## output_format_arrow_low_cardinality_as_dictionary {#output-format-arrow-low-cardinality-as-dictionary}
|
||||
|
||||
Allows to convert the [LowCardinality](../../sql-reference/data-types/lowcardinality.md) type to the `DICTIONARY` type of the [Arrow](../../interfaces/formats.md#data-format-arrow) format for `SELECT` queries.
|
||||
|
||||
Possible values:
|
||||
|
||||
- 0 — The `LowCardinality` type is not converted to the `DICTIONARY` type.
|
||||
- 1 — The `LowCardinality` type is converted to the `DICTIONARY` type.
|
||||
|
||||
Default value: `0`.
|
||||
|
@ -47,6 +47,7 @@ Settings:
|
||||
- [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part)
|
||||
- [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format)
|
||||
- [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types)
|
||||
- [output_format_arrow_low_cardinality_as_dictionary](../../operations/settings/settings.md#output-format-arrow-low-cardinality-as-dictionary)
|
||||
|
||||
Functions:
|
||||
|
||||
@ -57,5 +58,3 @@ Functions:
|
||||
- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality).
|
||||
- [Reducing ClickHouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/).
|
||||
- [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf).
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/sql-reference/data-types/lowcardinality/) <!--hide-->
|
||||
|
@ -100,9 +100,9 @@ sudo ./clickhouse install
|
||||
|
||||
Для других операционных систем и архитектуры AArch64 сборки ClickHouse предоставляются в виде кросс-компилированного бинарного файла из последнего коммита ветки `master` (с задержкой в несколько часов).
|
||||
|
||||
- [macOS](https://builds.clickhouse.tech/master/macos/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/macos/clickhouse' && chmod a+x ./clickhouse`
|
||||
- [AArch64](https://builds.clickhouse.tech/master/aarch64/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/aarch64/clickhouse' && chmod a+x ./clickhouse`
|
||||
- [FreeBSD](https://builds.clickhouse.tech/master/freebsd/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/freebsd/clickhouse' && chmod a+x ./clickhouse`
|
||||
- [macOS](https://builds.clickhouse.tech/master/macos/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/macos/clickhouse' && chmod a+x ./clickhouse`
|
||||
- [FreeBSD](https://builds.clickhouse.tech/master/freebsd/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/freebsd/clickhouse' && chmod a+x ./clickhouse`
|
||||
- [AArch64](https://builds.clickhouse.tech/master/aarch64/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/aarch64/clickhouse' && chmod a+x ./clickhouse`
|
||||
|
||||
После скачивания можно воспользоваться `clickhouse client` для подключения к серверу или `clickhouse local` для обработки локальных данных.
|
||||
|
||||
|
@ -1165,12 +1165,14 @@ SELECT * FROM topic1_stream;
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `STRING` |
|
||||
| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `STRING` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |
|
||||
| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` |
|
||||
| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` |
|
||||
|
||||
Массивы могут быть вложенными и иметь в качестве аргумента значение типа `Nullable`.
|
||||
Массивы могут быть вложенными и иметь в качестве аргумента значение типа `Nullable`. Типы `Tuple` и `Map` также могут быть вложенными.
|
||||
|
||||
ClickHouse поддерживает настраиваемую точность для формата `Decimal`. При выполнении запроса `INSERT` ClickHouse обрабатывает тип данных Parquet `DECIMAL` как `Decimal128`.
|
||||
|
||||
@ -1218,12 +1220,17 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `FLOAT64` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `UTF8` |
|
||||
| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `UTF8` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
| `DECIMAL256` | [Decimal256](../sql-reference/data-types/decimal.md)| `DECIMAL256` |
|
||||
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |
|
||||
| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` |
|
||||
| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` |
|
||||
|
||||
Массивы могут быть вложенными и иметь в качестве аргумента значение типа `Nullable`.
|
||||
Массивы могут быть вложенными и иметь в качестве аргумента значение типа `Nullable`. Типы `Tuple` и `Map` также могут быть вложенными.
|
||||
|
||||
Тип `DICTIONARY` поддерживается для запросов `INSERT`. Для запросов `SELECT` есть настройка [output_format_arrow_low_cardinality_as_dictionary](../operations/settings/settings.md#output-format-arrow-low-cardinality-as-dictionary), которая позволяет выводить тип [LowCardinality](../sql-reference/data-types/lowcardinality.md) как `DICTIONARY`.
|
||||
|
||||
ClickHouse поддерживает настраиваемую точность для формата `Decimal`. При выполнении запроса `INSERT` ClickHouse обрабатывает тип данных Arrow `DECIMAL` как `Decimal128`.
|
||||
|
||||
@ -1276,8 +1283,10 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Arrow" > {filenam
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |
|
||||
| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` |
|
||||
| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` |
|
||||
|
||||
Массивы могут быть вложенными и иметь в качестве аргумента значение типа `Nullable`.
|
||||
Массивы могут быть вложенными и иметь в качестве аргумента значение типа `Nullable`. Типы `Tuple` и `Map` также могут быть вложенными.
|
||||
|
||||
ClickHouse поддерживает настраиваемую точность для формата `Decimal`. При выполнении запроса `INSERT` ClickHouse обрабатывает тип данных ORC `DECIMAL` как `Decimal128`.
|
||||
|
||||
|
@ -3066,3 +3066,14 @@ SETTINGS index_granularity = 8192 │
|
||||
**Использование**
|
||||
|
||||
Если установлено значение `0`, то табличная функция не делает Nullable столбцы, а вместо NULL выставляет значения по умолчанию для скалярного типа. Это также применимо для значений NULL внутри массивов.
|
||||
|
||||
## output_format_arrow_low_cardinality_as_dictionary {#output-format-arrow-low-cardinality-as-dictionary}
|
||||
|
||||
Позволяет конвертировать тип [LowCardinality](../../sql-reference/data-types/lowcardinality.md) в тип `DICTIONARY` формата [Arrow](../../interfaces/formats.md#data-format-arrow) для запросов `SELECT`.
|
||||
|
||||
Возможные значения:
|
||||
|
||||
- 0 — тип `LowCardinality` не конвертируется в тип `DICTIONARY`.
|
||||
- 1 — тип `LowCardinality` конвертируется в тип `DICTIONARY`.
|
||||
|
||||
Значение по умолчанию: `0`.
|
||||
|
@ -15,7 +15,7 @@ LowCardinality(data_type)
|
||||
|
||||
**Параметры**
|
||||
|
||||
- `data_type` — [String](string.md), [FixedString](fixedstring.md), [Date](date.md), [DateTime](datetime.md) и числа за исключением типа [Decimal](decimal.md). `LowCardinality` неэффективен для некоторых типов данных, см. описание настройки [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types).
|
||||
- `data_type` — [String](string.md), [FixedString](fixedstring.md), [Date](date.md), [DateTime](datetime.md) и числа за исключением типа [Decimal](decimal.md). `LowCardinality` неэффективен для некоторых типов данных, см. описание настройки [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types).
|
||||
|
||||
## Описание {#lowcardinality-dscr}
|
||||
|
||||
@ -23,11 +23,11 @@ LowCardinality(data_type)
|
||||
|
||||
Эффективность использования типа данных `LowCarditality` зависит от разнообразия данных. Если словарь содержит менее 10 000 различных значений, ClickHouse в основном показывает более высокую эффективность чтения и хранения данных. Если же словарь содержит более 100 000 различных значений, ClickHouse может работать хуже, чем при использовании обычных типов данных.
|
||||
|
||||
При работе со строками, использование `LowCardinality` вместо [Enum](enum.md) обеспечивает большую гибкость в использовании и часто показывает такую же или более высокую эффективность.
|
||||
При работе со строками использование `LowCardinality` вместо [Enum](enum.md) обеспечивает большую гибкость в использовании и часто показывает такую же или более высокую эффективность.
|
||||
|
||||
## Пример
|
||||
|
||||
Создать таблицу со столбцами типа `LowCardinality`:
|
||||
Создание таблицы со столбцами типа `LowCardinality`:
|
||||
|
||||
```sql
|
||||
CREATE TABLE lc_t
|
||||
@ -43,18 +43,18 @@ ORDER BY id
|
||||
|
||||
Настройки:
|
||||
|
||||
- [low_cardinality_max_dictionary_size](../../operations/settings/settings.md#low_cardinality_max_dictionary_size)
|
||||
- [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part)
|
||||
- [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format)
|
||||
- [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types)
|
||||
- [low_cardinality_max_dictionary_size](../../operations/settings/settings.md#low_cardinality_max_dictionary_size)
|
||||
- [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part)
|
||||
- [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format)
|
||||
- [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types)
|
||||
- [output_format_arrow_low_cardinality_as_dictionary](../../operations/settings/settings.md#output-format-arrow-low-cardinality-as-dictionary)
|
||||
|
||||
Функции:
|
||||
|
||||
- [toLowCardinality](../functions/type-conversion-functions.md#tolowcardinality)
|
||||
- [toLowCardinality](../functions/type-conversion-functions.md#tolowcardinality)
|
||||
|
||||
## Смотрите также
|
||||
|
||||
- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality).
|
||||
- [Reducing Clickhouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/).
|
||||
- [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf).
|
||||
|
||||
- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality).
|
||||
- [Reducing Clickhouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/).
|
||||
- [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf).
|
||||
|
Loading…
Reference in New Issue
Block a user