CLICKHOUSEDOCS-561: Repaired master

This commit is contained in:
Sergei Shtykov 2020-02-26 10:59:07 +03:00
parent 93c5062782
commit 11fb904719
7 changed files with 295 additions and 3 deletions

View File

@ -28,6 +28,11 @@ Format | INSERT | SELECT
[PrettyCompactMonoBlock](formats.md#prettycompactmonoblock) | ✗ | ✔ |
[PrettyNoEscapes](formats.md#prettynoescapes) | ✗ | ✔ |
[PrettySpace](formats.md#prettyspace) | ✗ | ✔ |
[Protobuf](#protobuf) | ✔ | ✔ |
[Avro](#data-format-avro) | ✔ | ✔ |
[AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ |
[Parquet](#data-format-parquet) | ✔ | ✔ |
[ORC](#data-format-orc) | ✔ | ✗ |
[RowBinary](formats.md#rowbinary) | ✔ | ✔ |
[Native](formats.md#native) | ✔ | ✔ |
[Null](formats.md#null) | ✗ | ✔ |
@ -750,4 +755,273 @@ struct Message {
</div>
## Protobuf {#protobuf}
Protobuf - is a [Protocol Buffers](https://developers.google.com/protocol-buffers/) format.
This format requires an external format schema. The schema is cached between queries.
ClickHouse supports both `proto2` and `proto3` syntaxes. Repeated/optional/required fields are supported.
Usage examples:
```sql
SELECT * FROM test.table FORMAT Protobuf SETTINGS format_schema = 'schemafile:MessageType'
```
```bash
cat protobuf_messages.bin | clickhouse-client --query "INSERT INTO test.table FORMAT Protobuf SETTINGS format_schema='schemafile:MessageType'"
```
where the file `schemafile.proto` looks like this:
```capnp
syntax = "proto3";
message MessageType {
string name = 1;
string surname = 2;
uint32 birthDate = 3;
repeated string phoneNumbers = 4;
};
```
To find the correspondence between table columns and fields of Protocol Buffers' message type ClickHouse compares their names.
This comparison is case-insensitive and the characters `_` (underscore) and `.` (dot) are considered as equal.
If types of a column and a field of Protocol Buffers' message are different the necessary conversion is applied.
Nested messages are supported. For example, for the field `z` in the following message type
```capnp
message MessageType {
message XType {
message YType {
int32 z;
};
repeated YType y;
};
XType x;
};
```
ClickHouse tries to find a column named `x.y.z` (or `x_y_z` or `X.y_Z` and so on).
Nested messages are suitable to input or output a [nested data structures](../data_types/nested_data_structures/nested.md).
Default values defined in a protobuf schema like this
```capnp
syntax = "proto2";
message MessageType {
optional int32 result_per_page = 3 [default = 10];
}
```
are not applied; the [table defaults](../query_language/create.md#create-default-values) are used instead of them.
ClickHouse inputs and outputs protobuf messages in the `length-delimited` format.
It means before every message should be written its length as a [varint](https://developers.google.com/protocol-buffers/docs/encoding#varints).
See also [how to read/write length-delimited protobuf messages in popular languages](https://cwiki.apache.org/confluence/display/GEODE/Delimiting+Protobuf+Messages).
## Avro {#data-format-avro}
[Apache Avro](http://avro.apache.org/) is a row-oriented data serialization framework developed within Apache's Hadoop project.
ClickHouse Avro format supports reading and writing [Avro data files](http://avro.apache.org/docs/current/spec.html#Object+Container+Files).
### Data Types Matching
The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` and `SELECT` queries.
| Avro data type `INSERT` | ClickHouse data type | Avro data type `SELECT` |
| -------------------- | -------------------- | ------------------ |
| `boolean`, `int`, `long`, `float`, `double` | [Int(8\|16\|32\)](../data_types/int_uint.md), [UInt(8\|16\|32)](../data_types/int_uint.md) | `int` |
| `boolean`, `int`, `long`, `float`, `double` | [Int64](../data_types/int_uint.md), [UInt64](../data_types/int_uint.md) | `long` |
| `boolean`, `int`, `long`, `float`, `double` | [Float32](../data_types/float.md) | `float` |
| `boolean`, `int`, `long`, `float`, `double` | [Float64](../data_types/float.md) | `double` |
| `bytes`, `string`, `fixed`, `enum` | [String](../data_types/string.md) | `bytes` |
| `bytes`, `string`, `fixed` | [FixedString(N)](../data_types/fixedstring.md) | `fixed(N)` |
| `enum` | [Enum(8\|16)](../data_types/enum.md) | `enum` |
| `array(T)` | [Array(T)](../data_types/array.md) | `array(T)` |
| `union(null, T)`, `union(T, null)` | [Nullable(T)](../data_types/date.md) | `union(null, T)`|
| `null` | [Nullable(Nothing)](../data_types/special_data_types/nothing.md) | `null` |
| `int (date)` * | [Date](../data_types/date.md) | `int (date)` * |
| `long (timestamp-millis)` * | [DateTime64(3)](../data_types/datetime.md) | `long (timestamp-millis)` * |
| `long (timestamp-micros)` * | [DateTime64(6)](../data_types/datetime.md) | `long (timestamp-micros)` * |
\* [Avro logical types](http://avro.apache.org/docs/current/spec.html#Logical+Types)
Unsupported Avro data types: `record` (non-root), `map`
Unsupported Avro logical data types: `uuid`, `time-millis`, `time-micros`, `duration`
### Inserting Data
To insert data from an Avro file into ClickHouse table:
```bash
$ cat file.avro | clickhouse-client --query="INSERT INTO {some_table} FORMAT Avro"
```
The root schema of input Avro file must be of `record` type.
To find the correspondence between table columns and fields of Avro schema ClickHouse compares their names. This comparison is case-sensitive.
Unused fields are skipped.
Data types of a ClickHouse table columns can differ from the corresponding fields of the Avro data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to corresponding column type.
### Selecting Data
To select data from ClickHouse table into an Avro file:
```bash
$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Avro" > file.avro
```
Column names must:
- start with `[A-Za-z_]`
- subsequently contain only `[A-Za-z0-9_]`
Output Avro file compression and sync interval can be configured with [output_format_avro_codec](../operations/settings/settings.md#settings-output_format_avro_codec) and [output_format_avro_sync_interval](../operations/settings/settings.md#settings-output_format_avro_sync_interval) respectively.
## AvroConfluent {#data-format-avro-confluent}
AvroConfluent supports decoding single-object Avro messages commonly used with [Kafka](https://kafka.apache.org/) and [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html).
Each Avro message embeds a schema id that can be resolved to the actual schema with help of the Schema Registry.
Schemas are cached once resolved.
Schema Registry URL is configured with [format_avro_schema_registry_url](../operations/settings/settings.md#settings-format_avro_schema_registry_url)
### Data Types Matching
Same as [Avro](#data-format-avro)
### Usage
To quickly verify schema resolution you can use [kafkacat](https://github.com/edenhill/kafkacat) with [clickhouse-local](../operations/utils/clickhouse-local.md):
```bash
$ kafkacat -b kafka-broker -C -t topic1 -o beginning -f '%s' -c 3 | clickhouse-local --input-format AvroConfluent --format_avro_schema_registry_url 'http://schema-registry' -S "field1 Int64, field2 String" -q 'select * from table'
1 a
2 b
3 c
```
To use `AvroConfluent` with [Kafka](../operations/table_engines/kafka.md):
```sql
CREATE TABLE topic1_stream
(
field1 String,
field2 String
)
ENGINE = Kafka()
SETTINGS
kafka_broker_list = 'kafka-broker',
kafka_topic_list = 'topic1',
kafka_group_name = 'group1',
kafka_format = 'AvroConfluent';
SET format_avro_schema_registry_url = 'http://schema-registry';
SELECT * FROM topic1_stream;
```
!!! note "Warning"
Setting `format_avro_schema_registry_url` needs to be configured in `users.xml` to maintain it's value after a restart.
## Parquet {#data-format-parquet}
[Apache Parquet](http://parquet.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports read and write operations for this format.
### Data Types Matching
The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` and `SELECT` queries.
| Parquet data type (`INSERT`) | ClickHouse data type | Parquet data type (`SELECT`) |
| -------------------- | ------------------ | ---- |
| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) | `UINT8` |
| `INT8` | [Int8](../data_types/int_uint.md) | `INT8` |
| `UINT16` | [UInt16](../data_types/int_uint.md) | `UINT16` |
| `INT16` | [Int16](../data_types/int_uint.md) | `INT16` |
| `UINT32` | [UInt32](../data_types/int_uint.md) | `UINT32` |
| `INT32` | [Int32](../data_types/int_uint.md) | `INT32` |
| `UINT64` | [UInt64](../data_types/int_uint.md) | `UINT64` |
| `INT64` | [Int64](../data_types/int_uint.md) | `INT64` |
| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) | `FLOAT` |
| `DOUBLE` | [Float64](../data_types/float.md) | `DOUBLE` |
| `DATE32` | [Date](../data_types/date.md) | `UINT16` |
| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) | `UINT32` |
| `STRING`, `BINARY` | [String](../data_types/string.md) | `STRING` |
| — | [FixedString](../data_types/fixedstring.md) | `STRING` |
| `DECIMAL` | [Decimal](../data_types/decimal.md) | `DECIMAL` |
ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query treats the Parquet `DECIMAL` type as the ClickHouse `Decimal128` type.
Unsupported Parquet data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`.
Data types of a ClickHouse table columns can differ from the corresponding fields of the Parquet data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column.
### Inserting and Selecting Data
You can insert Parquet data from a file into ClickHouse table by the following command:
```bash
$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet"
```
You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command:
```bash
$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq}
```
To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md).
## ORC {#data-format-orc}
[Apache ORC](https://orc.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. You can only insert data in this format to ClickHouse.
### Data Types Matching
The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` queries.
| ORC data type (`INSERT`) | ClickHouse data type |
| -------------------- | ------------------ |
| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) |
| `INT8` | [Int8](../data_types/int_uint.md) |
| `UINT16` | [UInt16](../data_types/int_uint.md) |
| `INT16` | [Int16](../data_types/int_uint.md) |
| `UINT32` | [UInt32](../data_types/int_uint.md) |
| `INT32` | [Int32](../data_types/int_uint.md) |
| `UINT64` | [UInt64](../data_types/int_uint.md) |
| `INT64` | [Int64](../data_types/int_uint.md) |
| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) |
| `DOUBLE` | [Float64](../data_types/float.md) |
| `DATE32` | [Date](../data_types/date.md) |
| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) |
| `STRING`, `BINARY` | [String](../data_types/string.md) |
| `DECIMAL` | [Decimal](../data_types/decimal.md) |
ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type.
Unsupported ORC data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`.
The data types of ClickHouse table columns don't have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to the data type set for the ClickHouse table column.
### Inserting Data
You can insert ORC data from a file into ClickHouse table by the following command:
```bash
$ cat filename.orc | clickhouse-client --query="INSERT INTO some_table FORMAT ORC"
```
To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md).
[مقاله اصلی](https://clickhouse.tech/docs/fa/interfaces/formats/) <!--hide-->

View File

@ -26,6 +26,7 @@ nav:
- ' کلاینت Command-line': 'interfaces/cli.md'
- 'Native interface (TCP)': 'interfaces/tcp.md'
- 'HTTP interface': 'interfaces/http.md'
- 'MySQL Interface': 'interfaces/mysql.md'
- ' فرمت های Input و Output': 'interfaces/formats.md'
- ' درایور JDBC': 'interfaces/jdbc.md'
- ' درایور ODBC': 'interfaces/odbc.md'
@ -183,6 +184,10 @@ nav:
- 'Operators': 'query_language/operators.md'
- 'General Syntax': 'query_language/syntax.md'
- 'Guides':
- 'Overview': 'guides/index.md'
- 'Applying CatBoost Models': 'guides/apply_catboost_model.md'
- 'Operations':
- 'Introduction': 'operations/index.md'
- 'Requirements': 'operations/requirements.md'
@ -225,6 +230,8 @@ nav:
- 'Browse ClickHouse Source Code': 'development/browse_code.md'
- 'How to Build ClickHouse on Linux': 'development/build.md'
- 'How to Build ClickHouse on Mac OS X': 'development/build_osx.md'
- 'How to Build ClickHouse on Linux for Mac OS X': 'development/build_cross_osx.md'
- 'How to Build ClickHouse on Linux for AARCH64 (ARM64)': 'development/build_cross_arm.md'
- 'How to Write C++ code': 'development/style.md'
- 'How to Run ClickHouse Tests': 'development/tests.md'
- 'Third-Party Libraries Used': 'development/contrib.md'

View File

@ -26,6 +26,7 @@ nav:
- 'Command-Line Client': 'interfaces/cli.md'
- 'Native Interface (TCP)': 'interfaces/tcp.md'
- 'HTTP Interface': 'interfaces/http.md'
- 'MySQL Interface': 'interfaces/mysql.md'
- 'Input and Output Formats': 'interfaces/formats.md'
- 'JDBC Driver': 'interfaces/jdbc.md'
- 'ODBC Driver': 'interfaces/odbc.md'

View File

@ -41,6 +41,7 @@ nav:
- 'Движки баз данных':
- 'Введение': 'database_engines/index.md'
- 'MySQL': 'database_engines/mysql.md'
- 'Lazy': 'database_engines/lazy.md'
- 'Движки таблиц':
- 'Введение': 'operations/table_engines/index.md'
@ -218,6 +219,7 @@ nav:
- 'Введение': 'operations/utils/index.md'
- 'clickhouse-copier': 'operations/utils/clickhouse-copier.md'
- 'clickhouse-local': 'operations/utils/clickhouse-local.md'
- 'clickhouse-benchmark': 'operations/utils/clickhouse-benchmark.md'
- 'Разработка':
- 'hidden': 'development/index.md'
@ -227,6 +229,7 @@ nav:
- 'Как собрать ClickHouse на Linux': 'development/build.md'
- 'Как собрать ClickHouse на Mac OS X': 'development/build_osx.md'
- 'Как собрать ClickHouse на Linux для Mac OS X': 'development/build_cross_osx.md'
- 'Как собрать ClickHouse на Linux для AARCH64 (ARM64)': 'development/build_cross_arm.md'
- 'Как писать код на C++': 'development/style.md'
- 'Как запустить тесты': 'development/tests.md'
- 'Сторонние библиотеки': 'development/contrib.md'

View File

@ -26,6 +26,7 @@ nav:
- '命令行客户端接口': 'interfaces/cli.md'
- '原生客户端接口 (TCP)': 'interfaces/tcp.md'
- 'HTTP 客户端接口': 'interfaces/http.md'
- 'MySQL 客户端接口': 'interfaces/mysql.md'
- '输入输出格式': 'interfaces/formats.md'
- 'JDBC 驱动': 'interfaces/jdbc.md'
- 'ODBC 驱动': 'interfaces/odbc.md'
@ -69,6 +70,7 @@ nav:
- '数据库引擎':
- '介绍': 'database_engines/index.md'
- 'MySQL': 'database_engines/mysql.md'
- 'Lazy': 'database_engines/lazy.md'
- '表引擎':
- '介绍': 'operations/table_engines/index.md'
@ -182,6 +184,10 @@ nav:
- '操作符': 'query_language/operators.md'
- '语法说明': 'query_language/syntax.md'
- 'Guides':
- 'Overview': 'guides/index.md'
- 'Applying CatBoost Models': 'guides/apply_catboost_model.md'
- '运维':
- '介绍': 'operations/index.md'
- '环境要求': 'operations/requirements.md'
@ -225,6 +231,7 @@ nav:
- '如何在Linux中编译ClickHouse': 'development/build.md'
- '如何在Mac OS X中编译ClickHouse': 'development/build_osx.md'
- '如何在Linux中编译Mac OS X ClickHouse': 'development/build_cross_osx.md'
- '如何在Linux中编译AARCH64 (ARM64) ClickHouse': 'development/build_cross_arm.md'
- '如何编写C++代码': 'development/style.md'
- '如何运行ClickHouse测试': 'development/tests.md'
- '使用的第三方库': 'development/contrib.md'

View File

@ -29,7 +29,7 @@ ClickHouse 可以接受多种数据格式,可以在 (`INSERT`) 以及 (`SELECT
| [PrettySpace](#prettyspace) | ✗ | ✔ |
| [Protobuf](#protobuf) | ✔ | ✔ |
| [Avro](#data-format-avro) | ✔ | ✔ |
| AvroConfluent | ✔ | ✗ |
| [AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ |
| [Parquet](#data-format-parquet) | ✔ | ✔ |
| [ORC](#data-format-orc) | ✔ | ✗ |
| [RowBinary](#rowbinary) | ✔ | ✔ |
@ -914,7 +914,7 @@ Column names must:
Output Avro file compression and sync interval can be configured with [output_format_avro_codec](../operations/settings/settings.md#settings-output_format_avro_codec) and [output_format_avro_sync_interval](../operations/settings/settings.md#settings-output_format_avro_sync_interval) respectively.
## AvroConfluent
## AvroConfluent {#data-format-avro-confluent}
AvroConfluent supports decoding single-object Avro messages commonly used with [Kafka](https://kafka.apache.org/) and [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html).

View File

@ -652,7 +652,7 @@ SELECT replicate(1, ['a', 'b', 'c'])
使用指定的连接键从Join类型引擎的表中获取数据。
## modelEvaluate(model_name, ...)
## modelEvaluate(model_name, ...) {#function-modelevaluate}
使用外部模型计算。
接受模型的名称以及模型的参数。返回Float64类型的值。