diff --git a/docs/fa/interfaces/formats.md b/docs/fa/interfaces/formats.md index 2253ab3d3d7..5cea6162263 100644 --- a/docs/fa/interfaces/formats.md +++ b/docs/fa/interfaces/formats.md @@ -28,6 +28,11 @@ Format | INSERT | SELECT [PrettyCompactMonoBlock](formats.md#prettycompactmonoblock) | ✗ | ✔ | [PrettyNoEscapes](formats.md#prettynoescapes) | ✗ | ✔ | [PrettySpace](formats.md#prettyspace) | ✗ | ✔ | +[Protobuf](#protobuf) | ✔ | ✔ | +[Avro](#data-format-avro) | ✔ | ✔ | +[AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ | +[Parquet](#data-format-parquet) | ✔ | ✔ | +[ORC](#data-format-orc) | ✔ | ✗ | [RowBinary](formats.md#rowbinary) | ✔ | ✔ | [Native](formats.md#native) | ✔ | ✔ | [Null](formats.md#null) | ✗ | ✔ | @@ -750,4 +755,273 @@ struct Message { +## Protobuf {#protobuf} + +Protobuf - is a [Protocol Buffers](https://developers.google.com/protocol-buffers/) format. + +This format requires an external format schema. The schema is cached between queries. +ClickHouse supports both `proto2` and `proto3` syntaxes. Repeated/optional/required fields are supported. + +Usage examples: + +```sql +SELECT * FROM test.table FORMAT Protobuf SETTINGS format_schema = 'schemafile:MessageType' +``` + +```bash +cat protobuf_messages.bin | clickhouse-client --query "INSERT INTO test.table FORMAT Protobuf SETTINGS format_schema='schemafile:MessageType'" +``` + +where the file `schemafile.proto` looks like this: + +```capnp +syntax = "proto3"; + +message MessageType { + string name = 1; + string surname = 2; + uint32 birthDate = 3; + repeated string phoneNumbers = 4; +}; +``` + +To find the correspondence between table columns and fields of Protocol Buffers' message type ClickHouse compares their names. +This comparison is case-insensitive and the characters `_` (underscore) and `.` (dot) are considered as equal. +If types of a column and a field of Protocol Buffers' message are different the necessary conversion is applied. + +Nested messages are supported. For example, for the field `z` in the following message type + +```capnp +message MessageType { + message XType { + message YType { + int32 z; + }; + repeated YType y; + }; + XType x; +}; +``` + +ClickHouse tries to find a column named `x.y.z` (or `x_y_z` or `X.y_Z` and so on). +Nested messages are suitable to input or output a [nested data structures](../data_types/nested_data_structures/nested.md). + +Default values defined in a protobuf schema like this + +```capnp +syntax = "proto2"; + +message MessageType { + optional int32 result_per_page = 3 [default = 10]; +} +``` + +are not applied; the [table defaults](../query_language/create.md#create-default-values) are used instead of them. + +ClickHouse inputs and outputs protobuf messages in the `length-delimited` format. +It means before every message should be written its length as a [varint](https://developers.google.com/protocol-buffers/docs/encoding#varints). +See also [how to read/write length-delimited protobuf messages in popular languages](https://cwiki.apache.org/confluence/display/GEODE/Delimiting+Protobuf+Messages). + +## Avro {#data-format-avro} + +[Apache Avro](http://avro.apache.org/) is a row-oriented data serialization framework developed within Apache's Hadoop project. + +ClickHouse Avro format supports reading and writing [Avro data files](http://avro.apache.org/docs/current/spec.html#Object+Container+Files). + +### Data Types Matching + +The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` and `SELECT` queries. + +| Avro data type `INSERT` | ClickHouse data type | Avro data type `SELECT` | +| -------------------- | -------------------- | ------------------ | +| `boolean`, `int`, `long`, `float`, `double` | [Int(8\|16\|32\)](../data_types/int_uint.md), [UInt(8\|16\|32)](../data_types/int_uint.md) | `int` | +| `boolean`, `int`, `long`, `float`, `double` | [Int64](../data_types/int_uint.md), [UInt64](../data_types/int_uint.md) | `long` | +| `boolean`, `int`, `long`, `float`, `double` | [Float32](../data_types/float.md) | `float` | +| `boolean`, `int`, `long`, `float`, `double` | [Float64](../data_types/float.md) | `double` | +| `bytes`, `string`, `fixed`, `enum` | [String](../data_types/string.md) | `bytes` | +| `bytes`, `string`, `fixed` | [FixedString(N)](../data_types/fixedstring.md) | `fixed(N)` | +| `enum` | [Enum(8\|16)](../data_types/enum.md) | `enum` | +| `array(T)` | [Array(T)](../data_types/array.md) | `array(T)` | +| `union(null, T)`, `union(T, null)` | [Nullable(T)](../data_types/date.md) | `union(null, T)`| +| `null` | [Nullable(Nothing)](../data_types/special_data_types/nothing.md) | `null` | +| `int (date)` * | [Date](../data_types/date.md) | `int (date)` * | +| `long (timestamp-millis)` * | [DateTime64(3)](../data_types/datetime.md) | `long (timestamp-millis)` * | +| `long (timestamp-micros)` * | [DateTime64(6)](../data_types/datetime.md) | `long (timestamp-micros)` * | + +\* [Avro logical types](http://avro.apache.org/docs/current/spec.html#Logical+Types) + + + +Unsupported Avro data types: `record` (non-root), `map` + +Unsupported Avro logical data types: `uuid`, `time-millis`, `time-micros`, `duration` + +### Inserting Data + +To insert data from an Avro file into ClickHouse table: + +```bash +$ cat file.avro | clickhouse-client --query="INSERT INTO {some_table} FORMAT Avro" +``` + +The root schema of input Avro file must be of `record` type. + +To find the correspondence between table columns and fields of Avro schema ClickHouse compares their names. This comparison is case-sensitive. +Unused fields are skipped. + +Data types of a ClickHouse table columns can differ from the corresponding fields of the Avro data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to corresponding column type. + +### Selecting Data + +To select data from ClickHouse table into an Avro file: + +```bash +$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Avro" > file.avro +``` + +Column names must: + +- start with `[A-Za-z_]` +- subsequently contain only `[A-Za-z0-9_]` + +Output Avro file compression and sync interval can be configured with [output_format_avro_codec](../operations/settings/settings.md#settings-output_format_avro_codec) and [output_format_avro_sync_interval](../operations/settings/settings.md#settings-output_format_avro_sync_interval) respectively. + +## AvroConfluent {#data-format-avro-confluent} + +AvroConfluent supports decoding single-object Avro messages commonly used with [Kafka](https://kafka.apache.org/) and [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html). + +Each Avro message embeds a schema id that can be resolved to the actual schema with help of the Schema Registry. + +Schemas are cached once resolved. + +Schema Registry URL is configured with [format_avro_schema_registry_url](../operations/settings/settings.md#settings-format_avro_schema_registry_url) + +### Data Types Matching + +Same as [Avro](#data-format-avro) + +### Usage + +To quickly verify schema resolution you can use [kafkacat](https://github.com/edenhill/kafkacat) with [clickhouse-local](../operations/utils/clickhouse-local.md): + +```bash +$ kafkacat -b kafka-broker -C -t topic1 -o beginning -f '%s' -c 3 | clickhouse-local --input-format AvroConfluent --format_avro_schema_registry_url 'http://schema-registry' -S "field1 Int64, field2 String" -q 'select * from table' +1 a +2 b +3 c +``` + +To use `AvroConfluent` with [Kafka](../operations/table_engines/kafka.md): +```sql +CREATE TABLE topic1_stream +( + field1 String, + field2 String +) +ENGINE = Kafka() +SETTINGS +kafka_broker_list = 'kafka-broker', +kafka_topic_list = 'topic1', +kafka_group_name = 'group1', +kafka_format = 'AvroConfluent'; + +SET format_avro_schema_registry_url = 'http://schema-registry'; + +SELECT * FROM topic1_stream; +``` + +!!! note "Warning" + Setting `format_avro_schema_registry_url` needs to be configured in `users.xml` to maintain it's value after a restart. + + +## Parquet {#data-format-parquet} + +[Apache Parquet](http://parquet.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports read and write operations for this format. + +### Data Types Matching + +The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` and `SELECT` queries. + +| Parquet data type (`INSERT`) | ClickHouse data type | Parquet data type (`SELECT`) | +| -------------------- | ------------------ | ---- | +| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) | `UINT8` | +| `INT8` | [Int8](../data_types/int_uint.md) | `INT8` | +| `UINT16` | [UInt16](../data_types/int_uint.md) | `UINT16` | +| `INT16` | [Int16](../data_types/int_uint.md) | `INT16` | +| `UINT32` | [UInt32](../data_types/int_uint.md) | `UINT32` | +| `INT32` | [Int32](../data_types/int_uint.md) | `INT32` | +| `UINT64` | [UInt64](../data_types/int_uint.md) | `UINT64` | +| `INT64` | [Int64](../data_types/int_uint.md) | `INT64` | +| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) | `FLOAT` | +| `DOUBLE` | [Float64](../data_types/float.md) | `DOUBLE` | +| `DATE32` | [Date](../data_types/date.md) | `UINT16` | +| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) | `UINT32` | +| `STRING`, `BINARY` | [String](../data_types/string.md) | `STRING` | +| — | [FixedString](../data_types/fixedstring.md) | `STRING` | +| `DECIMAL` | [Decimal](../data_types/decimal.md) | `DECIMAL` | + +ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query treats the Parquet `DECIMAL` type as the ClickHouse `Decimal128` type. + +Unsupported Parquet data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`. + +Data types of a ClickHouse table columns can differ from the corresponding fields of the Parquet data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column. + +### Inserting and Selecting Data + +You can insert Parquet data from a file into ClickHouse table by the following command: + +```bash +$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet" +``` + +You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command: + +```bash +$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq} +``` + +To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). + +## ORC {#data-format-orc} + +[Apache ORC](https://orc.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. You can only insert data in this format to ClickHouse. + +### Data Types Matching + +The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` queries. + +| ORC data type (`INSERT`) | ClickHouse data type | +| -------------------- | ------------------ | +| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) | +| `INT8` | [Int8](../data_types/int_uint.md) | +| `UINT16` | [UInt16](../data_types/int_uint.md) | +| `INT16` | [Int16](../data_types/int_uint.md) | +| `UINT32` | [UInt32](../data_types/int_uint.md) | +| `INT32` | [Int32](../data_types/int_uint.md) | +| `UINT64` | [UInt64](../data_types/int_uint.md) | +| `INT64` | [Int64](../data_types/int_uint.md) | +| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) | +| `DOUBLE` | [Float64](../data_types/float.md) | +| `DATE32` | [Date](../data_types/date.md) | +| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) | +| `STRING`, `BINARY` | [String](../data_types/string.md) | +| `DECIMAL` | [Decimal](../data_types/decimal.md) | + +ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type. + +Unsupported ORC data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`. + +The data types of ClickHouse table columns don't have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to the data type set for the ClickHouse table column. + +### Inserting Data + +You can insert ORC data from a file into ClickHouse table by the following command: + +```bash +$ cat filename.orc | clickhouse-client --query="INSERT INTO some_table FORMAT ORC" +``` + +To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). + + [مقاله اصلی](https://clickhouse.tech/docs/fa/interfaces/formats/) diff --git a/docs/toc_fa.yml b/docs/toc_fa.yml index 1d8c4a43bd3..5cafefc6e32 100644 --- a/docs/toc_fa.yml +++ b/docs/toc_fa.yml @@ -26,6 +26,7 @@ nav: - ' کلاینت Command-line': 'interfaces/cli.md' - 'Native interface (TCP)': 'interfaces/tcp.md' - 'HTTP interface': 'interfaces/http.md' + - 'MySQL Interface': 'interfaces/mysql.md' - ' فرمت های Input و Output': 'interfaces/formats.md' - ' درایور JDBC': 'interfaces/jdbc.md' - ' درایور ODBC': 'interfaces/odbc.md' @@ -183,6 +184,10 @@ nav: - 'Operators': 'query_language/operators.md' - 'General Syntax': 'query_language/syntax.md' +- 'Guides': + - 'Overview': 'guides/index.md' + - 'Applying CatBoost Models': 'guides/apply_catboost_model.md' + - 'Operations': - 'Introduction': 'operations/index.md' - 'Requirements': 'operations/requirements.md' @@ -225,6 +230,8 @@ nav: - 'Browse ClickHouse Source Code': 'development/browse_code.md' - 'How to Build ClickHouse on Linux': 'development/build.md' - 'How to Build ClickHouse on Mac OS X': 'development/build_osx.md' + - 'How to Build ClickHouse on Linux for Mac OS X': 'development/build_cross_osx.md' + - 'How to Build ClickHouse on Linux for AARCH64 (ARM64)': 'development/build_cross_arm.md' - 'How to Write C++ code': 'development/style.md' - 'How to Run ClickHouse Tests': 'development/tests.md' - 'Third-Party Libraries Used': 'development/contrib.md' diff --git a/docs/toc_ja.yml b/docs/toc_ja.yml index 5d77c2bd7b7..645791e6959 100644 --- a/docs/toc_ja.yml +++ b/docs/toc_ja.yml @@ -26,6 +26,7 @@ nav: - 'Command-Line Client': 'interfaces/cli.md' - 'Native Interface (TCP)': 'interfaces/tcp.md' - 'HTTP Interface': 'interfaces/http.md' + - 'MySQL Interface': 'interfaces/mysql.md' - 'Input and Output Formats': 'interfaces/formats.md' - 'JDBC Driver': 'interfaces/jdbc.md' - 'ODBC Driver': 'interfaces/odbc.md' diff --git a/docs/toc_ru.yml b/docs/toc_ru.yml index 94f8c8a79bb..7b59a2aba38 100644 --- a/docs/toc_ru.yml +++ b/docs/toc_ru.yml @@ -41,6 +41,7 @@ nav: - 'Движки баз данных': - 'Введение': 'database_engines/index.md' - 'MySQL': 'database_engines/mysql.md' + - 'Lazy': 'database_engines/lazy.md' - 'Движки таблиц': - 'Введение': 'operations/table_engines/index.md' @@ -218,6 +219,7 @@ nav: - 'Введение': 'operations/utils/index.md' - 'clickhouse-copier': 'operations/utils/clickhouse-copier.md' - 'clickhouse-local': 'operations/utils/clickhouse-local.md' + - 'clickhouse-benchmark': 'operations/utils/clickhouse-benchmark.md' - 'Разработка': - 'hidden': 'development/index.md' @@ -227,6 +229,7 @@ nav: - 'Как собрать ClickHouse на Linux': 'development/build.md' - 'Как собрать ClickHouse на Mac OS X': 'development/build_osx.md' - 'Как собрать ClickHouse на Linux для Mac OS X': 'development/build_cross_osx.md' + - 'Как собрать ClickHouse на Linux для AARCH64 (ARM64)': 'development/build_cross_arm.md' - 'Как писать код на C++': 'development/style.md' - 'Как запустить тесты': 'development/tests.md' - 'Сторонние библиотеки': 'development/contrib.md' diff --git a/docs/toc_zh.yml b/docs/toc_zh.yml index 43c93333f08..de7cbb8ec5e 100644 --- a/docs/toc_zh.yml +++ b/docs/toc_zh.yml @@ -26,6 +26,7 @@ nav: - '命令行客户端接口': 'interfaces/cli.md' - '原生客户端接口 (TCP)': 'interfaces/tcp.md' - 'HTTP 客户端接口': 'interfaces/http.md' + - 'MySQL 客户端接口': 'interfaces/mysql.md' - '输入输出格式': 'interfaces/formats.md' - 'JDBC 驱动': 'interfaces/jdbc.md' - 'ODBC 驱动': 'interfaces/odbc.md' @@ -69,6 +70,7 @@ nav: - '数据库引擎': - '介绍': 'database_engines/index.md' - 'MySQL': 'database_engines/mysql.md' + - 'Lazy': 'database_engines/lazy.md' - '表引擎': - '介绍': 'operations/table_engines/index.md' @@ -182,6 +184,10 @@ nav: - '操作符': 'query_language/operators.md' - '语法说明': 'query_language/syntax.md' +- 'Guides': + - 'Overview': 'guides/index.md' + - 'Applying CatBoost Models': 'guides/apply_catboost_model.md' + - '运维': - '介绍': 'operations/index.md' - '环境要求': 'operations/requirements.md' @@ -225,6 +231,7 @@ nav: - '如何在Linux中编译ClickHouse': 'development/build.md' - '如何在Mac OS X中编译ClickHouse': 'development/build_osx.md' - '如何在Linux中编译Mac OS X ClickHouse': 'development/build_cross_osx.md' + - '如何在Linux中编译AARCH64 (ARM64) ClickHouse': 'development/build_cross_arm.md' - '如何编写C++代码': 'development/style.md' - '如何运行ClickHouse测试': 'development/tests.md' - '使用的第三方库': 'development/contrib.md' diff --git a/docs/zh/interfaces/formats.md b/docs/zh/interfaces/formats.md index 38ce513a104..7b7f839447a 100644 --- a/docs/zh/interfaces/formats.md +++ b/docs/zh/interfaces/formats.md @@ -29,7 +29,7 @@ ClickHouse 可以接受多种数据格式,可以在 (`INSERT`) 以及 (`SELECT | [PrettySpace](#prettyspace) | ✗ | ✔ | | [Protobuf](#protobuf) | ✔ | ✔ | | [Avro](#data-format-avro) | ✔ | ✔ | -| AvroConfluent | ✔ | ✗ | +| [AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ | | [Parquet](#data-format-parquet) | ✔ | ✔ | | [ORC](#data-format-orc) | ✔ | ✗ | | [RowBinary](#rowbinary) | ✔ | ✔ | @@ -914,7 +914,7 @@ Column names must: Output Avro file compression and sync interval can be configured with [output_format_avro_codec](../operations/settings/settings.md#settings-output_format_avro_codec) and [output_format_avro_sync_interval](../operations/settings/settings.md#settings-output_format_avro_sync_interval) respectively. -## AvroConfluent +## AvroConfluent {#data-format-avro-confluent} AvroConfluent supports decoding single-object Avro messages commonly used with [Kafka](https://kafka.apache.org/) and [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html). diff --git a/docs/zh/query_language/functions/other_functions.md b/docs/zh/query_language/functions/other_functions.md index d72cbe7ea38..a93079f4af3 100644 --- a/docs/zh/query_language/functions/other_functions.md +++ b/docs/zh/query_language/functions/other_functions.md @@ -652,7 +652,7 @@ SELECT replicate(1, ['a', 'b', 'c']) 使用指定的连接键从Join类型引擎的表中获取数据。 -## modelEvaluate(model_name, ...) +## modelEvaluate(model_name, ...) {#function-modelevaluate} 使用外部模型计算。 接受模型的名称以及模型的参数。返回Float64类型的值。