CLICKHOUSEDOCS-561: Repaired master

2024-11-24 08:32:02 +00:00 · 2020-02-26 10:59:07 +03:00 · 2020-02-26 10:59:07 +03:00 · 11fb904719
commit 11fb904719
parent 93c5062782
7 changed files with 295 additions and 3 deletions
--- a/docs/fa/interfaces/formats.md
+++ b/docs/fa/interfaces/formats.md
@ -28,6 +28,11 @@ Format | INSERT | SELECT
 [PrettyCompactMonoBlock](formats.md#prettycompactmonoblock) | ✗ | ✔ |
 [PrettyNoEscapes](formats.md#prettynoescapes) | ✗ | ✔ |
 [PrettySpace](formats.md#prettyspace) | ✗ | ✔ |
+[Protobuf](#protobuf) | ✔ | ✔ |
+[Avro](#data-format-avro) | ✔ | ✔ |
+[AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ |
+[Parquet](#data-format-parquet) | ✔ | ✔ |
+[ORC](#data-format-orc) | ✔ | ✗ |
 [RowBinary](formats.md#rowbinary) | ✔ | ✔ |
 [Native](formats.md#native) | ✔ | ✔ |
 [Null](formats.md#null) | ✗ | ✔ |
@ -750,4 +755,273 @@ struct Message {

 </div>

+## Protobuf {#protobuf}
+
+Protobuf - is a [Protocol Buffers](https://developers.google.com/protocol-buffers/) format.
+
+This format requires an external format schema. The schema is cached between queries.
+ClickHouse supports both `proto2` and `proto3` syntaxes. Repeated/optional/required fields are supported.
+
+Usage examples:
+
+```sql
+SELECT * FROM test.table FORMAT Protobuf SETTINGS format_schema = 'schemafile:MessageType'
+```
+
+```bash
+cat protobuf_messages.bin | clickhouse-client --query "INSERT INTO test.table FORMAT Protobuf SETTINGS format_schema='schemafile:MessageType'"
+```
+
+where the file `schemafile.proto` looks like this:
+
+```capnp
+syntax = "proto3";
+
+message MessageType {
+  string name = 1;
+  string surname = 2;
+  uint32 birthDate = 3;
+  repeated string phoneNumbers = 4;
+};
+```
+
+To find the correspondence between table columns and fields of Protocol Buffers' message type ClickHouse compares their names.
+This comparison is case-insensitive and the characters `_` (underscore) and `.` (dot) are considered as equal.
+If types of a column and a field of Protocol Buffers' message are different the necessary conversion is applied.
+
+Nested messages are supported. For example, for the field `z` in the following message type
+
+```capnp
+message MessageType {
+  message XType {
+    message YType {
+      int32 z;
+    };
+    repeated YType y;
+  };
+  XType x;
+};
+```
+
+ClickHouse tries to find a column named `x.y.z` (or `x_y_z` or `X.y_Z` and so on).
+Nested messages are suitable to input or output a [nested data structures](../data_types/nested_data_structures/nested.md).
+
+Default values defined in a protobuf schema like this
+
+```capnp
+syntax = "proto2";
+
+message MessageType {
+  optional int32 result_per_page = 3 [default = 10];
+}
+```
+
+are not applied; the [table defaults](../query_language/create.md#create-default-values) are used instead of them.
+
+ClickHouse inputs and outputs protobuf messages in the `length-delimited` format.
+It means before every message should be written its length as a [varint](https://developers.google.com/protocol-buffers/docs/encoding#varints).
+See also [how to read/write length-delimited protobuf messages in popular languages](https://cwiki.apache.org/confluence/display/GEODE/Delimiting+Protobuf+Messages).
+
+## Avro {#data-format-avro}
+
+[Apache Avro](http://avro.apache.org/) is a row-oriented data serialization framework developed within Apache's Hadoop project.
+
+ClickHouse Avro format supports reading and writing [Avro data files](http://avro.apache.org/docs/current/spec.html#Object+Container+Files).
+
+### Data Types Matching
+
+The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` and `SELECT` queries.
+
+| Avro data type `INSERT` | ClickHouse data type | Avro data type `SELECT` |
+| -------------------- | -------------------- | ------------------ |
+| `boolean`, `int`, `long`, `float`, `double` | [Int(8\|16\|32\)](../data_types/int_uint.md), [UInt(8\|16\|32)](../data_types/int_uint.md) | `int` |
+| `boolean`, `int`, `long`, `float`, `double` | [Int64](../data_types/int_uint.md), [UInt64](../data_types/int_uint.md) | `long` |
+| `boolean`, `int`, `long`, `float`, `double` | [Float32](../data_types/float.md) | `float` |
+| `boolean`, `int`, `long`, `float`, `double` | [Float64](../data_types/float.md) | `double` |
+| `bytes`, `string`, `fixed`, `enum` | [String](../data_types/string.md) | `bytes` |
+| `bytes`, `string`, `fixed` | [FixedString(N)](../data_types/fixedstring.md) | `fixed(N)` |
+| `enum` | [Enum(8\|16)](../data_types/enum.md) | `enum` |
+| `array(T)` | [Array(T)](../data_types/array.md) | `array(T)` |
+| `union(null, T)`, `union(T, null)` | [Nullable(T)](../data_types/date.md) | `union(null, T)`|
+| `null` | [Nullable(Nothing)](../data_types/special_data_types/nothing.md) | `null` |
+| `int (date)` *  | [Date](../data_types/date.md) | `int (date)` * |
+| `long (timestamp-millis)` * | [DateTime64(3)](../data_types/datetime.md) | `long (timestamp-millis)` * |
+| `long (timestamp-micros)` * | [DateTime64(6)](../data_types/datetime.md) | `long (timestamp-micros)` * |
+
+\* [Avro logical types](http://avro.apache.org/docs/current/spec.html#Logical+Types)
+
+
+
+Unsupported Avro data types: `record` (non-root), `map`
+
+Unsupported Avro logical data types: `uuid`, `time-millis`, `time-micros`, `duration`
+
+### Inserting Data
+
+To insert data from an Avro file into ClickHouse table:
+
+```bash
+$ cat file.avro | clickhouse-client --query="INSERT INTO {some_table} FORMAT Avro"
+```
+
+The root schema of input Avro file must be of `record` type.
+
+To find the correspondence between table columns and fields of Avro schema ClickHouse compares their names. This comparison is case-sensitive.
+Unused fields are skipped.
+
+Data types of a ClickHouse table columns can differ from the corresponding fields of the Avro data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to corresponding column type.
+
+### Selecting Data
+
+To select data from ClickHouse table into an Avro file:
+
+```bash
+$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Avro" > file.avro
+```
+
+Column names must:
+
+- start with `[A-Za-z_]`
+- subsequently contain only `[A-Za-z0-9_]`
+
+Output Avro file compression and sync interval can be configured with [output_format_avro_codec](../operations/settings/settings.md#settings-output_format_avro_codec) and [output_format_avro_sync_interval](../operations/settings/settings.md#settings-output_format_avro_sync_interval) respectively.
+
+## AvroConfluent {#data-format-avro-confluent}
+
+AvroConfluent supports decoding single-object Avro messages commonly used with [Kafka](https://kafka.apache.org/) and [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html).
+
+Each Avro message embeds a schema id that can be resolved to the actual schema with help of the Schema Registry.
+
+Schemas are cached once resolved.
+
+Schema Registry URL is configured with [format_avro_schema_registry_url](../operations/settings/settings.md#settings-format_avro_schema_registry_url)
+
+### Data Types Matching
+
+Same as [Avro](#data-format-avro)
+
+### Usage
+
+To quickly verify schema resolution you can use [kafkacat](https://github.com/edenhill/kafkacat) with [clickhouse-local](../operations/utils/clickhouse-local.md):
+
+```bash
+$ kafkacat -b kafka-broker  -C -t topic1 -o beginning -f '%s' -c 3 | clickhouse-local   --input-format AvroConfluent --format_avro_schema_registry_url 'http://schema-registry' -S "field1 Int64, field2 String"  -q 'select *  from table'
+1 a
+2 b
+3 c
+```
+
+To use `AvroConfluent` with [Kafka](../operations/table_engines/kafka.md):
+```sql
+CREATE TABLE topic1_stream
+(
+    field1 String,
+    field2 String
+)
+ENGINE = Kafka()
+SETTINGS
+kafka_broker_list = 'kafka-broker',
+kafka_topic_list = 'topic1',
+kafka_group_name = 'group1',
+kafka_format = 'AvroConfluent';
+
+SET format_avro_schema_registry_url = 'http://schema-registry';
+
+SELECT * FROM topic1_stream;
+```
+
+!!! note "Warning"
+    Setting `format_avro_schema_registry_url` needs to be configured in `users.xml` to maintain it's value after a restart.
+
+
+## Parquet {#data-format-parquet}
+
+[Apache Parquet](http://parquet.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports read and write operations for this format.
+
+### Data Types Matching
+
+The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` and `SELECT` queries.
+
+| Parquet data type (`INSERT`) | ClickHouse data type | Parquet data type (`SELECT`) |
+| -------------------- | ------------------ | ---- |
+| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) | `UINT8` |
+| `INT8` | [Int8](../data_types/int_uint.md) | `INT8` |
+| `UINT16` | [UInt16](../data_types/int_uint.md) | `UINT16` |
+| `INT16` | [Int16](../data_types/int_uint.md) | `INT16` |
+| `UINT32` | [UInt32](../data_types/int_uint.md) | `UINT32` |
+| `INT32` | [Int32](../data_types/int_uint.md) | `INT32` |
+| `UINT64` | [UInt64](../data_types/int_uint.md) | `UINT64` |
+| `INT64` | [Int64](../data_types/int_uint.md) | `INT64` |
+| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) | `FLOAT` |
+| `DOUBLE` | [Float64](../data_types/float.md) | `DOUBLE` |
+| `DATE32` | [Date](../data_types/date.md) | `UINT16` |
+| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) | `UINT32` |
+| `STRING`, `BINARY` | [String](../data_types/string.md) | `STRING` |
+| — | [FixedString](../data_types/fixedstring.md) | `STRING` |
+| `DECIMAL` | [Decimal](../data_types/decimal.md) | `DECIMAL` |
+
+ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query treats the Parquet `DECIMAL` type as the ClickHouse `Decimal128` type.
+
+Unsupported Parquet data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`.
+
+Data types of a ClickHouse table columns can differ from the corresponding fields of the Parquet data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column.
+
+### Inserting and Selecting Data
+
+You can insert Parquet data from a file into ClickHouse table by the following command:
+
+```bash
+$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet"
+```
+
+You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command:
+
+```bash
+$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq}
+```
+
+To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md).
+
+## ORC {#data-format-orc}
+
+[Apache ORC](https://orc.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. You can only insert data in this format to ClickHouse.
+
+### Data Types Matching
+
+The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` queries.
+
+| ORC data type (`INSERT`) | ClickHouse data type |
+| -------------------- | ------------------ |
+| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) |
+| `INT8` | [Int8](../data_types/int_uint.md) |
+| `UINT16` | [UInt16](../data_types/int_uint.md) |
+| `INT16` | [Int16](../data_types/int_uint.md) |
+| `UINT32` | [UInt32](../data_types/int_uint.md) |
+| `INT32` | [Int32](../data_types/int_uint.md) |
+| `UINT64` | [UInt64](../data_types/int_uint.md) |
+| `INT64` | [Int64](../data_types/int_uint.md) |
+| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) |
+| `DOUBLE` | [Float64](../data_types/float.md) |
+| `DATE32` | [Date](../data_types/date.md) |
+| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) |
+| `STRING`, `BINARY` | [String](../data_types/string.md) |
+| `DECIMAL` | [Decimal](../data_types/decimal.md) |
+
+ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type.
+
+Unsupported ORC data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`.
+
+The data types of ClickHouse table columns don't have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then [casts](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to the data type set for the ClickHouse table column.
+
+### Inserting Data
+
+You can insert ORC data from a file into ClickHouse table by the following command:
+
+```bash
+$ cat filename.orc | clickhouse-client --query="INSERT INTO some_table FORMAT ORC"
+```
+
+To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md).
+
+
 [مقاله اصلی](https://clickhouse.tech/docs/fa/interfaces/formats/) <!--hide-->
--- a/docs/toc_fa.yml
+++ b/docs/toc_fa.yml
@ -26,6 +26,7 @@ nav:
  - ' کلاینت Command-line': 'interfaces/cli.md'
  - 'Native interface (TCP)': 'interfaces/tcp.md'
  - 'HTTP interface': 'interfaces/http.md'
+  - 'MySQL Interface': 'interfaces/mysql.md'
  - ' فرمت های Input و Output': 'interfaces/formats.md'
  - ' درایور JDBC': 'interfaces/jdbc.md'
  - ' درایور ODBC': 'interfaces/odbc.md'
@ -183,6 +184,10 @@ nav:
    - 'Operators': 'query_language/operators.md'
    - 'General Syntax': 'query_language/syntax.md'

+- 'Guides':
+  - 'Overview': 'guides/index.md'
+  - 'Applying CatBoost Models': 'guides/apply_catboost_model.md'
+
 - 'Operations':
  - 'Introduction': 'operations/index.md'
  - 'Requirements': 'operations/requirements.md'
@ -225,6 +230,8 @@ nav:
  - 'Browse ClickHouse Source Code': 'development/browse_code.md'
  - 'How to Build ClickHouse on Linux': 'development/build.md'
  - 'How to Build ClickHouse on Mac OS X': 'development/build_osx.md'
+  - 'How to Build ClickHouse on Linux for Mac OS X': 'development/build_cross_osx.md'
+  - 'How to Build ClickHouse on Linux for AARCH64 (ARM64)': 'development/build_cross_arm.md'
  - 'How to Write C++ code': 'development/style.md'
  - 'How to Run ClickHouse Tests': 'development/tests.md'
  - 'Third-Party Libraries Used': 'development/contrib.md'
--- a/docs/toc_ja.yml
+++ b/docs/toc_ja.yml
@ -26,6 +26,7 @@ nav:
  - 'Command-Line Client': 'interfaces/cli.md'
  - 'Native Interface (TCP)': 'interfaces/tcp.md'
  - 'HTTP Interface': 'interfaces/http.md'
+  - 'MySQL Interface': 'interfaces/mysql.md'
  - 'Input and Output Formats': 'interfaces/formats.md'
  - 'JDBC Driver': 'interfaces/jdbc.md'
  - 'ODBC Driver': 'interfaces/odbc.md'
--- a/docs/toc_ru.yml
+++ b/docs/toc_ru.yml
@ -41,6 +41,7 @@ nav:
 - 'Движки баз данных':
    - 'Введение': 'database_engines/index.md'
    - 'MySQL': 'database_engines/mysql.md'
+    - 'Lazy': 'database_engines/lazy.md'

 - 'Движки таблиц':
  - 'Введение': 'operations/table_engines/index.md'
@ -218,6 +219,7 @@ nav:
    - 'Введение': 'operations/utils/index.md'
    - 'clickhouse-copier': 'operations/utils/clickhouse-copier.md'
    - 'clickhouse-local': 'operations/utils/clickhouse-local.md'
+    - 'clickhouse-benchmark': 'operations/utils/clickhouse-benchmark.md'

 - 'Разработка':
  - 'hidden': 'development/index.md'
@ -227,6 +229,7 @@ nav:
  - 'Как собрать ClickHouse на Linux': 'development/build.md'
  - 'Как собрать ClickHouse на Mac OS X': 'development/build_osx.md'
  - 'Как собрать ClickHouse на Linux для Mac OS X': 'development/build_cross_osx.md'
+  - 'Как собрать ClickHouse на Linux для AARCH64 (ARM64)': 'development/build_cross_arm.md'
  - 'Как писать код на C++': 'development/style.md'
  - 'Как запустить тесты': 'development/tests.md'
  - 'Сторонние библиотеки': 'development/contrib.md'
--- a/docs/toc_zh.yml
+++ b/docs/toc_zh.yml
@ -26,6 +26,7 @@ nav:
  - '命令行客户端接口': 'interfaces/cli.md'
  - '原生客户端接口 (TCP)': 'interfaces/tcp.md'
  - 'HTTP 客户端接口': 'interfaces/http.md'
+  - 'MySQL 客户端接口': 'interfaces/mysql.md'
  - '输入输出格式': 'interfaces/formats.md'
  - 'JDBC 驱动': 'interfaces/jdbc.md'
  - 'ODBC 驱动': 'interfaces/odbc.md'
@ -69,6 +70,7 @@ nav:
 - '数据库引擎':
    - '介绍': 'database_engines/index.md'
    - 'MySQL': 'database_engines/mysql.md'
+    - 'Lazy': 'database_engines/lazy.md'

 - '表引擎':
  - '介绍': 'operations/table_engines/index.md'
@ -182,6 +184,10 @@ nav:
  - '操作符': 'query_language/operators.md'
  - '语法说明': 'query_language/syntax.md'

+- 'Guides':
+  - 'Overview': 'guides/index.md'
+  - 'Applying CatBoost Models': 'guides/apply_catboost_model.md'
+
 - '运维':
  - '介绍': 'operations/index.md'
  - '环境要求': 'operations/requirements.md'
@ -225,6 +231,7 @@ nav:
  - '如何在Linux中编译ClickHouse': 'development/build.md'
  - '如何在Mac OS X中编译ClickHouse': 'development/build_osx.md'
  - '如何在Linux中编译Mac OS X ClickHouse': 'development/build_cross_osx.md'
+  - '如何在Linux中编译AARCH64 (ARM64) ClickHouse': 'development/build_cross_arm.md'
  - '如何编写C++代码': 'development/style.md'
  - '如何运行ClickHouse测试': 'development/tests.md'
  - '使用的第三方库': 'development/contrib.md'
--- a/docs/zh/interfaces/formats.md
+++ b/docs/zh/interfaces/formats.md
@ -29,7 +29,7 @@ ClickHouse 可以接受多种数据格式，可以在 (`INSERT`) 以及 (`SELECT
 | [PrettySpace](#prettyspace) | ✗ | ✔ |
 | [Protobuf](#protobuf) | ✔ | ✔ |
 | [Avro](#data-format-avro) | ✔ | ✔ |
-| AvroConfluent | ✔ | ✗ |
+| [AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ |
 | [Parquet](#data-format-parquet) | ✔ | ✔ |
 | [ORC](#data-format-orc) | ✔ | ✗ |
 | [RowBinary](#rowbinary) | ✔ | ✔ |
@ -914,7 +914,7 @@ Column names must:

 Output Avro file compression and sync interval can be configured with [output_format_avro_codec](../operations/settings/settings.md#settings-output_format_avro_codec) and [output_format_avro_sync_interval](../operations/settings/settings.md#settings-output_format_avro_sync_interval) respectively.

-## AvroConfluent
+## AvroConfluent {#data-format-avro-confluent}

 AvroConfluent supports decoding single-object Avro messages commonly used with [Kafka](https://kafka.apache.org/) and [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html).

--- a/docs/zh/query_language/functions/other_functions.md
+++ b/docs/zh/query_language/functions/other_functions.md
@ -652,7 +652,7 @@ SELECT replicate(1, ['a', 'b', 'c'])

 使用指定的连接键从Join类型引擎的表中获取数据。

-## modelEvaluate(model_name, ...)
+## modelEvaluate(model_name, ...) {#function-modelevaluate}
 使用外部模型计算。
 接受模型的名称以及模型的参数。返回Float64类型的值。