Add CapnProto format

Задокументировал формат CapnProto.
This commit is contained in:
Dmitriy 2021-11-14 15:27:35 +03:00
parent 1579bcfd20
commit caf2aec7b8
2 changed files with 52 additions and 6 deletions

View File

@ -61,7 +61,7 @@ The supported formats are:
| [Native](#native) | ✔ | ✔ |
| [Null](#null) | ✗ | ✔ |
| [XML](#xml) | ✗ | ✔ |
| [CapnProto](#capnproto) | ✔ | |
| [CapnProto](#capnproto) | ✔ | |
| [LineAsString](#lineasstring) | ✔ | ✗ |
| [Regexp](#data-format-regexp) | ✔ | ✗ |
| [RawBLOB](#rawblob) | ✔ | ✔ |
@ -1092,12 +1092,44 @@ Arrays are output as `<array><elem>Hello</elem><elem>World</elem>...</array>`,an
## CapnProto {#capnproto}
Capn Proto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
CapnProto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
Capn Proto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.
CapnProto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.
Deserialization is effective and usually does not increase the system load.
See also [Format Schema](#formatschema).
### Data Types Matching {#data_types-matching-capnproto}
The table below shows supported data types and how they match ClickHouse [data types](../sql-reference/data-types/index.md) in `INSERT` and `SELECT` queries.
| CapnProto data type (`INSERT`) | ClickHouse data type | CapnProto data type (`SELECT`) |
|--------------------------------|-----------------------------------------------------------|--------------------------------|
| `UINT8`, `BOOL` | [UInt8](../sql-reference/data-types/int-uint.md) | `UINT8` |
| `INT8` | [Int8](../sql-reference/data-types/int-uint.md) | `INT8` |
| `UINT16` | [UInt16](../sql-reference/data-types/int-uint.md), [Date](../sql-reference/data-types/date.md) | `UINT16` |
| `INT16` | [Int16](../sql-reference/data-types/int-uint.md) | `INT16` |
| `UINT32` | [UInt32](../sql-reference/data-types/int-uint.md), [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `INT32` | [Int32](../sql-reference/data-types/int-uint.md) | `INT32` |
| `UINT64` | [UInt64](../sql-reference/data-types/int-uint.md) | `UINT64` |
| `INT64` | [Int64](../sql-reference/data-types/int-uint.md), [DateTime64](../sql-reference/data-types/datetime.md) | `INT64` |
| `FLOAT32` | [Float32](../sql-reference/data-types/float.md) | `FLOAT32` |
| `FLOAT64` | [Float64](../sql-reference/data-types/float.md) | `FLOAT64` |
| `Text, Data` | [String](../sql-reference/data-types/string.md), [FixedString](../sql-reference/data-types/fixedstring.md) | `Text, Data` |
| `union(T, Void), union(Void, T)` | [Nullable(T)](../sql-reference/data-types/date.md) | `union(T, Void), union(Void, T)` |
| `Enum` | [Enum(8\|16)](../sql-reference/data-types/enum.md) | `Enum` |
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |
| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` |
For working with `Enum` in CapnProto format use the [format_capn_proto_enum_comparising_mode](../operarions/settings/settings.md#format-capn-proto-enum-comparising-mode) setting.
### Inserting and Selecting Data {#inserting-and-selecting-data-capnproto}
You can insert CapnProto data from a file into ClickHouse table by the following command:
``` bash
$ cat capnproto_messages.bin | clickhouse-client --query "INSERT INTO test.hits FORMAT CapnProto SETTINGS format_schema='schema:Message'"
$ cat capnproto_messages.bin | clickhouse-client --query "INSERT INTO db.hits FORMAT CapnProto SETTINGS format_schema = 'schema:Message'"
```
Where `schema.capnp` looks like this:
@ -1109,9 +1141,11 @@ struct Message {
}
```
Deserialization is effective and usually does not increase the system load.
You can select data from a ClickHouse table and save them into some file in the CapnProto format by the following command:
See also [Format Schema](#formatschema).
``` bash
$ clickhouse-client --query = "SELECT * FROM capnp_tuples FORMAT CapnProto SETTINGS format_schema = '$CLIENT_SCHEMADIR/02030_capnp_tuples:Message'"
```
## Protobuf {#protobuf}

View File

@ -4048,3 +4048,15 @@ Possible values:
- 0 — Timeout disabled.
Default value: `0`.
## format_capn_proto_enum_comparising_mode {#format-capn-proto-enum-comparising-mode}
Determines how to map ClickHouse Enum and CapnProto Enum from schema.
Possible values:
- `'by_values'` — Values in Enums should be the same, names can be different.
- `'by_names'` — Names in enums should be the same, values can be different.
- `'by_name_case_insensitive'` — Names in enums should be the same case-insensitive, values can be different.
Default value: `'by_values'`.