Update docs

This commit is contained in:
avogar 2024-08-13 13:56:08 +00:00
parent 710cf1a223
commit 0b728e7547

View File

@ -12,58 +12,59 @@ This specification describes the binary format that can be used for binary encod
The table below describes how each data type is represented in binary format. Each data type encoding consist of 1 byte that indicates the type and some optional additional information.
`var_uint` in the binary encoding means that the size is encoded using Variable-Length Quantity compression.
| ClickHouse data type | Binary encoding |
|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `Nothing` | `0x00` |
| `UInt8` | `0x01` |
| `UInt16` | `0x02` |
| `UInt32` | `0x03` |
| `UInt64` | `0x04` |
| `UInt128` | `0x05` |
| `UInt256` | `0x06` |
| `Int8` | `0x07` |
| `Int16` | `0x08` |
| `Int32` | `0x09` |
| `Int64` | `0x0A` |
| `Int128` | `0x0B` |
| `Int256` | `0x0C` |
| `Float32` | `0x0D` |
| `Float64` | `0x0E` |
| `Date` | `0x0F` |
| `Date32` | `0x10` |
| `DateTime` | `0x11` |
| `DateTime(time_zone)` | `0x12<var_uint_time_zone_name_size><time_zone_name_data>` |
| `DateTime64(P)` | `0x13<uint8_precision>` |
| `DateTime64(P, time_zone)` | `0x14<uint8_precision><var_uint_time_zone_name_size><time_zone_name_data>` |
| `String` | `0x15` |
| `FixedString(N)` | `0x16<var_uint_size>` |
| `Enum8` | `0x17<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int8_value_1>...<var_uint_name_size_N><name_data_N><int8_value_N>` |
| `Enum16` | `0x18<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int16_little_endian_value_1>...><var_uint_name_size_N><name_data_N><int16_little_endian_value_N>` |
| `Decimal32(P, S)` | `0x19<uint8_precision><uint8_scale>` |
| `Decimal64(P, S)` | `0x1A<uint8_precision><uint8_scale>` |
| `Decimal128(P, S)` | `0x1B<uint8_precision><uint8_scale>` |
| `Decimal256(P, S)` | `0x1C<uint8_precision><uint8_scale>` |
| `UUID` | `0x1D` |
| `Array(T)` | `0x1E<nested_type_encoding>` |
| `Tuple(T1, ..., TN)` | `0x1F<var_uint_number_of_elements><nested_type_encoding_1>...<nested_type_encoding_N>` |
| `Tuple(name1 T1, ..., nameN TN)` | `0x20<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>` |
| `Set` | `0x21` |
| `Interval` | `0x22<interval_kind>` (see [interval kind binary encoding](#interval-kind-binary-encoding)) |
| `Nullable(T)` | `0x23<nested_type_encoding>` |
| `Function` | `0x24<var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N><return_type_encoding>` |
| `AggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)` | `0x25<var_uint_version><var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N>` (see [aggregate function parameter binary encoding](#aggregate-function-parameter-binary-encoding)) |
| `LowCardinality(T)` | `0x26<nested_type_encoding>` |
| `Map(K, V)` | `0x27<key_type_encoding><value_type_encoding>` |
| `IPv4` | `0x28` |
| `IPv6` | `0x29` |
| `Variant(T1, ..., TN)` | `0x2A<var_uint_number_of_variants><variant_type_encoding_1>...<variant_type_encoding_N>` |
| `Dynamic(max_types=N)` | `0x2B<uint8_max_types>` |
| `Custom type` (`Ring`, `Polygon`, etc) | `0x2C<var_uint_type_name_size><type_name_data>` |
| `Bool` | `0x2D` |
| `SimpleAggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)` | `0x2E<var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N>` (see [aggregate function parameter binary encoding](#aggregate-function-parameter-binary-encoding)) |
| `Nested(name1 T1, ..., nameN TN)` | `0x2F<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>` |
| `JSON(max_dynamic_paths=N, max_dynamic_types=M, path Type, SKIP skip_path, SKIP REGEXP skip_path_regexp)` | `0x30<var_int_max_dynamic_paths><uint8_max_dynamic_types><var_uint_number_of_typed_paths><var_uint_path_name_size_1><path_name_data_1><encoded_type_1>...<var_uint_number_of_skip_paths><var_uint_skip_path_size_1><skip_path_data_1>...<var_uint_number_of_skip_path_regexps><var_uint_skip_path_regexp_size_1><skip_path_data_regexp_1>...` |
| ClickHouse data type | Binary encoding |
|-----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `Nothing` | `0x00` |
| `UInt8` | `0x01` |
| `UInt16` | `0x02` |
| `UInt32` | `0x03` |
| `UInt64` | `0x04` |
| `UInt128` | `0x05` |
| `UInt256` | `0x06` |
| `Int8` | `0x07` |
| `Int16` | `0x08` |
| `Int32` | `0x09` |
| `Int64` | `0x0A` |
| `Int128` | `0x0B` |
| `Int256` | `0x0C` |
| `Float32` | `0x0D` |
| `Float64` | `0x0E` |
| `Date` | `0x0F` |
| `Date32` | `0x10` |
| `DateTime` | `0x11` |
| `DateTime(time_zone)` | `0x12<var_uint_time_zone_name_size><time_zone_name_data>` |
| `DateTime64(P)` | `0x13<uint8_precision>` |
| `DateTime64(P, time_zone)` | `0x14<uint8_precision><var_uint_time_zone_name_size><time_zone_name_data>` |
| `String` | `0x15` |
| `FixedString(N)` | `0x16<var_uint_size>` |
| `Enum8` | `0x17<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int8_value_1>...<var_uint_name_size_N><name_data_N><int8_value_N>` |
| `Enum16` | `0x18<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int16_little_endian_value_1>...><var_uint_name_size_N><name_data_N><int16_little_endian_value_N>` |
| `Decimal32(P, S)` | `0x19<uint8_precision><uint8_scale>` |
| `Decimal64(P, S)` | `0x1A<uint8_precision><uint8_scale>` |
| `Decimal128(P, S)` | `0x1B<uint8_precision><uint8_scale>` |
| `Decimal256(P, S)` | `0x1C<uint8_precision><uint8_scale>` |
| `UUID` | `0x1D` |
| `Array(T)` | `0x1E<nested_type_encoding>` |
| `Tuple(T1, ..., TN)` | `0x1F<var_uint_number_of_elements><nested_type_encoding_1>...<nested_type_encoding_N>` |
| `Tuple(name1 T1, ..., nameN TN)` | `0x20<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>` |
| `Set` | `0x21` |
| `Interval` | `0x22<interval_kind>` (see [interval kind binary encoding](#interval-kind-binary-encoding)) |
| `Nullable(T)` | `0x23<nested_type_encoding>` |
| `Function` | `0x24<var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N><return_type_encoding>` |
| `AggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)` | `0x25<var_uint_version><var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N>` (see [aggregate function parameter binary encoding](#aggregate-function-parameter-binary-encoding)) |
| `LowCardinality(T)` | `0x26<nested_type_encoding>` |
| `Map(K, V)` | `0x27<key_type_encoding><value_type_encoding>` |
| `IPv4` | `0x28` |
| `IPv6` | `0x29` |
| `Variant(T1, ..., TN)` | `0x2A<var_uint_number_of_variants><variant_type_encoding_1>...<variant_type_encoding_N>` |
| `Dynamic(max_types=N)` | `0x2B<uint8_max_types>` |
| `Custom type` (`Ring`, `Polygon`, etc) | `0x2C<var_uint_type_name_size><type_name_data>` |
| `Bool` | `0x2D` |
| `SimpleAggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)` | `0x2E<var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N>` (see [aggregate function parameter binary encoding](#aggregate-function-parameter-binary-encoding)) |
| `Nested(name1 T1, ..., nameN TN)` | `0x2F<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>` |
| `JSON(max_dynamic_paths=N, max_dynamic_types=M, path Type, SKIP skip_path, SKIP REGEXP skip_path_regexp)` | `0x30<uint8_serialization_version><var_int_max_dynamic_paths><uint8_max_dynamic_types><var_uint_number_of_typed_paths><var_uint_path_name_size_1><path_name_data_1><encoded_type_1>...<var_uint_number_of_skip_paths><var_uint_skip_path_size_1><skip_path_data_1>...<var_uint_number_of_skip_path_regexps><var_uint_skip_path_regexp_size_1><skip_path_data_regexp_1>...` |
For type `JSON` byte `uint8_serialization_version` indicates the version of the serialization. Right now the version is always 0 but can change in future if new arguments will be introduced for `JSON` type.
### Interval kind binary encoding