Support Enum output/input in BSONEachRow, allow all map key types and avoid extra calculations

This commit is contained in:
avogar 2023-03-28 17:57:23 +00:00
parent f7e29d1e92
commit e7ff6e85c2
5 changed files with 46 additions and 25 deletions

View File

@ -1235,8 +1235,8 @@ For output it uses the following correspondence between ClickHouse types and BSO
| ClickHouse type | BSON Type | | ClickHouse type | BSON Type |
|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------| |-----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
| [Bool](/docs/en/sql-reference/data-types/boolean.md) | `\x08` boolean | | [Bool](/docs/en/sql-reference/data-types/boolean.md) | `\x08` boolean |
| [Int8/UInt8](/docs/en/sql-reference/data-types/int-uint.md) | `\x10` int32 | | [Int8/UInt8](/docs/en/sql-reference/data-types/int-uint.md)/[Enum8](/docs/en/sql-reference/data-types/enum.md) | `\x10` int32 |
| [Int16UInt16](/docs/en/sql-reference/data-types/int-uint.md) | `\x10` int32 | | [Int16/UInt16(/docs/en/sql-reference/data-types/int-uint.md)/[Enum16](/docs/en/sql-reference/data-types/enum.md) | `\x10` int32 |
| [Int32](/docs/en/sql-reference/data-types/int-uint.md) | `\x10` int32 | | [Int32](/docs/en/sql-reference/data-types/int-uint.md) | `\x10` int32 |
| [UInt32](/docs/en/sql-reference/data-types/int-uint.md) | `\x12` int64 | | [UInt32](/docs/en/sql-reference/data-types/int-uint.md) | `\x12` int64 |
| [Int64/UInt64](/docs/en/sql-reference/data-types/int-uint.md) | `\x12` int64 | | [Int64/UInt64](/docs/en/sql-reference/data-types/int-uint.md) | `\x12` int64 |
@ -1255,30 +1255,30 @@ For output it uses the following correspondence between ClickHouse types and BSO
| [Array](/docs/en/sql-reference/data-types/array.md) | `\x04` array | | [Array](/docs/en/sql-reference/data-types/array.md) | `\x04` array |
| [Tuple](/docs/en/sql-reference/data-types/tuple.md) | `\x04` array | | [Tuple](/docs/en/sql-reference/data-types/tuple.md) | `\x04` array |
| [Named Tuple](/docs/en/sql-reference/data-types/tuple.md) | `\x03` document | | [Named Tuple](/docs/en/sql-reference/data-types/tuple.md) | `\x03` document |
| [Map](/docs/en/sql-reference/data-types/map.md) (with String keys) | `\x03` document | | [Map](/docs/en/sql-reference/data-types/map.md) | `\x03` document |
| [IPv4](/docs/en/sql-reference/data-types/domains/ipv4.md) | `\x10` int32 | | [IPv4](/docs/en/sql-reference/data-types/domains/ipv4.md) | `\x10` int32 |
| [IPv6](/docs/en/sql-reference/data-types/domains/ipv6.md) | `\x05` binary, `\x00` binary subtype | | [IPv6](/docs/en/sql-reference/data-types/domains/ipv6.md) | `\x05` binary, `\x00` binary subtype |
For input it uses the following correspondence between BSON types and ClickHouse types: For input it uses the following correspondence between BSON types and ClickHouse types:
| BSON Type | ClickHouse Type | | BSON Type | ClickHouse Type |
|------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `\x01` double | [Float32/Float64](/docs/en/sql-reference/data-types/float.md) | | `\x01` double | [Float32/Float64](/docs/en/sql-reference/data-types/float.md) |
| `\x02` string | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) | | `\x02` string | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) |
| `\x03` document | [Map](/docs/en/sql-reference/data-types/map.md)/[Named Tuple](/docs/en/sql-reference/data-types/tuple.md) | | `\x03` document | [Map](/docs/en/sql-reference/data-types/map.md)/[Named Tuple](/docs/en/sql-reference/data-types/tuple.md) |
| `\x04` array | [Array](/docs/en/sql-reference/data-types/array.md)/[Tuple](/docs/en/sql-reference/data-types/tuple.md) | | `\x04` array | [Array](/docs/en/sql-reference/data-types/array.md)/[Tuple](/docs/en/sql-reference/data-types/tuple.md) |
| `\x05` binary, `\x00` binary subtype | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md)/[IPv6](/docs/en/sql-reference/data-types/domains/ipv6.md) | | `\x05` binary, `\x00` binary subtype | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md)/[IPv6](/docs/en/sql-reference/data-types/domains/ipv6.md) |
| `\x05` binary, `\x02` old binary subtype | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) | | `\x05` binary, `\x02` old binary subtype | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) |
| `\x05` binary, `\x03` old uuid subtype | [UUID](/docs/en/sql-reference/data-types/uuid.md) | | `\x05` binary, `\x03` old uuid subtype | [UUID](/docs/en/sql-reference/data-types/uuid.md) |
| `\x05` binary, `\x04` uuid subtype | [UUID](/docs/en/sql-reference/data-types/uuid.md) | | `\x05` binary, `\x04` uuid subtype | [UUID](/docs/en/sql-reference/data-types/uuid.md) |
| `\x07` ObjectId | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) | | `\x07` ObjectId | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) |
| `\x08` boolean | [Bool](/docs/en/sql-reference/data-types/boolean.md) | | `\x08` boolean | [Bool](/docs/en/sql-reference/data-types/boolean.md) |
| `\x09` datetime | [DateTime64](/docs/en/sql-reference/data-types/datetime64.md) | | `\x09` datetime | [DateTime64](/docs/en/sql-reference/data-types/datetime64.md) |
| `\x0A` null value | [NULL](/docs/en/sql-reference/data-types/nullable.md) | | `\x0A` null value | [NULL](/docs/en/sql-reference/data-types/nullable.md) |
| `\x0D` JavaScript code | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) | | `\x0D` JavaScript code | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) |
| `\x0E` symbol | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) | | `\x0E` symbol | [String](/docs/en/sql-reference/data-types/string.md)/[FixedString](/docs/en/sql-reference/data-types/fixedstring.md) |
| `\x10` int32 | [Int32/UInt32](/docs/en/sql-reference/data-types/int-uint.md)/[Decimal32](/docs/en/sql-reference/data-types/decimal.md)/[IPv4](/docs/en/sql-reference/data-types/domains/ipv4.md) | | `\x10` int32 | [Int32/UInt32](/docs/en/sql-reference/data-types/int-uint.md)/[Decimal32](/docs/en/sql-reference/data-types/decimal.md)/[IPv4](/docs/en/sql-reference/data-types/domains/ipv4.md)/[Enum8/Enum16](/docs/en/sql-reference/data-types/enum.md) |
| `\x12` int64 | [Int64/UInt64](/docs/en/sql-reference/data-types/int-uint.md)/[Decimal64](/docs/en/sql-reference/data-types/decimal.md)/[DateTime64](/docs/en/sql-reference/data-types/datetime64.md) | | `\x12` int64 | [Int64/UInt64](/docs/en/sql-reference/data-types/int-uint.md)/[Decimal64](/docs/en/sql-reference/data-types/decimal.md)/[DateTime64](/docs/en/sql-reference/data-types/datetime64.md) |
Other BSON types are not supported. Also, it performs conversion between different integer types (for example, you can insert BSON int32 value into ClickHouse UInt8). Other BSON types are not supported. Also, it performs conversion between different integer types (for example, you can insert BSON int32 value into ClickHouse UInt8).
Big integers and decimals (Int128/UInt128/Int256/UInt256/Decimal128/Decimal256) can be parsed from BSON Binary value with `\x00` binary subtype. In this case this format will validate that the size of binary data equals the size of expected value. Big integers and decimals (Int128/UInt128/Int256/UInt256/Decimal128/Decimal256) can be parsed from BSON Binary value with `\x00` binary subtype. In this case this format will validate that the size of binary data equals the size of expected value.

View File

@ -39,7 +39,8 @@ static String toValidUTF8String(const String & name)
WriteBufferValidUTF8 validating_buf(buf); WriteBufferValidUTF8 validating_buf(buf);
writeString(name, validating_buf); writeString(name, validating_buf);
validating_buf.finalize(); validating_buf.finalize();
return buf.str(); /// Return value without quotes
return buf.str().substr(1, buf.str().size() - 2);
} }
BSONEachRowRowOutputFormat::BSONEachRowRowOutputFormat( BSONEachRowRowOutputFormat::BSONEachRowRowOutputFormat(

View File

@ -17,8 +17,8 @@ namespace DB
* *
* ClickHouse type | BSON Type * ClickHouse type | BSON Type
* Bool | \x08 boolean * Bool | \x08 boolean
* Int8/UInt8 | \x10 int32 * Int8/UInt8/Enum8 | \x10 int32
* Int16UInt16 | \x10 int32 * Int16UInt16/Enum16 | \x10 int32
* Int32 | \x10 int32 * Int32 | \x10 int32
* UInt32 | \x12 int64 * UInt32 | \x12 int64
* Int64 | \x12 int64 * Int64 | \x12 int64
@ -38,7 +38,7 @@ namespace DB
* Array | \x04 array * Array | \x04 array
* Tuple | \x04 array * Tuple | \x04 array
* Named Tuple | \x03 document * Named Tuple | \x03 document
* Map (with String keys) | \x03 document * Map | \x03 document
* *
* Note: on Big-Endian platforms this format will not work properly. * Note: on Big-Endian platforms this format will not work properly.
*/ */

View File

@ -0,0 +1,5 @@
{'a\\u0000b':42}
c1 Nullable(Int32)
c2 Nullable(Int32)
c3 Map(String, Nullable(Int32))
a b {42:42}

View File

@ -0,0 +1,15 @@
#!/usr/bin/env bash
# Tags: no-fasttest
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
$CLICKHOUSE_LOCAL -q "select map('a\0b', 42) as c1 format BSONEachRow" | $CLICKHOUSE_LOCAL --input-format BSONEachRow --table test --structure "c1 Map(String, UInt32)" -q "select * from test"
$CLICKHOUSE_LOCAL -q "select 'a'::Enum8('a' = 1) as c1, 'b'::Enum16('b' = 1) as c2, map(42, 42) as c3 format BSONEachRow" | $CLICKHOUSE_LOCAL --input-format BSONEachRow --table test -q "desc test"
$CLICKHOUSE_LOCAL -q "select 'a'::Enum8('a' = 1) as c1, 'b'::Enum16('b' = 1) as c2, map(42, 42) as c3 format BSONEachRow" | $CLICKHOUSE_LOCAL --input-format BSONEachRow --table test --structure "c1 Enum8('a' = 1), c2 Enum16('b' = 1), c3 Map(UInt32, UInt32)" -q "select * from test"