Merge branch 'master' into mvcc_prototype

This commit is contained in:
Alexander Tokmakov 2022-03-17 15:24:32 +01:00
commit d04dc03fa4
349 changed files with 8612 additions and 1468 deletions

1
.gitattributes vendored
View File

@ -1,2 +1,3 @@
contrib/* linguist-vendored
*.h linguist-language=C++
tests/queries/0_stateless/data_json/* binary

View File

@ -1,3 +1,130 @@
### ClickHouse release v22.3-lts, 2022-03-17
#### Backward Incompatible Change
* Make `arrayCompact` function behave as other higher-order functions: perform compaction not of lambda function results but on the original array. If you're using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrapping `arrayCompact` arguments into `arrayMap`. Closes [#34010](https://github.com/ClickHouse/ClickHouse/issues/34010) [#18535](https://github.com/ClickHouse/ClickHouse/issues/18535) [#14778](https://github.com/ClickHouse/ClickHouse/issues/14778). [#34795](https://github.com/ClickHouse/ClickHouse/pull/34795) ([Alexandre Snarskii](https://github.com/snar)).
* Change implementation specific behavior on overflow of function `toDatetime`. It will be saturated to the nearest min/max supported instant of datetime instead of wraparound. This change is highlighted as "backward incompatible" because someone may unintentionally rely on the old behavior. [#32898](https://github.com/ClickHouse/ClickHouse/pull/32898) ([HaiBo Li](https://github.com/marising)).
#### New Feature
* Support for caching data locally for remote filesystems. It can be enabled for `s3` disks. Closes [#28961](https://github.com/ClickHouse/ClickHouse/issues/28961). [#33717](https://github.com/ClickHouse/ClickHouse/pull/33717) ([Kseniia Sumarokova](https://github.com/kssenii)). In the meantime, we enabled the test suite on s3 filesystem and no more known issues exist, so it is started to be production ready.
* Add new table function `hive`. It can be used as follows `hive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>')` for example `SELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', 'id Nullable(String), score Nullable(Int32), day Nullable(String)', 'day')`. [#34946](https://github.com/ClickHouse/ClickHouse/pull/34946) ([lgbo](https://github.com/lgbo-ustc)).
* Support authentication of users connected via SSL by their X.509 certificate. [#31484](https://github.com/ClickHouse/ClickHouse/pull/31484) ([eungenue](https://github.com/eungenue)).
* Support schema inference for inserting into table functions `file`/`hdfs`/`s3`/`url`. [#34732](https://github.com/ClickHouse/ClickHouse/pull/34732) ([Kruglov Pavel](https://github.com/Avogar)).
* Now you can read `system.zookeeper` table without restrictions on path or using `like` expression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable setting `allow_unrestricted_reads_from_keeper`. [#34609](https://github.com/ClickHouse/ClickHouse/pull/34609) ([Sergei Trifonov](https://github.com/serxa)).
* Display CPU and memory metrics in clickhouse-local. Close [#34545](https://github.com/ClickHouse/ClickHouse/issues/34545). [#34605](https://github.com/ClickHouse/ClickHouse/pull/34605) ([李扬](https://github.com/taiyang-li)).
* Implement `startsWith` and `endsWith` function for arrays, closes [#33982](https://github.com/ClickHouse/ClickHouse/issues/33982). [#34368](https://github.com/ClickHouse/ClickHouse/pull/34368) ([usurai](https://github.com/usurai)).
* Add three functions for Map data type: 1. `mapReplace(map1, map2)` - replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2. `mapFilter` 3. `mapMap`. mapFilter and mapMap are higher order functions, accepting two arguments, the first argument is a lambda function with k, v pair as arguments, the second argument is a column of type Map. [#33698](https://github.com/ClickHouse/ClickHouse/pull/33698) ([hexiaoting](https://github.com/hexiaoting)).
* Allow getting default user and password for clickhouse-client from the `CLICKHOUSE_USER` and `CLICKHOUSE_PASSWORD` environment variables. Close [#34538](https://github.com/ClickHouse/ClickHouse/issues/34538). [#34947](https://github.com/ClickHouse/ClickHouse/pull/34947) ([DR](https://github.com/freedomDR)).
#### Experimental Feature
* New data type `Object(<schema_format>)`, which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.g `data.key1.key2` or with cast operator `data.key1.key2::Int64`.
* Add `database_replicated_allow_only_replicated_engine` setting. When enabled, it only allowed to only create `Replicated` tables or tables with stateless engines in `Replicated` databases. [#35214](https://github.com/ClickHouse/ClickHouse/pull/35214) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). Note that `Replicated` database is still an experimental feature.
#### Performance Improvement
* Improve performance of insertion into `MergeTree` tables by optimizing sorting. Up to 2x improvement is observed on realistic benchmarks. [#34750](https://github.com/ClickHouse/ClickHouse/pull/34750) ([Maksim Kita](https://github.com/kitaisreal)).
* Columns pruning when reading Parquet, ORC and Arrow files from URL and S3. Closes [#34163](https://github.com/ClickHouse/ClickHouse/issues/34163). [#34849](https://github.com/ClickHouse/ClickHouse/pull/34849) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Columns pruning when reading Parquet, ORC and Arrow files from Hive. [#34954](https://github.com/ClickHouse/ClickHouse/pull/34954) ([lgbo](https://github.com/lgbo-ustc)).
* A bunch of performance optimizations from a performance superhero. Improve performance of processing queries with large `IN` section. Improve performance of `direct` dictionary if its source is `ClickHouse`. Improve performance of `detectCharset `, `detectLanguageUnknown ` functions. [#34888](https://github.com/ClickHouse/ClickHouse/pull/34888) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve performance of `any` aggregate function by using more batching. [#34760](https://github.com/ClickHouse/ClickHouse/pull/34760) ([Raúl Marín](https://github.com/Algunenano)).
* Multiple improvements for performance of `clickhouse-keeper`: less locking [#35010](https://github.com/ClickHouse/ClickHouse/pull/35010) ([zhanglistar](https://github.com/zhanglistar)), lower memory usage by streaming reading and writing of snapshot instead of full copy. [#34584](https://github.com/ClickHouse/ClickHouse/pull/34584) ([zhanglistar](https://github.com/zhanglistar)), optimizing compaction of log store in the RAFT implementation. [#34534](https://github.com/ClickHouse/ClickHouse/pull/34534) ([zhanglistar](https://github.com/zhanglistar)), versioning of the internal data structure [#34486](https://github.com/ClickHouse/ClickHouse/pull/34486) ([zhanglistar](https://github.com/zhanglistar)).
#### Improvement
* Allow asynchronous inserts to table functions. Fixes [#34864](https://github.com/ClickHouse/ClickHouse/issues/34864). [#34866](https://github.com/ClickHouse/ClickHouse/pull/34866) ([Anton Popov](https://github.com/CurtizJ)).
* Implicit type casting of the key argument for functions `dictGetHierarchy`, `dictIsIn`, `dictGetChildren`, `dictGetDescendants`. Closes [#34970](https://github.com/ClickHouse/ClickHouse/issues/34970). [#35027](https://github.com/ClickHouse/ClickHouse/pull/35027) ([Maksim Kita](https://github.com/kitaisreal)).
* `EXPLAIN AST` query can output AST in form of a graph in Graphviz format: `EXPLAIN AST graph = 1 SELECT * FROM system.parts`. [#35173](https://github.com/ClickHouse/ClickHouse/pull/35173) ([李扬](https://github.com/taiyang-li)).
* When large files were written with `s3` table function or table engine, the content type on the files was mistakenly set to `application/xml` due to a bug in the AWS SDK. This closes [#33964](https://github.com/ClickHouse/ClickHouse/issues/33964). [#34433](https://github.com/ClickHouse/ClickHouse/pull/34433) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Change restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also `SHOW CREATE ROW POLICY` will always show `AS permissive` or `AS restrictive` in row policy's definition. [#34596](https://github.com/ClickHouse/ClickHouse/pull/34596) ([Vitaly Baranov](https://github.com/vitlibar)).
* Improve schema inference with globs in File/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. [#34465](https://github.com/ClickHouse/ClickHouse/pull/34465) ([Kruglov Pavel](https://github.com/Avogar)).
* Play UI now correctly detects the preferred light/dark theme from the OS. [#35068](https://github.com/ClickHouse/ClickHouse/pull/35068) ([peledni](https://github.com/peledni)).
* Added `date_time_input_format = 'best_effort_us'`. Closes [#34799](https://github.com/ClickHouse/ClickHouse/issues/34799). [#34982](https://github.com/ClickHouse/ClickHouse/pull/34982) ([WenYao](https://github.com/Cai-Yao)).
* A new settings called `allow_plaintext_password` and `allow_no_password` are added in server configuration which turn on/off authentication types that can be potentially insecure in some environments. They are allowed by default. [#34738](https://github.com/ClickHouse/ClickHouse/pull/34738) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Support for `DateTime64` data type in `Arrow` format, closes [#8280](https://github.com/ClickHouse/ClickHouse/issues/8280) and closes [#28574](https://github.com/ClickHouse/ClickHouse/issues/28574). [#34561](https://github.com/ClickHouse/ClickHouse/pull/34561) ([李扬](https://github.com/taiyang-li)).
* Reload `remote_url_allow_hosts` (filtering of outgoing connections) on config update. [#35294](https://github.com/ClickHouse/ClickHouse/pull/35294) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support `--testmode` parameter for `clickhouse-local`. This parameter enables interpretation of test hints that we use in functional tests. [#35264](https://github.com/ClickHouse/ClickHouse/pull/35264) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add `distributed_depth` to query log. It is like a more detailed variant of `is_initial_query` [#35207](https://github.com/ClickHouse/ClickHouse/pull/35207) ([李扬](https://github.com/taiyang-li)).
* Respect `remote_url_allow_hosts` for `MySQL` and `PostgreSQL` table functions. [#35191](https://github.com/ClickHouse/ClickHouse/pull/35191) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Added `disk_name` field to `system.part_log`. [#35178](https://github.com/ClickHouse/ClickHouse/pull/35178) ([Artyom Yurkov](https://github.com/Varinara)).
* Do not retry non-rertiable errors when querying remote URLs. Closes [#35161](https://github.com/ClickHouse/ClickHouse/issues/35161). [#35172](https://github.com/ClickHouse/ClickHouse/pull/35172) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support distributed INSERT SELECT queries (the setting `parallel_distributed_insert_select`) table function `view()`. [#35132](https://github.com/ClickHouse/ClickHouse/pull/35132) ([Azat Khuzhin](https://github.com/azat)).
* More precise memory tracking during `INSERT` into `Buffer` with `AggregateFunction`. [#35072](https://github.com/ClickHouse/ClickHouse/pull/35072) ([Azat Khuzhin](https://github.com/azat)).
* Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes [#34787](https://github.com/ClickHouse/ClickHouse/issues/34787). [#35032](https://github.com/ClickHouse/ClickHouse/pull/35032) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. [#35004](https://github.com/ClickHouse/ClickHouse/pull/35004) ([alesapin](https://github.com/alesapin)).
* Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes [#34929](https://github.com/ClickHouse/ClickHouse/issues/34929). [#34949](https://github.com/ClickHouse/ClickHouse/pull/34949) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Use connection pool for Hive metastore client. [#34940](https://github.com/ClickHouse/ClickHouse/pull/34940) ([lgbo](https://github.com/lgbo-ustc)).
* Ignore per-column `TTL` in `CREATE TABLE AS` if new table engine does not support it (i.e. if the engine is not of `MergeTree` family). [#34938](https://github.com/ClickHouse/ClickHouse/pull/34938) ([Azat Khuzhin](https://github.com/azat)).
* Allow `LowCardinality` strings for `ngrambf_v1`/`tokenbf_v1` indexes. Closes [#21865](https://github.com/ClickHouse/ClickHouse/issues/21865). [#34911](https://github.com/ClickHouse/ClickHouse/pull/34911) ([Lars Hiller Eidnes](https://github.com/larspars)).
* Allow opening empty sqlite db if the file doesn't exist. Closes [#33367](https://github.com/ClickHouse/ClickHouse/issues/33367). [#34907](https://github.com/ClickHouse/ClickHouse/pull/34907) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Implement memory statistics for FreeBSD - this is required for `max_server_memory_usage` to work correctly. [#34902](https://github.com/ClickHouse/ClickHouse/pull/34902) ([Alexandre Snarskii](https://github.com/snar)).
* In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes [#34324](https://github.com/ClickHouse/ClickHouse/issues/34324). [#34801](https://github.com/ClickHouse/ClickHouse/pull/34801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Now `ALTER TABLE DROP COLUMN columnX` queries for `MergeTree` table engines will work instantly when `columnX` is an `ALIAS` column. Fixes [#34660](https://github.com/ClickHouse/ClickHouse/issues/34660). [#34786](https://github.com/ClickHouse/ClickHouse/pull/34786) ([alesapin](https://github.com/alesapin)).
* Show hints when user mistyped the name of a data skipping index. Closes [#29698](https://github.com/ClickHouse/ClickHouse/issues/29698). [#34764](https://github.com/ClickHouse/ClickHouse/pull/34764) ([flynn](https://github.com/ucasfl)).
* Support `remote()`/`cluster()` table functions for `parallel_distributed_insert_select`. [#34728](https://github.com/ClickHouse/ClickHouse/pull/34728) ([Azat Khuzhin](https://github.com/azat)).
* Do not reset logging that configured via `--log-file`/`--errorlog-file` command line options in case of empty configuration in the config file. [#34718](https://github.com/ClickHouse/ClickHouse/pull/34718) ([Amos Bird](https://github.com/amosbird)).
* Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. [#34684](https://github.com/ClickHouse/ClickHouse/pull/34684) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow specifying argument names for executable UDFs. This is necessary for formats where argument name is part of serialization, like `Native`, `JSONEachRow`. Closes [#34604](https://github.com/ClickHouse/ClickHouse/issues/34604). [#34653](https://github.com/ClickHouse/ClickHouse/pull/34653) ([Maksim Kita](https://github.com/kitaisreal)).
* `MaterializedMySQL` (experimental feature) now supports `materialized_mysql_tables_list` (a comma-separated list of MySQL database tables, which will be replicated by the MaterializedMySQL database engine. Default value: empty list — means all the tables will be replicated), mentioned at [#32977](https://github.com/ClickHouse/ClickHouse/issues/32977). [#34487](https://github.com/ClickHouse/ClickHouse/pull/34487) ([zzsmdfj](https://github.com/zzsmdfj)).
* Improve OpenTelemetry span logs for INSERT operation on distributed table. [#34480](https://github.com/ClickHouse/ClickHouse/pull/34480) ([Frank Chen](https://github.com/FrankChen021)).
* Make the znode `ctime` and `mtime` consistent between servers in ClickHouse Keeper. [#33441](https://github.com/ClickHouse/ClickHouse/pull/33441) ([小路](https://github.com/nicelulu)).
#### Build/Testing/Packaging Improvement
* Package repository is migrated to JFrog Artifactory (**Mikhail f. Shiryaev**).
* Randomize some settings in functional tests, so more possible combinations of settings will be tested. This is yet another fuzzing method to ensure better test coverage. This closes [#32268](https://github.com/ClickHouse/ClickHouse/issues/32268). [#34092](https://github.com/ClickHouse/ClickHouse/pull/34092) ([Kruglov Pavel](https://github.com/Avogar)).
* Drop PVS-Studio from our CI. [#34680](https://github.com/ClickHouse/ClickHouse/pull/34680) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add an ability to build stripped binaries with CMake. In previous versions it was performed by dh-tools. [#35196](https://github.com/ClickHouse/ClickHouse/pull/35196) ([alesapin](https://github.com/alesapin)).
* Smaller "fat-free" `clickhouse-keeper` build. [#35031](https://github.com/ClickHouse/ClickHouse/pull/35031) ([alesapin](https://github.com/alesapin)).
* Use @robot-clickhouse as an author and committer for PRs like https://github.com/ClickHouse/ClickHouse/pull/34685. [#34793](https://github.com/ClickHouse/ClickHouse/pull/34793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Limit DWARF version for debug info by 4 max, because our internal stack symbolizer cannot parse DWARF version 5. This makes sense if you compile ClickHouse with clang-15. [#34777](https://github.com/ClickHouse/ClickHouse/pull/34777) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove `clickhouse-test` debian package as unneeded complication. CI use tests from repository and standalone testing via deb package is no longer supported. [#34606](https://github.com/ClickHouse/ClickHouse/pull/34606) ([Ilya Yatsishin](https://github.com/qoega)).
#### Bug Fix (user-visible misbehaviour in official stable or prestable release)
* A fix for HDFS integration: When the inner buffer size is too small, NEED_MORE_INPUT in `HadoopSnappyDecoder` will run multi times (>=3) for one compressed block. This makes the input data be copied into the wrong place in `HadoopSnappyDecoder::buffer`. [#35116](https://github.com/ClickHouse/ClickHouse/pull/35116) ([lgbo](https://github.com/lgbo-ustc)).
* Ignore obsolete grants in ATTACH GRANT statements. This PR fixes [#34815](https://github.com/ClickHouse/ClickHouse/issues/34815). [#34855](https://github.com/ClickHouse/ClickHouse/pull/34855) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes [#35312](https://github.com/ClickHouse/ClickHouse/issues/35312). [#35313](https://github.com/ClickHouse/ClickHouse/pull/35313) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partial merge join duplicate rows bug, close [#31009](https://github.com/ClickHouse/ClickHouse/issues/31009). [#35311](https://github.com/ClickHouse/ClickHouse/pull/35311) ([Vladimir C](https://github.com/vdimir)).
* Fix possible `Assertion 'position() != working_buffer.end()' failed` while using bzip2 compression with small `max_read_buffer_size` setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. [#35300](https://github.com/ClickHouse/ClickHouse/pull/35300) ([Kruglov Pavel](https://github.com/Avogar)). While using lz4 compression with a small max_read_buffer_size setting value. [#35296](https://github.com/ClickHouse/ClickHouse/pull/35296) ([Kruglov Pavel](https://github.com/Avogar)). While using lzma compression with small `max_read_buffer_size` setting value. [#35295](https://github.com/ClickHouse/ClickHouse/pull/35295) ([Kruglov Pavel](https://github.com/Avogar)). While using `brotli` compression with a small `max_read_buffer_size` setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. [#35281](https://github.com/ClickHouse/ClickHouse/pull/35281) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix possible segfault in `JSONEachRow` schema inference. [#35291](https://github.com/ClickHouse/ClickHouse/pull/35291) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix `CHECK TABLE` query in case when sparse columns are enabled in table. [#35274](https://github.com/ClickHouse/ClickHouse/pull/35274) ([Anton Popov](https://github.com/CurtizJ)).
* Avoid std::terminate in case of exception in reading from remote VFS. [#35257](https://github.com/ClickHouse/ClickHouse/pull/35257) ([Azat Khuzhin](https://github.com/azat)).
* Fix reading port from config, close [#34776](https://github.com/ClickHouse/ClickHouse/issues/34776). [#35193](https://github.com/ClickHouse/ClickHouse/pull/35193) ([Vladimir C](https://github.com/vdimir)).
* Fix error in query with `WITH TOTALS` in case if `HAVING` returned empty result. This fixes [#33711](https://github.com/ClickHouse/ClickHouse/issues/33711). [#35186](https://github.com/ClickHouse/ClickHouse/pull/35186) ([Amos Bird](https://github.com/amosbird)).
* Fix a corner case of `replaceRegexpAll`, close [#35117](https://github.com/ClickHouse/ClickHouse/issues/35117). [#35182](https://github.com/ClickHouse/ClickHouse/pull/35182) ([Vladimir C](https://github.com/vdimir)).
* Schema inference didn't work properly on case of `INSERT INTO FUNCTION s3(...) FROM ...`, it tried to read schema from s3 file instead of from select query. [#35176](https://github.com/ClickHouse/ClickHouse/pull/35176) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix MaterializedPostgreSQL (experimental feature) `table overrides` for partition by, etc. Closes [#35048](https://github.com/ClickHouse/ClickHouse/issues/35048). [#35162](https://github.com/ClickHouse/ClickHouse/pull/35162) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix MaterializedPostgreSQL (experimental feature) adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes [#33800](https://github.com/ClickHouse/ClickHouse/issues/33800). Closes [#34922](https://github.com/ClickHouse/ClickHouse/issues/34922). Closes [#34315](https://github.com/ClickHouse/ClickHouse/issues/34315). [#35158](https://github.com/ClickHouse/ClickHouse/pull/35158) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning error when non-monotonic function is used with IN operator. This fixes [#35136](https://github.com/ClickHouse/ClickHouse/issues/35136). [#35146](https://github.com/ClickHouse/ClickHouse/pull/35146) ([Amos Bird](https://github.com/amosbird)).
* Fixed slightly incorrect translation of YAML configs to XML. [#35135](https://github.com/ClickHouse/ClickHouse/pull/35135) ([Miel Donkers](https://github.com/mdonkers)).
* Fix `optimize_skip_unused_shards_rewrite_in` for signed columns and negative values. [#35134](https://github.com/ClickHouse/ClickHouse/pull/35134) ([Azat Khuzhin](https://github.com/azat)).
* The `update_lag` external dictionary configuration option was unusable showing the error message ``Unexpected key `update_lag` in dictionary source configuration``. [#35089](https://github.com/ClickHouse/ClickHouse/pull/35089) ([Jason Chu](https://github.com/1lann)).
* Avoid possible deadlock on server shutdown. [#35081](https://github.com/ClickHouse/ClickHouse/pull/35081) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing alias after function is optimized to a subcolumn when setting `optimize_functions_to_subcolumns` is enabled. Closes [#33798](https://github.com/ClickHouse/ClickHouse/issues/33798). [#35079](https://github.com/ClickHouse/ClickHouse/pull/35079) ([qieqieplus](https://github.com/qieqieplus)).
* Fix reading from `system.asynchronous_inserts` table if there exists asynchronous insert into table function. [#35050](https://github.com/ClickHouse/ClickHouse/pull/35050) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible exception `Reading for MergeTree family tables must be done with last position boundary` (relevant to operation on remote VFS). Closes [#34979](https://github.com/ClickHouse/ClickHouse/issues/34979). [#35001](https://github.com/ClickHouse/ClickHouse/pull/35001) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix unexpected result when use -State type aggregate function in window frame. [#34999](https://github.com/ClickHouse/ClickHouse/pull/34999) ([metahys](https://github.com/metahys)).
* Fix possible segfault in FileLog (experimental feature). Closes [#30749](https://github.com/ClickHouse/ClickHouse/issues/30749). [#34996](https://github.com/ClickHouse/ClickHouse/pull/34996) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible rare error `Cannot push block to port which already has data`. [#34993](https://github.com/ClickHouse/ClickHouse/pull/34993) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix wrong schema inference for unquoted dates in CSV. Closes [#34768](https://github.com/ClickHouse/ClickHouse/issues/34768). [#34961](https://github.com/ClickHouse/ClickHouse/pull/34961) ([Kruglov Pavel](https://github.com/Avogar)).
* Integration with Hive: Fix unexpected result when use `in` in `where` in hive query. [#34945](https://github.com/ClickHouse/ClickHouse/pull/34945) ([lgbo](https://github.com/lgbo-ustc)).
* Avoid busy polling in ClickHouse Keeper while searching for changelog files to delete. [#34931](https://github.com/ClickHouse/ClickHouse/pull/34931) ([Azat Khuzhin](https://github.com/azat)).
* Fix DateTime64 conversion from PostgreSQL. Closes [#33364](https://github.com/ClickHouse/ClickHouse/issues/33364). [#34910](https://github.com/ClickHouse/ClickHouse/pull/34910) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible "Part directory doesn't exist" during `INSERT` into MergeTree table backed by VFS over s3. [#34876](https://github.com/ClickHouse/ClickHouse/pull/34876) ([Azat Khuzhin](https://github.com/azat)).
* Support DDLs like CREATE USER to be executed on cross replicated cluster. [#34860](https://github.com/ClickHouse/ClickHouse/pull/34860) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Fix bugs for multiple columns group by in `WindowView` (experimental feature). [#34859](https://github.com/ClickHouse/ClickHouse/pull/34859) ([vxider](https://github.com/Vxider)).
* Fix possible failures in S2 functions when queries contain const columns. [#34745](https://github.com/ClickHouse/ClickHouse/pull/34745) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix bug for H3 funcs containing const columns which cause queries to fail. [#34743](https://github.com/ClickHouse/ClickHouse/pull/34743) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix `No such file or directory` with enabled `fsync_part_directory` and vertical merge. [#34739](https://github.com/ClickHouse/ClickHouse/pull/34739) ([Azat Khuzhin](https://github.com/azat)).
* Fix serialization/printing for system queries `RELOAD MODEL`, `RELOAD FUNCTION`, `RESTART DISK` when used `ON CLUSTER`. Closes [#34514](https://github.com/ClickHouse/ClickHouse/issues/34514). [#34696](https://github.com/ClickHouse/ClickHouse/pull/34696) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix `allow_experimental_projection_optimization` with `enable_global_with_statement` (before it may lead to `Stack size too large` error in case of multiple expressions in `WITH` clause, and also it executes scalar subqueries again and again, so not it will be more optimal). [#34650](https://github.com/ClickHouse/ClickHouse/pull/34650) ([Azat Khuzhin](https://github.com/azat)).
* Stop to select part for mutate when the other replica has already updated the transaction log for `ReplatedMergeTree` engine. [#34633](https://github.com/ClickHouse/ClickHouse/pull/34633) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Fix incorrect result of trivial count query when part movement feature is used [#34089](https://github.com/ClickHouse/ClickHouse/issues/34089). [#34385](https://github.com/ClickHouse/ClickHouse/pull/34385) ([nvartolomei](https://github.com/nvartolomei)).
* Fix inconsistency of `max_query_size` limitation in distributed subqueries. [#34078](https://github.com/ClickHouse/ClickHouse/pull/34078) ([Chao Ma](https://github.com/godliness)).
### ClickHouse release v22.2, 2022-02-17
#### Upgrade Notes

View File

@ -55,7 +55,7 @@ Internal coordination settings are located in `<keeper_server>.<coordination_set
- `auto_forwarding` — Allow to forward write requests from followers to the leader (default: true).
- `shutdown_timeout` — Wait to finish internal connections and shutdown (ms) (default: 5000).
- `startup_timeout` — If the server doesn't connect to other quorum participants in the specified timeout it will terminate (ms) (default: 30000).
- `four_letter_word_white_list` — White list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
- `four_letter_word_white_list` — White list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro").
Quorum configuration is located in `<keeper_server>.<raft_configuration>` section and contain servers description.
@ -121,7 +121,7 @@ clickhouse keeper --config /etc/your_path_to_config/config.xml
ClickHouse Keeper also provides 4lw commands which are almost the same with Zookeeper. Each command is composed of four letters such as `mntr`, `stat` etc. There are some more interesting commands: `stat` gives some general information about the server and connected clients, while `srvr` and `cons` give extended details on server and connections respectively.
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro".
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro".
You can issue the commands to ClickHouse Keeper via telnet or nc, at the client port.

View File

@ -3290,6 +3290,19 @@ Possible values:
Default value: `16`.
## max_insert_delayed_streams_for_parallel_write {#max-insert-delayed-streams-for-parallel-write}
The maximum number of streams (columns) to delay final part flush.
It makes difference only if underlying storage supports parallel write (i.e. S3), otherwise it will not give any benefit.
Possible values:
- Positive integer.
- 0 or 1 — Disabled.
Default value: `1000` for S3 and `0` otherwise.
## opentelemetry_start_trace_probability {#opentelemetry-start-trace-probability}
Sets the probability that the ClickHouse can start a trace for executed queries (if no parent [trace context](https://www.w3.org/TR/trace-context/) is supplied).

View File

@ -1026,4 +1026,185 @@ Result:
│ 41162 │
└─────────────┘
```
## h3PointDistM {#h3pointdistm}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in meters.
**Syntax**
``` sql
h3PointDistM(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in meters.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistM(-10.0 ,0.0, 10.0, 0.0) as h3PointDistM;
```
Result:
``` text
┌──────h3PointDistM─┐
│ 2223901.039504589 │
└───────────────────┘
```
## h3PointDistKm {#h3pointdistkm}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in kilometers.
**Syntax**
``` sql
h3PointDistKm(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in kilometers.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistKm(-10.0 ,0.0, 10.0, 0.0) as h3PointDistKm;
```
Result:
``` text
┌─────h3PointDistKm─┐
│ 2223.901039504589 │
└───────────────────┘
```
## h3PointDistRads {#h3pointdistrads}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in radians.
**Syntax**
``` sql
h3PointDistRads(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in radians.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistRads(-10.0 ,0.0, 10.0, 0.0) as h3PointDistRads;
```
Result:
``` text
┌────h3PointDistRads─┐
│ 0.3490658503988659 │
└────────────────────┘
```
## h3GetRes0Indexes {#h3getres0indexes}
Returns an array of all the resolution 0 H3 indexes.
**Syntax**
``` sql
h3GetRes0Indexes()
```
**Returned values**
- Array of all the resolution 0 H3 indexes.
Type: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)).
**Example**
Query:
``` sql
select h3GetRes0Indexes as indexes ;
```
Result:
``` text
┌─indexes─────────────────────────────────────┐
│ [576495936675512319,576531121047601151,....]│
└─────────────────────────────────────────────┘
```
## h3GetPentagonIndexes {#h3getpentagonindexes}
Returns all the pentagon H3 indexes at the specified resolution.
**Syntax**
``` sql
h3GetPentagonIndexes(resolution)
```
**Parameter**
- `resolution` — Index resolution. Range: `[0, 15]`. Type: [UInt8](../../../sql-reference/data-types/int-uint.md).
**Returned value**
- Array of all pentagon H3 indexes.
Type: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)).
**Example**
Query:
``` sql
SELECT h3GetPentagonIndexes(3) AS indexes;
```
Result:
``` text
┌─indexes────────────────────────────────────────────────────────┐
│ [590112357393367039,590464201114255359,590816044835143679,...] │
└────────────────────────────────────────────────────────────────┘
```
[Original article](https://clickhouse.com/docs/en/sql-reference/functions/geo/h3) <!--hide-->

View File

@ -13,10 +13,18 @@ Alias: `INET_NTOA`.
## IPv4StringToNum(s) {#ipv4stringtonums}
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it returns 0.
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it throws exception.
Alias: `INET_ATON`.
## IPv4StringToNumOrDefault(s) {#ipv4stringtonums}
Same as `IPv4StringToNum`, but if the IPv4 address has an invalid format, it returns 0.
## IPv4StringToNumOrNull(s) {#ipv4stringtonums}
Same as `IPv4StringToNum`, but if the IPv4 address has an invalid format, it returns null.
## IPv4NumToStringClassC(num) {#ipv4numtostringclasscnum}
Similar to IPv4NumToString, but using xxx instead of the last octet.
@ -123,7 +131,7 @@ LIMIT 10
## IPv6StringToNum {#ipv6stringtonums}
The reverse function of [IPv6NumToString](#ipv6numtostringx). If the IPv6 address has an invalid format, it returns a string of null bytes.
The reverse function of [IPv6NumToString](#ipv6numtostringx). If the IPv6 address has an invalid format, it throws exception.
If the input string contains a valid IPv4 address, returns its IPv6 equivalent.
HEX can be uppercase or lowercase.
@ -168,6 +176,14 @@ Result:
- [cutIPv6](#cutipv6x-bytestocutforipv6-bytestocutforipv4).
## IPv6StringToNumOrDefault(s) {#ipv6stringtonums}
Same as `IPv6StringToNum`, but if the IPv6 address has an invalid format, it returns 0.
## IPv6StringToNumOrNull(s) {#ipv6stringtonums}
Same as `IPv6StringToNum`, but if the IPv6 address has an invalid format, it returns null.
## IPv4ToIPv6(x) {#ipv4toipv6x}
Takes a `UInt32` number. Interprets it as an IPv4 address in [big endian](https://en.wikipedia.org/wiki/Endianness). Returns a `FixedString(16)` value containing the IPv6 address in binary format. Examples:
@ -261,6 +277,14 @@ SELECT
└───────────────────────────────────┴──────────────────────────┘
```
## toIPv4OrDefault(string) {#toipv4ordefaultstring}
Same as `toIPv4`, but if the IPv4 address has an invalid format, it returns 0.
## toIPv4OrNull(string) {#toipv4ornullstring}
Same as `toIPv4`, but if the IPv4 address has an invalid format, it returns null.
## toIPv6 {#toipv6string}
Converts a string form of IPv6 address to [IPv6](../../sql-reference/data-types/domains/ipv6.md) type. If the IPv6 address has an invalid format, returns an empty value.
@ -317,6 +341,14 @@ Result:
└─────────────────────┘
```
## IPv6StringToNumOrDefault(s) {#toipv6ordefaultstring}
Same as `toIPv6`, but if the IPv6 address has an invalid format, it returns 0.
## IPv6StringToNumOrNull(s) {#toipv6ornullstring}
Same as `toIPv6`, but if the IPv6 address has an invalid format, it returns null.
## isIPv4String {#isipv4string}
Determines whether the input string is an IPv4 address or not. If `string` is IPv6 address returns `0`.

View File

@ -1,4 +1,4 @@
Babel==2.8.0
Babel==2.9.1
backports-abc==0.5
backports.functools-lru-cache==1.6.1
beautifulsoup4==4.9.1
@ -10,22 +10,22 @@ cssmin==0.2.0
future==0.18.2
htmlmin==0.1.12
idna==2.10
Jinja2>=2.11.3
Jinja2>=3.0.3
jinja2-highlight==0.6.1
jsmin==3.0.0
livereload==2.6.2
livereload==2.6.3
Markdown==3.3.2
MarkupSafe==1.1.1
MarkupSafe==2.1.0
mkdocs==1.1.2
mkdocs-htmlproofer-plugin==0.0.3
mkdocs-macros-plugin==0.4.20
nltk==3.5
nltk==3.7
nose==1.3.7
protobuf==3.14.0
numpy==1.21.2
pymdown-extensions==8.0
python-slugify==4.0.1
PyYAML==5.4.1
PyYAML==6.0
repackage==0.7.3
requests==2.25.1
singledispatch==3.4.0.3
@ -34,5 +34,6 @@ soupsieve==2.0.1
termcolor==1.1.0
tornado==6.1
Unidecode==1.1.1
urllib3>=1.26.5
Pygments>=2.7.4
urllib3>=1.26.8
Pygments>=2.11.2

View File

@ -1,10 +1,5 @@
---
machine_translated: true
machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3
---
# system.numbers_mt {#system-numbers-mt}
# 系统。numbers_mt {#system-numbers-mt}
一样的 [系统。数字](../../operations/system-tables/numbers.md) 但读取是并行的。 这些数字可以以任何顺序返回。
与[system.numbers](../../operations/system-tables/numbers.md)相似,但读取是并行的。 这些数字可以以任何顺序返回。
用于测试。

View File

@ -38,7 +38,8 @@ struct AggregateFunctionWithProperties
AggregateFunctionWithProperties(const AggregateFunctionWithProperties &) = default;
AggregateFunctionWithProperties & operator = (const AggregateFunctionWithProperties &) = default;
template <typename Creator, std::enable_if_t<!std::is_same_v<Creator, AggregateFunctionWithProperties>> * = nullptr>
template <typename Creator>
requires (!std::is_same_v<Creator, AggregateFunctionWithProperties>)
AggregateFunctionWithProperties(Creator creator_, AggregateFunctionProperties properties_ = {}) /// NOLINT
: creator(std::forward<Creator>(creator_)), properties(std::move(properties_))
{

View File

@ -569,6 +569,14 @@ if (ENABLE_TESTS)
clickhouse_common_zookeeper
string_utils)
if (TARGET ch_contrib::simdjson)
target_link_libraries(unit_tests_dbms PRIVATE ch_contrib::simdjson)
endif()
if(TARGET ch_contrib::rapidjson)
target_include_directories(unit_tests_dbms PRIVATE ch_contrib::rapidjson)
endif()
if (TARGET ch_contrib::yaml_cpp)
target_link_libraries(unit_tests_dbms PRIVATE ch_contrib::yaml_cpp)
endif()

View File

@ -1092,10 +1092,11 @@ void ClientBase::sendData(Block & sample, const ColumnsDescription & columns_des
try
{
auto metadata = storage->getInMemoryMetadataPtr();
sendDataFromPipe(
storage->read(
sample.getNames(),
storage->getInMemoryMetadataPtr(),
storage->getStorageSnapshot(metadata),
query_info,
global_context,
{},

View File

@ -297,7 +297,7 @@ ColumnPtr ColumnAggregateFunction::filter(const Filter & filter, ssize_t result_
{
size_t size = data.size();
if (size != filter.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filter.size(), size);
if (size == 0)
return cloneEmpty();

View File

@ -608,7 +608,7 @@ ColumnPtr ColumnArray::filterString(const Filter & filt, ssize_t result_size_hin
{
size_t col_size = getOffsets().size();
if (col_size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), col_size);
if (0 == col_size)
return ColumnArray::create(data);
@ -676,7 +676,7 @@ ColumnPtr ColumnArray::filterGeneric(const Filter & filt, ssize_t result_size_hi
{
size_t size = getOffsets().size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
if (size == 0)
return ColumnArray::create(data);
@ -1189,4 +1189,12 @@ void ColumnArray::gather(ColumnGathererStream & gatherer)
gatherer.gather(*this);
}
size_t ColumnArray::getNumberOfDimensions() const
{
const auto * nested_array = checkAndGetColumn<ColumnArray>(*data);
if (!nested_array)
return 1;
return 1 + nested_array->getNumberOfDimensions(); /// Every modern C++ compiler optimizes tail recursion.
}
}

View File

@ -169,6 +169,8 @@ public:
bool isCollationSupported() const override { return getData().isCollationSupported(); }
size_t getNumberOfDimensions() const;
private:
WrappedPtr data;
WrappedPtr offsets;

View File

@ -266,7 +266,7 @@ ColumnPtr ColumnDecimal<T>::filter(const IColumn::Filter & filt, ssize_t result_
{
size_t size = data.size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
auto res = this->create(0, scale);
Container & res_data = res->getData();

View File

@ -207,7 +207,7 @@ ColumnPtr ColumnFixedString::filter(const IColumn::Filter & filt, ssize_t result
{
size_t col_size = size();
if (col_size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), col_size);
auto res = ColumnFixedString::create(n);

View File

@ -144,15 +144,15 @@ public:
double getRatioOfDefaultRows(double sample_ratio) const override
{
return null_map->getRatioOfDefaultRows(sample_ratio);
return getRatioOfDefaultRowsImpl<ColumnNullable>(sample_ratio);
}
void getIndicesOfNonDefaultRows(Offsets & indices, size_t from, size_t limit) const override
{
null_map->getIndicesOfNonDefaultRows(indices, from, limit);
getIndicesOfNonDefaultRowsImpl<ColumnNullable>(indices, from, limit);
}
ColumnPtr createWithOffsets(const IColumn::Offsets & offsets, const Field & default_field, size_t total_rows, size_t shift) const override;
ColumnPtr createWithOffsets(const Offsets & offsets, const Field & default_field, size_t total_rows, size_t shift) const override;
bool isNullable() const override { return true; }
bool isFixedAndContiguous() const override { return false; }

View File

@ -0,0 +1,780 @@
#include <Core/Field.h>
#include <Columns/ColumnObject.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnArray.h>
#include <DataTypes/ObjectUtils.h>
#include <DataTypes/getLeastSupertype.h>
#include <DataTypes/DataTypeNothing.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/NestedUtils.h>
#include <Interpreters/castColumn.h>
#include <Interpreters/convertFieldToType.h>
#include <Common/HashTable/HashSet.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int ILLEGAL_COLUMN;
extern const int DUPLICATE_COLUMN;
extern const int NUMBER_OF_DIMENSIONS_MISMATHED;
extern const int NOT_IMPLEMENTED;
extern const int SIZES_OF_COLUMNS_DOESNT_MATCH;
}
namespace
{
/// Recreates column with default scalar values and keeps sizes of arrays.
ColumnPtr recreateColumnWithDefaultValues(
const ColumnPtr & column, const DataTypePtr & scalar_type, size_t num_dimensions)
{
const auto * column_array = checkAndGetColumn<ColumnArray>(column.get());
if (column_array && num_dimensions)
{
return ColumnArray::create(
recreateColumnWithDefaultValues(
column_array->getDataPtr(), scalar_type, num_dimensions - 1),
IColumn::mutate(column_array->getOffsetsPtr()));
}
return createArrayOfType(scalar_type, num_dimensions)->createColumn()->cloneResized(column->size());
}
/// Replaces NULL fields to given field or empty array.
class FieldVisitorReplaceNull : public StaticVisitor<Field>
{
public:
explicit FieldVisitorReplaceNull(
const Field & replacement_, size_t num_dimensions_)
: replacement(replacement_)
, num_dimensions(num_dimensions_)
{
}
Field operator()(const Null &) const
{
return num_dimensions
? createEmptyArrayField(num_dimensions)
: replacement;
}
Field operator()(const Array & x) const
{
assert(num_dimensions > 0);
const size_t size = x.size();
Array res(size);
for (size_t i = 0; i < size; ++i)
res[i] = applyVisitor(FieldVisitorReplaceNull(replacement, num_dimensions - 1), x[i]);
return res;
}
template <typename T>
Field operator()(const T & x) const { return x; }
private:
const Field & replacement;
size_t num_dimensions;
};
/// Calculates number of dimensions in array field.
/// Returns 0 for scalar fields.
class FieldVisitorToNumberOfDimensions : public StaticVisitor<size_t>
{
public:
size_t operator()(const Array & x) const
{
const size_t size = x.size();
std::optional<size_t> dimensions;
for (size_t i = 0; i < size; ++i)
{
/// Do not count Nulls, because they will be replaced by default
/// values with proper number of dimensions.
if (x[i].isNull())
continue;
size_t current_dimensions = applyVisitor(*this, x[i]);
if (!dimensions)
dimensions = current_dimensions;
else if (current_dimensions != *dimensions)
throw Exception(ErrorCodes::NUMBER_OF_DIMENSIONS_MISMATHED,
"Number of dimensions mismatched among array elements");
}
return 1 + dimensions.value_or(0);
}
template <typename T>
size_t operator()(const T &) const { return 0; }
};
/// Visitor that allows to get type of scalar field
/// or least common type of scalars in array.
/// More optimized version of FieldToDataType.
class FieldVisitorToScalarType : public StaticVisitor<>
{
public:
using FieldType = Field::Types::Which;
void operator()(const Array & x)
{
size_t size = x.size();
for (size_t i = 0; i < size; ++i)
applyVisitor(*this, x[i]);
}
void operator()(const UInt64 & x)
{
field_types.insert(FieldType::UInt64);
if (x <= std::numeric_limits<UInt8>::max())
type_indexes.insert(TypeIndex::UInt8);
else if (x <= std::numeric_limits<UInt16>::max())
type_indexes.insert(TypeIndex::UInt16);
else if (x <= std::numeric_limits<UInt32>::max())
type_indexes.insert(TypeIndex::UInt32);
else
type_indexes.insert(TypeIndex::UInt64);
}
void operator()(const Int64 & x)
{
field_types.insert(FieldType::Int64);
if (x <= std::numeric_limits<Int8>::max() && x >= std::numeric_limits<Int8>::min())
type_indexes.insert(TypeIndex::Int8);
else if (x <= std::numeric_limits<Int16>::max() && x >= std::numeric_limits<Int16>::min())
type_indexes.insert(TypeIndex::Int16);
else if (x <= std::numeric_limits<Int32>::max() && x >= std::numeric_limits<Int32>::min())
type_indexes.insert(TypeIndex::Int32);
else
type_indexes.insert(TypeIndex::Int64);
}
void operator()(const Null &)
{
have_nulls = true;
}
template <typename T>
void operator()(const T &)
{
field_types.insert(Field::TypeToEnum<NearestFieldType<T>>::value);
type_indexes.insert(TypeToTypeIndex<NearestFieldType<T>>);
}
DataTypePtr getScalarType() const { return getLeastSupertype(type_indexes, true); }
bool haveNulls() const { return have_nulls; }
bool needConvertField() const { return field_types.size() > 1; }
private:
TypeIndexSet type_indexes;
std::unordered_set<FieldType> field_types;
bool have_nulls = false;
};
}
FieldInfo getFieldInfo(const Field & field)
{
FieldVisitorToScalarType to_scalar_type_visitor;
applyVisitor(to_scalar_type_visitor, field);
return
{
to_scalar_type_visitor.getScalarType(),
to_scalar_type_visitor.haveNulls(),
to_scalar_type_visitor.needConvertField(),
applyVisitor(FieldVisitorToNumberOfDimensions(), field),
};
}
ColumnObject::Subcolumn::Subcolumn(MutableColumnPtr && data_, bool is_nullable_)
: least_common_type(getDataTypeByColumn(*data_))
, is_nullable(is_nullable_)
{
data.push_back(std::move(data_));
}
ColumnObject::Subcolumn::Subcolumn(
size_t size_, bool is_nullable_)
: least_common_type(std::make_shared<DataTypeNothing>())
, is_nullable(is_nullable_)
, num_of_defaults_in_prefix(size_)
{
}
size_t ColumnObject::Subcolumn::Subcolumn::size() const
{
size_t res = num_of_defaults_in_prefix;
for (const auto & part : data)
res += part->size();
return res;
}
size_t ColumnObject::Subcolumn::Subcolumn::byteSize() const
{
size_t res = 0;
for (const auto & part : data)
res += part->byteSize();
return res;
}
size_t ColumnObject::Subcolumn::Subcolumn::allocatedBytes() const
{
size_t res = 0;
for (const auto & part : data)
res += part->allocatedBytes();
return res;
}
void ColumnObject::Subcolumn::checkTypes() const
{
DataTypes prefix_types;
prefix_types.reserve(data.size());
for (size_t i = 0; i < data.size(); ++i)
{
auto current_type = getDataTypeByColumn(*data[i]);
prefix_types.push_back(current_type);
auto prefix_common_type = getLeastSupertype(prefix_types);
if (!prefix_common_type->equals(*current_type))
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Data type {} of column at position {} cannot represent all columns from i-th prefix",
current_type->getName(), i);
}
}
void ColumnObject::Subcolumn::insert(Field field)
{
auto info = getFieldInfo(field);
insert(std::move(field), std::move(info));
}
void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
{
auto base_type = info.scalar_type;
if (isNothing(base_type) && info.num_dimensions == 0)
{
insertDefault();
return;
}
auto column_dim = getNumberOfDimensions(*least_common_type);
auto value_dim = info.num_dimensions;
if (isNothing(least_common_type))
column_dim = value_dim;
if (field.isNull())
value_dim = column_dim;
if (value_dim != column_dim)
throw Exception(ErrorCodes::NUMBER_OF_DIMENSIONS_MISMATHED,
"Dimension of types mismatched between inserted value and column. "
"Dimension of value: {}. Dimension of column: {}",
value_dim, column_dim);
if (is_nullable)
base_type = makeNullable(base_type);
if (!is_nullable && info.have_nulls)
field = applyVisitor(FieldVisitorReplaceNull(base_type->getDefault(), value_dim), std::move(field));
auto value_type = createArrayOfType(base_type, value_dim);
bool type_changed = false;
if (data.empty())
{
data.push_back(value_type->createColumn());
least_common_type = value_type;
}
else if (!least_common_type->equals(*value_type))
{
value_type = getLeastSupertype(DataTypes{value_type, least_common_type}, true);
type_changed = true;
if (!least_common_type->equals(*value_type))
{
data.push_back(value_type->createColumn());
least_common_type = value_type;
}
}
if (type_changed || info.need_convert)
field = convertFieldToTypeOrThrow(field, *value_type);
data.back()->insert(field);
}
void ColumnObject::Subcolumn::insertRangeFrom(const Subcolumn & src, size_t start, size_t length)
{
assert(src.isFinalized());
const auto & src_column = src.data.back();
const auto & src_type = src.least_common_type;
if (data.empty())
{
least_common_type = src_type;
data.push_back(src_type->createColumn());
data.back()->insertRangeFrom(*src_column, start, length);
}
else if (least_common_type->equals(*src_type))
{
data.back()->insertRangeFrom(*src_column, start, length);
}
else
{
auto new_least_common_type = getLeastSupertype(DataTypes{least_common_type, src_type}, true);
auto casted_column = castColumn({src_column, src_type, ""}, new_least_common_type);
if (!least_common_type->equals(*new_least_common_type))
{
least_common_type = new_least_common_type;
data.push_back(least_common_type->createColumn());
}
data.back()->insertRangeFrom(*casted_column, start, length);
}
}
void ColumnObject::Subcolumn::finalize()
{
if (isFinalized() || data.empty())
return;
const auto & to_type = least_common_type;
auto result_column = to_type->createColumn();
if (num_of_defaults_in_prefix)
result_column->insertManyDefaults(num_of_defaults_in_prefix);
for (auto & part : data)
{
auto from_type = getDataTypeByColumn(*part);
size_t part_size = part->size();
if (!from_type->equals(*to_type))
{
auto offsets = ColumnUInt64::create();
auto & offsets_data = offsets->getData();
/// We need to convert only non-default values and then recreate column
/// with default value of new type, because default values (which represents misses in data)
/// may be inconsistent between types (e.g "0" in UInt64 and empty string in String).
part->getIndicesOfNonDefaultRows(offsets_data, 0, part_size);
if (offsets->size() == part_size)
{
part = castColumn({part, from_type, ""}, to_type);
}
else
{
auto values = part->index(*offsets, offsets->size());
values = castColumn({values, from_type, ""}, to_type);
part = values->createWithOffsets(offsets_data, to_type->getDefault(), part_size, /*shift=*/ 0);
}
}
result_column->insertRangeFrom(*part, 0, part_size);
}
data = { std::move(result_column) };
num_of_defaults_in_prefix = 0;
}
void ColumnObject::Subcolumn::insertDefault()
{
if (data.empty())
++num_of_defaults_in_prefix;
else
data.back()->insertDefault();
}
void ColumnObject::Subcolumn::insertManyDefaults(size_t length)
{
if (data.empty())
num_of_defaults_in_prefix += length;
else
data.back()->insertManyDefaults(length);
}
void ColumnObject::Subcolumn::popBack(size_t n)
{
assert(n <= size());
size_t num_removed = 0;
for (auto it = data.rbegin(); it != data.rend(); ++it)
{
if (n == 0)
break;
auto & column = *it;
if (n < column->size())
{
column->popBack(n);
n = 0;
}
else
{
++num_removed;
n -= column->size();
}
}
data.resize(data.size() - num_removed);
num_of_defaults_in_prefix -= n;
}
Field ColumnObject::Subcolumn::getLastField() const
{
if (data.empty())
return Field();
const auto & last_part = data.back();
assert(!last_part->empty());
return (*last_part)[last_part->size() - 1];
}
ColumnObject::Subcolumn ColumnObject::Subcolumn::recreateWithDefaultValues(const FieldInfo & field_info) const
{
auto scalar_type = field_info.scalar_type;
if (is_nullable)
scalar_type = makeNullable(scalar_type);
Subcolumn new_subcolumn;
new_subcolumn.least_common_type = createArrayOfType(scalar_type, field_info.num_dimensions);
new_subcolumn.is_nullable = is_nullable;
new_subcolumn.num_of_defaults_in_prefix = num_of_defaults_in_prefix;
new_subcolumn.data.reserve(data.size());
for (const auto & part : data)
new_subcolumn.data.push_back(recreateColumnWithDefaultValues(
part, scalar_type, field_info.num_dimensions));
return new_subcolumn;
}
IColumn & ColumnObject::Subcolumn::getFinalizedColumn()
{
assert(isFinalized());
return *data[0];
}
const IColumn & ColumnObject::Subcolumn::getFinalizedColumn() const
{
assert(isFinalized());
return *data[0];
}
const ColumnPtr & ColumnObject::Subcolumn::getFinalizedColumnPtr() const
{
assert(isFinalized());
return data[0];
}
ColumnObject::ColumnObject(bool is_nullable_)
: is_nullable(is_nullable_)
, num_rows(0)
{
}
ColumnObject::ColumnObject(SubcolumnsTree && subcolumns_, bool is_nullable_)
: is_nullable(is_nullable_)
, subcolumns(std::move(subcolumns_))
, num_rows(subcolumns.empty() ? 0 : (*subcolumns.begin())->data.size())
{
checkConsistency();
}
void ColumnObject::checkConsistency() const
{
if (subcolumns.empty())
return;
for (const auto & leaf : subcolumns)
{
if (num_rows != leaf->data.size())
{
throw Exception(ErrorCodes::LOGICAL_ERROR, "Sizes of subcolumns are inconsistent in ColumnObject."
" Subcolumn '{}' has {} rows, but expected size is {}",
leaf->path.getPath(), leaf->data.size(), num_rows);
}
}
}
size_t ColumnObject::size() const
{
#ifndef NDEBUG
checkConsistency();
#endif
return num_rows;
}
MutableColumnPtr ColumnObject::cloneResized(size_t new_size) const
{
/// cloneResized with new_size == 0 is used for cloneEmpty().
if (new_size != 0)
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
"ColumnObject doesn't support resize to non-zero length");
return ColumnObject::create(is_nullable);
}
size_t ColumnObject::byteSize() const
{
size_t res = 0;
for (const auto & entry : subcolumns)
res += entry->data.byteSize();
return res;
}
size_t ColumnObject::allocatedBytes() const
{
size_t res = 0;
for (const auto & entry : subcolumns)
res += entry->data.allocatedBytes();
return res;
}
void ColumnObject::forEachSubcolumn(ColumnCallback callback)
{
if (!isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot iterate over non-finalized ColumnObject");
for (auto & entry : subcolumns)
callback(entry->data.data.back());
}
void ColumnObject::insert(const Field & field)
{
const auto & object = field.get<const Object &>();
HashSet<StringRef, StringRefHash> inserted;
size_t old_size = size();
for (const auto & [key_str, value] : object)
{
PathInData key(key_str);
inserted.insert(key_str);
if (!hasSubcolumn(key))
addSubcolumn(key, old_size);
auto & subcolumn = getSubcolumn(key);
subcolumn.insert(value);
}
for (auto & entry : subcolumns)
if (!inserted.has(entry->path.getPath()))
entry->data.insertDefault();
++num_rows;
}
void ColumnObject::insertDefault()
{
for (auto & entry : subcolumns)
entry->data.insertDefault();
++num_rows;
}
Field ColumnObject::operator[](size_t n) const
{
if (!isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get Field from non-finalized ColumnObject");
Object object;
for (const auto & entry : subcolumns)
object[entry->path.getPath()] = (*entry->data.data.back())[n];
return object;
}
void ColumnObject::get(size_t n, Field & res) const
{
if (!isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get Field from non-finalized ColumnObject");
auto & object = res.get<Object &>();
for (const auto & entry : subcolumns)
{
auto it = object.try_emplace(entry->path.getPath()).first;
entry->data.data.back()->get(n, it->second);
}
}
void ColumnObject::insertRangeFrom(const IColumn & src, size_t start, size_t length)
{
const auto & src_object = assert_cast<const ColumnObject &>(src);
for (auto & entry : subcolumns)
{
if (src_object.hasSubcolumn(entry->path))
entry->data.insertRangeFrom(src_object.getSubcolumn(entry->path), start, length);
else
entry->data.insertManyDefaults(length);
}
num_rows += length;
finalize();
}
ColumnPtr ColumnObject::replicate(const Offsets & offsets) const
{
if (!isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot replicate non-finalized ColumnObject");
auto res_column = ColumnObject::create(is_nullable);
for (const auto & entry : subcolumns)
{
auto replicated_data = entry->data.data.back()->replicate(offsets)->assumeMutable();
res_column->addSubcolumn(entry->path, std::move(replicated_data));
}
return res_column;
}
void ColumnObject::popBack(size_t length)
{
for (auto & entry : subcolumns)
entry->data.popBack(length);
num_rows -= length;
}
const ColumnObject::Subcolumn & ColumnObject::getSubcolumn(const PathInData & key) const
{
if (const auto * node = subcolumns.findLeaf(key))
return node->data;
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "There is no subcolumn {} in ColumnObject", key.getPath());
}
ColumnObject::Subcolumn & ColumnObject::getSubcolumn(const PathInData & key)
{
if (const auto * node = subcolumns.findLeaf(key))
return const_cast<SubcolumnsTree::Node *>(node)->data;
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "There is no subcolumn {} in ColumnObject", key.getPath());
}
bool ColumnObject::hasSubcolumn(const PathInData & key) const
{
return subcolumns.findLeaf(key) != nullptr;
}
void ColumnObject::addSubcolumn(const PathInData & key, MutableColumnPtr && subcolumn)
{
size_t new_size = subcolumn->size();
bool inserted = subcolumns.add(key, Subcolumn(std::move(subcolumn), is_nullable));
if (!inserted)
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
if (num_rows == 0)
num_rows = new_size;
else if (new_size != num_rows)
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH,
"Size of subcolumn {} ({}) is inconsistent with column size ({})",
key.getPath(), new_size, num_rows);
}
void ColumnObject::addSubcolumn(const PathInData & key, size_t new_size)
{
bool inserted = subcolumns.add(key, Subcolumn(new_size, is_nullable));
if (!inserted)
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
if (num_rows == 0)
num_rows = new_size;
else if (new_size != num_rows)
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH,
"Required size of subcolumn {} ({}) is inconsistent with column size ({})",
key.getPath(), new_size, num_rows);
}
void ColumnObject::addNestedSubcolumn(const PathInData & key, const FieldInfo & field_info, size_t new_size)
{
if (!key.hasNested())
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Cannot add Nested subcolumn, because path doesn't contain Nested");
bool inserted = false;
/// We find node that represents the same Nested type as @key.
const auto * nested_node = subcolumns.findBestMatch(key);
if (nested_node)
{
/// Find any leaf of Nested subcolumn.
const auto * leaf = subcolumns.findLeaf(nested_node, [&](const auto &) { return true; });
assert(leaf);
/// Recreate subcolumn with default values and the same sizes of arrays.
auto new_subcolumn = leaf->data.recreateWithDefaultValues(field_info);
/// It's possible that we have already inserted value from current row
/// to this subcolumn. So, adjust size to expected.
if (new_subcolumn.size() > new_size)
new_subcolumn.popBack(new_subcolumn.size() - new_size);
assert(new_subcolumn.size() == new_size);
inserted = subcolumns.add(key, new_subcolumn);
}
else
{
/// If node was not found just add subcolumn with empty arrays.
inserted = subcolumns.add(key, Subcolumn(new_size, is_nullable));
}
if (!inserted)
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
if (num_rows == 0)
num_rows = new_size;
}
PathsInData ColumnObject::getKeys() const
{
PathsInData keys;
keys.reserve(subcolumns.size());
for (const auto & entry : subcolumns)
keys.emplace_back(entry->path);
return keys;
}
bool ColumnObject::isFinalized() const
{
return std::all_of(subcolumns.begin(), subcolumns.end(),
[](const auto & entry) { return entry->data.isFinalized(); });
}
void ColumnObject::finalize()
{
size_t old_size = size();
SubcolumnsTree new_subcolumns;
for (auto && entry : subcolumns)
{
const auto & least_common_type = entry->data.getLeastCommonType();
/// Do not add subcolumns, which consists only from NULLs.
if (isNothing(getBaseTypeOfArray(least_common_type)))
continue;
entry->data.finalize();
new_subcolumns.add(entry->path, entry->data);
}
/// If all subcolumns were skipped add a dummy subcolumn,
/// because Tuple type must have at least one element.
if (new_subcolumns.empty())
new_subcolumns.add(PathInData{COLUMN_NAME_DUMMY}, Subcolumn{ColumnUInt8::create(old_size, 0), is_nullable});
std::swap(subcolumns, new_subcolumns);
checkObjectHasNoAmbiguosPaths(getKeys());
}
}

219
src/Columns/ColumnObject.h Normal file
View File

@ -0,0 +1,219 @@
#pragma once
#include <Core/Field.h>
#include <Core/Names.h>
#include <Columns/IColumn.h>
#include <Common/PODArray.h>
#include <Common/HashTable/HashMap.h>
#include <DataTypes/Serializations/JSONDataParser.h>
#include <DataTypes/Serializations/SubcolumnsTree.h>
#include <DataTypes/IDataType.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
/// Info that represents a scalar or array field in a decomposed view.
/// It allows to recreate field with different number
/// of dimensions or nullability.
struct FieldInfo
{
/// The common type of of all scalars in field.
DataTypePtr scalar_type;
/// Do we have NULL scalar in field.
bool have_nulls;
/// If true then we have scalars with different types in array and
/// we need to convert scalars to the common type.
bool need_convert;
/// Number of dimension in array. 0 if field is scalar.
size_t num_dimensions;
};
FieldInfo getFieldInfo(const Field & field);
/** A column that represents object with dynamic set of subcolumns.
* Subcolumns are identified by paths in document and are stored in
* a trie-like structure. ColumnObject is not suitable for writing into tables
* and it should be converted to Tuple with fixed set of subcolumns before that.
*/
class ColumnObject final : public COWHelper<IColumn, ColumnObject>
{
public:
/** Class that represents one subcolumn.
* It stores values in several parts of column
* and keeps current common type of all parts.
* We add a new column part with a new type, when we insert a field,
* which can't be converted to the current common type.
* After insertion of all values subcolumn should be finalized
* for writing and other operations.
*/
class Subcolumn
{
public:
Subcolumn() = default;
Subcolumn(size_t size_, bool is_nullable_);
Subcolumn(MutableColumnPtr && data_, bool is_nullable_);
size_t size() const;
size_t byteSize() const;
size_t allocatedBytes() const;
bool isFinalized() const { return data.size() == 1 && num_of_defaults_in_prefix == 0; }
const DataTypePtr & getLeastCommonType() const { return least_common_type; }
/// Checks the consistency of column's parts stored in @data.
void checkTypes() const;
/// Inserts a field, which scalars can be arbitrary, but number of
/// dimensions should be consistent with current common type.
void insert(Field field);
void insert(Field field, FieldInfo info);
void insertDefault();
void insertManyDefaults(size_t length);
void insertRangeFrom(const Subcolumn & src, size_t start, size_t length);
void popBack(size_t n);
/// Converts all column's parts to the common type and
/// creates a single column that stores all values.
void finalize();
/// Returns last inserted field.
Field getLastField() const;
/// Recreates subcolumn with default scalar values and keeps sizes of arrays.
/// Used to create columns of type Nested with consistent array sizes.
Subcolumn recreateWithDefaultValues(const FieldInfo & field_info) const;
/// Returns single column if subcolumn in finalizes.
/// Otherwise -- undefined behaviour.
IColumn & getFinalizedColumn();
const IColumn & getFinalizedColumn() const;
const ColumnPtr & getFinalizedColumnPtr() const;
friend class ColumnObject;
private:
/// Current least common type of all values inserted to this subcolumn.
DataTypePtr least_common_type;
/// If true then common type type of subcolumn is Nullable
/// and default values are NULLs.
bool is_nullable = false;
/// Parts of column. Parts should be in increasing order in terms of subtypes/supertypes.
/// That means that the least common type for i-th prefix is the type of i-th part
/// and it's the supertype for all type of column from 0 to i-1.
std::vector<WrappedPtr> data;
/// Until we insert any non-default field we don't know further
/// least common type and we count number of defaults in prefix,
/// which will be converted to the default type of final common type.
size_t num_of_defaults_in_prefix = 0;
};
using SubcolumnsTree = SubcolumnsTree<Subcolumn>;
private:
/// If true then all subcolumns are nullable.
const bool is_nullable;
SubcolumnsTree subcolumns;
size_t num_rows;
public:
static constexpr auto COLUMN_NAME_DUMMY = "_dummy";
explicit ColumnObject(bool is_nullable_);
ColumnObject(SubcolumnsTree && subcolumns_, bool is_nullable_);
/// Checks that all subcolumns have consistent sizes.
void checkConsistency() const;
bool hasSubcolumn(const PathInData & key) const;
const Subcolumn & getSubcolumn(const PathInData & key) const;
Subcolumn & getSubcolumn(const PathInData & key);
void incrementNumRows() { ++num_rows; }
/// Adds a subcolumn from existing IColumn.
void addSubcolumn(const PathInData & key, MutableColumnPtr && subcolumn);
/// Adds a subcolumn of specific size with default values.
void addSubcolumn(const PathInData & key, size_t new_size);
/// Adds a subcolumn of type Nested of specific size with default values.
/// It cares about consistency of sizes of Nested arrays.
void addNestedSubcolumn(const PathInData & key, const FieldInfo & field_info, size_t new_size);
const SubcolumnsTree & getSubcolumns() const { return subcolumns; }
SubcolumnsTree & getSubcolumns() { return subcolumns; }
PathsInData getKeys() const;
/// Finalizes all subcolumns.
void finalize();
bool isFinalized() const;
/// Part of interface
const char * getFamilyName() const override { return "Object"; }
TypeIndex getDataType() const override { return TypeIndex::Object; }
size_t size() const override;
MutableColumnPtr cloneResized(size_t new_size) const override;
size_t byteSize() const override;
size_t allocatedBytes() const override;
void forEachSubcolumn(ColumnCallback callback) override;
void insert(const Field & field) override;
void insertDefault() override;
void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;
ColumnPtr replicate(const Offsets & offsets) const override;
void popBack(size_t length) override;
Field operator[](size_t n) const override;
void get(size_t n, Field & res) const override;
/// All other methods throw exception.
ColumnPtr decompress() const override { throwMustBeConcrete(); }
StringRef getDataAt(size_t) const override { throwMustBeConcrete(); }
bool isDefaultAt(size_t) const override { throwMustBeConcrete(); }
void insertData(const char *, size_t) override { throwMustBeConcrete(); }
StringRef serializeValueIntoArena(size_t, Arena &, char const *&) const override { throwMustBeConcrete(); }
const char * deserializeAndInsertFromArena(const char *) override { throwMustBeConcrete(); }
const char * skipSerializedInArena(const char *) const override { throwMustBeConcrete(); }
void updateHashWithValue(size_t, SipHash &) const override { throwMustBeConcrete(); }
void updateWeakHash32(WeakHash32 &) const override { throwMustBeConcrete(); }
void updateHashFast(SipHash &) const override { throwMustBeConcrete(); }
ColumnPtr filter(const Filter &, ssize_t) const override { throwMustBeConcrete(); }
void expand(const Filter &, bool) override { throwMustBeConcrete(); }
ColumnPtr permute(const Permutation &, size_t) const override { throwMustBeConcrete(); }
ColumnPtr index(const IColumn &, size_t) const override { throwMustBeConcrete(); }
int compareAt(size_t, size_t, const IColumn &, int) const override { throwMustBeConcrete(); }
void compareColumn(const IColumn &, size_t, PaddedPODArray<UInt64> *, PaddedPODArray<Int8> &, int, int) const override { throwMustBeConcrete(); }
bool hasEqualValues() const override { throwMustBeConcrete(); }
void getPermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &) const override { throwMustBeConcrete(); }
void updatePermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &, EqualRanges &) const override { throwMustBeConcrete(); }
MutableColumns scatter(ColumnIndex, const Selector &) const override { throwMustBeConcrete(); }
void gather(ColumnGathererStream &) override { throwMustBeConcrete(); }
void getExtremes(Field &, Field &) const override { throwMustBeConcrete(); }
size_t byteSizeAt(size_t) const override { throwMustBeConcrete(); }
double getRatioOfDefaultRows(double) const override { throwMustBeConcrete(); }
void getIndicesOfNonDefaultRows(Offsets &, size_t, size_t) const override { throwMustBeConcrete(); }
private:
[[noreturn]] static void throwMustBeConcrete()
{
throw Exception("ColumnObject must be converted to ColumnTuple before use", ErrorCodes::LOGICAL_ERROR);
}
};
}

View File

@ -288,7 +288,7 @@ void ColumnSparse::popBack(size_t n)
ColumnPtr ColumnSparse::filter(const Filter & filt, ssize_t) const
{
if (_size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), _size);
if (offsets->empty())
{

View File

@ -381,7 +381,7 @@ ColumnPtr ColumnVector<T>::filter(const IColumn::Filter & filt, ssize_t result_s
{
size_t size = data.size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
auto res = this->create();
Container & res_data = res->getData();
@ -450,7 +450,7 @@ void ColumnVector<T>::applyZeroMap(const IColumn::Filter & filt, bool inverted)
{
size_t size = data.size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
const UInt8 * filt_pos = filt.data();
const UInt8 * filt_end = filt_pos + size;

View File

@ -192,7 +192,7 @@ namespace
{
const size_t size = src_offsets.size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
ResultOffsetsBuilder result_offsets_builder(res_offsets);

View File

@ -883,8 +883,8 @@ public:
return toDayNum(years_lut[year - DATE_LUT_MIN_YEAR]);
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto toStartOfQuarterInterval(Date d, UInt64 quarters) const
{
if (quarters == 1)
@ -892,8 +892,8 @@ public:
return toStartOfMonthInterval(d, quarters * 3);
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto toStartOfMonthInterval(Date d, UInt64 months) const
{
if (months == 1)
@ -906,8 +906,8 @@ public:
return toDayNum(years_months_lut[month_total_index / months * months]);
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto toStartOfWeekInterval(Date d, UInt64 weeks) const
{
if (weeks == 1)
@ -920,8 +920,8 @@ public:
return ExtendedDayNum(4 + (d - 4) / days * days);
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline Time toStartOfDayInterval(Date d, UInt64 days) const
{
if (days == 1)
@ -1219,10 +1219,8 @@ public:
/// If resulting month has less deys than source month, then saturation can happen.
/// Example: 31 Aug + 1 month = 30 Sep.
template <
typename DateTime,
typename
= std::enable_if_t<std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>>>
template <typename DateTime>
requires std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>
inline Time NO_SANITIZE_UNDEFINED addMonths(DateTime t, Int64 delta) const
{
const auto result_day = addMonthsIndex(t, delta);
@ -1247,8 +1245,8 @@ public:
return res;
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto NO_SANITIZE_UNDEFINED addMonths(Date d, Int64 delta) const
{
if constexpr (std::is_same_v<Date, DayNum>)
@ -1280,10 +1278,8 @@ public:
}
/// Saturation can occur if 29 Feb is mapped to non-leap year.
template <
typename DateTime,
typename
= std::enable_if_t<std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>>>
template <typename DateTime>
requires std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>
inline Time addYears(DateTime t, Int64 delta) const
{
auto result_day = addYearsIndex(t, delta);
@ -1308,8 +1304,8 @@ public:
return res;
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto addYears(Date d, Int64 delta) const
{
if constexpr (std::is_same_v<Date, DayNum>)

View File

@ -613,8 +613,9 @@
M(642, CANNOT_PACK_ARCHIVE) \
M(643, CANNOT_UNPACK_ARCHIVE) \
M(644, REMOTE_FS_OBJECT_CACHE_ERROR) \
M(645, INVALID_TRANSACTION) \
M(646, SERIALIZATION_ERROR) \
M(645, NUMBER_OF_DIMENSIONS_MISMATHED) \
M(646, INVALID_TRANSACTION) \
M(647, SERIALIZATION_ERROR) \
\
M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \

View File

@ -205,7 +205,8 @@ void rethrowFirstException(const Exceptions & exceptions);
template <typename T>
std::enable_if_t<std::is_pointer_v<T>, T> exception_cast(std::exception_ptr e)
requires std::is_pointer_v<T>
T exception_cast(std::exception_ptr e)
{
try
{

View File

@ -46,6 +46,11 @@ public:
throw Exception("Cannot convert Map to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
}
T operator() (const Object &) const
{
throw Exception("Cannot convert Object to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
}
T operator() (const UInt64 & x) const { return T(x); }
T operator() (const Int64 & x) const { return T(x); }
T operator() (const Int128 & x) const { return T(x); }
@ -113,7 +118,8 @@ public:
throw Exception("Cannot convert AggregateFunctionStateData to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
}
template <typename U, typename = std::enable_if_t<is_big_int_v<U>> >
template <typename U>
requires is_big_int_v<U>
T operator() (const U & x) const
{
if constexpr (is_decimal<T>)

View File

@ -95,6 +95,23 @@ String FieldVisitorDump::operator() (const Map & x) const
return wb.str();
}
String FieldVisitorDump::operator() (const Object & x) const
{
WriteBufferFromOwnString wb;
wb << "Object_(";
for (auto it = x.begin(); it != x.end(); ++it)
{
if (it != x.begin())
wb << ", ";
wb << "(" << it->first << ", " << applyVisitor(*this, it->second) << ")";
}
wb << ')';
return wb.str();
}
String FieldVisitorDump::operator() (const AggregateFunctionStateData & x) const
{
WriteBufferFromOwnString wb;

View File

@ -22,6 +22,7 @@ public:
String operator() (const Array & x) const;
String operator() (const Tuple & x) const;
String operator() (const Map & x) const;
String operator() (const Object & x) const;
String operator() (const DecimalField<Decimal32> & x) const;
String operator() (const DecimalField<Decimal64> & x) const;
String operator() (const DecimalField<Decimal128> & x) const;

View File

@ -94,6 +94,19 @@ void FieldVisitorHash::operator() (const Array & x) const
applyVisitor(*this, elem);
}
void FieldVisitorHash::operator() (const Object & x) const
{
UInt8 type = Field::Types::Object;
hash.update(type);
hash.update(x.size());
for (const auto & [key, value]: x)
{
hash.update(key);
applyVisitor(*this, value);
}
}
void FieldVisitorHash::operator() (const DecimalField<Decimal32> & x) const
{
UInt8 type = Field::Types::Decimal32;

View File

@ -28,6 +28,7 @@ public:
void operator() (const Array & x) const;
void operator() (const Tuple & x) const;
void operator() (const Map & x) const;
void operator() (const Object & x) const;
void operator() (const DecimalField<Decimal32> & x) const;
void operator() (const DecimalField<Decimal64> & x) const;
void operator() (const DecimalField<Decimal128> & x) const;

View File

@ -26,6 +26,7 @@ bool FieldVisitorSum::operator() (String &) const { throw Exception("Cannot sum
bool FieldVisitorSum::operator() (Array &) const { throw Exception("Cannot sum Arrays", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (Tuple &) const { throw Exception("Cannot sum Tuples", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (Map &) const { throw Exception("Cannot sum Maps", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (Object &) const { throw Exception("Cannot sum Objects", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (UUID &) const { throw Exception("Cannot sum UUIDs", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (AggregateFunctionStateData &) const

View File

@ -25,6 +25,7 @@ public:
bool operator() (Array &) const;
bool operator() (Tuple &) const;
bool operator() (Map &) const;
bool operator() (Object &) const;
bool operator() (UUID &) const;
bool operator() (AggregateFunctionStateData &) const;
bool operator() (bool &) const;
@ -36,7 +37,8 @@ public:
return x.getValue() != T(0);
}
template <typename T, typename = std::enable_if_t<is_big_int_v<T>> >
template <typename T>
requires is_big_int_v<T>
bool operator() (T & x) const
{
x += rhs.reinterpret<T>();

View File

@ -126,5 +126,24 @@ String FieldVisitorToString::operator() (const Map & x) const
return wb.str();
}
String FieldVisitorToString::operator() (const Object & x) const
{
WriteBufferFromOwnString wb;
wb << '{';
for (auto it = x.begin(); it != x.end(); ++it)
{
if (it != x.begin())
wb << ", ";
writeDoubleQuoted(it->first, wb);
wb << ": " << applyVisitor(*this, it->second);
}
wb << '}';
return wb.str();
}
}

View File

@ -22,6 +22,7 @@ public:
String operator() (const Array & x) const;
String operator() (const Tuple & x) const;
String operator() (const Map & x) const;
String operator() (const Object & x) const;
String operator() (const DecimalField<Decimal32> & x) const;
String operator() (const DecimalField<Decimal64> & x) const;
String operator() (const DecimalField<Decimal128> & x) const;

View File

@ -66,6 +66,20 @@ void FieldVisitorWriteBinary::operator() (const Map & x, WriteBuffer & buf) cons
}
}
void FieldVisitorWriteBinary::operator() (const Object & x, WriteBuffer & buf) const
{
const size_t size = x.size();
writeBinary(size, buf);
for (const auto & [key, value] : x)
{
const UInt8 type = value.getType();
writeBinary(type, buf);
writeBinary(key, buf);
Field::dispatch([&buf] (const auto & val) { FieldVisitorWriteBinary()(val, buf); }, value);
}
}
void FieldVisitorWriteBinary::operator()(const bool & x, WriteBuffer & buf) const
{
writeBinary(UInt8(x), buf);

View File

@ -21,6 +21,7 @@ public:
void operator() (const Array & x, WriteBuffer & buf) const;
void operator() (const Tuple & x, WriteBuffer & buf) const;
void operator() (const Map & x, WriteBuffer & buf) const;
void operator() (const Object & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal32> & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal64> & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal128> & x, WriteBuffer & buf) const;

View File

@ -130,6 +130,7 @@ public:
IntervalTree() { nodes.resize(1); }
template <typename TValue = Value, std::enable_if_t<std::is_same_v<TValue, IntervalTreeVoidValue>, bool> = true>
requires std::is_same_v<Value, IntervalTreeVoidValue>
ALWAYS_INLINE bool emplace(Interval interval)
{
assert(!tree_is_built);

View File

@ -76,7 +76,8 @@ public:
void add(const char * value) { add(std::make_unique<JSONString>(value)); }
void add(bool value) { add(std::make_unique<JSONBool>(std::move(value))); }
template <typename T, std::enable_if_t<std::is_arithmetic_v<T>, bool> = true>
template <typename T>
requires std::is_arithmetic_v<T>
void add(T value) { add(std::make_unique<JSONNumber<T>>(value)); }
void format(const FormatSettings & settings, FormatContext & context) override;
@ -100,7 +101,8 @@ public:
void add(std::string key, std::string_view value) { add(std::move(key), std::make_unique<JSONString>(value)); }
void add(std::string key, bool value) { add(std::move(key), std::make_unique<JSONBool>(std::move(value))); }
template <typename T, std::enable_if_t<std::is_arithmetic_v<T>, bool> = true>
template <typename T>
requires std::is_arithmetic_v<T>
void add(std::string key, T value) { add(std::move(key), std::make_unique<JSONNumber<T>>(value)); }
void format(const FormatSettings & settings, FormatContext & context) override;

View File

@ -82,7 +82,8 @@ private:
#endif
public:
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
StringSearcher(const CharT * needle_, const size_t needle_size_)
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_size{needle_size_}
{
@ -191,7 +192,8 @@ public:
#endif
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compareTrivial(const CharT * haystack_pos, const CharT * const haystack_end, const uint8_t * needle_pos) const
{
while (haystack_pos < haystack_end && needle_pos < needle_end)
@ -217,7 +219,8 @@ public:
return needle_pos == needle_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * haystack_end, const CharT * pos) const
{
@ -262,7 +265,8 @@ public:
/** Returns haystack_end if not found.
*/
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
if (0 == needle_size)
@ -338,7 +342,8 @@ public:
return haystack_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
@ -367,7 +372,8 @@ private:
#endif
public:
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
StringSearcher(const CharT * needle_, const size_t needle_size)
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_end{needle + needle_size}
{
@ -399,7 +405,8 @@ public:
#endif
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * /*haystack_end*/, const CharT * pos) const
{
#ifdef __SSE4_1__
@ -453,7 +460,8 @@ public:
return false;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
if (needle == needle_end)
@ -540,7 +548,8 @@ public:
return haystack_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
@ -568,7 +577,8 @@ private:
#endif
public:
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
StringSearcher(const CharT * needle_, const size_t needle_size)
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_end{needle + needle_size}
{
@ -596,7 +606,8 @@ public:
#endif
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * /*haystack_end*/, const CharT * pos) const
{
#ifdef __SSE4_1__
@ -642,7 +653,8 @@ public:
return false;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
if (needle == needle_end)
@ -722,7 +734,8 @@ public:
return haystack_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
@ -740,7 +753,8 @@ class TokenSearcher : public StringSearcherBase
size_t needle_size;
public:
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
TokenSearcher(const CharT * needle_, const size_t needle_size_)
: searcher{needle_, needle_size_},
needle_size(needle_size_)
@ -752,7 +766,8 @@ public:
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compare(const CharT * haystack, const CharT * haystack_end, const CharT * pos) const
{
// use searcher only if pos is in the beginning of token and pos + searcher.needle_size is end of token.
@ -762,7 +777,8 @@ public:
return false;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
// use searcher.search(), then verify that returned value is a token
@ -781,13 +797,15 @@ public:
return haystack_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool isToken(const CharT * haystack, const CharT * const haystack_end, const CharT* p) const
{
return (p == haystack || isTokenSeparator(*(p - 1)))
@ -819,11 +837,13 @@ struct LibCASCIICaseSensitiveStringSearcher : public StringSearcherBase
{
const char * const needle;
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
LibCASCIICaseSensitiveStringSearcher(const CharT * const needle_, const size_t /* needle_size */)
: needle(reinterpret_cast<const char *>(needle_)) {}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
const auto * res = strstr(reinterpret_cast<const char *>(haystack), reinterpret_cast<const char *>(needle));
@ -832,7 +852,8 @@ struct LibCASCIICaseSensitiveStringSearcher : public StringSearcherBase
return reinterpret_cast<const CharT *>(res);
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
@ -843,11 +864,13 @@ struct LibCASCIICaseInsensitiveStringSearcher : public StringSearcherBase
{
const char * const needle;
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
LibCASCIICaseInsensitiveStringSearcher(const CharT * const needle_, const size_t /* needle_size */)
: needle(reinterpret_cast<const char *>(needle_)) {}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
const auto * res = strcasestr(reinterpret_cast<const char *>(haystack), reinterpret_cast<const char *>(needle));
@ -856,7 +879,8 @@ struct LibCASCIICaseInsensitiveStringSearcher : public StringSearcherBase
return reinterpret_cast<const CharT *>(res);
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);

View File

@ -75,7 +75,8 @@ inline size_t countCodePoints(const UInt8 * data, size_t size)
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
size_t convertCodePointToUTF8(int code_point, CharT * out_bytes, size_t out_length)
{
static const Poco::UTF8Encoding utf8;
@ -84,7 +85,8 @@ size_t convertCodePointToUTF8(int code_point, CharT * out_bytes, size_t out_leng
return res;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
std::optional<uint32_t> convertUTF8ToCodePoint(const CharT * in_bytes, size_t in_length)
{
static const Poco::UTF8Encoding utf8;

View File

@ -13,6 +13,9 @@
#cmakedefine01 USE_CASSANDRA
#cmakedefine01 USE_SENTRY
#cmakedefine01 USE_GRPC
#cmakedefine01 USE_SIMDJSON
#cmakedefine01 USE_RAPIDJSON
#cmakedefine01 USE_DATASKETCHES
#cmakedefine01 USE_YAML_CPP
#cmakedefine01 CLICKHOUSE_SPLIT_BINARY

View File

@ -25,7 +25,8 @@ namespace DB
* In the rest, behaves like a dynamic_cast.
*/
template <typename To, typename From>
std::enable_if_t<std::is_reference_v<To>, To> typeid_cast(From & from)
requires std::is_reference_v<To>
To typeid_cast(From & from)
{
try
{
@ -43,7 +44,8 @@ std::enable_if_t<std::is_reference_v<To>, To> typeid_cast(From & from)
template <typename To, typename From>
std::enable_if_t<std::is_pointer_v<To>, To> typeid_cast(From * from)
requires std::is_pointer_v<To>
To typeid_cast(From * from)
{
try
{
@ -60,7 +62,8 @@ std::enable_if_t<std::is_pointer_v<To>, To> typeid_cast(From * from)
template <typename To, typename From>
std::enable_if_t<is_shared_ptr_v<To>, To> typeid_cast(const std::shared_ptr<From> & from)
requires is_shared_ptr_v<To>
To typeid_cast(const std::shared_ptr<From> & from)
{
try
{

View File

@ -37,7 +37,7 @@ void CoordinationSettings::loadFromConfig(const String & config_elem, const Poco
}
const String KeeperConfigurationAndSettings::DEFAULT_FOUR_LETTER_WORD_CMD = "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro";
const String KeeperConfigurationAndSettings::DEFAULT_FOUR_LETTER_WORD_CMD = "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro";
KeeperConfigurationAndSettings::KeeperConfigurationAndSettings()
: server_id(NOT_EXIST)

View File

@ -726,18 +726,6 @@ void convertToFullIfSparse(Block & block)
column.column = recursiveRemoveSparse(column.column);
}
ColumnPtr getColumnFromBlock(const Block & block, const NameAndTypePair & column)
{
auto current_column = block.getByName(column.getNameInStorage()).column;
current_column = current_column->decompress();
if (column.isSubcolumn())
return column.getTypeInStorage()->getSubcolumn(column.getSubcolumnName(), current_column);
return current_column;
}
Block materializeBlock(const Block & block)
{
if (!block)

View File

@ -196,10 +196,6 @@ void getBlocksDifference(const Block & lhs, const Block & rhs, std::string & out
void convertToFullIfSparse(Block & block);
/// Helps in-memory storages to extract columns from block.
/// Properly handles cases, when column is a subcolumn and when it is compressed.
ColumnPtr getColumnFromBlock(const Block & block, const NameAndTypePair & column);
/// Converts columns-constants to full columns ("materializes" them).
Block materializeBlock(const Block & block);
void materializeBlockInplace(Block & block);

View File

@ -115,8 +115,8 @@ private:
}
template <typename T, typename U>
static std::enable_if_t<is_decimal<T> && is_decimal<U>, Shift>
getScales(const DataTypePtr & left_type, const DataTypePtr & right_type)
requires is_decimal<T> && is_decimal<U>
static Shift getScales(const DataTypePtr & left_type, const DataTypePtr & right_type)
{
const DataTypeDecimalBase<T> * decimal0 = checkDecimalBase<T>(*left_type);
const DataTypeDecimalBase<U> * decimal1 = checkDecimalBase<U>(*right_type);
@ -137,8 +137,8 @@ private:
}
template <typename T, typename U>
static std::enable_if_t<is_decimal<T> && !is_decimal<U>, Shift>
getScales(const DataTypePtr & left_type, const DataTypePtr &)
requires is_decimal<T> && (!is_decimal<U>)
static Shift getScales(const DataTypePtr & left_type, const DataTypePtr &)
{
Shift shift;
const DataTypeDecimalBase<T> * decimal0 = checkDecimalBase<T>(*left_type);
@ -148,8 +148,8 @@ private:
}
template <typename T, typename U>
static std::enable_if_t<!is_decimal<T> && is_decimal<U>, Shift>
getScales(const DataTypePtr &, const DataTypePtr & right_type)
requires (!is_decimal<T>) && is_decimal<U>
static Shift getScales(const DataTypePtr &, const DataTypePtr & right_type)
{
Shift shift;
const DataTypeDecimalBase<U> * decimal1 = checkDecimalBase<U>(*right_type);

View File

@ -99,6 +99,12 @@ inline Field getBinaryValue(UInt8 type, ReadBuffer & buf)
readBinary(value, buf);
return value;
}
case Field::Types::Object:
{
Object value;
readBinary(value, buf);
return value;
}
case Field::Types::AggregateFunctionState:
{
AggregateFunctionStateData value;
@ -208,6 +214,40 @@ void writeText(const Map & x, WriteBuffer & buf)
writeFieldText(Field(x), buf);
}
void readBinary(Object & x, ReadBuffer & buf)
{
size_t size;
readBinary(size, buf);
for (size_t index = 0; index < size; ++index)
{
UInt8 type;
String key;
readBinary(type, buf);
readBinary(key, buf);
x[key] = getBinaryValue(type, buf);
}
}
void writeBinary(const Object & x, WriteBuffer & buf)
{
const size_t size = x.size();
writeBinary(size, buf);
for (const auto & [key, value] : x)
{
const UInt8 type = value.getType();
writeBinary(type, buf);
writeBinary(key, buf);
Field::dispatch([&buf] (const auto & val) { FieldVisitorWriteBinary()(val, buf); }, value);
}
}
void writeText(const Object & x, WriteBuffer & buf)
{
writeFieldText(Field(x), buf);
}
template <typename T>
void readQuoted(DecimalField<T> & x, ReadBuffer & buf)
{

View File

@ -3,6 +3,7 @@
#include <cassert>
#include <vector>
#include <algorithm>
#include <map>
#include <type_traits>
#include <functional>
@ -49,10 +50,22 @@ DEFINE_FIELD_VECTOR(Array);
DEFINE_FIELD_VECTOR(Tuple);
/// An array with the following structure: [(key1, value1), (key2, value2), ...]
DEFINE_FIELD_VECTOR(Map);
DEFINE_FIELD_VECTOR(Map); /// TODO: use map instead of vector.
#undef DEFINE_FIELD_VECTOR
using FieldMap = std::map<String, Field, std::less<String>, AllocatorWithMemoryTracking<std::pair<const String, Field>>>;
#define DEFINE_FIELD_MAP(X) \
struct X : public FieldMap \
{ \
using FieldMap::FieldMap; \
}
DEFINE_FIELD_MAP(Object);
#undef DEFINE_FIELD_MAP
struct AggregateFunctionStateData
{
String name; /// Name with arguments.
@ -219,6 +232,7 @@ template <> struct NearestFieldTypeImpl<String> { using Type = String; };
template <> struct NearestFieldTypeImpl<Array> { using Type = Array; };
template <> struct NearestFieldTypeImpl<Tuple> { using Type = Tuple; };
template <> struct NearestFieldTypeImpl<Map> { using Type = Map; };
template <> struct NearestFieldTypeImpl<Object> { using Type = Object; };
template <> struct NearestFieldTypeImpl<bool> { using Type = UInt64; };
template <> struct NearestFieldTypeImpl<Null> { using Type = Null; };
@ -283,6 +297,7 @@ public:
Map = 26,
UUID = 27,
Bool = 28,
Object = 29,
};
};
@ -472,6 +487,7 @@ public:
case Types::Array: return get<Array>() < rhs.get<Array>();
case Types::Tuple: return get<Tuple>() < rhs.get<Tuple>();
case Types::Map: return get<Map>() < rhs.get<Map>();
case Types::Object: return get<Object>() < rhs.get<Object>();
case Types::Decimal32: return get<DecimalField<Decimal32>>() < rhs.get<DecimalField<Decimal32>>();
case Types::Decimal64: return get<DecimalField<Decimal64>>() < rhs.get<DecimalField<Decimal64>>();
case Types::Decimal128: return get<DecimalField<Decimal128>>() < rhs.get<DecimalField<Decimal128>>();
@ -510,6 +526,7 @@ public:
case Types::Array: return get<Array>() <= rhs.get<Array>();
case Types::Tuple: return get<Tuple>() <= rhs.get<Tuple>();
case Types::Map: return get<Map>() <= rhs.get<Map>();
case Types::Object: return get<Object>() <= rhs.get<Object>();
case Types::Decimal32: return get<DecimalField<Decimal32>>() <= rhs.get<DecimalField<Decimal32>>();
case Types::Decimal64: return get<DecimalField<Decimal64>>() <= rhs.get<DecimalField<Decimal64>>();
case Types::Decimal128: return get<DecimalField<Decimal128>>() <= rhs.get<DecimalField<Decimal128>>();
@ -548,6 +565,7 @@ public:
case Types::Array: return get<Array>() == rhs.get<Array>();
case Types::Tuple: return get<Tuple>() == rhs.get<Tuple>();
case Types::Map: return get<Map>() == rhs.get<Map>();
case Types::Object: return get<Object>() == rhs.get<Object>();
case Types::UInt128: return get<UInt128>() == rhs.get<UInt128>();
case Types::UInt256: return get<UInt256>() == rhs.get<UInt256>();
case Types::Int128: return get<Int128>() == rhs.get<Int128>();
@ -597,6 +615,7 @@ public:
bool value = bool(field.template get<UInt64>());
return f(value);
}
case Types::Object: return f(field.template get<Object>());
case Types::Decimal32: return f(field.template get<DecimalField<Decimal32>>());
case Types::Decimal64: return f(field.template get<DecimalField<Decimal64>>());
case Types::Decimal128: return f(field.template get<DecimalField<Decimal128>>());
@ -713,6 +732,9 @@ private:
case Types::Map:
destroy<Map>();
break;
case Types::Object:
destroy<Object>();
break;
case Types::AggregateFunctionState:
destroy<AggregateFunctionStateData>();
break;
@ -737,26 +759,27 @@ private:
using Row = std::vector<Field>;
template <> struct Field::TypeToEnum<Null> { static const Types::Which value = Types::Null; };
template <> struct Field::TypeToEnum<UInt64> { static const Types::Which value = Types::UInt64; };
template <> struct Field::TypeToEnum<UInt128> { static const Types::Which value = Types::UInt128; };
template <> struct Field::TypeToEnum<UInt256> { static const Types::Which value = Types::UInt256; };
template <> struct Field::TypeToEnum<Int64> { static const Types::Which value = Types::Int64; };
template <> struct Field::TypeToEnum<Int128> { static const Types::Which value = Types::Int128; };
template <> struct Field::TypeToEnum<Int256> { static const Types::Which value = Types::Int256; };
template <> struct Field::TypeToEnum<UUID> { static const Types::Which value = Types::UUID; };
template <> struct Field::TypeToEnum<Float64> { static const Types::Which value = Types::Float64; };
template <> struct Field::TypeToEnum<String> { static const Types::Which value = Types::String; };
template <> struct Field::TypeToEnum<Array> { static const Types::Which value = Types::Array; };
template <> struct Field::TypeToEnum<Tuple> { static const Types::Which value = Types::Tuple; };
template <> struct Field::TypeToEnum<Map> { static const Types::Which value = Types::Map; };
template <> struct Field::TypeToEnum<DecimalField<Decimal32>>{ static const Types::Which value = Types::Decimal32; };
template <> struct Field::TypeToEnum<DecimalField<Decimal64>>{ static const Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<DecimalField<Decimal128>>{ static const Types::Which value = Types::Decimal128; };
template <> struct Field::TypeToEnum<DecimalField<Decimal256>>{ static const Types::Which value = Types::Decimal256; };
template <> struct Field::TypeToEnum<DecimalField<DateTime64>>{ static const Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<AggregateFunctionStateData>{ static const Types::Which value = Types::AggregateFunctionState; };
template <> struct Field::TypeToEnum<bool>{ static const Types::Which value = Types::Bool; };
template <> struct Field::TypeToEnum<Null> { static constexpr Types::Which value = Types::Null; };
template <> struct Field::TypeToEnum<UInt64> { static constexpr Types::Which value = Types::UInt64; };
template <> struct Field::TypeToEnum<UInt128> { static constexpr Types::Which value = Types::UInt128; };
template <> struct Field::TypeToEnum<UInt256> { static constexpr Types::Which value = Types::UInt256; };
template <> struct Field::TypeToEnum<Int64> { static constexpr Types::Which value = Types::Int64; };
template <> struct Field::TypeToEnum<Int128> { static constexpr Types::Which value = Types::Int128; };
template <> struct Field::TypeToEnum<Int256> { static constexpr Types::Which value = Types::Int256; };
template <> struct Field::TypeToEnum<UUID> { static constexpr Types::Which value = Types::UUID; };
template <> struct Field::TypeToEnum<Float64> { static constexpr Types::Which value = Types::Float64; };
template <> struct Field::TypeToEnum<String> { static constexpr Types::Which value = Types::String; };
template <> struct Field::TypeToEnum<Array> { static constexpr Types::Which value = Types::Array; };
template <> struct Field::TypeToEnum<Tuple> { static constexpr Types::Which value = Types::Tuple; };
template <> struct Field::TypeToEnum<Map> { static constexpr Types::Which value = Types::Map; };
template <> struct Field::TypeToEnum<Object> { static constexpr Types::Which value = Types::Object; };
template <> struct Field::TypeToEnum<DecimalField<Decimal32>>{ static constexpr Types::Which value = Types::Decimal32; };
template <> struct Field::TypeToEnum<DecimalField<Decimal64>>{ static constexpr Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<DecimalField<Decimal128>>{ static constexpr Types::Which value = Types::Decimal128; };
template <> struct Field::TypeToEnum<DecimalField<Decimal256>>{ static constexpr Types::Which value = Types::Decimal256; };
template <> struct Field::TypeToEnum<DecimalField<DateTime64>>{ static constexpr Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<AggregateFunctionStateData>{ static constexpr Types::Which value = Types::AggregateFunctionState; };
template <> struct Field::TypeToEnum<bool>{ static constexpr Types::Which value = Types::Bool; };
template <> struct Field::EnumToType<Field::Types::Null> { using Type = Null; };
template <> struct Field::EnumToType<Field::Types::UInt64> { using Type = UInt64; };
@ -771,6 +794,7 @@ template <> struct Field::EnumToType<Field::Types::String> { using Type = Strin
template <> struct Field::EnumToType<Field::Types::Array> { using Type = Array; };
template <> struct Field::EnumToType<Field::Types::Tuple> { using Type = Tuple; };
template <> struct Field::EnumToType<Field::Types::Map> { using Type = Map; };
template <> struct Field::EnumToType<Field::Types::Object> { using Type = Object; };
template <> struct Field::EnumToType<Field::Types::Decimal32> { using Type = DecimalField<Decimal32>; };
template <> struct Field::EnumToType<Field::Types::Decimal64> { using Type = DecimalField<Decimal64>; };
template <> struct Field::EnumToType<Field::Types::Decimal128> { using Type = DecimalField<Decimal128>; };
@ -931,34 +955,39 @@ class WriteBuffer;
/// It is assumed that all elements of the array have the same type.
void readBinary(Array & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Array &, ReadBuffer &) { throw Exception("Cannot read Array.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Array &, ReadBuffer &) { throw Exception("Cannot read Array.", ErrorCodes::NOT_IMPLEMENTED); }
/// It is assumed that all elements of the array have the same type.
/// Also write size and type into buf. UInt64 and Int64 is written in variadic size form
void writeBinary(const Array & x, WriteBuffer & buf);
void writeText(const Array & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Array &, WriteBuffer &) { throw Exception("Cannot write Array quoted.", ErrorCodes::NOT_IMPLEMENTED); }
void readBinary(Tuple & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Tuple &, ReadBuffer &) { throw Exception("Cannot read Tuple.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Tuple &, ReadBuffer &) { throw Exception("Cannot read Tuple.", ErrorCodes::NOT_IMPLEMENTED); }
void writeBinary(const Tuple & x, WriteBuffer & buf);
void writeText(const Tuple & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Tuple &, WriteBuffer &) { throw Exception("Cannot write Tuple quoted.", ErrorCodes::NOT_IMPLEMENTED); }
void readBinary(Map & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Map &, ReadBuffer &) { throw Exception("Cannot read Map.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Map &, ReadBuffer &) { throw Exception("Cannot read Map.", ErrorCodes::NOT_IMPLEMENTED); }
void writeBinary(const Map & x, WriteBuffer & buf);
void writeText(const Map & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Map &, WriteBuffer &) { throw Exception("Cannot write Map quoted.", ErrorCodes::NOT_IMPLEMENTED); }
void readBinary(Object & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Object &, ReadBuffer &) { throw Exception("Cannot read Object.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Object &, ReadBuffer &) { throw Exception("Cannot read Object.", ErrorCodes::NOT_IMPLEMENTED); }
void writeBinary(const Object & x, WriteBuffer & buf);
void writeText(const Object & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Object &, WriteBuffer &) { throw Exception("Cannot write Object quoted.", ErrorCodes::NOT_IMPLEMENTED); }
__attribute__ ((noreturn)) inline void writeText(const AggregateFunctionStateData &, WriteBuffer &)
{
// This probably doesn't make any sense, but we have to have it for
@ -977,8 +1006,6 @@ void readQuoted(DecimalField<T> & x, ReadBuffer & buf);
void writeFieldText(const Field & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Tuple &, WriteBuffer &) { throw Exception("Cannot write Tuple quoted.", ErrorCodes::NOT_IMPLEMENTED); }
String toString(const Field & x);
}

View File

@ -53,7 +53,8 @@ struct MultiEnum
return bitset;
}
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
template <typename ValueType>
requires std::is_convertible_v<ValueType, StorageType>
void setValue(ValueType new_value)
{
// Can't set value from any enum avoid confusion
@ -66,7 +67,8 @@ struct MultiEnum
return bitset == other.bitset;
}
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
template <typename ValueType>
requires std::is_convertible_v<ValueType, StorageType>
bool operator==(ValueType other) const
{
// Shouldn't be comparable with any enum to avoid confusion
@ -80,13 +82,15 @@ struct MultiEnum
return !(*this == other);
}
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
template <typename ValueType>
requires std::is_convertible_v<ValueType, StorageType>
friend bool operator==(ValueType left, MultiEnum right)
{
return right.operator==(left);
}
template <typename L, typename = typename std::enable_if<!std::is_same_v<L, MultiEnum>>::type>
template <typename L>
requires (!std::is_same_v<L, MultiEnum>)
friend bool operator!=(L left, MultiEnum right)
{
return !(right.operator==(left));

View File

@ -44,6 +44,7 @@ class IColumn;
M(UInt64, min_insert_block_size_bytes_for_materialized_views, 0, "Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)", 0) \
M(UInt64, max_joined_block_size_rows, DEFAULT_BLOCK_SIZE, "Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.", 0) \
M(UInt64, max_insert_threads, 0, "The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see 'max_threads' setting.", 0) \
M(UInt64, max_insert_delayed_streams_for_parallel_write, 0, "The maximum number of streams (columns) to delay final part flush. Default - auto (1000 in case of underlying storage supports parallel write, for example S3 and disabled otherwise)", 0) \
M(UInt64, max_final_threads, 16, "The maximum number of threads to read from table with FINAL.", 0) \
M(MaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
M(UInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.", 0) \
@ -136,7 +137,7 @@ class IColumn;
\
M(Bool, skip_unavailable_shards, false, "If true, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.", 0) \
\
M(UInt64, parallel_distributed_insert_select, 0, "Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard", 0) \
M(UInt64, parallel_distributed_insert_select, 0, "Process distributed INSERT SELECT query in the same cluster on local tables on every shard; if set to 1 - SELECT is executed on each shard; if set to 2 - SELECT and INSERT are executed on each shard", 0) \
M(UInt64, distributed_group_by_no_merge, 0, "If 1, Do not merge aggregation states from different servers for distributed queries (shards will process query up to the Complete stage, initiator just proxies the data from the shards). If 2 the initiator will apply ORDER BY and LIMIT stages (it is not in case when shard process query up to the Complete stage)", 0) \
M(UInt64, distributed_push_down_limit, 1, "If 1, LIMIT will be applied on each shard separatelly. Usually you don't need to use it, since this will be done automatically if it is possible, i.e. for simple query SELECT FROM LIMIT.", 0) \
M(Bool, optimize_distributed_group_by_sharding_key, true, "Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server).", 0) \
@ -471,6 +472,7 @@ class IColumn;
M(Bool, allow_experimental_geo_types, false, "Allow geo data types such as Point, Ring, Polygon, MultiPolygon", 0) \
M(Bool, data_type_default_nullable, false, "Data types without NULL or NOT NULL will make Nullable", 0) \
M(Bool, cast_keep_nullable, false, "CAST operator keep Nullable for result data type", 0) \
M(Bool, cast_ipv4_ipv6_default_on_conversion_error, false, "CAST operator into IPv4, CAST operator into IPV6 type, toIPv4, toIPv6 functions will return default value instead of throwing exception on conversion error.", 0) \
M(Bool, alter_partition_verbose_result, false, "Output information about affected parts. Currently works only for FREEZE and ATTACH commands.", 0) \
M(Bool, allow_experimental_database_materialized_mysql, false, "Allow to create database with Engine=MaterializedMySQL(...).", 0) \
M(Bool, allow_experimental_database_materialized_postgresql, false, "Allow to create database with Engine=MaterializedPostgreSQL(...).", 0) \
@ -490,6 +492,7 @@ class IColumn;
M(Bool, force_optimize_projection, false, "If projection optimization is enabled, SELECT queries need to use projection", 0) \
M(Bool, async_socket_for_remote, true, "Asynchronously read from socket executing remote query", 0) \
M(Bool, insert_null_as_default, true, "Insert DEFAULT values instead of NULL in INSERT SELECT (UNION ALL)", 0) \
M(Bool, describe_extend_object_types, false, "Deduce concrete type of columns of type Object in DESCRIBE query", 0) \
M(Bool, describe_include_subcolumns, false, "If true, subcolumns of all table columns will be included into result of DESCRIBE query", 0) \
\
M(Bool, optimize_rewrite_sum_if_to_count_if, true, "Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent", 0) \
@ -551,7 +554,7 @@ class IColumn;
M(UInt64, remote_fs_read_max_backoff_ms, 10000, "Max wait time when trying to read data for remote disk", 0) \
M(UInt64, remote_fs_read_backoff_max_tries, 5, "Max attempts to read with backoff", 0) \
M(Bool, remote_fs_enable_cache, true, "Use cache for remote filesystem. This setting does not turn on/off cache for disks (must me done via disk config), but allows to bypass cache for some queries if intended", 0) \
M(UInt64, remote_fs_cache_max_wait_sec, 5, "Allow to wait a most this number of seconds for download of current remote_fs_buffer_size bytes, and skip cache if exceeded", 0) \
M(UInt64, remote_fs_cache_max_wait_sec, 5, "Allow to wait at most this number of seconds for download of current remote_fs_buffer_size bytes, and skip cache if exceeded", 0) \
\
M(UInt64, http_max_tries, 10, "Max attempts to read via http.", 0) \
M(UInt64, http_retry_initial_backoff_ms, 100, "Min milliseconds for backoff, when retrying read via http", 0) \
@ -566,6 +569,7 @@ class IColumn;
/** Experimental functions */ \
M(Bool, allow_experimental_funnel_functions, false, "Enable experimental functions for funnel analysis.", 0) \
M(Bool, allow_experimental_nlp_functions, false, "Enable experimental functions for natural language processing.", 0) \
M(Bool, allow_experimental_object_type, false, "Allow Object and JSON data types", 0) \
M(String, insert_deduplication_token, "", "If not empty, used for duplicate detection instead of data digest", 0) \
M(Bool, throw_on_unsupported_query_inside_transaction, true, "Throw exception if unsupported query is used inside transaction", 0) \
// End of COMMON_SETTINGS

View File

@ -87,6 +87,7 @@ enum class TypeIndex
AggregateFunction,
LowCardinality,
Map,
Object,
};
#if !defined(__clang__)
#pragma GCC diagnostic pop

View File

@ -15,6 +15,8 @@
#cmakedefine01 USE_NURAFT
#cmakedefine01 USE_NLP
#cmakedefine01 USE_KRB5
#cmakedefine01 USE_SIMDJSON
#cmakedefine01 USE_RAPIDJSON
#cmakedefine01 USE_FILELOG
#cmakedefine01 USE_ODBC
#cmakedefine01 USE_REPLXX

View File

@ -7,7 +7,8 @@ namespace DB
// Use template to disable implicit casting for certain overloaded types such as Field, which leads
// to overload resolution ambiguity.
class Field;
template <typename T, typename U = std::enable_if_t<std::is_same_v<T, Field>>>
template <typename T>
requires std::is_same_v<T, Field>
std::ostream & operator<<(std::ostream & stream, const T & what);
struct NameAndTypePair;

View File

@ -1,3 +1,5 @@
add_subdirectory (Serializations)
if (ENABLE_EXAMPLES)
add_subdirectory(examples)
add_subdirectory (examples)
endif ()

View File

@ -213,6 +213,7 @@ DataTypeFactory::DataTypeFactory()
registerDataTypeDomainSimpleAggregateFunction(*this);
registerDataTypeDomainGeo(*this);
registerDataTypeMap(*this);
registerDataTypeObject(*this);
}
DataTypeFactory & DataTypeFactory::instance()

View File

@ -87,5 +87,6 @@ void registerDataTypeDomainIPv4AndIPv6(DataTypeFactory & factory);
void registerDataTypeDomainBool(DataTypeFactory & factory);
void registerDataTypeDomainSimpleAggregateFunction(DataTypeFactory & factory);
void registerDataTypeDomainGeo(DataTypeFactory & factory);
void registerDataTypeObject(DataTypeFactory & factory);
}

View File

@ -0,0 +1,83 @@
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/Serializations/SerializationObject.h>
#include <Parsers/IAST.h>
#include <Parsers/ASTLiteral.h>
#include <Parsers/ASTFunction.h>
#include <IO/Operators.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int UNEXPECTED_AST_STRUCTURE;
}
DataTypeObject::DataTypeObject(const String & schema_format_, bool is_nullable_)
: schema_format(Poco::toLower(schema_format_))
, is_nullable(is_nullable_)
, default_serialization(getObjectSerialization(schema_format))
{
}
bool DataTypeObject::equals(const IDataType & rhs) const
{
if (const auto * object = typeid_cast<const DataTypeObject *>(&rhs))
return schema_format == object->schema_format && is_nullable == object->is_nullable;
return false;
}
SerializationPtr DataTypeObject::doGetDefaultSerialization() const
{
return default_serialization;
}
String DataTypeObject::doGetName() const
{
WriteBufferFromOwnString out;
if (is_nullable)
out << "Object(Nullable(" << quote << schema_format << "))";
else
out << "Object(" << quote << schema_format << ")";
return out.str();
}
static DataTypePtr create(const ASTPtr & arguments)
{
if (!arguments || arguments->children.size() != 1)
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Object data type family must have one argument - name of schema format");
ASTPtr schema_argument = arguments->children[0];
bool is_nullable = false;
if (const auto * func = schema_argument->as<ASTFunction>())
{
if (func->name != "Nullable" || func->arguments->children.size() != 1)
throw Exception(ErrorCodes::UNEXPECTED_AST_STRUCTURE,
"Expected 'Nullable(<schema_name>)' as parameter for type Object", func->name);
schema_argument = func->arguments->children[0];
is_nullable = true;
}
const auto * literal = schema_argument->as<ASTLiteral>();
if (!literal || literal->value.getType() != Field::Types::String)
throw Exception(ErrorCodes::UNEXPECTED_AST_STRUCTURE,
"Object data type family must have a const string as its schema name parameter");
return std::make_shared<DataTypeObject>(literal->value.get<const String &>(), is_nullable);
}
void registerDataTypeObject(DataTypeFactory & factory)
{
factory.registerDataType("Object", create);
factory.registerSimpleDataType("JSON",
[] { return std::make_shared<DataTypeObject>("JSON", false); },
DataTypeFactory::CaseInsensitive);
}
}

View File

@ -0,0 +1,46 @@
#pragma once
#include <DataTypes/IDataType.h>
#include <Core/Field.h>
#include <Columns/ColumnObject.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
}
class DataTypeObject : public IDataType
{
private:
String schema_format;
bool is_nullable;
SerializationPtr default_serialization;
public:
DataTypeObject(const String & schema_format_, bool is_nullable_);
const char * getFamilyName() const override { return "Object"; }
String doGetName() const override;
TypeIndex getTypeId() const override { return TypeIndex::Object; }
MutableColumnPtr createColumn() const override { return ColumnObject::create(is_nullable); }
Field getDefault() const override
{
throw Exception("Method getDefault() is not implemented for data type " + getName(), ErrorCodes::NOT_IMPLEMENTED);
}
bool haveSubtypes() const override { return false; }
bool equals(const IDataType & rhs) const override;
bool isParametric() const override { return true; }
SerializationPtr doGetDefaultSerialization() const override;
bool hasNullableSubcolumns() const { return is_nullable; }
};
}

View File

@ -1,6 +1,7 @@
#include <DataTypes/FieldToDataType.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypesDecimal.h>
#include <DataTypes/DataTypeString.h>
@ -108,12 +109,11 @@ DataTypePtr FieldToDataType::operator() (const Array & x) const
element_types.reserve(x.size());
for (const Field & elem : x)
element_types.emplace_back(applyVisitor(FieldToDataType(), elem));
element_types.emplace_back(applyVisitor(FieldToDataType(allow_convertion_to_string), elem));
return std::make_shared<DataTypeArray>(getLeastSupertype(element_types));
return std::make_shared<DataTypeArray>(getLeastSupertype(element_types, allow_convertion_to_string));
}
DataTypePtr FieldToDataType::operator() (const Tuple & tuple) const
{
if (tuple.empty())
@ -123,7 +123,7 @@ DataTypePtr FieldToDataType::operator() (const Tuple & tuple) const
element_types.reserve(tuple.size());
for (const auto & element : tuple)
element_types.push_back(applyVisitor(FieldToDataType(), element));
element_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), element));
return std::make_shared<DataTypeTuple>(element_types);
}
@ -139,11 +139,19 @@ DataTypePtr FieldToDataType::operator() (const Map & map) const
{
const auto & tuple = elem.safeGet<const Tuple &>();
assert(tuple.size() == 2);
key_types.push_back(applyVisitor(FieldToDataType(), tuple[0]));
value_types.push_back(applyVisitor(FieldToDataType(), tuple[1]));
key_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), tuple[0]));
value_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), tuple[1]));
}
return std::make_shared<DataTypeMap>(getLeastSupertype(key_types), getLeastSupertype(value_types));
return std::make_shared<DataTypeMap>(
getLeastSupertype(key_types, allow_convertion_to_string),
getLeastSupertype(value_types, allow_convertion_to_string));
}
DataTypePtr FieldToDataType::operator() (const Object &) const
{
/// TODO: Do we need different parameters for type Object?
return std::make_shared<DataTypeObject>("json", false);
}
DataTypePtr FieldToDataType::operator() (const AggregateFunctionStateData & x) const

View File

@ -20,26 +20,34 @@ using DataTypePtr = std::shared_ptr<const IDataType>;
class FieldToDataType : public StaticVisitor<DataTypePtr>
{
public:
FieldToDataType(bool allow_convertion_to_string_ = false)
: allow_convertion_to_string(allow_convertion_to_string_)
{
}
DataTypePtr operator() (const Null & x) const;
DataTypePtr operator() (const UInt64 & x) const;
DataTypePtr operator() (const UInt128 & x) const;
DataTypePtr operator() (const UInt256 & x) const;
DataTypePtr operator() (const Int64 & x) const;
DataTypePtr operator() (const Int128 & x) const;
DataTypePtr operator() (const Int256 & x) const;
DataTypePtr operator() (const UUID & x) const;
DataTypePtr operator() (const Float64 & x) const;
DataTypePtr operator() (const String & x) const;
DataTypePtr operator() (const Array & x) const;
DataTypePtr operator() (const Tuple & tuple) const;
DataTypePtr operator() (const Map & map) const;
DataTypePtr operator() (const Object & map) const;
DataTypePtr operator() (const DecimalField<Decimal32> & x) const;
DataTypePtr operator() (const DecimalField<Decimal64> & x) const;
DataTypePtr operator() (const DecimalField<Decimal128> & x) const;
DataTypePtr operator() (const DecimalField<Decimal256> & x) const;
DataTypePtr operator() (const AggregateFunctionStateData & x) const;
DataTypePtr operator() (const UInt256 & x) const;
DataTypePtr operator() (const Int256 & x) const;
DataTypePtr operator() (const bool & x) const;
private:
bool allow_convertion_to_string;
};
}

View File

@ -126,19 +126,25 @@ DataTypePtr IDataType::tryGetSubcolumnType(const String & subcolumn_name) const
DataTypePtr IDataType::getSubcolumnType(const String & subcolumn_name) const
{
SubstreamData data = { getDefaultSerialization(), getPtr(), nullptr, nullptr };
return getForSubcolumn<DataTypePtr>(subcolumn_name, data, &SubstreamData::type);
return getForSubcolumn<DataTypePtr>(subcolumn_name, data, &SubstreamData::type, true);
}
SerializationPtr IDataType::getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const
ColumnPtr IDataType::tryGetSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const
{
SubstreamData data = { serialization, nullptr, nullptr, nullptr };
return getForSubcolumn<SerializationPtr>(subcolumn_name, data, &SubstreamData::serialization);
SubstreamData data = { getDefaultSerialization(), nullptr, column, nullptr };
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column, false);
}
ColumnPtr IDataType::getSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const
{
SubstreamData data = { getDefaultSerialization(), nullptr, column, nullptr };
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column);
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column, true);
}
SerializationPtr IDataType::getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const
{
SubstreamData data = { serialization, nullptr, nullptr, nullptr };
return getForSubcolumn<SerializationPtr>(subcolumn_name, data, &SubstreamData::serialization, true);
}
Names IDataType::getSubcolumnNames() const

View File

@ -82,9 +82,11 @@ public:
DataTypePtr tryGetSubcolumnType(const String & subcolumn_name) const;
DataTypePtr getSubcolumnType(const String & subcolumn_name) const;
SerializationPtr getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const;
ColumnPtr tryGetSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const;
ColumnPtr getSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const;
SerializationPtr getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const;
using SubstreamData = ISerialization::SubstreamData;
using SubstreamPath = ISerialization::SubstreamPath;
@ -309,7 +311,7 @@ private:
const String & subcolumn_name,
const SubstreamData & data,
Ptr SubstreamData::*member,
bool throw_if_null = true) const;
bool throw_if_null) const;
};
@ -373,11 +375,13 @@ struct WhichDataType
constexpr bool isMap() const {return idx == TypeIndex::Map; }
constexpr bool isSet() const { return idx == TypeIndex::Set; }
constexpr bool isInterval() const { return idx == TypeIndex::Interval; }
constexpr bool isObject() const { return idx == TypeIndex::Object; }
constexpr bool isNothing() const { return idx == TypeIndex::Nothing; }
constexpr bool isNullable() const { return idx == TypeIndex::Nullable; }
constexpr bool isFunction() const { return idx == TypeIndex::Function; }
constexpr bool isAggregateFunction() const { return idx == TypeIndex::AggregateFunction; }
constexpr bool isSimple() const { return isInt() || isUInt() || isFloat() || isString(); }
constexpr bool isLowCarnality() const { return idx == TypeIndex::LowCardinality; }
};
@ -399,10 +403,16 @@ inline bool isEnum(const DataTypePtr & data_type) { return WhichDataType(data_ty
inline bool isDecimal(const DataTypePtr & data_type) { return WhichDataType(data_type).isDecimal(); }
inline bool isTuple(const DataTypePtr & data_type) { return WhichDataType(data_type).isTuple(); }
inline bool isArray(const DataTypePtr & data_type) { return WhichDataType(data_type).isArray(); }
inline bool isMap(const DataTypePtr & data_type) { return WhichDataType(data_type).isMap(); }
inline bool isMap(const DataTypePtr & data_type) {return WhichDataType(data_type).isMap(); }
inline bool isNothing(const DataTypePtr & data_type) { return WhichDataType(data_type).isNothing(); }
inline bool isUUID(const DataTypePtr & data_type) { return WhichDataType(data_type).isUUID(); }
template <typename T>
inline bool isObject(const T & data_type)
{
return WhichDataType(data_type).isObject();
}
template <typename T>
inline bool isUInt8(const T & data_type)
{

View File

@ -30,6 +30,12 @@ namespace Nested
std::string concatenateName(const std::string & nested_table_name, const std::string & nested_field_name)
{
if (nested_table_name.empty())
return nested_field_name;
if (nested_field_name.empty())
return nested_table_name;
return nested_table_name + "." + nested_field_name;
}

View File

@ -0,0 +1,703 @@
#include <DataTypes/ObjectUtils.h>
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/DataTypeNothing.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeNested.h>
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/getLeastSupertype.h>
#include <DataTypes/NestedUtils.h>
#include <Columns/ColumnObject.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnNullable.h>
#include <Parsers/ASTSelectQuery.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTLiteral.h>
#include <Parsers/ASTFunction.h>
#include <IO/Operators.h>
namespace DB
{
namespace ErrorCodes
{
extern const int TYPE_MISMATCH;
extern const int LOGICAL_ERROR;
extern const int DUPLICATE_COLUMN;
}
size_t getNumberOfDimensions(const IDataType & type)
{
if (const auto * type_array = typeid_cast<const DataTypeArray *>(&type))
return type_array->getNumberOfDimensions();
return 0;
}
size_t getNumberOfDimensions(const IColumn & column)
{
if (const auto * column_array = checkAndGetColumn<ColumnArray>(column))
return column_array->getNumberOfDimensions();
return 0;
}
DataTypePtr getBaseTypeOfArray(const DataTypePtr & type)
{
/// Get raw pointers to avoid extra copying of type pointers.
const DataTypeArray * last_array = nullptr;
const auto * current_type = type.get();
while (const auto * type_array = typeid_cast<const DataTypeArray *>(current_type))
{
current_type = type_array->getNestedType().get();
last_array = type_array;
}
return last_array ? last_array->getNestedType() : type;
}
ColumnPtr getBaseColumnOfArray(const ColumnPtr & column)
{
/// Get raw pointers to avoid extra copying of column pointers.
const ColumnArray * last_array = nullptr;
const auto * current_column = column.get();
while (const auto * column_array = checkAndGetColumn<ColumnArray>(current_column))
{
current_column = &column_array->getData();
last_array = column_array;
}
return last_array ? last_array->getDataPtr() : column;
}
DataTypePtr createArrayOfType(DataTypePtr type, size_t num_dimensions)
{
for (size_t i = 0; i < num_dimensions; ++i)
type = std::make_shared<DataTypeArray>(std::move(type));
return type;
}
ColumnPtr createArrayOfColumn(ColumnPtr column, size_t num_dimensions)
{
for (size_t i = 0; i < num_dimensions; ++i)
column = ColumnArray::create(column);
return column;
}
Array createEmptyArrayField(size_t num_dimensions)
{
if (num_dimensions == 0)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot create array field with 0 dimensions");
Array array;
Array * current_array = &array;
for (size_t i = 1; i < num_dimensions; ++i)
{
current_array->push_back(Array());
current_array = &current_array->back().get<Array &>();
}
return array;
}
DataTypePtr getDataTypeByColumn(const IColumn & column)
{
auto idx = column.getDataType();
if (WhichDataType(idx).isSimple())
return DataTypeFactory::instance().get(String(magic_enum::enum_name(idx)));
if (const auto * column_array = checkAndGetColumn<ColumnArray>(&column))
return std::make_shared<DataTypeArray>(getDataTypeByColumn(column_array->getData()));
if (const auto * column_nullable = checkAndGetColumn<ColumnNullable>(&column))
return makeNullable(getDataTypeByColumn(column_nullable->getNestedColumn()));
/// TODO: add more types.
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get data type of column {}", column.getFamilyName());
}
template <size_t I, typename Tuple>
static auto extractVector(const std::vector<Tuple> & vec)
{
static_assert(I < std::tuple_size_v<Tuple>);
std::vector<std::tuple_element_t<I, Tuple>> res;
res.reserve(vec.size());
for (const auto & elem : vec)
res.emplace_back(std::get<I>(elem));
return res;
}
void convertObjectsToTuples(NamesAndTypesList & columns_list, Block & block, const NamesAndTypesList & extended_storage_columns)
{
std::unordered_map<String, DataTypePtr> storage_columns_map;
for (const auto & [name, type] : extended_storage_columns)
storage_columns_map[name] = type;
for (auto & name_type : columns_list)
{
if (!isObject(name_type.type))
continue;
auto & column = block.getByName(name_type.name);
if (!isObject(column.type))
throw Exception(ErrorCodes::TYPE_MISMATCH,
"Type for column '{}' mismatch in columns list and in block. In list: {}, in block: {}",
name_type.name, name_type.type->getName(), column.type->getName());
const auto & column_object = assert_cast<const ColumnObject &>(*column.column);
const auto & subcolumns = column_object.getSubcolumns();
if (!column_object.isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Cannot convert to tuple column '{}' from type {}. Column should be finalized first",
name_type.name, name_type.type->getName());
PathsInData tuple_paths;
DataTypes tuple_types;
Columns tuple_columns;
for (const auto & entry : subcolumns)
{
tuple_paths.emplace_back(entry->path);
tuple_types.emplace_back(entry->data.getLeastCommonType());
tuple_columns.emplace_back(entry->data.getFinalizedColumnPtr());
}
auto it = storage_columns_map.find(name_type.name);
if (it == storage_columns_map.end())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Column '{}' not found in storage", name_type.name);
std::tie(column.column, column.type) = unflattenTuple(tuple_paths, tuple_types, tuple_columns);
name_type.type = column.type;
/// Check that constructed Tuple type and type in storage are compatible.
getLeastCommonTypeForObject({column.type, it->second}, true);
}
}
static bool isPrefix(const PathInData::Parts & prefix, const PathInData::Parts & parts)
{
if (prefix.size() > parts.size())
return false;
for (size_t i = 0; i < prefix.size(); ++i)
if (prefix[i].key != parts[i].key)
return false;
return true;
}
void checkObjectHasNoAmbiguosPaths(const PathsInData & paths)
{
size_t size = paths.size();
for (size_t i = 0; i < size; ++i)
{
for (size_t j = 0; j < i; ++j)
{
if (isPrefix(paths[i].getParts(), paths[j].getParts())
|| isPrefix(paths[j].getParts(), paths[i].getParts()))
throw Exception(ErrorCodes::DUPLICATE_COLUMN,
"Data in Object has ambiguous paths: '{}' and '{}'",
paths[i].getPath(), paths[j].getPath());
}
}
}
DataTypePtr getLeastCommonTypeForObject(const DataTypes & types, bool check_ambiguos_paths)
{
if (types.empty())
return nullptr;
bool all_equal = true;
for (size_t i = 1; i < types.size(); ++i)
{
if (!types[i]->equals(*types[0]))
{
all_equal = false;
break;
}
}
if (all_equal)
return types[0];
/// Types of subcolumns by path from all tuples.
std::unordered_map<PathInData, DataTypes, PathInData::Hash> subcolumns_types;
/// First we flatten tuples, then get common type for paths
/// and finally unflatten paths and create new tuple type.
for (const auto & type : types)
{
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(type.get());
if (!type_tuple)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Least common type for object can be deduced only from tuples, but {} given", type->getName());
auto [tuple_paths, tuple_types] = flattenTuple(type);
assert(tuple_paths.size() == tuple_types.size());
for (size_t i = 0; i < tuple_paths.size(); ++i)
subcolumns_types[tuple_paths[i]].push_back(tuple_types[i]);
}
PathsInData tuple_paths;
DataTypes tuple_types;
/// Get the least common type for all paths.
for (const auto & [key, subtypes] : subcolumns_types)
{
assert(!subtypes.empty());
if (key.getPath() == ColumnObject::COLUMN_NAME_DUMMY)
continue;
size_t first_dim = getNumberOfDimensions(*subtypes[0]);
for (size_t i = 1; i < subtypes.size(); ++i)
if (first_dim != getNumberOfDimensions(*subtypes[i]))
throw Exception(ErrorCodes::TYPE_MISMATCH,
"Uncompatible types of subcolumn '{}': {} and {}",
key.getPath(), subtypes[0]->getName(), subtypes[i]->getName());
tuple_paths.emplace_back(key);
tuple_types.emplace_back(getLeastSupertype(subtypes, /*allow_conversion_to_string=*/ true));
}
if (tuple_paths.empty())
{
tuple_paths.emplace_back(ColumnObject::COLUMN_NAME_DUMMY);
tuple_types.emplace_back(std::make_shared<DataTypeUInt8>());
}
if (check_ambiguos_paths)
checkObjectHasNoAmbiguosPaths(tuple_paths);
return unflattenTuple(tuple_paths, tuple_types);
}
NameSet getNamesOfObjectColumns(const NamesAndTypesList & columns_list)
{
NameSet res;
for (const auto & [name, type] : columns_list)
if (isObject(type))
res.insert(name);
return res;
}
bool hasObjectColumns(const ColumnsDescription & columns)
{
return std::any_of(columns.begin(), columns.end(), [](const auto & column) { return isObject(column.type); });
}
void extendObjectColumns(NamesAndTypesList & columns_list, const ColumnsDescription & object_columns, bool with_subcolumns)
{
NamesAndTypesList subcolumns_list;
for (auto & column : columns_list)
{
auto object_column = object_columns.tryGetColumn(GetColumnsOptions::All, column.name);
if (object_column)
{
column.type = object_column->type;
if (with_subcolumns)
subcolumns_list.splice(subcolumns_list.end(), object_columns.getSubcolumns(column.name));
}
}
columns_list.splice(columns_list.end(), std::move(subcolumns_list));
}
void updateObjectColumns(ColumnsDescription & object_columns, const NamesAndTypesList & new_columns)
{
for (const auto & new_column : new_columns)
{
auto object_column = object_columns.tryGetColumn(GetColumnsOptions::All, new_column.name);
if (object_column && !object_column->type->equals(*new_column.type))
{
object_columns.modify(new_column.name, [&](auto & column)
{
column.type = getLeastCommonTypeForObject({object_column->type, new_column.type});
});
}
}
}
namespace
{
void flattenTupleImpl(
PathInDataBuilder & builder,
DataTypePtr type,
std::vector<PathInData::Parts> & new_paths,
DataTypes & new_types)
{
if (const auto * type_tuple = typeid_cast<const DataTypeTuple *>(type.get()))
{
const auto & tuple_names = type_tuple->getElementNames();
const auto & tuple_types = type_tuple->getElements();
for (size_t i = 0; i < tuple_names.size(); ++i)
{
builder.append(tuple_names[i], false);
flattenTupleImpl(builder, tuple_types[i], new_paths, new_types);
builder.popBack();
}
}
else if (const auto * type_array = typeid_cast<const DataTypeArray *>(type.get()))
{
PathInDataBuilder element_builder;
std::vector<PathInData::Parts> element_paths;
DataTypes element_types;
flattenTupleImpl(element_builder, type_array->getNestedType(), element_paths, element_types);
assert(element_paths.size() == element_types.size());
for (size_t i = 0; i < element_paths.size(); ++i)
{
builder.append(element_paths[i], true);
new_paths.emplace_back(builder.getParts());
new_types.emplace_back(std::make_shared<DataTypeArray>(element_types[i]));
builder.popBack(element_paths[i].size());
}
}
else
{
new_paths.emplace_back(builder.getParts());
new_types.emplace_back(type);
}
}
/// @offsets_columns are used as stack of array offsets and allows to recreate Array columns.
void flattenTupleImpl(const ColumnPtr & column, Columns & new_columns, Columns & offsets_columns)
{
if (const auto * column_tuple = checkAndGetColumn<ColumnTuple>(column.get()))
{
const auto & subcolumns = column_tuple->getColumns();
for (const auto & subcolumn : subcolumns)
flattenTupleImpl(subcolumn, new_columns, offsets_columns);
}
else if (const auto * column_array = checkAndGetColumn<ColumnArray>(column.get()))
{
offsets_columns.push_back(column_array->getOffsetsPtr());
flattenTupleImpl(column_array->getDataPtr(), new_columns, offsets_columns);
offsets_columns.pop_back();
}
else
{
if (!offsets_columns.empty())
{
auto new_column = ColumnArray::create(column, offsets_columns.back());
for (auto it = offsets_columns.rbegin() + 1; it != offsets_columns.rend(); ++it)
new_column = ColumnArray::create(new_column, *it);
new_columns.push_back(std::move(new_column));
}
else
{
new_columns.push_back(column);
}
}
}
DataTypePtr reduceNumberOfDimensions(DataTypePtr type, size_t dimensions_to_reduce)
{
while (dimensions_to_reduce--)
{
const auto * type_array = typeid_cast<const DataTypeArray *>(type.get());
if (!type_array)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Not enough dimensions to reduce");
type = type_array->getNestedType();
}
return type;
}
ColumnPtr reduceNumberOfDimensions(ColumnPtr column, size_t dimensions_to_reduce)
{
while (dimensions_to_reduce--)
{
const auto * column_array = typeid_cast<const ColumnArray *>(column.get());
if (!column_array)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Not enough dimensions to reduce");
column = column_array->getDataPtr();
}
return column;
}
/// We save intermediate column, type and number of array
/// dimensions for each intermediate node in path in subcolumns tree.
struct ColumnWithTypeAndDimensions
{
ColumnPtr column;
DataTypePtr type;
size_t array_dimensions;
};
using SubcolumnsTreeWithColumns = SubcolumnsTree<ColumnWithTypeAndDimensions>;
using Node = SubcolumnsTreeWithColumns::Node;
/// Creates data type and column from tree of subcolumns.
ColumnWithTypeAndDimensions createTypeFromNode(const Node * node)
{
auto collect_tuple_elemets = [](const auto & children)
{
std::vector<std::tuple<String, ColumnWithTypeAndDimensions>> tuple_elements;
tuple_elements.reserve(children.size());
for (const auto & [name, child] : children)
{
auto column = createTypeFromNode(child.get());
tuple_elements.emplace_back(name, std::move(column));
}
/// Sort to always create the same type for the same set of subcolumns.
std::sort(tuple_elements.begin(), tuple_elements.end(),
[](const auto & lhs, const auto & rhs) { return std::get<0>(lhs) < std::get<0>(rhs); });
auto tuple_names = extractVector<0>(tuple_elements);
auto tuple_columns = extractVector<1>(tuple_elements);
return std::make_tuple(std::move(tuple_names), std::move(tuple_columns));
};
if (node->kind == Node::SCALAR)
{
return node->data;
}
else if (node->kind == Node::NESTED)
{
auto [tuple_names, tuple_columns] = collect_tuple_elemets(node->children);
Columns offsets_columns;
offsets_columns.reserve(tuple_columns[0].array_dimensions + 1);
/// If we have a Nested node and child node with anonymous array levels
/// we need to push a Nested type through all array levels.
/// Example: { "k1": [[{"k2": 1, "k3": 2}] } should be parsed as
/// `k1 Array(Nested(k2 Int, k3 Int))` and k1 is marked as Nested
/// and `k2` and `k3` has anonymous_array_level = 1 in that case.
const auto & current_array = assert_cast<const ColumnArray &>(*node->data.column);
offsets_columns.push_back(current_array.getOffsetsPtr());
auto first_column = tuple_columns[0].column;
for (size_t i = 0; i < tuple_columns[0].array_dimensions; ++i)
{
const auto & column_array = assert_cast<const ColumnArray &>(*first_column);
offsets_columns.push_back(column_array.getOffsetsPtr());
first_column = column_array.getDataPtr();
}
size_t num_elements = tuple_columns.size();
Columns tuple_elements_columns(num_elements);
DataTypes tuple_elements_types(num_elements);
/// Reduce extra array dimensions to get columns and types of Nested elements.
for (size_t i = 0; i < num_elements; ++i)
{
assert(tuple_columns[i].array_dimensions == tuple_columns[0].array_dimensions);
tuple_elements_columns[i] = reduceNumberOfDimensions(tuple_columns[i].column, tuple_columns[i].array_dimensions);
tuple_elements_types[i] = reduceNumberOfDimensions(tuple_columns[i].type, tuple_columns[i].array_dimensions);
}
auto result_column = ColumnArray::create(ColumnTuple::create(tuple_elements_columns), offsets_columns.back());
auto result_type = createNested(tuple_elements_types, tuple_names);
/// Recreate result Array type and Array column.
for (auto it = offsets_columns.rbegin() + 1; it != offsets_columns.rend(); ++it)
{
result_column = ColumnArray::create(result_column, *it);
result_type = std::make_shared<DataTypeArray>(result_type);
}
return {result_column, result_type, tuple_columns[0].array_dimensions};
}
else
{
auto [tuple_names, tuple_columns] = collect_tuple_elemets(node->children);
size_t num_elements = tuple_columns.size();
Columns tuple_elements_columns(num_elements);
DataTypes tuple_elements_types(num_elements);
for (size_t i = 0; i < tuple_columns.size(); ++i)
{
assert(tuple_columns[i].array_dimensions == tuple_columns[0].array_dimensions);
tuple_elements_columns[i] = tuple_columns[i].column;
tuple_elements_types[i] = tuple_columns[i].type;
}
auto result_column = ColumnTuple::create(tuple_elements_columns);
auto result_type = std::make_shared<DataTypeTuple>(tuple_elements_types, tuple_names);
return {result_column, result_type, tuple_columns[0].array_dimensions};
}
}
}
std::pair<PathsInData, DataTypes> flattenTuple(const DataTypePtr & type)
{
std::vector<PathInData::Parts> new_path_parts;
DataTypes new_types;
PathInDataBuilder builder;
flattenTupleImpl(builder, type, new_path_parts, new_types);
PathsInData new_paths(new_path_parts.begin(), new_path_parts.end());
return {new_paths, new_types};
}
ColumnPtr flattenTuple(const ColumnPtr & column)
{
Columns new_columns;
Columns offsets_columns;
flattenTupleImpl(column, new_columns, offsets_columns);
return ColumnTuple::create(new_columns);
}
DataTypePtr unflattenTuple(const PathsInData & paths, const DataTypes & tuple_types)
{
assert(paths.size() == tuple_types.size());
Columns tuple_columns;
tuple_columns.reserve(tuple_types.size());
for (const auto & type : tuple_types)
tuple_columns.emplace_back(type->createColumn());
return unflattenTuple(paths, tuple_types, tuple_columns).second;
}
std::pair<ColumnPtr, DataTypePtr> unflattenTuple(
const PathsInData & paths,
const DataTypes & tuple_types,
const Columns & tuple_columns)
{
assert(paths.size() == tuple_types.size());
assert(paths.size() == tuple_columns.size());
/// We add all paths to the subcolumn tree and then create a type from it.
/// The tree stores column, type and number of array dimensions
/// for each intermediate node.
SubcolumnsTreeWithColumns tree;
for (size_t i = 0; i < paths.size(); ++i)
{
auto column = tuple_columns[i];
auto type = tuple_types[i];
const auto & parts = paths[i].getParts();
size_t num_parts = parts.size();
size_t pos = 0;
tree.add(paths[i], [&](Node::Kind kind, bool exists) -> std::shared_ptr<Node>
{
if (pos >= num_parts)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Not enough name parts for path {}. Expected at least {}, got {}",
paths[i].getPath(), pos + 1, num_parts);
size_t array_dimensions = kind == Node::NESTED ? 1 : parts[pos].anonymous_array_level;
ColumnWithTypeAndDimensions current_column{column, type, array_dimensions};
/// Get type and column for next node.
if (array_dimensions)
{
type = reduceNumberOfDimensions(type, array_dimensions);
column = reduceNumberOfDimensions(column, array_dimensions);
}
++pos;
if (exists)
return nullptr;
return kind == Node::SCALAR
? std::make_shared<Node>(kind, current_column, paths[i])
: std::make_shared<Node>(kind, current_column);
});
}
auto [column, type, _] = createTypeFromNode(tree.getRoot());
return std::make_pair(std::move(column), std::move(type));
}
static void addConstantToWithClause(const ASTPtr & query, const String & column_name, const DataTypePtr & data_type)
{
auto & select = query->as<ASTSelectQuery &>();
if (!select.with())
select.setExpression(ASTSelectQuery::Expression::WITH, std::make_shared<ASTExpressionList>());
/// TODO: avoid materialize
auto node = makeASTFunction("materialize",
makeASTFunction("CAST",
std::make_shared<ASTLiteral>(data_type->getDefault()),
std::make_shared<ASTLiteral>(data_type->getName())));
node->alias = column_name;
node->prefer_alias_to_column_name = true;
select.with()->children.push_back(std::move(node));
}
/// @expected_columns and @available_columns contain descriptions
/// of extended Object columns.
void replaceMissedSubcolumnsByConstants(
const ColumnsDescription & expected_columns,
const ColumnsDescription & available_columns,
ASTPtr query)
{
NamesAndTypes missed_names_types;
/// Find all subcolumns that are in @expected_columns, but not in @available_columns.
for (const auto & column : available_columns)
{
auto expected_column = expected_columns.getColumn(GetColumnsOptions::All, column.name);
/// Extract all paths from both descriptions to easily check existence of subcolumns.
auto [available_paths, available_types] = flattenTuple(column.type);
auto [expected_paths, expected_types] = flattenTuple(expected_column.type);
auto extract_names_and_types = [&column](const auto & paths, const auto & types)
{
NamesAndTypes res;
res.reserve(paths.size());
for (size_t i = 0; i < paths.size(); ++i)
{
auto full_name = Nested::concatenateName(column.name, paths[i].getPath());
res.emplace_back(full_name, types[i]);
}
std::sort(res.begin(), res.end());
return res;
};
auto available_names_types = extract_names_and_types(available_paths, available_types);
auto expected_names_types = extract_names_and_types(expected_paths, expected_types);
std::set_difference(
expected_names_types.begin(), expected_names_types.end(),
available_names_types.begin(), available_names_types.end(),
std::back_inserter(missed_names_types),
[](const auto & lhs, const auto & rhs) { return lhs.name < rhs.name; });
}
if (missed_names_types.empty())
return;
IdentifierNameSet identifiers;
query->collectIdentifierNames(identifiers);
/// Replace missed subcolumns to default literals of theirs type.
for (const auto & [name, type] : missed_names_types)
if (identifiers.count(name))
addConstantToWithClause(query, name, type);
}
void finalizeObjectColumns(MutableColumns & columns)
{
for (auto & column : columns)
if (auto * column_object = typeid_cast<ColumnObject *>(column.get()))
column_object->finalize();
}
}

140
src/DataTypes/ObjectUtils.h Normal file
View File

@ -0,0 +1,140 @@
#pragma once
#include <Core/Block.h>
#include <Core/NamesAndTypes.h>
#include <Common/FieldVisitors.h>
#include <Storages/ColumnsDescription.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/Serializations/JSONDataParser.h>
#include <DataTypes/DataTypesNumber.h>
#include <Columns/ColumnObject.h>
namespace DB
{
/// Returns number of dimensions in Array type. 0 if type is not array.
size_t getNumberOfDimensions(const IDataType & type);
/// Returns number of dimensions in Array column. 0 if column is not array.
size_t getNumberOfDimensions(const IColumn & column);
/// Returns type of scalars of Array of arbitrary dimensions.
DataTypePtr getBaseTypeOfArray(const DataTypePtr & type);
/// Returns Array type with requested scalar type and number of dimensions.
DataTypePtr createArrayOfType(DataTypePtr type, size_t num_dimensions);
/// Returns column of scalars of Array of arbitrary dimensions.
ColumnPtr getBaseColumnOfArray(const ColumnPtr & column);
/// Returns empty Array column with requested scalar column and number of dimensions.
ColumnPtr createArrayOfColumn(const ColumnPtr & column, size_t num_dimensions);
/// Returns Array with requested number of dimensions and no scalars.
Array createEmptyArrayField(size_t num_dimensions);
/// Tries to get data type by column. Only limited subset of types is supported
DataTypePtr getDataTypeByColumn(const IColumn & column);
/// Converts Object types and columns to Tuples in @columns_list and @block
/// and checks that types are consistent with types in @extended_storage_columns.
void convertObjectsToTuples(NamesAndTypesList & columns_list, Block & block, const NamesAndTypesList & extended_storage_columns);
/// Checks that each path is not the prefix of any other path.
void checkObjectHasNoAmbiguosPaths(const PathsInData & paths);
/// Receives several Tuple types and deduces the least common type among them.
DataTypePtr getLeastCommonTypeForObject(const DataTypes & types, bool check_ambiguos_paths = false);
/// Converts types of object columns to tuples in @columns_list
/// according to @object_columns and adds all tuple's subcolumns if needed.
void extendObjectColumns(NamesAndTypesList & columns_list, const ColumnsDescription & object_columns, bool with_subcolumns);
NameSet getNamesOfObjectColumns(const NamesAndTypesList & columns_list);
bool hasObjectColumns(const ColumnsDescription & columns);
void finalizeObjectColumns(MutableColumns & columns);
/// Updates types of objects in @object_columns inplace
/// according to types in new_columns.
void updateObjectColumns(ColumnsDescription & object_columns, const NamesAndTypesList & new_columns);
using DataTypeTuplePtr = std::shared_ptr<DataTypeTuple>;
/// Flattens nested Tuple to plain Tuple. I.e extracts all paths and types from tuple.
/// E.g. Tuple(t Tuple(c1 UInt32, c2 String), c3 UInt64) -> Tuple(t.c1 UInt32, t.c2 String, c3 UInt32)
std::pair<PathsInData, DataTypes> flattenTuple(const DataTypePtr & type);
/// Flattens nested Tuple column to plain Tuple column.
ColumnPtr flattenTuple(const ColumnPtr & column);
/// The reverse operation to 'flattenTuple'.
/// Creates nested Tuple from all paths and types.
/// E.g. Tuple(t.c1 UInt32, t.c2 String, c3 UInt32) -> Tuple(t Tuple(c1 UInt32, c2 String), c3 UInt64)
DataTypePtr unflattenTuple(
const PathsInData & paths,
const DataTypes & tuple_types);
std::pair<ColumnPtr, DataTypePtr> unflattenTuple(
const PathsInData & paths,
const DataTypes & tuple_types,
const Columns & tuple_columns);
/// For all columns which exist in @expected_columns and
/// don't exist in @available_columns adds to WITH clause
/// an alias with column name to literal of default value of column type.
void replaceMissedSubcolumnsByConstants(
const ColumnsDescription & expected_columns,
const ColumnsDescription & available_columns,
ASTPtr query);
/// Receives range of objects, which contains collections
/// of columns-like objects (e.g. ColumnsDescription or NamesAndTypesList)
/// and deduces the common types of object columns for all entries.
/// @entry_columns_getter should extract reference to collection of
/// columns-like objects from entry to which Iterator points.
/// columns-like object should have fields "name" and "type".
template <typename Iterator, typename EntryColumnsGetter>
ColumnsDescription getObjectColumns(
Iterator begin, Iterator end,
const ColumnsDescription & storage_columns,
EntryColumnsGetter && entry_columns_getter)
{
ColumnsDescription res;
if (begin == end)
{
for (const auto & column : storage_columns)
{
if (isObject(column.type))
{
auto tuple_type = std::make_shared<DataTypeTuple>(
DataTypes{std::make_shared<DataTypeUInt8>()},
Names{ColumnObject::COLUMN_NAME_DUMMY});
res.add({column.name, std::move(tuple_type)});
}
}
return res;
}
std::unordered_map<String, DataTypes> types_in_entries;
for (auto it = begin; it != end; ++it)
{
const auto & entry_columns = entry_columns_getter(*it);
for (const auto & column : entry_columns)
{
auto storage_column = storage_columns.tryGetPhysical(column.name);
if (storage_column && isObject(storage_column->type))
types_in_entries[column.name].push_back(column.type);
}
}
for (const auto & [name, types] : types_in_entries)
res.add({name, getLeastCommonTypeForObject(types)});
return res;
}
}

View File

@ -0,0 +1,3 @@
if (ENABLE_TESTS)
add_subdirectory (tests)
endif ()

View File

@ -172,6 +172,10 @@ String getNameForSubstreamPath(
else
stream_name += "." + it->tuple_element_name;
}
else if (it->type == Substream::ObjectElement)
{
stream_name += escapeForFileName(".") + escapeForFileName(it->object_key_name);
}
}
return stream_name;

View File

@ -125,6 +125,9 @@ public:
SparseElements,
SparseOffsets,
ObjectStructure,
ObjectElement,
Regular,
};
@ -133,6 +136,9 @@ public:
/// Index of tuple element, starting at 1 or name.
String tuple_element_name;
/// Name of subcolumn of object column.
String object_key_name;
/// Do we need to escape a dot in filenames for tuple elements.
bool escape_tuple_delimiter = true;

View File

@ -0,0 +1,183 @@
#pragma once
#include <IO/ReadHelpers.h>
#include <Common/HashTable/HashMap.h>
#include <Common/checkStackSize.h>
#include <DataTypes/Serializations/PathInData.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
class ReadBuffer;
class WriteBuffer;
template <typename Element>
static Field getValueAsField(const Element & element)
{
if (element.isBool()) return element.getBool();
if (element.isInt64()) return element.getInt64();
if (element.isUInt64()) return element.getUInt64();
if (element.isDouble()) return element.getDouble();
if (element.isString()) return element.getString();
if (element.isNull()) return Field();
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unsupported type of JSON field");
}
template <typename ParserImpl>
class JSONDataParser
{
public:
using Element = typename ParserImpl::Element;
void readJSON(String & s, ReadBuffer & buf)
{
readJSONObjectPossiblyInvalid(s, buf);
}
std::optional<ParseResult> parse(const char * begin, size_t length)
{
std::string_view json{begin, length};
Element document;
if (!parser.parse(json, document))
return {};
ParseResult result;
PathInDataBuilder builder;
std::vector<PathInData::Parts> paths;
traverse(document, builder, paths, result.values);
result.paths.reserve(paths.size());
for (auto && path : paths)
result.paths.emplace_back(std::move(path));
return result;
}
private:
void traverse(
const Element & element,
PathInDataBuilder & builder,
std::vector<PathInData::Parts> & paths,
std::vector<Field> & values)
{
checkStackSize();
if (element.isObject())
{
auto object = element.getObject();
paths.reserve(paths.size() + object.size());
values.reserve(values.size() + object.size());
for (auto it = object.begin(); it != object.end(); ++it)
{
const auto & [key, value] = *it;
traverse(value, builder.append(key, false), paths, values);
builder.popBack();
}
}
else if (element.isArray())
{
auto array = element.getArray();
using PathPartsWithArray = std::pair<PathInData::Parts, Array>;
using PathToArray = HashMapWithStackMemory<UInt128, PathPartsWithArray, UInt128TrivialHash, 5>;
/// Traverse elements of array and collect an array
/// of fields by each path.
PathToArray arrays_by_path;
Arena strings_pool;
size_t current_size = 0;
for (auto it = array.begin(); it != array.end(); ++it)
{
std::vector<PathInData::Parts> element_paths;
std::vector<Field> element_values;
PathInDataBuilder element_builder;
traverse(*it, element_builder, element_paths, element_values);
size_t size = element_paths.size();
size_t keys_to_update = arrays_by_path.size();
for (size_t i = 0; i < size; ++i)
{
UInt128 hash = PathInData::getPartsHash(element_paths[i]);
if (auto * found = arrays_by_path.find(hash))
{
auto & path_array = found->getMapped().second;
assert(path_array.size() == current_size);
path_array.push_back(std::move(element_values[i]));
--keys_to_update;
}
else
{
/// We found a new key. Add and empty array with current size.
Array path_array;
path_array.reserve(array.size());
path_array.resize(current_size);
path_array.push_back(std::move(element_values[i]));
auto & elem = arrays_by_path[hash];
elem.first = std::move(element_paths[i]);
elem.second = std::move(path_array);
}
}
/// If some of the keys are missed in current element,
/// add default values for them.
if (keys_to_update)
{
for (auto & [_, value] : arrays_by_path)
{
auto & path_array = value.second;
assert(path_array.size() == current_size || path_array.size() == current_size + 1);
if (path_array.size() == current_size)
path_array.push_back(Field());
}
}
++current_size;
}
if (arrays_by_path.empty())
{
paths.push_back(builder.getParts());
values.push_back(Array());
}
else
{
paths.reserve(paths.size() + arrays_by_path.size());
values.reserve(values.size() + arrays_by_path.size());
for (auto && [_, value] : arrays_by_path)
{
auto && [path, path_array] = value;
/// Merge prefix path and path of array element.
paths.push_back(builder.append(path, true).getParts());
values.push_back(std::move(path_array));
builder.popBack(path.size());
}
}
}
else
{
paths.push_back(builder.getParts());
values.push_back(getValueAsField(element));
}
}
ParserImpl parser;
};
}

View File

@ -0,0 +1,199 @@
#include <DataTypes/Serializations/PathInData.h>
#include <DataTypes/NestedUtils.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeArray.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnArray.h>
#include <Common/SipHash.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
namespace DB
{
PathInData::PathInData(std::string_view path_)
: path(path_)
{
const char * begin = path.data();
const char * end = path.data() + path.size();
for (const char * it = path.data(); it != end; ++it)
{
if (*it == '.')
{
size_t size = static_cast<size_t>(it - begin);
parts.emplace_back(std::string_view{begin, size}, false, 0);
begin = it + 1;
}
}
size_t size = static_cast<size_t>(end - begin);
parts.emplace_back(std::string_view{begin, size}, false, 0.);
}
PathInData::PathInData(const Parts & parts_)
: path(buildPath(parts_))
, parts(buildParts(path, parts_))
{
}
PathInData::PathInData(const PathInData & other)
: path(other.path)
, parts(buildParts(path, other.getParts()))
{
}
PathInData & PathInData::operator=(const PathInData & other)
{
if (this != &other)
{
path = other.path;
parts = buildParts(path, other.parts);
}
return *this;
}
UInt128 PathInData::getPartsHash(const Parts & parts_)
{
SipHash hash;
hash.update(parts_.size());
for (const auto & part : parts_)
{
hash.update(part.key.data(), part.key.length());
hash.update(part.is_nested);
hash.update(part.anonymous_array_level);
}
UInt128 res;
hash.get128(res);
return res;
}
void PathInData::writeBinary(WriteBuffer & out) const
{
writeVarUInt(parts.size(), out);
for (const auto & part : parts)
{
writeStringBinary(part.key, out);
writeVarUInt(part.is_nested, out);
writeVarUInt(part.anonymous_array_level, out);
}
}
void PathInData::readBinary(ReadBuffer & in)
{
size_t num_parts;
readVarUInt(num_parts, in);
Arena arena;
Parts temp_parts;
temp_parts.reserve(num_parts);
for (size_t i = 0; i < num_parts; ++i)
{
bool is_nested;
UInt8 anonymous_array_level;
auto ref = readStringBinaryInto(arena, in);
readVarUInt(is_nested, in);
readVarUInt(anonymous_array_level, in);
temp_parts.emplace_back(static_cast<std::string_view>(ref), is_nested, anonymous_array_level);
}
/// Recreate path and parts.
path = buildPath(temp_parts);
parts = buildParts(path, temp_parts);
}
String PathInData::buildPath(const Parts & other_parts)
{
if (other_parts.empty())
return "";
String res;
auto it = other_parts.begin();
res += it->key;
++it;
for (; it != other_parts.end(); ++it)
{
res += ".";
res += it->key;
}
return res;
}
PathInData::Parts PathInData::buildParts(const String & other_path, const Parts & other_parts)
{
if (other_parts.empty())
return {};
Parts res;
const char * begin = other_path.data();
for (const auto & part : other_parts)
{
res.emplace_back(std::string_view{begin, part.key.length()}, part.is_nested, part.anonymous_array_level);
begin += part.key.length() + 1;
}
return res;
}
size_t PathInData::Hash::operator()(const PathInData & value) const
{
auto hash = getPartsHash(value.parts);
return hash.items[0] ^ hash.items[1];
}
PathInDataBuilder & PathInDataBuilder::append(std::string_view key, bool is_array)
{
if (parts.empty())
current_anonymous_array_level += is_array;
if (!key.empty())
{
if (!parts.empty())
parts.back().is_nested = is_array;
parts.emplace_back(key, false, current_anonymous_array_level);
current_anonymous_array_level = 0;
}
return *this;
}
PathInDataBuilder & PathInDataBuilder::append(const PathInData::Parts & path, bool is_array)
{
if (parts.empty())
current_anonymous_array_level += is_array;
if (!path.empty())
{
if (!parts.empty())
parts.back().is_nested = is_array;
auto it = parts.insert(parts.end(), path.begin(), path.end());
for (; it != parts.end(); ++it)
it->anonymous_array_level += current_anonymous_array_level;
current_anonymous_array_level = 0;
}
return *this;
}
void PathInDataBuilder::popBack()
{
parts.pop_back();
}
void PathInDataBuilder::popBack(size_t n)
{
assert(n <= parts.size());
parts.resize(parts.size() - n);
}
}

View File

@ -0,0 +1,112 @@
#pragma once
#include <Core/Types.h>
#include <Core/Field.h>
#include <bitset>
namespace DB
{
class ReadBuffer;
class WriteBuffer;
/// Class that represents path in document, e.g. JSON.
class PathInData
{
public:
struct Part
{
Part() = default;
Part(std::string_view key_, bool is_nested_, UInt8 anonymous_array_level_)
: key(key_), is_nested(is_nested_), anonymous_array_level(anonymous_array_level_)
{
}
/// Name of part of path.
std::string_view key;
/// If this part is Nested, i.e. element
/// related to this key is the array of objects.
bool is_nested = false;
/// Number of array levels between current key and previous key.
/// E.g. in JSON {"k1": [[[{"k2": 1, "k3": 2}]]]}
/// "k1" is nested and has anonymous_array_level = 0.
/// "k2" and "k3" are not nested and have anonymous_array_level = 2.
UInt8 anonymous_array_level = 0;
bool operator==(const Part & other) const = default;
};
using Parts = std::vector<Part>;
PathInData() = default;
explicit PathInData(std::string_view path_);
explicit PathInData(const Parts & parts_);
PathInData(const PathInData & other);
PathInData & operator=(const PathInData & other);
static UInt128 getPartsHash(const Parts & parts_);
bool empty() const { return parts.empty(); }
const String & getPath() const { return path; }
const Parts & getParts() const { return parts; }
bool isNested(size_t i) const { return parts[i].is_nested; }
bool hasNested() const { return std::any_of(parts.begin(), parts.end(), [](const auto & part) { return part.is_nested; }); }
void writeBinary(WriteBuffer & out) const;
void readBinary(ReadBuffer & in);
bool operator==(const PathInData & other) const { return parts == other.parts; }
struct Hash { size_t operator()(const PathInData & value) const; };
private:
/// Creates full path from parts.
static String buildPath(const Parts & other_parts);
/// Creates new parts full from full path with correct string pointers.
static Parts buildParts(const String & other_path, const Parts & other_parts);
/// The full path. Parts are separated by dots.
String path;
/// Parts of the path. All string_view-s in parts must point to the @path.
Parts parts;
};
class PathInDataBuilder
{
public:
const PathInData::Parts & getParts() const { return parts; }
PathInDataBuilder & append(std::string_view key, bool is_array);
PathInDataBuilder & append(const PathInData::Parts & path, bool is_array);
void popBack();
void popBack(size_t n);
private:
PathInData::Parts parts;
/// Number of array levels without key to which
/// next non-empty key will be nested.
/// Example: for JSON { "k1": [[{"k2": 1, "k3": 2}] }
// `k2` and `k3` has anonymous_array_level = 1 in that case.
size_t current_anonymous_array_level = 0;
};
using PathsInData = std::vector<PathInData>;
/// Result of parsing of a document.
/// Contains all paths extracted from document
/// and values which are related to them.
struct ParseResult
{
std::vector<PathInData> paths;
std::vector<Field> values;
};
}

View File

@ -0,0 +1,460 @@
#include <DataTypes/Serializations/SerializationObject.h>
#include <DataTypes/Serializations/JSONDataParser.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/ObjectUtils.h>
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/NestedUtils.h>
#include <Common/JSONParsers/SimdJSONParser.h>
#include <Common/JSONParsers/RapidJSONParser.h>
#include <Common/HashTable/HashSet.h>
#include <Columns/ColumnObject.h>
#include <Common/FieldVisitorToString.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <IO/VarInt.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
extern const int INCORRECT_DATA;
extern const int CANNOT_READ_ALL_DATA;
extern const int LOGICAL_ERROR;
}
namespace
{
/// Visitor that keeps @num_dimensions_to_keep dimensions in arrays
/// and replaces all scalars or nested arrays to @replacement at that level.
class FieldVisitorReplaceScalars : public StaticVisitor<Field>
{
public:
FieldVisitorReplaceScalars(const Field & replacement_, size_t num_dimensions_to_keep_)
: replacement(replacement_), num_dimensions_to_keep(num_dimensions_to_keep_)
{
}
template <typename T>
Field operator()(const T & x) const
{
if constexpr (std::is_same_v<T, Array>)
{
if (num_dimensions_to_keep == 0)
return replacement;
const size_t size = x.size();
Array res(size);
for (size_t i = 0; i < size; ++i)
res[i] = applyVisitor(FieldVisitorReplaceScalars(replacement, num_dimensions_to_keep - 1), x[i]);
return res;
}
else
return replacement;
}
private:
const Field & replacement;
size_t num_dimensions_to_keep;
};
using Node = typename ColumnObject::SubcolumnsTree::Node;
/// Finds a subcolumn from the same Nested type as @entry and inserts
/// an array with default values with consistent sizes as in Nested type.
bool tryInsertDefaultFromNested(
std::shared_ptr<Node> entry, const ColumnObject::SubcolumnsTree & subcolumns)
{
if (!entry->path.hasNested())
return false;
const Node * current_node = subcolumns.findLeaf(entry->path);
const Node * leaf = nullptr;
size_t num_skipped_nested = 0;
while (current_node)
{
/// Try to find the first Nested up to the current node.
const auto * node_nested = subcolumns.findParent(current_node,
[](const auto & candidate) { return candidate.isNested(); });
if (!node_nested)
break;
/// If there are no leaves, skip current node and find
/// the next node up to the current.
leaf = subcolumns.findLeaf(node_nested,
[&](const auto & candidate)
{
return candidate.data.size() == entry->data.size() + 1;
});
if (leaf)
break;
current_node = node_nested->parent;
++num_skipped_nested;
}
if (!leaf)
return false;
auto last_field = leaf->data.getLastField();
if (last_field.isNull())
return false;
const auto & least_common_type = entry->data.getLeastCommonType();
size_t num_dimensions = getNumberOfDimensions(*least_common_type);
assert(num_skipped_nested < num_dimensions);
/// Replace scalars to default values with consistent array sizes.
size_t num_dimensions_to_keep = num_dimensions - num_skipped_nested;
auto default_scalar = num_skipped_nested
? createEmptyArrayField(num_skipped_nested)
: getBaseTypeOfArray(least_common_type)->getDefault();
auto default_field = applyVisitor(FieldVisitorReplaceScalars(default_scalar, num_dimensions_to_keep), last_field);
entry->data.insert(std::move(default_field));
return true;
}
}
template <typename Parser>
template <typename Reader>
void SerializationObject<Parser>::deserializeTextImpl(IColumn & column, Reader && reader) const
{
auto & column_object = assert_cast<ColumnObject &>(column);
String buf;
reader(buf);
auto result = parser.parse(buf.data(), buf.size());
if (!result)
throw Exception(ErrorCodes::INCORRECT_DATA, "Cannot parse object");
auto & [paths, values] = *result;
assert(paths.size() == values.size());
HashSet<StringRef, StringRefHash> paths_set;
size_t column_size = column_object.size();
for (size_t i = 0; i < paths.size(); ++i)
{
auto field_info = getFieldInfo(values[i]);
if (isNothing(field_info.scalar_type))
continue;
if (!paths_set.insert(paths[i].getPath()).second)
throw Exception(ErrorCodes::INCORRECT_DATA,
"Object has ambiguous path: {}", paths[i].getPath());
if (!column_object.hasSubcolumn(paths[i]))
{
if (paths[i].hasNested())
column_object.addNestedSubcolumn(paths[i], field_info, column_size);
else
column_object.addSubcolumn(paths[i], column_size);
}
auto & subcolumn = column_object.getSubcolumn(paths[i]);
assert(subcolumn.size() == column_size);
subcolumn.insert(std::move(values[i]), std::move(field_info));
}
/// Insert default values to missed subcolumns.
const auto & subcolumns = column_object.getSubcolumns();
for (const auto & entry : subcolumns)
{
if (!paths_set.has(entry->path.getPath()))
{
bool inserted = tryInsertDefaultFromNested(entry, subcolumns);
if (!inserted)
entry->data.insertDefault();
}
}
column_object.incrementNumRows();
}
template <typename Parser>
void SerializationObject<Parser>::deserializeWholeText(IColumn & column, ReadBuffer & istr, const FormatSettings &) const
{
deserializeTextImpl(column, [&](String & s) { readStringInto(s, istr); });
}
template <typename Parser>
void SerializationObject<Parser>::deserializeTextEscaped(IColumn & column, ReadBuffer & istr, const FormatSettings &) const
{
deserializeTextImpl(column, [&](String & s) { readEscapedStringInto(s, istr); });
}
template <typename Parser>
void SerializationObject<Parser>::deserializeTextQuoted(IColumn & column, ReadBuffer & istr, const FormatSettings &) const
{
deserializeTextImpl(column, [&](String & s) { readQuotedStringInto<true>(s, istr); });
}
template <typename Parser>
void SerializationObject<Parser>::deserializeTextJSON(IColumn & column, ReadBuffer & istr, const FormatSettings &) const
{
deserializeTextImpl(column, [&](String & s) { parser.readJSON(s, istr); });
}
template <typename Parser>
void SerializationObject<Parser>::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
deserializeTextImpl(column, [&](String & s) { readCSVStringInto(s, istr, settings.csv); });
}
template <typename Parser>
template <typename TSettings, typename TStatePtr>
void SerializationObject<Parser>::checkSerializationIsSupported(const TSettings & settings, const TStatePtr & state) const
{
if (settings.position_independent_encoding)
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
"DataTypeObject doesn't support serialization with position independent encoding");
if (state)
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
"DataTypeObject doesn't support serialization with non-trivial state");
}
template <typename Parser>
void SerializationObject<Parser>::serializeBinaryBulkStatePrefix(
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const
{
checkSerializationIsSupported(settings, state);
}
template <typename Parser>
void SerializationObject<Parser>::serializeBinaryBulkStateSuffix(
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const
{
checkSerializationIsSupported(settings, state);
}
template <typename Parser>
void SerializationObject<Parser>::deserializeBinaryBulkStatePrefix(
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state) const
{
checkSerializationIsSupported(settings, state);
}
template <typename Parser>
void SerializationObject<Parser>::serializeBinaryBulkWithMultipleStreams(
const IColumn & column,
size_t offset,
size_t limit,
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const
{
checkSerializationIsSupported(settings, state);
const auto & column_object = assert_cast<const ColumnObject &>(column);
if (!column_object.isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot write non-finalized ColumnObject");
settings.path.push_back(Substream::ObjectStructure);
if (auto * stream = settings.getter(settings.path))
writeVarUInt(column_object.getSubcolumns().size(), *stream);
const auto & subcolumns = column_object.getSubcolumns();
for (const auto & entry : subcolumns)
{
settings.path.back() = Substream::ObjectStructure;
settings.path.back().object_key_name = entry->path.getPath();
const auto & type = entry->data.getLeastCommonType();
if (auto * stream = settings.getter(settings.path))
{
entry->path.writeBinary(*stream);
writeStringBinary(type->getName(), *stream);
}
settings.path.back() = Substream::ObjectElement;
if (auto * stream = settings.getter(settings.path))
{
auto serialization = type->getDefaultSerialization();
serialization->serializeBinaryBulkWithMultipleStreams(
entry->data.getFinalizedColumn(), offset, limit, settings, state);
}
}
settings.path.pop_back();
}
template <typename Parser>
void SerializationObject<Parser>::deserializeBinaryBulkWithMultipleStreams(
ColumnPtr & column,
size_t limit,
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state,
SubstreamsCache * cache) const
{
checkSerializationIsSupported(settings, state);
if (!column->empty())
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
"DataTypeObject cannot be deserialized to non-empty column");
auto mutable_column = column->assumeMutable();
auto & column_object = typeid_cast<ColumnObject &>(*mutable_column);
size_t num_subcolumns = 0;
settings.path.push_back(Substream::ObjectStructure);
if (auto * stream = settings.getter(settings.path))
readVarUInt(num_subcolumns, *stream);
settings.path.back() = Substream::ObjectElement;
for (size_t i = 0; i < num_subcolumns; ++i)
{
PathInData key;
String type_name;
settings.path.back() = Substream::ObjectStructure;
if (auto * stream = settings.getter(settings.path))
{
key.readBinary(*stream);
readStringBinary(type_name, *stream);
}
else
{
throw Exception(ErrorCodes::CANNOT_READ_ALL_DATA,
"Cannot read structure of DataTypeObject, because its stream is missing");
}
settings.path.back() = Substream::ObjectElement;
settings.path.back().object_key_name = key.getPath();
if (auto * stream = settings.getter(settings.path))
{
auto type = DataTypeFactory::instance().get(type_name);
auto serialization = type->getDefaultSerialization();
ColumnPtr subcolumn_data = type->createColumn();
serialization->deserializeBinaryBulkWithMultipleStreams(subcolumn_data, limit, settings, state, cache);
column_object.addSubcolumn(key, subcolumn_data->assumeMutable());
}
else
{
throw Exception(ErrorCodes::CANNOT_READ_ALL_DATA,
"Cannot read subcolumn '{}' of DataTypeObject, because its stream is missing", key.getPath());
}
}
settings.path.pop_back();
column_object.checkConsistency();
column_object.finalize();
column = std::move(mutable_column);
}
template <typename Parser>
void SerializationObject<Parser>::serializeBinary(const Field &, WriteBuffer &) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Not implemented for SerializationObject");
}
template <typename Parser>
void SerializationObject<Parser>::deserializeBinary(Field &, ReadBuffer &) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Not implemented for SerializationObject");
}
template <typename Parser>
void SerializationObject<Parser>::serializeBinary(const IColumn &, size_t, WriteBuffer &) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Not implemented for SerializationObject");
}
template <typename Parser>
void SerializationObject<Parser>::deserializeBinary(IColumn &, ReadBuffer &) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Not implemented for SerializationObject");
}
/// TODO: use format different of JSON in serializations.
template <typename Parser>
void SerializationObject<Parser>::serializeTextImpl(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
const auto & column_object = assert_cast<const ColumnObject &>(column);
const auto & subcolumns = column_object.getSubcolumns();
writeChar('{', ostr);
for (auto it = subcolumns.begin(); it != subcolumns.end(); ++it)
{
if (it != subcolumns.begin())
writeCString(",", ostr);
writeDoubleQuoted((*it)->path.getPath(), ostr);
writeChar(':', ostr);
auto serialization = (*it)->data.getLeastCommonType()->getDefaultSerialization();
serialization->serializeTextJSON((*it)->data.getFinalizedColumn(), row_num, ostr, settings);
}
writeChar('}', ostr);
}
template <typename Parser>
void SerializationObject<Parser>::serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
serializeTextImpl(column, row_num, ostr, settings);
}
template <typename Parser>
void SerializationObject<Parser>::serializeTextEscaped(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
WriteBufferFromOwnString ostr_str;
serializeTextImpl(column, row_num, ostr_str, settings);
writeEscapedString(ostr_str.str(), ostr);
}
template <typename Parser>
void SerializationObject<Parser>::serializeTextQuoted(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
WriteBufferFromOwnString ostr_str;
serializeTextImpl(column, row_num, ostr_str, settings);
writeQuotedString(ostr_str.str(), ostr);
}
template <typename Parser>
void SerializationObject<Parser>::serializeTextJSON(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
serializeTextImpl(column, row_num, ostr, settings);
}
template <typename Parser>
void SerializationObject<Parser>::serializeTextCSV(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
WriteBufferFromOwnString ostr_str;
serializeTextImpl(column, row_num, ostr_str, settings);
writeCSVString(ostr_str.str(), ostr);
}
SerializationPtr getObjectSerialization(const String & schema_format)
{
if (schema_format == "json")
{
#if USE_SIMDJSON
return std::make_shared<SerializationObject<JSONDataParser<SimdJSONParser>>>();
#elif USE_RAPIDJSON
return std::make_shared<SerializationObject<JSONDataParser<RapidJSONParser>>>();
#else
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
"To use data type Object with JSON format ClickHouse should be built with Simdjson or Rapidjson");
#endif
}
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Unknown schema format '{}'", schema_format);
}
}

View File

@ -0,0 +1,73 @@
#pragma once
#include <DataTypes/Serializations/SimpleTextSerialization.h>
namespace DB
{
/// Serialization for data type Object.
/// Supported only test serialization/deserialization.
/// and binary bulk serialization/deserialization without position independent
/// encoding, i.e. serialization/deserialization into Native format.
template <typename Parser>
class SerializationObject : public ISerialization
{
public:
void serializeBinaryBulkStatePrefix(
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const override;
void serializeBinaryBulkStateSuffix(
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const override;
void deserializeBinaryBulkStatePrefix(
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state) const override;
void serializeBinaryBulkWithMultipleStreams(
const IColumn & column,
size_t offset,
size_t limit,
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const override;
void deserializeBinaryBulkWithMultipleStreams(
ColumnPtr & column,
size_t limit,
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state,
SubstreamsCache * cache) const override;
void serializeBinary(const Field & field, WriteBuffer & ostr) const override;
void deserializeBinary(Field & field, ReadBuffer & istr) const override;
void serializeBinary(const IColumn & column, size_t row_num, WriteBuffer & ostr) const override;
void deserializeBinary(IColumn & column, ReadBuffer & istr) const override;
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
void serializeTextEscaped(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
void serializeTextQuoted(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
void serializeTextJSON(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
void serializeTextCSV(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
void deserializeWholeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
void deserializeTextEscaped(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
void deserializeTextQuoted(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
void deserializeTextJSON(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
void deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
private:
template <typename TSettings, typename TStatePtr>
void checkSerializationIsSupported(const TSettings & settings, const TStatePtr & state) const;
template <typename Reader>
void deserializeTextImpl(IColumn & column, Reader && reader) const;
void serializeTextImpl(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const;
mutable Parser parser;
};
SerializationPtr getObjectSerialization(const String & schema_format);
}

View File

@ -0,0 +1,209 @@
#pragma once
#include <DataTypes/Serializations/PathInData.h>
#include <DataTypes/IDataType.h>
#include <Columns/IColumn.h>
#include <unordered_map>
namespace DB
{
/// Tree that represents paths in document
/// with additional data in nodes.
template <typename NodeData>
class SubcolumnsTree
{
public:
struct Node
{
enum Kind
{
TUPLE,
NESTED,
SCALAR,
};
explicit Node(Kind kind_) : kind(kind_) {}
Node(Kind kind_, const NodeData & data_) : kind(kind_), data(data_) {}
Node(Kind kind_, const NodeData & data_, const PathInData & path_)
: kind(kind_), data(data_), path(path_) {}
Kind kind = TUPLE;
const Node * parent = nullptr;
std::map<String, std::shared_ptr<Node>, std::less<>> children;
NodeData data;
PathInData path;
bool isNested() const { return kind == NESTED; }
bool isScalar() const { return kind == SCALAR; }
void addChild(const String & key, std::shared_ptr<Node> next_node)
{
next_node->parent = this;
children[key] = std::move(next_node);
}
};
using NodeKind = typename Node::Kind;
using NodePtr = std::shared_ptr<Node>;
/// Add a leaf without any data in other nodes.
bool add(const PathInData & path, const NodeData & leaf_data)
{
return add(path, [&](NodeKind kind, bool exists) -> NodePtr
{
if (exists)
return nullptr;
if (kind == Node::SCALAR)
return std::make_shared<Node>(kind, leaf_data, path);
return std::make_shared<Node>(kind);
});
}
/// Callback for creation of node. Receives kind of node and
/// flag, which is true if node already exists.
using NodeCreator = std::function<NodePtr(NodeKind, bool)>;
bool add(const PathInData & path, const NodeCreator & node_creator)
{
const auto & parts = path.getParts();
if (parts.empty())
return false;
if (!root)
root = std::make_shared<Node>(Node::TUPLE);
Node * current_node = root.get();
for (size_t i = 0; i < parts.size() - 1; ++i)
{
assert(current_node->kind != Node::SCALAR);
auto it = current_node->children.find(parts[i].key);
if (it != current_node->children.end())
{
current_node = it->second.get();
node_creator(current_node->kind, true);
if (current_node->isNested() != parts[i].is_nested)
return false;
}
else
{
auto next_kind = parts[i].is_nested ? Node::NESTED : Node::TUPLE;
auto next_node = node_creator(next_kind, false);
current_node->addChild(String(parts[i].key), next_node);
current_node = next_node.get();
}
}
auto it = current_node->children.find(parts.back().key);
if (it != current_node->children.end())
return false;
auto next_node = node_creator(Node::SCALAR, false);
current_node->addChild(String(parts.back().key), next_node);
leaves.push_back(std::move(next_node));
return true;
}
/// Find node that matches the path the best.
const Node * findBestMatch(const PathInData & path) const
{
return findImpl(path, false);
}
/// Find node that matches the path exactly.
const Node * findExact(const PathInData & path) const
{
return findImpl(path, true);
}
/// Find leaf by path.
const Node * findLeaf(const PathInData & path) const
{
const auto * candidate = findExact(path);
if (!candidate || !candidate->isScalar())
return nullptr;
return candidate;
}
using NodePredicate = std::function<bool(const Node &)>;
/// Finds leaf that satisfies the predicate.
const Node * findLeaf(const NodePredicate & predicate)
{
return findLeaf(root.get(), predicate);
}
static const Node * findLeaf(const Node * node, const NodePredicate & predicate)
{
if (!node)
return nullptr;
if (node->isScalar())
return predicate(*node) ? node : nullptr;
for (const auto & [_, child] : node->children)
if (const auto * leaf = findLeaf(child.get(), predicate))
return leaf;
return nullptr;
}
/// Find first parent node that satisfies the predicate.
static const Node * findParent(const Node * node, const NodePredicate & predicate)
{
while (node && !predicate(*node))
node = node->parent;
return node;
}
bool empty() const { return root == nullptr; }
size_t size() const { return leaves.size(); }
using Nodes = std::vector<NodePtr>;
const Nodes & getLeaves() const { return leaves; }
const Node * getRoot() const { return root.get(); }
using iterator = typename Nodes::iterator;
using const_iterator = typename Nodes::const_iterator;
iterator begin() { return leaves.begin(); }
iterator end() { return leaves.end(); }
const_iterator begin() const { return leaves.begin(); }
const_iterator end() const { return leaves.end(); }
private:
const Node * findImpl(const PathInData & path, bool find_exact) const
{
if (!root)
return nullptr;
const auto & parts = path.getParts();
const Node * current_node = root.get();
for (const auto & part : parts)
{
auto it = current_node->children.find(part.key);
if (it == current_node->children.end())
return find_exact ? nullptr : current_node;
current_node = it->second.get();
}
return current_node;
}
NodePtr root;
Nodes leaves;
};
}

View File

@ -0,0 +1,216 @@
#include <DataTypes/Serializations/JSONDataParser.h>
#include <Common/JSONParsers/SimdJSONParser.h>
#include <IO/ReadBufferFromString.h>
#include <Common/FieldVisitorToString.h>
#include <ostream>
#include <gtest/gtest.h>
#if USE_SIMDJSON
using namespace DB;
const String json1 = R"({"k1" : 1, "k2" : {"k3" : "aa", "k4" : 2}})";
/// Nested(k2 String, k3 Nested(k4 String))
const String json2 =
R"({"k1" : [
{
"k2" : "aaa",
"k3" : [{ "k4" : "bbb" }, { "k4" : "ccc" }]
},
{
"k2" : "ddd",
"k3" : [{ "k4" : "eee" }, { "k4" : "fff" }]
}
]
})";
TEST(JSONDataParser, ReadJSON)
{
{
String json_bad = json1 + "aaaaaaa";
JSONDataParser<SimdJSONParser> parser;
ReadBufferFromString buf(json_bad);
String res;
parser.readJSON(res, buf);
ASSERT_EQ(json1, res);
}
{
String json_bad = json2 + "aaaaaaa";
JSONDataParser<SimdJSONParser> parser;
ReadBufferFromString buf(json_bad);
String res;
parser.readJSON(res, buf);
ASSERT_EQ(json2, res);
}
}
struct JSONPathAndValue
{
PathInData path;
Field value;
JSONPathAndValue(const PathInData & path_, const Field & value_)
: path(path_), value(value_)
{
}
bool operator==(const JSONPathAndValue & other) const = default;
bool operator<(const JSONPathAndValue & other) const { return path.getPath() < other.path.getPath(); }
};
static std::ostream & operator<<(std::ostream & ostr, const JSONPathAndValue & path_and_value)
{
ostr << "{ PathInData{";
bool first = true;
for (const auto & part : path_and_value.path.getParts())
{
ostr << (first ? "{" : ", {") << part.key << ", " << part.is_nested << ", " << part.anonymous_array_level << "}";
first = false;
}
ostr << "}, Field{" << applyVisitor(FieldVisitorToString(), path_and_value.value) << "} }";
return ostr;
}
using JSONValues = std::vector<JSONPathAndValue>;
static void check(
const String & json_str,
const String & tag,
JSONValues expected_values)
{
JSONDataParser<SimdJSONParser> parser;
auto res = parser.parse(json_str.data(), json_str.size());
ASSERT_TRUE(res.has_value()) << tag;
const auto & [paths, values] = *res;
ASSERT_EQ(paths.size(), expected_values.size()) << tag;
ASSERT_EQ(values.size(), expected_values.size()) << tag;
JSONValues result_values;
for (size_t i = 0; i < paths.size(); ++i)
result_values.emplace_back(paths[i], values[i]);
std::sort(expected_values.begin(), expected_values.end());
std::sort(result_values.begin(), result_values.end());
ASSERT_EQ(result_values, expected_values) << tag;
}
TEST(JSONDataParser, Parse)
{
{
check(json1, "json1",
{
{ PathInData{{{"k1", false, 0}}}, 1 },
{ PathInData{{{"k2", false, 0}, {"k3", false, 0}}}, "aa" },
{ PathInData{{{"k2", false, 0}, {"k4", false, 0}}}, 2 },
});
}
{
check(json2, "json2",
{
{ PathInData{{{"k1", true, 0}, {"k2", false, 0}}}, Array{"aaa", "ddd"} },
{ PathInData{{{"k1", true, 0}, {"k3", true, 0}, {"k4", false, 0}}}, Array{Array{"bbb", "ccc"}, Array{"eee", "fff"}} },
});
}
{
/// Nested(k2 Tuple(k3 Array(Int), k4 Array(Int)), k5 String)
const String json3 =
R"({"k1": [
{
"k2": {
"k3": [1, 2],
"k4": [3, 4]
},
"k5": "foo"
},
{
"k2": {
"k3": [5, 6],
"k4": [7, 8]
},
"k5": "bar"
}
]})";
check(json3, "json3",
{
{ PathInData{{{"k1", true, 0}, {"k5", false, 0}}}, Array{"foo", "bar"} },
{ PathInData{{{"k1", true, 0}, {"k2", false, 0}, {"k3", false, 0}}}, Array{Array{1, 2}, Array{5, 6}} },
{ PathInData{{{"k1", true, 0}, {"k2", false, 0}, {"k4", false, 0}}}, Array{Array{3, 4}, Array{7, 8}} },
});
}
{
/// Nested(k2 Nested(k3 Int, k4 Int), k5 String)
const String json4 =
R"({"k1": [
{
"k2": [{"k3": 1, "k4": 3}, {"k3": 2, "k4": 4}],
"k5": "foo"
},
{
"k2": [{"k3": 5, "k4": 7}, {"k3": 6, "k4": 8}],
"k5": "bar"
}
]})";
check(json4, "json4",
{
{ PathInData{{{"k1", true, 0}, {"k5", false, 0}}}, Array{"foo", "bar"} },
{ PathInData{{{"k1", true, 0}, {"k2", true, 0}, {"k3", false, 0}}}, Array{Array{1, 2}, Array{5, 6}} },
{ PathInData{{{"k1", true, 0}, {"k2", true, 0}, {"k4", false, 0}}}, Array{Array{3, 4}, Array{7, 8}} },
});
}
{
const String json5 = R"({"k1": [[1, 2, 3], [4, 5], [6]]})";
check(json5, "json5",
{
{ PathInData{{{"k1", false, 0}}}, Array{Array{1, 2, 3}, Array{4, 5}, Array{6}} }
});
}
{
/// Array(Nested(k2 Int, k3 Int))
const String json6 = R"({
"k1": [
[{"k2": 1, "k3": 2}, {"k2": 3, "k3": 4}],
[{"k2": 5, "k3": 6}]
]
})";
check(json6, "json6",
{
{ PathInData{{{"k1", true, 0}, {"k2", false, 1}}}, Array{Array{1, 3}, Array{5}} },
{ PathInData{{{"k1", true, 0}, {"k3", false, 1}}}, Array{Array{2, 4}, Array{6}} },
});
}
{
/// Nested(k2 Array(Int), k3 Array(Int))
const String json7 = R"({
"k1": [
{"k2": [1, 3], "k3": [2, 4]},
{"k2": [5], "k3": [6]}
]
})";
check(json7, "json7",
{
{ PathInData{{{"k1", true, 0}, {"k2", false, 0}}}, Array{Array{1, 3}, Array{5}} },
{ PathInData{{{"k1", true, 0}, {"k3", false, 0}}}, Array{Array{2, 4}, Array{6}} },
});
}
}
#endif

View File

@ -18,6 +18,8 @@
#include <DataTypes/DataTypeEnum.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypesDecimal.h>
#include <DataTypes/DataTypeFactory.h>
#include <base/EnumReflection.h>
namespace DB
@ -30,28 +32,181 @@ namespace ErrorCodes
namespace
{
String getExceptionMessagePrefix(const DataTypes & types)
String typeToString(const DataTypePtr & type) { return type->getName(); }
String typeToString(const TypeIndex & type) { return String(magic_enum::enum_name(type)); }
template <typename DataTypes>
String getExceptionMessagePrefix(const DataTypes & types)
{
WriteBufferFromOwnString res;
res << "There is no supertype for types ";
bool first = true;
for (const auto & type : types)
{
WriteBufferFromOwnString res;
res << "There is no supertype for types ";
if (!first)
res << ", ";
first = false;
bool first = true;
for (const auto & type : types)
{
if (!first)
res << ", ";
first = false;
res << type->getName();
}
return res.str();
res << typeToString(type);
}
return res.str();
}
DataTypePtr getLeastSupertype(const DataTypes & types)
DataTypePtr getNumericType(const TypeIndexSet & types, bool allow_conversion_to_string)
{
auto throw_or_return = [&](std::string_view message, int error_code)
{
if (allow_conversion_to_string)
return std::make_shared<DataTypeString>();
throw Exception(String(message), error_code);
};
bool all_numbers = true;
size_t max_bits_of_signed_integer = 0;
size_t max_bits_of_unsigned_integer = 0;
size_t max_mantissa_bits_of_floating = 0;
auto maximize = [](size_t & what, size_t value)
{
if (value > what)
what = value;
};
for (const auto & type : types)
{
if (type == TypeIndex::UInt8)
maximize(max_bits_of_unsigned_integer, 8);
else if (type == TypeIndex::UInt16)
maximize(max_bits_of_unsigned_integer, 16);
else if (type == TypeIndex::UInt32)
maximize(max_bits_of_unsigned_integer, 32);
else if (type == TypeIndex::UInt64)
maximize(max_bits_of_unsigned_integer, 64);
else if (type == TypeIndex::UInt128)
maximize(max_bits_of_unsigned_integer, 128);
else if (type == TypeIndex::UInt256)
maximize(max_bits_of_unsigned_integer, 256);
else if (type == TypeIndex::Int8 || type == TypeIndex::Enum8)
maximize(max_bits_of_signed_integer, 8);
else if (type == TypeIndex::Int16 || type == TypeIndex::Enum16)
maximize(max_bits_of_signed_integer, 16);
else if (type == TypeIndex::Int32)
maximize(max_bits_of_signed_integer, 32);
else if (type == TypeIndex::Int64)
maximize(max_bits_of_signed_integer, 64);
else if (type == TypeIndex::Int128)
maximize(max_bits_of_signed_integer, 128);
else if (type == TypeIndex::Int256)
maximize(max_bits_of_signed_integer, 256);
else if (type == TypeIndex::Float32)
maximize(max_mantissa_bits_of_floating, 24);
else if (type == TypeIndex::Float64)
maximize(max_mantissa_bits_of_floating, 53);
else
all_numbers = false;
}
if (max_bits_of_signed_integer || max_bits_of_unsigned_integer || max_mantissa_bits_of_floating)
{
if (!all_numbers)
return throw_or_return(getExceptionMessagePrefix(types) + " because some of them are numbers and some of them are not", ErrorCodes::NO_COMMON_TYPE);
/// If there are signed and unsigned types of same bit-width, the result must be signed number with at least one more bit.
/// Example, common of Int32, UInt32 = Int64.
size_t min_bit_width_of_integer = std::max(max_bits_of_signed_integer, max_bits_of_unsigned_integer);
/// If unsigned is not covered by signed.
if (max_bits_of_signed_integer && max_bits_of_unsigned_integer >= max_bits_of_signed_integer) //-V1051
{
// Because 128 and 256 bit integers are significantly slower, we should not promote to them.
// But if we already have wide numbers, promotion is necessary.
if (min_bit_width_of_integer != 64)
++min_bit_width_of_integer;
else
return throw_or_return(
getExceptionMessagePrefix(types)
+ " because some of them are signed integers and some are unsigned integers,"
" but there is no signed integer type, that can exactly represent all required unsigned integer values",
ErrorCodes::NO_COMMON_TYPE);
}
/// If the result must be floating.
if (max_mantissa_bits_of_floating)
{
size_t min_mantissa_bits = std::max(min_bit_width_of_integer, max_mantissa_bits_of_floating);
if (min_mantissa_bits <= 24)
return std::make_shared<DataTypeFloat32>();
else if (min_mantissa_bits <= 53)
return std::make_shared<DataTypeFloat64>();
else
return throw_or_return(getExceptionMessagePrefix(types)
+ " because some of them are integers and some are floating point,"
" but there is no floating point type, that can exactly represent all required integers", ErrorCodes::NO_COMMON_TYPE);
}
/// If the result must be signed integer.
if (max_bits_of_signed_integer)
{
if (min_bit_width_of_integer <= 8)
return std::make_shared<DataTypeInt8>();
else if (min_bit_width_of_integer <= 16)
return std::make_shared<DataTypeInt16>();
else if (min_bit_width_of_integer <= 32)
return std::make_shared<DataTypeInt32>();
else if (min_bit_width_of_integer <= 64)
return std::make_shared<DataTypeInt64>();
else if (min_bit_width_of_integer <= 128)
return std::make_shared<DataTypeInt128>();
else if (min_bit_width_of_integer <= 256)
return std::make_shared<DataTypeInt256>();
else
return throw_or_return(getExceptionMessagePrefix(types)
+ " because some of them are signed integers and some are unsigned integers,"
" but there is no signed integer type, that can exactly represent all required unsigned integer values", ErrorCodes::NO_COMMON_TYPE);
}
/// All unsigned.
{
if (min_bit_width_of_integer <= 8)
return std::make_shared<DataTypeUInt8>();
else if (min_bit_width_of_integer <= 16)
return std::make_shared<DataTypeUInt16>();
else if (min_bit_width_of_integer <= 32)
return std::make_shared<DataTypeUInt32>();
else if (min_bit_width_of_integer <= 64)
return std::make_shared<DataTypeUInt64>();
else if (min_bit_width_of_integer <= 128)
return std::make_shared<DataTypeUInt128>();
else if (min_bit_width_of_integer <= 256)
return std::make_shared<DataTypeUInt256>();
else
return throw_or_return("Logical error: " + getExceptionMessagePrefix(types)
+ " but as all data types are unsigned integers, we must have found maximum unsigned integer type", ErrorCodes::NO_COMMON_TYPE);
}
}
return {};
}
}
DataTypePtr getLeastSupertype(const DataTypes & types, bool allow_conversion_to_string)
{
auto throw_or_return = [&](std::string_view message, int error_code)
{
if (allow_conversion_to_string)
return std::make_shared<DataTypeString>();
throw Exception(String(message), error_code);
};
/// Trivial cases
if (types.empty())
@ -88,7 +243,7 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
non_nothing_types.emplace_back(type);
if (non_nothing_types.size() < types.size())
return getLeastSupertype(non_nothing_types);
return getLeastSupertype(non_nothing_types, allow_conversion_to_string);
}
/// For Arrays
@ -113,9 +268,9 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
if (have_array)
{
if (!all_arrays)
throw Exception(getExceptionMessagePrefix(types) + " because some of them are Array and some of them are not", ErrorCodes::NO_COMMON_TYPE);
return throw_or_return(getExceptionMessagePrefix(types) + " because some of them are Array and some of them are not", ErrorCodes::NO_COMMON_TYPE);
return std::make_shared<DataTypeArray>(getLeastSupertype(nested_types));
return std::make_shared<DataTypeArray>(getLeastSupertype(nested_types, allow_conversion_to_string));
}
}
@ -139,7 +294,7 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
nested_types[elem_idx].reserve(types.size());
}
else if (tuple_size != type_tuple->getElements().size())
throw Exception(getExceptionMessagePrefix(types) + " because Tuples have different sizes", ErrorCodes::NO_COMMON_TYPE);
return throw_or_return(getExceptionMessagePrefix(types) + " because Tuples have different sizes", ErrorCodes::NO_COMMON_TYPE);
have_tuple = true;
@ -153,11 +308,11 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
if (have_tuple)
{
if (!all_tuples)
throw Exception(getExceptionMessagePrefix(types) + " because some of them are Tuple and some of them are not", ErrorCodes::NO_COMMON_TYPE);
return throw_or_return(getExceptionMessagePrefix(types) + " because some of them are Tuple and some of them are not", ErrorCodes::NO_COMMON_TYPE);
DataTypes common_tuple_types(tuple_size);
for (size_t elem_idx = 0; elem_idx < tuple_size; ++elem_idx)
common_tuple_types[elem_idx] = getLeastSupertype(nested_types[elem_idx]);
common_tuple_types[elem_idx] = getLeastSupertype(nested_types[elem_idx], allow_conversion_to_string);
return std::make_shared<DataTypeTuple>(common_tuple_types);
}
@ -187,9 +342,11 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
if (have_maps)
{
if (!all_maps)
throw Exception(getExceptionMessagePrefix(types) + " because some of them are Maps and some of them are not", ErrorCodes::NO_COMMON_TYPE);
return throw_or_return(getExceptionMessagePrefix(types) + " because some of them are Maps and some of them are not", ErrorCodes::NO_COMMON_TYPE);
return std::make_shared<DataTypeMap>(getLeastSupertype(key_types), getLeastSupertype(value_types));
return std::make_shared<DataTypeMap>(
getLeastSupertype(key_types, allow_conversion_to_string),
getLeastSupertype(value_types, allow_conversion_to_string));
}
}
@ -220,9 +377,9 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
if (have_low_cardinality)
{
if (have_not_low_cardinality)
return getLeastSupertype(nested_types);
return getLeastSupertype(nested_types, allow_conversion_to_string);
else
return std::make_shared<DataTypeLowCardinality>(getLeastSupertype(nested_types));
return std::make_shared<DataTypeLowCardinality>(getLeastSupertype(nested_types, allow_conversion_to_string));
}
}
@ -248,13 +405,13 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
if (have_nullable)
{
return std::make_shared<DataTypeNullable>(getLeastSupertype(nested_types));
return std::make_shared<DataTypeNullable>(getLeastSupertype(nested_types, allow_conversion_to_string));
}
}
/// Non-recursive rules
std::unordered_set<TypeIndex> type_ids;
TypeIndexSet type_ids;
for (const auto & type : types)
type_ids.insert(type->getTypeId());
@ -268,7 +425,7 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
{
bool all_strings = type_ids.size() == (have_string + have_fixed_string);
if (!all_strings)
throw Exception(getExceptionMessagePrefix(types) + " because some of them are String/FixedString and some of them are not", ErrorCodes::NO_COMMON_TYPE);
return throw_or_return(getExceptionMessagePrefix(types) + " because some of them are String/FixedString and some of them are not", ErrorCodes::NO_COMMON_TYPE);
return std::make_shared<DataTypeString>();
}
@ -285,7 +442,8 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
{
bool all_date_or_datetime = type_ids.size() == (have_date + have_date32 + have_datetime + have_datetime64);
if (!all_date_or_datetime)
throw Exception(getExceptionMessagePrefix(types) + " because some of them are Date/Date32/DateTime/DateTime64 and some of them are not",
return throw_or_return(getExceptionMessagePrefix(types)
+ " because some of them are Date/Date32/DateTime/DateTime64 and some of them are not",
ErrorCodes::NO_COMMON_TYPE);
if (have_datetime64 == 0 && have_date32 == 0)
@ -362,7 +520,7 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
}
if (num_supported != type_ids.size())
throw Exception(getExceptionMessagePrefix(types) + " because some of them have no lossless conversion to Decimal",
return throw_or_return(getExceptionMessagePrefix(types) + " because some of them have no lossless conversion to Decimal",
ErrorCodes::NO_COMMON_TYPE);
UInt32 max_scale = 0;
@ -385,7 +543,7 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
}
if (min_precision > DataTypeDecimal<Decimal128>::maxPrecision())
throw Exception(getExceptionMessagePrefix(types) + " because the least supertype is Decimal("
return throw_or_return(getExceptionMessagePrefix(types) + " because the least supertype is Decimal("
+ toString(min_precision) + ',' + toString(max_scale) + ')',
ErrorCodes::NO_COMMON_TYPE);
@ -399,135 +557,56 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
/// For numeric types, the most complicated part.
{
bool all_numbers = true;
size_t max_bits_of_signed_integer = 0;
size_t max_bits_of_unsigned_integer = 0;
size_t max_mantissa_bits_of_floating = 0;
auto maximize = [](size_t & what, size_t value)
{
if (value > what)
what = value;
};
for (const auto & type : types)
{
if (typeid_cast<const DataTypeUInt8 *>(type.get()))
maximize(max_bits_of_unsigned_integer, 8);
else if (typeid_cast<const DataTypeUInt16 *>(type.get()))
maximize(max_bits_of_unsigned_integer, 16);
else if (typeid_cast<const DataTypeUInt32 *>(type.get()))
maximize(max_bits_of_unsigned_integer, 32);
else if (typeid_cast<const DataTypeUInt64 *>(type.get()))
maximize(max_bits_of_unsigned_integer, 64);
else if (typeid_cast<const DataTypeUInt128 *>(type.get()))
maximize(max_bits_of_unsigned_integer, 128);
else if (typeid_cast<const DataTypeUInt256 *>(type.get()))
maximize(max_bits_of_unsigned_integer, 256);
else if (typeid_cast<const DataTypeInt8 *>(type.get()) || typeid_cast<const DataTypeEnum8 *>(type.get()))
maximize(max_bits_of_signed_integer, 8);
else if (typeid_cast<const DataTypeInt16 *>(type.get()) || typeid_cast<const DataTypeEnum16 *>(type.get()))
maximize(max_bits_of_signed_integer, 16);
else if (typeid_cast<const DataTypeInt32 *>(type.get()))
maximize(max_bits_of_signed_integer, 32);
else if (typeid_cast<const DataTypeInt64 *>(type.get()))
maximize(max_bits_of_signed_integer, 64);
else if (typeid_cast<const DataTypeInt128 *>(type.get()))
maximize(max_bits_of_signed_integer, 128);
else if (typeid_cast<const DataTypeInt256 *>(type.get()))
maximize(max_bits_of_signed_integer, 256);
else if (typeid_cast<const DataTypeFloat32 *>(type.get()))
maximize(max_mantissa_bits_of_floating, 24);
else if (typeid_cast<const DataTypeFloat64 *>(type.get()))
maximize(max_mantissa_bits_of_floating, 53);
else
all_numbers = false;
}
if (max_bits_of_signed_integer || max_bits_of_unsigned_integer || max_mantissa_bits_of_floating)
{
if (!all_numbers)
throw Exception(getExceptionMessagePrefix(types) + " because some of them are numbers and some of them are not", ErrorCodes::NO_COMMON_TYPE);
/// If there are signed and unsigned types of same bit-width, the result must be signed number with at least one more bit.
/// Example, common of Int32, UInt32 = Int64.
size_t min_bit_width_of_integer = std::max(max_bits_of_signed_integer, max_bits_of_unsigned_integer);
/// If unsigned is not covered by signed.
if (max_bits_of_signed_integer && max_bits_of_unsigned_integer >= max_bits_of_signed_integer) //-V1051
{
// Because 128 and 256 bit integers are significantly slower, we should not promote to them.
// But if we already have wide numbers, promotion is necessary.
if (min_bit_width_of_integer != 64)
++min_bit_width_of_integer;
else
throw Exception(
getExceptionMessagePrefix(types)
+ " because some of them are signed integers and some are unsigned integers,"
" but there is no signed integer type, that can exactly represent all required unsigned integer values",
ErrorCodes::NO_COMMON_TYPE);
}
/// If the result must be floating.
if (max_mantissa_bits_of_floating)
{
size_t min_mantissa_bits = std::max(min_bit_width_of_integer, max_mantissa_bits_of_floating);
if (min_mantissa_bits <= 24)
return std::make_shared<DataTypeFloat32>();
else if (min_mantissa_bits <= 53)
return std::make_shared<DataTypeFloat64>();
else
throw Exception(getExceptionMessagePrefix(types)
+ " because some of them are integers and some are floating point,"
" but there is no floating point type, that can exactly represent all required integers", ErrorCodes::NO_COMMON_TYPE);
}
/// If the result must be signed integer.
if (max_bits_of_signed_integer)
{
if (min_bit_width_of_integer <= 8)
return std::make_shared<DataTypeInt8>();
else if (min_bit_width_of_integer <= 16)
return std::make_shared<DataTypeInt16>();
else if (min_bit_width_of_integer <= 32)
return std::make_shared<DataTypeInt32>();
else if (min_bit_width_of_integer <= 64)
return std::make_shared<DataTypeInt64>();
else if (min_bit_width_of_integer <= 128)
return std::make_shared<DataTypeInt128>();
else if (min_bit_width_of_integer <= 256)
return std::make_shared<DataTypeInt256>();
else
throw Exception(getExceptionMessagePrefix(types)
+ " because some of them are signed integers and some are unsigned integers,"
" but there is no signed integer type, that can exactly represent all required unsigned integer values", ErrorCodes::NO_COMMON_TYPE);
}
/// All unsigned.
{
if (min_bit_width_of_integer <= 8)
return std::make_shared<DataTypeUInt8>();
else if (min_bit_width_of_integer <= 16)
return std::make_shared<DataTypeUInt16>();
else if (min_bit_width_of_integer <= 32)
return std::make_shared<DataTypeUInt32>();
else if (min_bit_width_of_integer <= 64)
return std::make_shared<DataTypeUInt64>();
else if (min_bit_width_of_integer <= 128)
return std::make_shared<DataTypeUInt128>();
else if (min_bit_width_of_integer <= 256)
return std::make_shared<DataTypeUInt256>();
else
throw Exception("Logical error: " + getExceptionMessagePrefix(types)
+ " but as all data types are unsigned integers, we must have found maximum unsigned integer type", ErrorCodes::NO_COMMON_TYPE);
}
}
auto numeric_type = getNumericType(type_ids, allow_conversion_to_string);
if (numeric_type)
return numeric_type;
}
/// All other data types (UUID, AggregateFunction, Enum...) are compatible only if they are the same (checked in trivial cases).
throw Exception(getExceptionMessagePrefix(types), ErrorCodes::NO_COMMON_TYPE);
return throw_or_return(getExceptionMessagePrefix(types), ErrorCodes::NO_COMMON_TYPE);
}
DataTypePtr getLeastSupertype(const TypeIndexSet & types, bool allow_conversion_to_string)
{
auto throw_or_return = [&](std::string_view message, int error_code)
{
if (allow_conversion_to_string)
return std::make_shared<DataTypeString>();
throw Exception(String(message), error_code);
};
TypeIndexSet types_set;
for (const auto & type : types)
{
if (WhichDataType(type).isNothing())
continue;
if (!WhichDataType(type).isSimple())
throw Exception(ErrorCodes::NO_COMMON_TYPE,
"Cannot get common type by type ids with parametric type {}", typeToString(type));
types_set.insert(type);
}
if (types_set.empty())
return std::make_shared<DataTypeNothing>();
if (types.count(TypeIndex::String))
{
if (types.size() != 1)
return throw_or_return(getExceptionMessagePrefix(types) + " because some of them are String and some of them are not", ErrorCodes::NO_COMMON_TYPE);
return std::make_shared<DataTypeString>();
}
/// For numeric types, the most complicated part.
auto numeric_type = getNumericType(types, allow_conversion_to_string);
if (numeric_type)
return numeric_type;
/// All other data types (UUID, AggregateFunction, Enum...) are compatible only if they are the same (checked in trivial cases).
return throw_or_return(getExceptionMessagePrefix(types), ErrorCodes::NO_COMMON_TYPE);
}
DataTypePtr tryGetLeastSupertype(const DataTypes & types)

View File

@ -7,12 +7,16 @@ namespace DB
{
/** Get data type that covers all possible values of passed data types.
* If there is no such data type, throws an exception.
* If there is no such data type, throws an exception
* or if 'allow_conversion_to_string' is true returns String as common type.
*
* Examples: least common supertype for UInt8, Int8 - Int16.
* Examples: there is no least common supertype for Array(UInt8), Int8.
*/
DataTypePtr getLeastSupertype(const DataTypes & types);
DataTypePtr getLeastSupertype(const DataTypes & types, bool allow_conversion_to_string = false);
using TypeIndexSet = std::unordered_set<TypeIndex>;
DataTypePtr getLeastSupertype(const TypeIndexSet & types, bool allow_conversion_to_string = false);
/// Same as above but return nullptr instead of throwing exception.
DataTypePtr tryGetLeastSupertype(const DataTypes & types);

View File

@ -248,6 +248,10 @@ public:
/// Overrode in remote fs disks.
virtual bool supportZeroCopyReplication() const = 0;
/// Whether this disk support parallel write
/// Overrode in remote fs disks.
virtual bool supportParallelWrite() const { return false; }
virtual bool isReadOnly() const { return false; }
/// Check if disk is broken. Broken disks will have 0 space and not be used.

View File

@ -4,7 +4,6 @@
#include <IO/ReadBufferFromFile.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromFile.h>
#include <IO/WriteBufferFromS3.h>
#include <IO/WriteHelpers.h>
#include <Common/createHardLink.h>
#include <Common/quoteString.h>

View File

@ -105,6 +105,8 @@ public:
bool supportZeroCopyReplication() const override { return true; }
bool supportParallelWrite() const override { return true; }
void shutdown() override;
void startup() override;

View File

@ -9,9 +9,9 @@
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeMap.h>
#include <Functions/SimdJSONParser.h>
#include <Functions/RapidJSONParser.h>
#include <Functions/DummyJSONParser.h>
#include <Common/JSONParsers/SimdJSONParser.h>
#include <Common/JSONParsers/RapidJSONParser.h>
#include <Common/JSONParsers/DummyJSONParser.h>
#include <base/find_symbols.h>

View File

@ -74,6 +74,7 @@ void registerOutputFormatCapnProto(FormatFactory & factory);
void registerInputFormatRegexp(FormatFactory & factory);
void registerInputFormatJSONAsString(FormatFactory & factory);
void registerInputFormatJSONAsObject(FormatFactory & factory);
void registerInputFormatLineAsString(FormatFactory & factory);
void registerInputFormatCapnProto(FormatFactory & factory);
@ -84,6 +85,7 @@ void registerInputFormatHiveText(FormatFactory & factory);
/// Non trivial prefix and suffix checkers for disabling parallel parsing.
void registerNonTrivialPrefixAndSuffixCheckerJSONEachRow(FormatFactory & factory);
void registerNonTrivialPrefixAndSuffixCheckerJSONAsString(FormatFactory & factory);
void registerNonTrivialPrefixAndSuffixCheckerJSONAsObject(FormatFactory & factory);
void registerArrowSchemaReader(FormatFactory & factory);
void registerParquetSchemaReader(FormatFactory & factory);
@ -175,6 +177,7 @@ void registerFormats()
registerInputFormatRegexp(factory);
registerInputFormatJSONAsString(factory);
registerInputFormatLineAsString(factory);
registerInputFormatJSONAsObject(factory);
#if USE_HIVE
registerInputFormatHiveText(factory);
#endif
@ -183,6 +186,7 @@ void registerFormats()
registerNonTrivialPrefixAndSuffixCheckerJSONEachRow(factory);
registerNonTrivialPrefixAndSuffixCheckerJSONAsString(factory);
registerNonTrivialPrefixAndSuffixCheckerJSONAsObject(factory);
registerArrowSchemaReader(factory);
registerParquetSchemaReader(factory);

View File

@ -33,22 +33,27 @@ public:
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; }
explicit CastOverloadResolverImpl(std::optional<Diagnostic> diagnostic_, bool keep_nullable_)
: diagnostic(std::move(diagnostic_)), keep_nullable(keep_nullable_)
explicit CastOverloadResolverImpl(std::optional<Diagnostic> diagnostic_, bool keep_nullable_, bool cast_ipv4_ipv6_default_on_conversion_error_)
: diagnostic(std::move(diagnostic_))
, keep_nullable(keep_nullable_)
, cast_ipv4_ipv6_default_on_conversion_error(cast_ipv4_ipv6_default_on_conversion_error_)
{
}
static FunctionOverloadResolverPtr create(ContextPtr context)
{
const auto & settings_ref = context->getSettingsRef();
if constexpr (internal)
return createImpl();
return createImpl({}, context->getSettingsRef().cast_keep_nullable);
return createImpl({}, false /*keep_nullable*/, false /*cast_ipv4_ipv6_default_on_conversion_error*/);
return createImpl({}, settings_ref.cast_keep_nullable, settings_ref.cast_ipv4_ipv6_default_on_conversion_error);
}
static FunctionOverloadResolverPtr createImpl(std::optional<Diagnostic> diagnostic = {}, bool keep_nullable = false)
static FunctionOverloadResolverPtr createImpl(std::optional<Diagnostic> diagnostic = {}, bool keep_nullable = false, bool cast_ipv4_ipv6_default_on_conversion_error = false)
{
assert(!internal || !keep_nullable);
return std::make_unique<CastOverloadResolverImpl>(std::move(diagnostic), keep_nullable);
return std::make_unique<CastOverloadResolverImpl>(std::move(diagnostic), keep_nullable, cast_ipv4_ipv6_default_on_conversion_error);
}
protected:
@ -61,7 +66,7 @@ protected:
data_types[i] = arguments[i].type;
auto monotonicity = MonotonicityHelper::getMonotonicityInformation(arguments.front().type, return_type.get());
return std::make_unique<FunctionCast<FunctionName>>(name, std::move(monotonicity), data_types, return_type, diagnostic, cast_type);
return std::make_unique<FunctionCast<FunctionName>>(name, std::move(monotonicity), data_types, return_type, diagnostic, cast_type, cast_ipv4_ipv6_default_on_conversion_error);
}
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
@ -98,6 +103,7 @@ protected:
private:
std::optional<Diagnostic> diagnostic;
bool keep_nullable;
bool cast_ipv4_ipv6_default_on_conversion_error;
};
@ -115,7 +121,10 @@ struct CastInternalOverloadName
static constexpr auto accurate_cast_or_null_name = "accurate_CastOrNull";
};
template <CastType cast_type> using CastOverloadResolver = CastOverloadResolverImpl<cast_type, false, CastOverloadName, CastName>;
template <CastType cast_type> using CastInternalOverloadResolver = CastOverloadResolverImpl<cast_type, true, CastInternalOverloadName, CastInternalName>;
template <CastType cast_type>
using CastOverloadResolver = CastOverloadResolverImpl<cast_type, false, CastOverloadName, CastName>;
template <CastType cast_type>
using CastInternalOverloadResolver = CastOverloadResolverImpl<cast_type, true, CastInternalOverloadName, CastInternalName>;
}

View File

@ -8,13 +8,13 @@
#include <Core/Settings.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/DummyJSONParser.h>
#include <Common/JSONParsers/DummyJSONParser.h>
#include <Functions/IFunction.h>
#include <Functions/JSONPath/ASTs/ASTJSONPath.h>
#include <Functions/JSONPath/Generator/GeneratorJSONPath.h>
#include <Functions/JSONPath/Parsers/ParserJSONPath.h>
#include <Functions/RapidJSONParser.h>
#include <Functions/SimdJSONParser.h>
#include <Common/JSONParsers/RapidJSONParser.h>
#include <Common/JSONParsers/SimdJSONParser.h>
#include <Interpreters/Context.h>
#include <Parsers/IParser.h>
#include <Parsers/Lexer.h>

View File

@ -2,12 +2,15 @@
#pragma clang diagnostic ignored "-Wreserved-identifier"
#endif
#include <Functions/FunctionsCodingIP.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnConst.h>
#include <Columns/ColumnFixedString.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnNullable.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/DataTypeFixedString.h>
@ -17,7 +20,7 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/IFunction.h>
#include <Interpreters/Context_fwd.h>
#include <Interpreters/Context.h>
#include <IO/WriteHelpers.h>
#include <Common/IPv6ToBinary.h>
#include <Common/formatIPv6.h>
@ -239,17 +242,19 @@ private:
}
};
template <IPStringToNumExceptionMode exception_mode>
class FunctionIPv6StringToNum : public IFunction
{
public:
static constexpr auto name = "IPv6StringToNum";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionIPv6StringToNum>(); }
static constexpr auto name = exception_mode == IPStringToNumExceptionMode::Throw
? "IPv6StringToNum"
: (exception_mode == IPStringToNumExceptionMode::Default ? "IPv6StringToNumOrDefault" : "IPv6StringToNumOrNull");
static inline bool tryParseIPv4(const char * pos)
static FunctionPtr create(ContextPtr context) { return std::make_shared<FunctionIPv6StringToNum>(context); }
explicit FunctionIPv6StringToNum(ContextPtr context)
: cast_ipv4_ipv6_default_on_conversion_error(context->getSettingsRef().cast_ipv4_ipv6_default_on_conversion_error)
{
UInt32 result = 0;
return DB::parseIPv4(pos, reinterpret_cast<unsigned char *>(&result));
}
String getName() const override { return name; }
@ -258,62 +263,43 @@ public:
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isString(arguments[0]))
if (!isStringOrFixedString(arguments[0]))
{
throw Exception(
"Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of argument of function {}", arguments[0]->getName(), getName());
}
return std::make_shared<DataTypeFixedString>(IPV6_BINARY_LENGTH);
auto result_type = std::make_shared<DataTypeFixedString>(IPV6_BINARY_LENGTH);
if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
{
return makeNullable(result_type);
}
return result_type;
}
bool useDefaultImplementationForConstants() const override { return true; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
{
const ColumnPtr & column = arguments[0].column;
if (const auto * col_in = checkAndGetColumn<ColumnString>(column.get()))
if constexpr (exception_mode == IPStringToNumExceptionMode::Throw)
{
auto col_res = ColumnFixedString::create(IPV6_BINARY_LENGTH);
auto & vec_res = col_res->getChars();
vec_res.resize(col_in->size() * IPV6_BINARY_LENGTH);
const ColumnString::Chars & vec_src = col_in->getChars();
const ColumnString::Offsets & offsets_src = col_in->getOffsets();
size_t src_offset = 0;
char src_ipv4_buf[sizeof("::ffff:") + IPV4_MAX_TEXT_LENGTH + 1] = "::ffff:";
for (size_t out_offset = 0, i = 0; out_offset < vec_res.size(); out_offset += IPV6_BINARY_LENGTH, ++i)
if (cast_ipv4_ipv6_default_on_conversion_error)
{
/// For both cases below: In case of failure, the function parseIPv6 fills vec_res with zero bytes.
/// If the source IP address is parsable as an IPv4 address, then transform it into a valid IPv6 address.
/// Keeping it simple by just prefixing `::ffff:` to the IPv4 address to represent it as a valid IPv6 address.
if (tryParseIPv4(reinterpret_cast<const char *>(&vec_src[src_offset])))
{
std::memcpy(
src_ipv4_buf + std::strlen("::ffff:"),
reinterpret_cast<const char *>(&vec_src[src_offset]),
std::min<UInt64>(offsets_src[i] - src_offset, IPV4_MAX_TEXT_LENGTH + 1));
parseIPv6(src_ipv4_buf, reinterpret_cast<unsigned char *>(&vec_res[out_offset]));
}
else
{
parseIPv6(
reinterpret_cast<const char *>(&vec_src[src_offset]), reinterpret_cast<unsigned char *>(&vec_res[out_offset]));
}
src_offset = offsets_src[i];
return convertToIPv6<IPStringToNumExceptionMode::Default>(column);
}
return col_res;
}
else
throw Exception("Illegal column " + arguments[0].column->getName()
+ " of argument of function " + getName(),
ErrorCodes::ILLEGAL_COLUMN);
return convertToIPv6<exception_mode>(column);
}
private:
bool cast_ipv4_ipv6_default_on_conversion_error = false;
};
@ -381,69 +367,64 @@ public:
}
};
template <IPStringToNumExceptionMode exception_mode>
class FunctionIPv4StringToNum : public IFunction
{
public:
static constexpr auto name = "IPv4StringToNum";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionIPv4StringToNum>(); }
static constexpr auto name = exception_mode == IPStringToNumExceptionMode::Throw
? "IPv4StringToNum"
: (exception_mode == IPStringToNumExceptionMode::Default ? "IPv4StringToNumOrDefault" : "IPv4StringToNumOrNull");
String getName() const override
static FunctionPtr create(ContextPtr context) { return std::make_shared<FunctionIPv4StringToNum>(context); }
explicit FunctionIPv4StringToNum(ContextPtr context)
: cast_ipv4_ipv6_default_on_conversion_error(context->getSettingsRef().cast_ipv4_ipv6_default_on_conversion_error)
{
return name;
}
String getName() const override { return name; }
size_t getNumberOfArguments() const override { return 1; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isString(arguments[0]))
throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
{
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of argument of function {}", arguments[0]->getName(), getName());
}
return std::make_shared<DataTypeUInt32>();
auto result_type = std::make_shared<DataTypeUInt32>();
if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
{
return makeNullable(result_type);
}
return result_type;
}
static inline UInt32 parseIPv4(const char * pos)
{
UInt32 result = 0;
DB::parseIPv4(pos, reinterpret_cast<unsigned char*>(&result));
return result;
}
bool useDefaultImplementationForConstants() const override { return true; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
{
const ColumnPtr & column = arguments[0].column;
if (const ColumnString * col = checkAndGetColumn<ColumnString>(column.get()))
if constexpr (exception_mode == IPStringToNumExceptionMode::Throw)
{
auto col_res = ColumnUInt32::create();
ColumnUInt32::Container & vec_res = col_res->getData();
vec_res.resize(col->size());
const ColumnString::Chars & vec_src = col->getChars();
const ColumnString::Offsets & offsets_src = col->getOffsets();
size_t prev_offset = 0;
for (size_t i = 0; i < vec_res.size(); ++i)
if (cast_ipv4_ipv6_default_on_conversion_error)
{
vec_res[i] = parseIPv4(reinterpret_cast<const char *>(&vec_src[prev_offset]));
prev_offset = offsets_src[i];
return convertToIPv4<IPStringToNumExceptionMode::Default>(column);
}
return col_res;
}
else
throw Exception("Illegal column " + arguments[0].column->getName()
+ " of argument of function " + getName(),
ErrorCodes::ILLEGAL_COLUMN);
return convertToIPv4<exception_mode>(column);
}
private:
bool cast_ipv4_ipv6_default_on_conversion_error = false;
};
@ -503,16 +484,21 @@ private:
}
};
class FunctionToIPv4 : public FunctionIPv4StringToNum
template <IPStringToNumExceptionMode exception_mode>
class FunctionToIPv4 : public FunctionIPv4StringToNum<exception_mode>
{
public:
static constexpr auto name = "toIPv4";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionToIPv4>(); }
using Base = FunctionIPv4StringToNum<exception_mode>;
String getName() const override
{
return name;
}
static constexpr auto name = exception_mode == IPStringToNumExceptionMode::Throw
? "toIPv4"
: (exception_mode == IPStringToNumExceptionMode::Default ? "toIPv4OrDefault" : "toIPv4OrNull");
static FunctionPtr create(ContextPtr context) { return std::make_shared<FunctionToIPv4>(context); }
explicit FunctionToIPv4(ContextPtr context) : Base(context) { }
String getName() const override { return name; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
@ -521,18 +507,35 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isString(arguments[0]))
throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
{
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of argument of function {}", arguments[0]->getName(), getName());
}
return DataTypeFactory::instance().get("IPv4");
auto result_type = DataTypeFactory::instance().get("IPv4");
if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
{
return makeNullable(result_type);
}
return result_type;
}
};
class FunctionToIPv6 : public FunctionIPv6StringToNum
template <IPStringToNumExceptionMode exception_mode>
class FunctionToIPv6 : public FunctionIPv6StringToNum<exception_mode>
{
public:
static constexpr auto name = "toIPv6";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionToIPv6>(); }
using Base = FunctionIPv6StringToNum<exception_mode>;
static constexpr auto name = exception_mode == IPStringToNumExceptionMode::Throw
? "toIPv6"
: (exception_mode == IPStringToNumExceptionMode::Default ? "toIPv6OrDefault" : "toIPv6OrNull");
static FunctionPtr create(ContextPtr context) { return std::make_shared<FunctionToIPv6>(context); }
explicit FunctionToIPv6(ContextPtr context) : Base(context) { }
String getName() const override { return name; }
@ -540,11 +543,20 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isString(arguments[0]))
throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
if (!isStringOrFixedString(arguments[0]))
{
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of argument of function {}", arguments[0]->getName(), getName());
}
return DataTypeFactory::instance().get("IPv6");
auto result_type = DataTypeFactory::instance().get("IPv6");
if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
{
return makeNullable(result_type);
}
return result_type;
}
};
@ -971,7 +983,7 @@ public:
}
};
class FunctionIsIPv4String : public FunctionIPv4StringToNum
class FunctionIsIPv4String : public IFunction
{
public:
static constexpr auto name = "isIPv4String";
@ -980,46 +992,51 @@ public:
String getName() const override { return name; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
size_t getNumberOfArguments() const override { return 1; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isString(arguments[0]))
throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
throw Exception(
"Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return std::make_shared<DataTypeUInt8>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
{
const ColumnPtr & column = arguments[0].column;
if (const ColumnString * col = checkAndGetColumn<ColumnString>(column.get()))
const ColumnString * input_column = checkAndGetColumn<ColumnString>(arguments[0].column.get());
if (!input_column)
{
auto col_res = ColumnUInt8::create();
ColumnUInt8::Container & vec_res = col_res->getData();
vec_res.resize(col->size());
const ColumnString::Chars & vec_src = col->getChars();
const ColumnString::Offsets & offsets_src = col->getOffsets();
size_t prev_offset = 0;
UInt32 result = 0;
for (size_t i = 0; i < vec_res.size(); ++i)
{
vec_res[i] = DB::parseIPv4(reinterpret_cast<const char *>(&vec_src[prev_offset]), reinterpret_cast<unsigned char*>(&result));
prev_offset = offsets_src[i];
}
return col_res;
throw Exception(
ErrorCodes::ILLEGAL_COLUMN, "Illegal column {} of argument of function {}", arguments[0].column->getName(), getName());
}
else
throw Exception("Illegal column " + arguments[0].column->getName()
+ " of argument of function " + getName(),
ErrorCodes::ILLEGAL_COLUMN);
auto col_res = ColumnUInt8::create();
ColumnUInt8::Container & vec_res = col_res->getData();
vec_res.resize(input_column->size());
const ColumnString::Chars & vec_src = input_column->getChars();
const ColumnString::Offsets & offsets_src = input_column->getOffsets();
size_t prev_offset = 0;
UInt32 result = 0;
for (size_t i = 0; i < vec_res.size(); ++i)
{
vec_res[i] = DB::parseIPv4(reinterpret_cast<const char *>(&vec_src[prev_offset]), reinterpret_cast<unsigned char *>(&result));
prev_offset = offsets_src[i];
}
return col_res;
}
};
class FunctionIsIPv6String : public FunctionIPv6StringToNum
class FunctionIsIPv6String : public IFunction
{
public:
static constexpr auto name = "isIPv6String";
@ -1028,44 +1045,49 @@ public:
String getName() const override { return name; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
size_t getNumberOfArguments() const override { return 1; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isString(arguments[0]))
throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
{
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of argument of function {}", arguments[0]->getName(), getName());
}
return std::make_shared<DataTypeUInt8>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
{
const ColumnPtr & column = arguments[0].column;
if (const ColumnString * col = checkAndGetColumn<ColumnString>(column.get()))
const ColumnString * input_column = checkAndGetColumn<ColumnString>(arguments[0].column.get());
if (!input_column)
{
auto col_res = ColumnUInt8::create();
ColumnUInt8::Container & vec_res = col_res->getData();
vec_res.resize(col->size());
const ColumnString::Chars & vec_src = col->getChars();
const ColumnString::Offsets & offsets_src = col->getOffsets();
size_t prev_offset = 0;
char v[IPV6_BINARY_LENGTH];
for (size_t i = 0; i < vec_res.size(); ++i)
{
vec_res[i] = DB::parseIPv6(reinterpret_cast<const char *>(&vec_src[prev_offset]), reinterpret_cast<unsigned char*>(v));
prev_offset = offsets_src[i];
}
return col_res;
throw Exception(
ErrorCodes::ILLEGAL_COLUMN, "Illegal column {} of argument of function {}", arguments[0].column->getName(), getName());
}
else
throw Exception("Illegal column " + arguments[0].column->getName()
+ " of argument of function " + getName(),
ErrorCodes::ILLEGAL_COLUMN);
auto col_res = ColumnUInt8::create();
ColumnUInt8::Container & vec_res = col_res->getData();
vec_res.resize(input_column->size());
const ColumnString::Chars & vec_src = input_column->getChars();
const ColumnString::Offsets & offsets_src = input_column->getOffsets();
size_t prev_offset = 0;
char buffer[IPV6_BINARY_LENGTH];
for (size_t i = 0; i < vec_res.size(); ++i)
{
vec_res[i] = DB::parseIPv6(reinterpret_cast<const char *>(&vec_src[prev_offset]), reinterpret_cast<unsigned char *>(buffer));
prev_offset = offsets_src[i];
}
return col_res;
}
};
@ -1079,8 +1101,6 @@ void registerFunctionsCoding(FunctionFactory & factory)
factory.registerFunction<FunctionMACNumToString>();
factory.registerFunction<FunctionMACStringTo<ParseMACImpl>>();
factory.registerFunction<FunctionMACStringTo<ParseOUIImpl>>();
factory.registerFunction<FunctionToIPv4>();
factory.registerFunction<FunctionToIPv6>();
factory.registerFunction<FunctionIPv6CIDRToRange>();
factory.registerFunction<FunctionIPv4CIDRToRange>();
factory.registerFunction<FunctionIsIPv4String>();
@ -1089,14 +1109,26 @@ void registerFunctionsCoding(FunctionFactory & factory)
factory.registerFunction<FunctionIPv4NumToString<0, NameFunctionIPv4NumToString>>();
factory.registerFunction<FunctionIPv4NumToString<1, NameFunctionIPv4NumToStringClassC>>();
factory.registerFunction<FunctionIPv4StringToNum>();
factory.registerFunction<FunctionIPv6NumToString>();
factory.registerFunction<FunctionIPv6StringToNum>();
factory.registerFunction<FunctionIPv4StringToNum<IPStringToNumExceptionMode::Throw>>();
factory.registerFunction<FunctionIPv4StringToNum<IPStringToNumExceptionMode::Default>>();
factory.registerFunction<FunctionIPv4StringToNum<IPStringToNumExceptionMode::Null>>();
factory.registerFunction<FunctionToIPv4<IPStringToNumExceptionMode::Throw>>();
factory.registerFunction<FunctionToIPv4<IPStringToNumExceptionMode::Default>>();
factory.registerFunction<FunctionToIPv4<IPStringToNumExceptionMode::Null>>();
/// MysQL compatibility aliases:
factory.registerAlias("INET_ATON", FunctionIPv4StringToNum::name, FunctionFactory::CaseInsensitive);
factory.registerFunction<FunctionIPv6NumToString>();
factory.registerFunction<FunctionIPv6StringToNum<IPStringToNumExceptionMode::Throw>>();
factory.registerFunction<FunctionIPv6StringToNum<IPStringToNumExceptionMode::Default>>();
factory.registerFunction<FunctionIPv6StringToNum<IPStringToNumExceptionMode::Null>>();
factory.registerFunction<FunctionToIPv6<IPStringToNumExceptionMode::Throw>>();
factory.registerFunction<FunctionToIPv6<IPStringToNumExceptionMode::Default>>();
factory.registerFunction<FunctionToIPv6<IPStringToNumExceptionMode::Null>>();
/// MySQL compatibility aliases:
factory.registerAlias("INET_ATON", FunctionIPv4StringToNum<IPStringToNumExceptionMode::Throw>::name, FunctionFactory::CaseInsensitive);
factory.registerAlias("INET6_NTOA", FunctionIPv6NumToString::name, FunctionFactory::CaseInsensitive);
factory.registerAlias("INET6_ATON", FunctionIPv6StringToNum::name, FunctionFactory::CaseInsensitive);
factory.registerAlias("INET6_ATON", FunctionIPv6StringToNum<IPStringToNumExceptionMode::Throw>::name, FunctionFactory::CaseInsensitive);
factory.registerAlias("INET_NTOA", NameFunctionIPv4NumToString::name, FunctionFactory::CaseInsensitive);
}

View File

@ -0,0 +1,212 @@
#pragma once
#include <Common/formatIPv6.h>
#include <Columns/ColumnFixedString.h>
#include <Columns/ColumnNullable.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnsNumber.h>
namespace DB
{
namespace ErrorCodes
{
extern const int CANNOT_PARSE_DOMAIN_VALUE_FROM_STRING;
extern const int ILLEGAL_COLUMN;
}
enum class IPStringToNumExceptionMode : uint8_t
{
Throw,
Default,
Null
};
static inline bool tryParseIPv4(const char * pos, UInt32 & result_value)
{
return parseIPv4(pos, reinterpret_cast<unsigned char *>(&result_value));
}
namespace detail
{
template <IPStringToNumExceptionMode exception_mode, typename StringColumnType>
ColumnPtr convertToIPv6(const StringColumnType & string_column)
{
size_t column_size = string_column.size();
ColumnUInt8::MutablePtr col_null_map_to;
ColumnUInt8::Container * vec_null_map_to = nullptr;
if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
{
col_null_map_to = ColumnUInt8::create(column_size, false);
vec_null_map_to = &col_null_map_to->getData();
}
auto col_res = ColumnFixedString::create(IPV6_BINARY_LENGTH);
auto & vec_res = col_res->getChars();
vec_res.resize(column_size * IPV6_BINARY_LENGTH);
using Chars = typename StringColumnType::Chars;
const Chars & vec_src = string_column.getChars();
size_t src_offset = 0;
char src_ipv4_buf[sizeof("::ffff:") + IPV4_MAX_TEXT_LENGTH + 1] = "::ffff:";
/// ColumnFixedString contains not null terminated strings. But functions parseIPv6, parseIPv4 expect null terminated string.
std::string fixed_string_buffer;
if constexpr (std::is_same_v<StringColumnType, ColumnFixedString>)
{
fixed_string_buffer.resize(string_column.getN());
}
for (size_t out_offset = 0, i = 0; out_offset < vec_res.size(); out_offset += IPV6_BINARY_LENGTH, ++i)
{
size_t src_next_offset = src_offset;
const char * src_value = nullptr;
unsigned char * res_value = reinterpret_cast<unsigned char *>(&vec_res[out_offset]);
if constexpr (std::is_same_v<StringColumnType, ColumnString>)
{
src_value = reinterpret_cast<const char *>(&vec_src[src_offset]);
src_next_offset = string_column.getOffsets()[i];
}
else if constexpr (std::is_same_v<StringColumnType, ColumnFixedString>)
{
size_t fixed_string_size = string_column.getN();
std::memcpy(fixed_string_buffer.data(), reinterpret_cast<const char *>(&vec_src[src_offset]), fixed_string_size);
src_value = fixed_string_buffer.data();
src_next_offset += fixed_string_size;
}
bool parse_result = false;
UInt32 dummy_result = 0;
/// For both cases below: In case of failure, the function parseIPv6 fills vec_res with zero bytes.
/// If the source IP address is parsable as an IPv4 address, then transform it into a valid IPv6 address.
/// Keeping it simple by just prefixing `::ffff:` to the IPv4 address to represent it as a valid IPv6 address.
if (tryParseIPv4(src_value, dummy_result))
{
std::memcpy(
src_ipv4_buf + std::strlen("::ffff:"),
src_value,
std::min<UInt64>(src_next_offset - src_offset, IPV4_MAX_TEXT_LENGTH + 1));
parse_result = parseIPv6(src_ipv4_buf, res_value);
}
else
{
parse_result = parseIPv6(src_value, res_value);
}
if (!parse_result)
{
if constexpr (exception_mode == IPStringToNumExceptionMode::Throw)
throw Exception("Invalid IPv6 value", ErrorCodes::CANNOT_PARSE_DOMAIN_VALUE_FROM_STRING);
else if constexpr (exception_mode == IPStringToNumExceptionMode::Default)
vec_res[i] = 0;
else if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
(*vec_null_map_to)[i] = true;
}
src_offset = src_next_offset;
}
if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
return ColumnNullable::create(std::move(col_res), std::move(col_null_map_to));
return col_res;
}
}
template <IPStringToNumExceptionMode exception_mode>
ColumnPtr convertToIPv6(ColumnPtr column)
{
size_t column_size = column->size();
auto col_res = ColumnFixedString::create(IPV6_BINARY_LENGTH);
auto & vec_res = col_res->getChars();
vec_res.resize(column_size * IPV6_BINARY_LENGTH);
if (const auto * column_input_string = checkAndGetColumn<ColumnString>(column.get()))
{
return detail::convertToIPv6<exception_mode>(*column_input_string);
}
else if (const auto * column_input_fixed_string = checkAndGetColumn<ColumnFixedString>(column.get()))
{
return detail::convertToIPv6<exception_mode>(*column_input_fixed_string);
}
else
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Illegal column type {}. Expected String or FixedString", column->getName());
}
}
template <IPStringToNumExceptionMode exception_mode>
ColumnPtr convertToIPv4(ColumnPtr column)
{
const ColumnString * column_string = checkAndGetColumn<ColumnString>(column.get());
if (!column_string)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Illegal column type {}. Expected String.", column->getName());
}
size_t column_size = column_string->size();
ColumnUInt8::MutablePtr col_null_map_to;
ColumnUInt8::Container * vec_null_map_to = nullptr;
if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
{
col_null_map_to = ColumnUInt8::create(column_size, false);
vec_null_map_to = &col_null_map_to->getData();
}
auto col_res = ColumnUInt32::create();
ColumnUInt32::Container & vec_res = col_res->getData();
vec_res.resize(column_size);
const ColumnString::Chars & vec_src = column_string->getChars();
const ColumnString::Offsets & offsets_src = column_string->getOffsets();
size_t prev_offset = 0;
for (size_t i = 0; i < vec_res.size(); ++i)
{
bool parse_result = tryParseIPv4(reinterpret_cast<const char *>(&vec_src[prev_offset]), vec_res[i]);
if (!parse_result)
{
if constexpr (exception_mode == IPStringToNumExceptionMode::Throw)
{
throw Exception("Invalid IPv4 value", ErrorCodes::CANNOT_PARSE_DOMAIN_VALUE_FROM_STRING);
}
else if constexpr (exception_mode == IPStringToNumExceptionMode::Default)
{
vec_res[i] = 0;
}
else if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
{
(*vec_null_map_to)[i] = true;
vec_res[i] = 0;
}
}
prev_offset = offsets_src[i];
}
if constexpr (exception_mode == IPStringToNumExceptionMode::Null)
return ColumnNullable::create(std::move(col_res), std::move(col_null_map_to));
return col_res;
}
}

View File

@ -1055,7 +1055,7 @@ private:
ColumnPtr executeGeneric(const ColumnWithTypeAndName & c0, const ColumnWithTypeAndName & c1) const
{
DataTypePtr common_type = getLeastSupertype({c0.type, c1.type});
DataTypePtr common_type = getLeastSupertype(DataTypes{c0.type, c1.type});
ColumnPtr c0_converted = castColumn(c0, common_type);
ColumnPtr c1_converted = castColumn(c1, common_type);
@ -1228,7 +1228,7 @@ public:
// Comparing Date/Date32 and DateTime64 requires implicit conversion,
if (date_and_datetime && (isDateOrDate32(left_type) || isDateOrDate32(right_type)))
{
DataTypePtr common_type = getLeastSupertype({left_type, right_type});
DataTypePtr common_type = getLeastSupertype(DataTypes{left_type, right_type});
ColumnPtr c0_converted = castColumn(col_with_type_and_name_left, common_type);
ColumnPtr c1_converted = castColumn(col_with_type_and_name_right, common_type);
return executeDecimal({c0_converted, common_type, "left"}, {c1_converted, common_type, "right"});
@ -1258,7 +1258,7 @@ public:
}
else if (date_and_datetime)
{
DataTypePtr common_type = getLeastSupertype({left_type, right_type});
DataTypePtr common_type = getLeastSupertype(DataTypes{left_type, right_type});
ColumnPtr c0_converted = castColumn(col_with_type_and_name_left, common_type);
ColumnPtr c1_converted = castColumn(col_with_type_and_name_right, common_type);
if (!((res = executeNumLeftType<UInt32>(c0_converted.get(), c1_converted.get()))

View File

@ -25,6 +25,9 @@
#include <DataTypes/DataTypeUUID.h>
#include <DataTypes/DataTypeInterval.h>
#include <DataTypes/DataTypeAggregateFunction.h>
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/ObjectUtils.h>
#include <DataTypes/DataTypeNested.h>
#include <DataTypes/Serializations/SerializationDecimal.h>
#include <Formats/FormatSettings.h>
#include <Columns/ColumnString.h>
@ -34,6 +37,7 @@
#include <Columns/ColumnNullable.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnMap.h>
#include <Columns/ColumnObject.h>
#include <Columns/ColumnsCommon.h>
#include <Columns/ColumnStringHelpers.h>
#include <Common/assert_cast.h>
@ -45,6 +49,7 @@
#include <Functions/DateTimeTransforms.h>
#include <Functions/toFixedString.h>
#include <Functions/TransformDateTime64.h>
#include <Functions/FunctionsCodingIP.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <Columns/ColumnLowCardinality.h>
#include <Interpreters/Context.h>
@ -2532,10 +2537,12 @@ public:
, const DataTypes & argument_types_
, const DataTypePtr & return_type_
, std::optional<Diagnostic> diagnostic_
, CastType cast_type_)
, CastType cast_type_
, bool cast_ipv4_ipv6_default_on_conversion_error_)
: cast_name(cast_name_), monotonicity_for_range(std::move(monotonicity_for_range_))
, argument_types(argument_types_), return_type(return_type_), diagnostic(std::move(diagnostic_))
, cast_type(cast_type_)
, cast_ipv4_ipv6_default_on_conversion_error(cast_ipv4_ipv6_default_on_conversion_error_)
{
}
@ -2584,6 +2591,7 @@ private:
std::optional<Diagnostic> diagnostic;
CastType cast_type;
bool cast_ipv4_ipv6_default_on_conversion_error;
static WrapperType createFunctionAdaptor(FunctionPtr function, const DataTypePtr & from_type)
{
@ -2934,20 +2942,61 @@ private:
throw Exception{"CAST AS Tuple can only be performed between tuple types or from String.\nLeft type: "
+ from_type_untyped->getName() + ", right type: " + to_type->getName(), ErrorCodes::TYPE_MISMATCH};
if (from_type->getElements().size() != to_type->getElements().size())
throw Exception{"CAST AS Tuple can only be performed between tuple types with the same number of elements or from String.\n"
"Left type: " + from_type->getName() + ", right type: " + to_type->getName(), ErrorCodes::TYPE_MISMATCH};
const auto & from_element_types = from_type->getElements();
const auto & to_element_types = to_type->getElements();
auto element_wrappers = getElementWrappers(from_element_types, to_element_types);
return [element_wrappers, from_element_types, to_element_types]
std::vector<WrapperType> element_wrappers;
std::vector<std::optional<size_t>> to_reverse_index;
/// For named tuples allow conversions for tuples with
/// different sets of elements. If element exists in @to_type
/// and doesn't exist in @to_type it will be filled by default values.
if (from_type->haveExplicitNames() && from_type->serializeNames()
&& to_type->haveExplicitNames() && to_type->serializeNames())
{
const auto & from_names = from_type->getElementNames();
std::unordered_map<String, size_t> from_positions;
from_positions.reserve(from_names.size());
for (size_t i = 0; i < from_names.size(); ++i)
from_positions[from_names[i]] = i;
const auto & to_names = to_type->getElementNames();
element_wrappers.reserve(to_names.size());
to_reverse_index.reserve(from_names.size());
for (size_t i = 0; i < to_names.size(); ++i)
{
auto it = from_positions.find(to_names[i]);
if (it != from_positions.end())
{
element_wrappers.emplace_back(prepareUnpackDictionaries(from_element_types[it->second], to_element_types[i]));
to_reverse_index.emplace_back(it->second);
}
else
{
element_wrappers.emplace_back();
to_reverse_index.emplace_back();
}
}
}
else
{
if (from_element_types.size() != to_element_types.size())
throw Exception{"CAST AS Tuple can only be performed between tuple types with the same number of elements or from String.\n"
"Left type: " + from_type->getName() + ", right type: " + to_type->getName(), ErrorCodes::TYPE_MISMATCH};
element_wrappers = getElementWrappers(from_element_types, to_element_types);
to_reverse_index.reserve(to_element_types.size());
for (size_t i = 0; i < to_element_types.size(); ++i)
to_reverse_index.emplace_back(i);
}
return [element_wrappers, from_element_types, to_element_types, to_reverse_index]
(ColumnsWithTypeAndName & arguments, const DataTypePtr &, const ColumnNullable * nullable_source, size_t input_rows_count) -> ColumnPtr
{
const auto * col = arguments.front().column.get();
size_t tuple_size = from_element_types.size();
size_t tuple_size = to_element_types.size();
const ColumnTuple & column_tuple = typeid_cast<const ColumnTuple &>(*col);
Columns converted_columns(tuple_size);
@ -2955,8 +3004,16 @@ private:
/// invoke conversion for each element
for (size_t i = 0; i < tuple_size; ++i)
{
ColumnsWithTypeAndName element = {{column_tuple.getColumns()[i], from_element_types[i], "" }};
converted_columns[i] = element_wrappers[i](element, to_element_types[i], nullable_source, input_rows_count);
if (to_reverse_index[i])
{
size_t from_idx = *to_reverse_index[i];
ColumnsWithTypeAndName element = {{column_tuple.getColumns()[from_idx], from_element_types[from_idx], "" }};
converted_columns[i] = element_wrappers[i](element, to_element_types[i], nullable_source, input_rows_count);
}
else
{
converted_columns[i] = to_element_types[i]->createColumn()->cloneResized(input_rows_count);
}
}
return ColumnTuple::create(converted_columns);
@ -3077,6 +3134,68 @@ private:
}
}
WrapperType createObjectWrapper(const DataTypePtr & from_type, const DataTypeObject * to_type) const
{
if (const auto * from_tuple = checkAndGetDataType<DataTypeTuple>(from_type.get()))
{
if (!from_tuple->haveExplicitNames())
throw Exception(ErrorCodes::TYPE_MISMATCH,
"Cast to Object can be performed only from flatten Named Tuple. Got: {}", from_type->getName());
PathsInData paths;
DataTypes from_types;
std::tie(paths, from_types) = flattenTuple(from_type);
auto to_types = from_types;
for (auto & type : to_types)
{
if (isTuple(type) || isNested(type))
throw Exception(ErrorCodes::TYPE_MISMATCH,
"Cast to Object can be performed only from flatten Named Tuple. Got: {}", from_type->getName());
type = recursiveRemoveLowCardinality(type);
}
return [element_wrappers = getElementWrappers(from_types, to_types),
has_nullable_subcolumns = to_type->hasNullableSubcolumns(), from_types, to_types, paths]
(ColumnsWithTypeAndName & arguments, const DataTypePtr &, const ColumnNullable * nullable_source, size_t input_rows_count)
{
size_t tuple_size = to_types.size();
auto flattened_column = flattenTuple(arguments.front().column);
const auto & column_tuple = assert_cast<const ColumnTuple &>(*flattened_column);
if (tuple_size != column_tuple.getColumns().size())
throw Exception(ErrorCodes::TYPE_MISMATCH,
"Expected tuple with {} subcolumn, but got {} subcolumns",
tuple_size, column_tuple.getColumns().size());
auto res = ColumnObject::create(has_nullable_subcolumns);
for (size_t i = 0; i < tuple_size; ++i)
{
ColumnsWithTypeAndName element = {{column_tuple.getColumns()[i], from_types[i], "" }};
auto converted_column = element_wrappers[i](element, to_types[i], nullable_source, input_rows_count);
res->addSubcolumn(paths[i], converted_column->assumeMutable());
}
return res;
};
}
else if (checkAndGetDataType<DataTypeString>(from_type.get()))
{
return [] (ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, const ColumnNullable * nullable_source, size_t input_rows_count)
{
auto res = ConvertImplGenericFromString<ColumnString>::execute(arguments, result_type, nullable_source, input_rows_count);
auto & res_object = assert_cast<ColumnObject &>(res->assumeMutableRef());
res_object.finalize();
return res;
};
}
throw Exception(ErrorCodes::TYPE_MISMATCH,
"Cast to Object can be performed only from flatten named tuple or string. Got: {}", from_type->getName());
}
template <typename FieldType>
WrapperType createEnumWrapper(const DataTypePtr & from_type, const DataTypeEnum<FieldType> * to_type) const
{
@ -3381,7 +3500,9 @@ private:
/// 'requested_result_is_nullable' is true if CAST to Nullable type is requested.
WrapperType prepareImpl(const DataTypePtr & from_type, const DataTypePtr & to_type, bool requested_result_is_nullable) const
{
if (from_type->equals(*to_type))
bool convert_to_ipv6 = to_type->getCustomName() && to_type->getCustomName()->getName() == "IPv6";
if (from_type->equals(*to_type) && !convert_to_ipv6)
{
if (isUInt8(from_type))
return createUInt8ToUInt8Wrapper(from_type, to_type);
@ -3449,7 +3570,9 @@ private:
return false;
};
auto make_custom_serialization_wrapper = [&](const auto & types) -> bool
bool cast_ipv4_ipv6_default_on_conversion_error_value = cast_ipv4_ipv6_default_on_conversion_error;
auto make_custom_serialization_wrapper = [&, cast_ipv4_ipv6_default_on_conversion_error_value](const auto & types) -> bool
{
using Types = std::decay_t<decltype(types)>;
using ToDataType = typename Types::RightType;
@ -3457,8 +3580,45 @@ private:
if constexpr (WhichDataType(FromDataType::type_id).isStringOrFixedString())
{
if (to_type->getCustomSerialization())
if (to_type->getCustomSerialization() && to_type->getCustomName())
{
if (to_type->getCustomName()->getName() == "IPv4")
{
ret = [cast_ipv4_ipv6_default_on_conversion_error_value](
ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, const ColumnNullable *, size_t)
-> ColumnPtr
{
if (!WhichDataType(result_type).isUInt32())
throw Exception(ErrorCodes::TYPE_MISMATCH, "Wrong result type {}. Expected UInt32", result_type->getName());
if (cast_ipv4_ipv6_default_on_conversion_error_value)
return convertToIPv4<IPStringToNumExceptionMode::Default>(arguments[0].column);
else
return convertToIPv4<IPStringToNumExceptionMode::Throw>(arguments[0].column);
};
return true;
}
if (to_type->getCustomName()->getName() == "IPv6")
{
ret = [cast_ipv4_ipv6_default_on_conversion_error_value](
ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, const ColumnNullable *, size_t)
-> ColumnPtr
{
if (!WhichDataType(result_type).isFixedString())
throw Exception(
ErrorCodes::TYPE_MISMATCH, "Wrong result type {}. Expected FixedString", result_type->getName());
if (cast_ipv4_ipv6_default_on_conversion_error_value)
return convertToIPv6<IPStringToNumExceptionMode::Default>(arguments[0].column);
else
return convertToIPv6<IPStringToNumExceptionMode::Throw>(arguments[0].column);
};
return true;
}
ret = &ConvertImplGenericFromString<typename FromDataType::ColumnType>::execute;
return true;
}
@ -3496,6 +3656,8 @@ private:
return createTupleWrapper(from_type, checkAndGetDataType<DataTypeTuple>(to_type.get()));
case TypeIndex::Map:
return createMapWrapper(from_type, checkAndGetDataType<DataTypeMap>(to_type.get()));
case TypeIndex::Object:
return createObjectWrapper(from_type, checkAndGetDataType<DataTypeObject>(to_type.get()));
case TypeIndex::AggregateFunction:
return createAggregateFunctionWrapper(from_type, checkAndGetDataType<DataTypeAggregateFunction>(to_type.get()));
default:

View File

@ -35,9 +35,9 @@
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <Functions/DummyJSONParser.h>
#include <Functions/SimdJSONParser.h>
#include <Functions/RapidJSONParser.h>
#include <Common/JSONParsers/DummyJSONParser.h>
#include <Common/JSONParsers/SimdJSONParser.h>
#include <Common/JSONParsers/RapidJSONParser.h>
#include <Functions/FunctionHelpers.h>
#include <Interpreters/Context.h>

View File

@ -663,7 +663,7 @@ public:
throw Exception{"Elements of array of second argument of function " + getName()
+ " must be numeric type.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
}
return getLeastSupertype({type_x, type_arr_nested});
return getLeastSupertype(DataTypes{type_x, type_arr_nested});
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t) const override

View File

@ -474,7 +474,7 @@ private:
auto arg_decayed = removeNullable(removeLowCardinality(arg));
return ((isNativeNumber(inner_type_decayed) || isEnum(inner_type_decayed)) && isNativeNumber(arg_decayed))
|| getLeastSupertype({inner_type_decayed, arg_decayed});
|| getLeastSupertype(DataTypes{inner_type_decayed, arg_decayed});
}
/**
@ -1045,7 +1045,7 @@ private:
DataTypePtr array_elements_type = assert_cast<const DataTypeArray &>(*arguments[0].type).getNestedType();
const DataTypePtr & index_type = arguments[1].type;
DataTypePtr common_type = getLeastSupertype({array_elements_type, index_type});
DataTypePtr common_type = getLeastSupertype(DataTypes{array_elements_type, index_type});
ColumnPtr col_nested = castColumn({ col->getDataPtr(), array_elements_type, "" }, common_type);

View File

@ -62,7 +62,7 @@ public:
if (number_of_arguments == 2)
return arguments[0];
else /* if (number_of_arguments == 3) */
return std::make_shared<DataTypeArray>(getLeastSupertype({array_type->getNestedType(), arguments[2]}));
return std::make_shared<DataTypeArray>(getLeastSupertype(DataTypes{array_type->getNestedType(), arguments[2]}));
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & return_type, size_t input_rows_count) const override

View File

@ -0,0 +1,118 @@
#include "config_functions.h"
#if USE_H3
#include <Columns/ColumnArray.h>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <Common/typeid_cast.h>
#include <constants.h>
#include <h3api.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
extern const int ARGUMENT_OUT_OF_BOUND;
}
namespace
{
class FunctionH3GetPentagonIndexes : public IFunction
{
public:
static constexpr auto name = "h3GetPentagonIndexes";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionH3GetPentagonIndexes>(); }
std::string getName() const override { return name; }
size_t getNumberOfArguments() const override { return 1; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
const auto * arg = arguments[0].get();
if (!WhichDataType(arg).isUInt8())
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument {} of function {}. Must be UInt8",
arg->getName(), 1, getName());
return std::make_shared<DataTypeArray>(std::make_shared<DataTypeUInt64>());
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
auto non_const_arguments = arguments;
for (auto & argument : non_const_arguments)
argument.column = argument.column->convertToFullColumnIfConst();
const auto * column = checkAndGetColumn<ColumnUInt8>(non_const_arguments[0].column.get());
if (!column)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be UInt8.",
arguments[0].type->getName(),
1,
getName());
const auto & data = column->getData();
auto result_column_data = ColumnUInt64::create();
auto & result_data = result_column_data->getData();
auto result_column_offsets = ColumnArray::ColumnOffsets::create();
auto & result_offsets = result_column_offsets->getData();
result_offsets.resize(input_rows_count);
auto current_offset = 0;
std::vector<H3Index> hindex_vec;
result_data.reserve(input_rows_count);
for (size_t row = 0; row < input_rows_count; ++row)
{
if (data[row] > MAX_H3_RES)
throw Exception(
ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"The argument 'resolution' ({}) of function {} is out of bounds because the maximum resolution in H3 library is ",
toString(data[row]),
getName(),
MAX_H3_RES);
const auto vec_size = pentagonCount();
hindex_vec.resize(vec_size);
getPentagons(data[row], hindex_vec.data());
for (auto & i : hindex_vec)
{
++current_offset;
result_data.emplace_back(i);
}
result_offsets[row] = current_offset;
hindex_vec.clear();
}
return ColumnArray::create(std::move(result_column_data), std::move(result_column_offsets));
}
};
}
void registerFunctionH3GetPentagonIndexes(FunctionFactory & factory)
{
factory.registerFunction<FunctionH3GetPentagonIndexes>();
}
}
#endif

View File

@ -0,0 +1,71 @@
#include "config_functions.h"
#if USE_H3
#include <Columns/ColumnArray.h>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <IO/WriteHelpers.h>
#include <Common/typeid_cast.h>
#include <base/range.h>
#include <h3api.h>
namespace DB
{
namespace
{
class FunctionH3GetRes0Indexes final : public IFunction
{
public:
static constexpr auto name = "h3GetRes0Indexes";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionH3GetRes0Indexes>(); }
std::string getName() const override { return name; }
size_t getNumberOfArguments() const override { return 0; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override
{
return std::make_shared<DataTypeArray>(std::make_shared<DataTypeUInt64>());
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr & result_type, size_t input_rows_count) const override
{
if (input_rows_count == 0)
return result_type->createColumn();
std::vector<H3Index> res0_indexes;
const auto cell_count = res0CellCount();
res0_indexes.resize(cell_count);
getRes0Cells(res0_indexes.data());
auto res = ColumnArray::create(ColumnUInt64::create());
Array res_indexes;
res_indexes.insert(res_indexes.end(), res0_indexes.begin(), res0_indexes.end());
res->insert(res_indexes);
return res;
}
};
}
void registerFunctionH3GetRes0Indexes(FunctionFactory & factory)
{
factory.registerFunction<FunctionH3GetRes0Indexes>();
}
}
#endif

View File

@ -0,0 +1,153 @@
#include "config_functions.h"
#if USE_H3
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <IO/WriteHelpers.h>
#include <Common/typeid_cast.h>
#include <base/range.h>
#include <constants.h>
#include <h3api.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
}
namespace
{
template <class Impl>
class FunctionH3PointDist final : public IFunction
{
public:
static constexpr auto name = Impl::name;
static constexpr auto function = Impl::function;
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionH3PointDist>(); }
std::string getName() const override { return name; }
size_t getNumberOfArguments() const override { return 4; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
for (size_t i = 0; i < getNumberOfArguments(); ++i)
{
const auto * arg = arguments[i].get();
if (!WhichDataType(arg).isFloat64())
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument {} of function {}. Must be Float64",
arg->getName(), i, getName());
}
return std::make_shared<DataTypeFloat64>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
auto non_const_arguments = arguments;
for (auto & argument : non_const_arguments)
argument.column = argument.column->convertToFullColumnIfConst();
const auto * col_lat1 = checkAndGetColumn<ColumnFloat64>(non_const_arguments[0].column.get());
if (!col_lat1)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float64",
arguments[0].type->getName(),
1,
getName());
const auto & data_lat1 = col_lat1->getData();
const auto * col_lon1 = checkAndGetColumn<ColumnFloat64>(non_const_arguments[1].column.get());
if (!col_lon1)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float64",
arguments[1].type->getName(),
2,
getName());
const auto & data_lon1 = col_lon1->getData();
const auto * col_lat2 = checkAndGetColumn<ColumnFloat64>(non_const_arguments[2].column.get());
if (!col_lat2)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float64",
arguments[2].type->getName(),
3,
getName());
const auto & data_lat2 = col_lat2->getData();
const auto * col_lon2 = checkAndGetColumn<ColumnFloat64>(non_const_arguments[3].column.get());
if (!col_lon2)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float64",
arguments[3].type->getName(),
4,
getName());
const auto & data_lon2 = col_lon2->getData();
auto dst = ColumnVector<Float64>::create();
auto & dst_data = dst->getData();
dst_data.resize(input_rows_count);
for (size_t row = 0; row < input_rows_count; ++row)
{
const double lat1 = data_lat1[row];
const double lon1 = data_lon1[row];
const auto lat2 = data_lat2[row];
const auto lon2 = data_lon2[row];
LatLng point1 = {degsToRads(lat1), degsToRads(lon1)};
LatLng point2 = {degsToRads(lat2), degsToRads(lon2)};
// function will be equivalent to distanceM or distanceKm or distanceRads
Float64 res = function(&point1, &point2);
dst_data[row] = res;
}
return dst;
}
};
}
struct H3PointDistM
{
static constexpr auto name = "h3PointDistM";
static constexpr auto function = distanceM;
};
struct H3PointDistKm
{
static constexpr auto name = "h3PointDistKm";
static constexpr auto function = distanceKm;
};
struct H3PointDistRads
{
static constexpr auto name = "h3PointDistRads";
static constexpr auto function = distanceRads;
};
void registerFunctionH3PointDistM(FunctionFactory & factory) { factory.registerFunction<FunctionH3PointDist<H3PointDistM>>(); }
void registerFunctionH3PointDistKm(FunctionFactory & factory) { factory.registerFunction<FunctionH3PointDist<H3PointDistKm>>(); }
void registerFunctionH3PointDistRads(FunctionFactory & factory) { factory.registerFunction<FunctionH3PointDist<H3PointDistRads>>(); }
}
#endif

View File

@ -632,7 +632,7 @@ private:
const ColumnWithTypeAndName & arg1 = arguments[1];
const ColumnWithTypeAndName & arg2 = arguments[2];
DataTypePtr common_type = getLeastSupertype({arg1.type, arg2.type});
DataTypePtr common_type = getLeastSupertype(DataTypes{arg1.type, arg2.type});
ColumnPtr col_then = castColumn(arg1, common_type);
ColumnPtr col_else = castColumn(arg2, common_type);
@ -1022,7 +1022,7 @@ public:
throw Exception("Illegal type " + arguments[0]->getName() + " of first argument (condition) of function if. Must be UInt8.",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return getLeastSupertype({arguments[1], arguments[2]});
return getLeastSupertype(DataTypes{arguments[1], arguments[2]});
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & args, const DataTypePtr & result_type, size_t input_rows_count) const override

Some files were not shown because too many files have changed in this diff Show More