Merge branch 'master' into parallel-multipart-upload-for-s3storage

This commit is contained in:
mergify[bot] 2022-03-20 18:25:28 +00:00 committed by GitHub
commit 7ac606fa65
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
514 changed files with 12853 additions and 3456 deletions

1
.gitattributes vendored
View File

@ -1,2 +1,3 @@
contrib/* linguist-vendored
*.h linguist-language=C++
tests/queries/0_stateless/data_json/* binary

View File

@ -32,7 +32,7 @@ jobs:
uses: svenstaro/upload-release-action@v2
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: ${{runner.temp}}/release_packages/*
file: ${{runner.temp}}/push_to_artifactory/*
overwrite: true
tag: ${{ github.ref }}
file_glob: true

View File

@ -1,4 +1,138 @@
### ClickHouse release v22.2, 2022-02-17
### Table of Contents
**[ClickHouse release v22.3-lts, 2022-03-17](#223)**<br>
**[ClickHouse release v22.2, 2022-02-17](#222)**<br>
**[ClickHouse release v22.1, 2022-01-18](#221)**<br>
**[Changelog for 2021](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/whats-new/changelog/2021.md)**<br>
## <a id="223"></a> ClickHouse release v22.3-lts, 2022-03-17
#### Backward Incompatible Change
* Make `arrayCompact` function behave as other higher-order functions: perform compaction not of lambda function results but on the original array. If you're using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrapping `arrayCompact` arguments into `arrayMap`. Closes [#34010](https://github.com/ClickHouse/ClickHouse/issues/34010) [#18535](https://github.com/ClickHouse/ClickHouse/issues/18535) [#14778](https://github.com/ClickHouse/ClickHouse/issues/14778). [#34795](https://github.com/ClickHouse/ClickHouse/pull/34795) ([Alexandre Snarskii](https://github.com/snar)).
* Change implementation specific behavior on overflow of function `toDatetime`. It will be saturated to the nearest min/max supported instant of datetime instead of wraparound. This change is highlighted as "backward incompatible" because someone may unintentionally rely on the old behavior. [#32898](https://github.com/ClickHouse/ClickHouse/pull/32898) ([HaiBo Li](https://github.com/marising)).
#### New Feature
* Support for caching data locally for remote filesystems. It can be enabled for `s3` disks. Closes [#28961](https://github.com/ClickHouse/ClickHouse/issues/28961). [#33717](https://github.com/ClickHouse/ClickHouse/pull/33717) ([Kseniia Sumarokova](https://github.com/kssenii)). In the meantime, we enabled the test suite on s3 filesystem and no more known issues exist, so it is started to be production ready.
* Add new table function `hive`. It can be used as follows `hive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>')` for example `SELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', 'id Nullable(String), score Nullable(Int32), day Nullable(String)', 'day')`. [#34946](https://github.com/ClickHouse/ClickHouse/pull/34946) ([lgbo](https://github.com/lgbo-ustc)).
* Support authentication of users connected via SSL by their X.509 certificate. [#31484](https://github.com/ClickHouse/ClickHouse/pull/31484) ([eungenue](https://github.com/eungenue)).
* Support schema inference for inserting into table functions `file`/`hdfs`/`s3`/`url`. [#34732](https://github.com/ClickHouse/ClickHouse/pull/34732) ([Kruglov Pavel](https://github.com/Avogar)).
* Now you can read `system.zookeeper` table without restrictions on path or using `like` expression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable setting `allow_unrestricted_reads_from_keeper`. [#34609](https://github.com/ClickHouse/ClickHouse/pull/34609) ([Sergei Trifonov](https://github.com/serxa)).
* Display CPU and memory metrics in clickhouse-local. Close [#34545](https://github.com/ClickHouse/ClickHouse/issues/34545). [#34605](https://github.com/ClickHouse/ClickHouse/pull/34605) ([李扬](https://github.com/taiyang-li)).
* Implement `startsWith` and `endsWith` function for arrays, closes [#33982](https://github.com/ClickHouse/ClickHouse/issues/33982). [#34368](https://github.com/ClickHouse/ClickHouse/pull/34368) ([usurai](https://github.com/usurai)).
* Add three functions for Map data type: 1. `mapReplace(map1, map2)` - replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2. `mapFilter` 3. `mapMap`. mapFilter and mapMap are higher order functions, accepting two arguments, the first argument is a lambda function with k, v pair as arguments, the second argument is a column of type Map. [#33698](https://github.com/ClickHouse/ClickHouse/pull/33698) ([hexiaoting](https://github.com/hexiaoting)).
* Allow getting default user and password for clickhouse-client from the `CLICKHOUSE_USER` and `CLICKHOUSE_PASSWORD` environment variables. Close [#34538](https://github.com/ClickHouse/ClickHouse/issues/34538). [#34947](https://github.com/ClickHouse/ClickHouse/pull/34947) ([DR](https://github.com/freedomDR)).
#### Experimental Feature
* New data type `Object(<schema_format>)`, which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.g `data.key1.key2` or with cast operator `data.key1.key2::Int64`.
* Add `database_replicated_allow_only_replicated_engine` setting. When enabled, it only allowed to only create `Replicated` tables or tables with stateless engines in `Replicated` databases. [#35214](https://github.com/ClickHouse/ClickHouse/pull/35214) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). Note that `Replicated` database is still an experimental feature.
#### Performance Improvement
* Improve performance of insertion into `MergeTree` tables by optimizing sorting. Up to 2x improvement is observed on realistic benchmarks. [#34750](https://github.com/ClickHouse/ClickHouse/pull/34750) ([Maksim Kita](https://github.com/kitaisreal)).
* Columns pruning when reading Parquet, ORC and Arrow files from URL and S3. Closes [#34163](https://github.com/ClickHouse/ClickHouse/issues/34163). [#34849](https://github.com/ClickHouse/ClickHouse/pull/34849) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Columns pruning when reading Parquet, ORC and Arrow files from Hive. [#34954](https://github.com/ClickHouse/ClickHouse/pull/34954) ([lgbo](https://github.com/lgbo-ustc)).
* A bunch of performance optimizations from a performance superhero. Improve performance of processing queries with large `IN` section. Improve performance of `direct` dictionary if its source is `ClickHouse`. Improve performance of `detectCharset `, `detectLanguageUnknown ` functions. [#34888](https://github.com/ClickHouse/ClickHouse/pull/34888) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve performance of `any` aggregate function by using more batching. [#34760](https://github.com/ClickHouse/ClickHouse/pull/34760) ([Raúl Marín](https://github.com/Algunenano)).
* Multiple improvements for performance of `clickhouse-keeper`: less locking [#35010](https://github.com/ClickHouse/ClickHouse/pull/35010) ([zhanglistar](https://github.com/zhanglistar)), lower memory usage by streaming reading and writing of snapshot instead of full copy. [#34584](https://github.com/ClickHouse/ClickHouse/pull/34584) ([zhanglistar](https://github.com/zhanglistar)), optimizing compaction of log store in the RAFT implementation. [#34534](https://github.com/ClickHouse/ClickHouse/pull/34534) ([zhanglistar](https://github.com/zhanglistar)), versioning of the internal data structure [#34486](https://github.com/ClickHouse/ClickHouse/pull/34486) ([zhanglistar](https://github.com/zhanglistar)).
#### Improvement
* Allow asynchronous inserts to table functions. Fixes [#34864](https://github.com/ClickHouse/ClickHouse/issues/34864). [#34866](https://github.com/ClickHouse/ClickHouse/pull/34866) ([Anton Popov](https://github.com/CurtizJ)).
* Implicit type casting of the key argument for functions `dictGetHierarchy`, `dictIsIn`, `dictGetChildren`, `dictGetDescendants`. Closes [#34970](https://github.com/ClickHouse/ClickHouse/issues/34970). [#35027](https://github.com/ClickHouse/ClickHouse/pull/35027) ([Maksim Kita](https://github.com/kitaisreal)).
* `EXPLAIN AST` query can output AST in form of a graph in Graphviz format: `EXPLAIN AST graph = 1 SELECT * FROM system.parts`. [#35173](https://github.com/ClickHouse/ClickHouse/pull/35173) ([李扬](https://github.com/taiyang-li)).
* When large files were written with `s3` table function or table engine, the content type on the files was mistakenly set to `application/xml` due to a bug in the AWS SDK. This closes [#33964](https://github.com/ClickHouse/ClickHouse/issues/33964). [#34433](https://github.com/ClickHouse/ClickHouse/pull/34433) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Change restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also `SHOW CREATE ROW POLICY` will always show `AS permissive` or `AS restrictive` in row policy's definition. [#34596](https://github.com/ClickHouse/ClickHouse/pull/34596) ([Vitaly Baranov](https://github.com/vitlibar)).
* Improve schema inference with globs in File/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. [#34465](https://github.com/ClickHouse/ClickHouse/pull/34465) ([Kruglov Pavel](https://github.com/Avogar)).
* Play UI now correctly detects the preferred light/dark theme from the OS. [#35068](https://github.com/ClickHouse/ClickHouse/pull/35068) ([peledni](https://github.com/peledni)).
* Added `date_time_input_format = 'best_effort_us'`. Closes [#34799](https://github.com/ClickHouse/ClickHouse/issues/34799). [#34982](https://github.com/ClickHouse/ClickHouse/pull/34982) ([WenYao](https://github.com/Cai-Yao)).
* A new settings called `allow_plaintext_password` and `allow_no_password` are added in server configuration which turn on/off authentication types that can be potentially insecure in some environments. They are allowed by default. [#34738](https://github.com/ClickHouse/ClickHouse/pull/34738) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Support for `DateTime64` data type in `Arrow` format, closes [#8280](https://github.com/ClickHouse/ClickHouse/issues/8280) and closes [#28574](https://github.com/ClickHouse/ClickHouse/issues/28574). [#34561](https://github.com/ClickHouse/ClickHouse/pull/34561) ([李扬](https://github.com/taiyang-li)).
* Reload `remote_url_allow_hosts` (filtering of outgoing connections) on config update. [#35294](https://github.com/ClickHouse/ClickHouse/pull/35294) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support `--testmode` parameter for `clickhouse-local`. This parameter enables interpretation of test hints that we use in functional tests. [#35264](https://github.com/ClickHouse/ClickHouse/pull/35264) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add `distributed_depth` to query log. It is like a more detailed variant of `is_initial_query` [#35207](https://github.com/ClickHouse/ClickHouse/pull/35207) ([李扬](https://github.com/taiyang-li)).
* Respect `remote_url_allow_hosts` for `MySQL` and `PostgreSQL` table functions. [#35191](https://github.com/ClickHouse/ClickHouse/pull/35191) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Added `disk_name` field to `system.part_log`. [#35178](https://github.com/ClickHouse/ClickHouse/pull/35178) ([Artyom Yurkov](https://github.com/Varinara)).
* Do not retry non-rertiable errors when querying remote URLs. Closes [#35161](https://github.com/ClickHouse/ClickHouse/issues/35161). [#35172](https://github.com/ClickHouse/ClickHouse/pull/35172) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support distributed INSERT SELECT queries (the setting `parallel_distributed_insert_select`) table function `view()`. [#35132](https://github.com/ClickHouse/ClickHouse/pull/35132) ([Azat Khuzhin](https://github.com/azat)).
* More precise memory tracking during `INSERT` into `Buffer` with `AggregateFunction`. [#35072](https://github.com/ClickHouse/ClickHouse/pull/35072) ([Azat Khuzhin](https://github.com/azat)).
* Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes [#34787](https://github.com/ClickHouse/ClickHouse/issues/34787). [#35032](https://github.com/ClickHouse/ClickHouse/pull/35032) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. [#35004](https://github.com/ClickHouse/ClickHouse/pull/35004) ([alesapin](https://github.com/alesapin)).
* Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes [#34929](https://github.com/ClickHouse/ClickHouse/issues/34929). [#34949](https://github.com/ClickHouse/ClickHouse/pull/34949) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Use connection pool for Hive metastore client. [#34940](https://github.com/ClickHouse/ClickHouse/pull/34940) ([lgbo](https://github.com/lgbo-ustc)).
* Ignore per-column `TTL` in `CREATE TABLE AS` if new table engine does not support it (i.e. if the engine is not of `MergeTree` family). [#34938](https://github.com/ClickHouse/ClickHouse/pull/34938) ([Azat Khuzhin](https://github.com/azat)).
* Allow `LowCardinality` strings for `ngrambf_v1`/`tokenbf_v1` indexes. Closes [#21865](https://github.com/ClickHouse/ClickHouse/issues/21865). [#34911](https://github.com/ClickHouse/ClickHouse/pull/34911) ([Lars Hiller Eidnes](https://github.com/larspars)).
* Allow opening empty sqlite db if the file doesn't exist. Closes [#33367](https://github.com/ClickHouse/ClickHouse/issues/33367). [#34907](https://github.com/ClickHouse/ClickHouse/pull/34907) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Implement memory statistics for FreeBSD - this is required for `max_server_memory_usage` to work correctly. [#34902](https://github.com/ClickHouse/ClickHouse/pull/34902) ([Alexandre Snarskii](https://github.com/snar)).
* In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes [#34324](https://github.com/ClickHouse/ClickHouse/issues/34324). [#34801](https://github.com/ClickHouse/ClickHouse/pull/34801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Now `ALTER TABLE DROP COLUMN columnX` queries for `MergeTree` table engines will work instantly when `columnX` is an `ALIAS` column. Fixes [#34660](https://github.com/ClickHouse/ClickHouse/issues/34660). [#34786](https://github.com/ClickHouse/ClickHouse/pull/34786) ([alesapin](https://github.com/alesapin)).
* Show hints when user mistyped the name of a data skipping index. Closes [#29698](https://github.com/ClickHouse/ClickHouse/issues/29698). [#34764](https://github.com/ClickHouse/ClickHouse/pull/34764) ([flynn](https://github.com/ucasfl)).
* Support `remote()`/`cluster()` table functions for `parallel_distributed_insert_select`. [#34728](https://github.com/ClickHouse/ClickHouse/pull/34728) ([Azat Khuzhin](https://github.com/azat)).
* Do not reset logging that configured via `--log-file`/`--errorlog-file` command line options in case of empty configuration in the config file. [#34718](https://github.com/ClickHouse/ClickHouse/pull/34718) ([Amos Bird](https://github.com/amosbird)).
* Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. [#34684](https://github.com/ClickHouse/ClickHouse/pull/34684) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow specifying argument names for executable UDFs. This is necessary for formats where argument name is part of serialization, like `Native`, `JSONEachRow`. Closes [#34604](https://github.com/ClickHouse/ClickHouse/issues/34604). [#34653](https://github.com/ClickHouse/ClickHouse/pull/34653) ([Maksim Kita](https://github.com/kitaisreal)).
* `MaterializedMySQL` (experimental feature) now supports `materialized_mysql_tables_list` (a comma-separated list of MySQL database tables, which will be replicated by the MaterializedMySQL database engine. Default value: empty list — means all the tables will be replicated), mentioned at [#32977](https://github.com/ClickHouse/ClickHouse/issues/32977). [#34487](https://github.com/ClickHouse/ClickHouse/pull/34487) ([zzsmdfj](https://github.com/zzsmdfj)).
* Improve OpenTelemetry span logs for INSERT operation on distributed table. [#34480](https://github.com/ClickHouse/ClickHouse/pull/34480) ([Frank Chen](https://github.com/FrankChen021)).
* Make the znode `ctime` and `mtime` consistent between servers in ClickHouse Keeper. [#33441](https://github.com/ClickHouse/ClickHouse/pull/33441) ([小路](https://github.com/nicelulu)).
#### Build/Testing/Packaging Improvement
* Package repository is migrated to JFrog Artifactory (**Mikhail f. Shiryaev**).
* Randomize some settings in functional tests, so more possible combinations of settings will be tested. This is yet another fuzzing method to ensure better test coverage. This closes [#32268](https://github.com/ClickHouse/ClickHouse/issues/32268). [#34092](https://github.com/ClickHouse/ClickHouse/pull/34092) ([Kruglov Pavel](https://github.com/Avogar)).
* Drop PVS-Studio from our CI. [#34680](https://github.com/ClickHouse/ClickHouse/pull/34680) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add an ability to build stripped binaries with CMake. In previous versions it was performed by dh-tools. [#35196](https://github.com/ClickHouse/ClickHouse/pull/35196) ([alesapin](https://github.com/alesapin)).
* Smaller "fat-free" `clickhouse-keeper` build. [#35031](https://github.com/ClickHouse/ClickHouse/pull/35031) ([alesapin](https://github.com/alesapin)).
* Use @robot-clickhouse as an author and committer for PRs like https://github.com/ClickHouse/ClickHouse/pull/34685. [#34793](https://github.com/ClickHouse/ClickHouse/pull/34793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Limit DWARF version for debug info by 4 max, because our internal stack symbolizer cannot parse DWARF version 5. This makes sense if you compile ClickHouse with clang-15. [#34777](https://github.com/ClickHouse/ClickHouse/pull/34777) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove `clickhouse-test` debian package as unneeded complication. CI use tests from repository and standalone testing via deb package is no longer supported. [#34606](https://github.com/ClickHouse/ClickHouse/pull/34606) ([Ilya Yatsishin](https://github.com/qoega)).
#### Bug Fix (user-visible misbehaviour in official stable or prestable release)
* A fix for HDFS integration: When the inner buffer size is too small, NEED_MORE_INPUT in `HadoopSnappyDecoder` will run multi times (>=3) for one compressed block. This makes the input data be copied into the wrong place in `HadoopSnappyDecoder::buffer`. [#35116](https://github.com/ClickHouse/ClickHouse/pull/35116) ([lgbo](https://github.com/lgbo-ustc)).
* Ignore obsolete grants in ATTACH GRANT statements. This PR fixes [#34815](https://github.com/ClickHouse/ClickHouse/issues/34815). [#34855](https://github.com/ClickHouse/ClickHouse/pull/34855) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes [#35312](https://github.com/ClickHouse/ClickHouse/issues/35312). [#35313](https://github.com/ClickHouse/ClickHouse/pull/35313) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partial merge join duplicate rows bug, close [#31009](https://github.com/ClickHouse/ClickHouse/issues/31009). [#35311](https://github.com/ClickHouse/ClickHouse/pull/35311) ([Vladimir C](https://github.com/vdimir)).
* Fix possible `Assertion 'position() != working_buffer.end()' failed` while using bzip2 compression with small `max_read_buffer_size` setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. [#35300](https://github.com/ClickHouse/ClickHouse/pull/35300) ([Kruglov Pavel](https://github.com/Avogar)). While using lz4 compression with a small max_read_buffer_size setting value. [#35296](https://github.com/ClickHouse/ClickHouse/pull/35296) ([Kruglov Pavel](https://github.com/Avogar)). While using lzma compression with small `max_read_buffer_size` setting value. [#35295](https://github.com/ClickHouse/ClickHouse/pull/35295) ([Kruglov Pavel](https://github.com/Avogar)). While using `brotli` compression with a small `max_read_buffer_size` setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. [#35281](https://github.com/ClickHouse/ClickHouse/pull/35281) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix possible segfault in `JSONEachRow` schema inference. [#35291](https://github.com/ClickHouse/ClickHouse/pull/35291) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix `CHECK TABLE` query in case when sparse columns are enabled in table. [#35274](https://github.com/ClickHouse/ClickHouse/pull/35274) ([Anton Popov](https://github.com/CurtizJ)).
* Avoid std::terminate in case of exception in reading from remote VFS. [#35257](https://github.com/ClickHouse/ClickHouse/pull/35257) ([Azat Khuzhin](https://github.com/azat)).
* Fix reading port from config, close [#34776](https://github.com/ClickHouse/ClickHouse/issues/34776). [#35193](https://github.com/ClickHouse/ClickHouse/pull/35193) ([Vladimir C](https://github.com/vdimir)).
* Fix error in query with `WITH TOTALS` in case if `HAVING` returned empty result. This fixes [#33711](https://github.com/ClickHouse/ClickHouse/issues/33711). [#35186](https://github.com/ClickHouse/ClickHouse/pull/35186) ([Amos Bird](https://github.com/amosbird)).
* Fix a corner case of `replaceRegexpAll`, close [#35117](https://github.com/ClickHouse/ClickHouse/issues/35117). [#35182](https://github.com/ClickHouse/ClickHouse/pull/35182) ([Vladimir C](https://github.com/vdimir)).
* Schema inference didn't work properly on case of `INSERT INTO FUNCTION s3(...) FROM ...`, it tried to read schema from s3 file instead of from select query. [#35176](https://github.com/ClickHouse/ClickHouse/pull/35176) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix MaterializedPostgreSQL (experimental feature) `table overrides` for partition by, etc. Closes [#35048](https://github.com/ClickHouse/ClickHouse/issues/35048). [#35162](https://github.com/ClickHouse/ClickHouse/pull/35162) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix MaterializedPostgreSQL (experimental feature) adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes [#33800](https://github.com/ClickHouse/ClickHouse/issues/33800). Closes [#34922](https://github.com/ClickHouse/ClickHouse/issues/34922). Closes [#34315](https://github.com/ClickHouse/ClickHouse/issues/34315). [#35158](https://github.com/ClickHouse/ClickHouse/pull/35158) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning error when non-monotonic function is used with IN operator. This fixes [#35136](https://github.com/ClickHouse/ClickHouse/issues/35136). [#35146](https://github.com/ClickHouse/ClickHouse/pull/35146) ([Amos Bird](https://github.com/amosbird)).
* Fixed slightly incorrect translation of YAML configs to XML. [#35135](https://github.com/ClickHouse/ClickHouse/pull/35135) ([Miel Donkers](https://github.com/mdonkers)).
* Fix `optimize_skip_unused_shards_rewrite_in` for signed columns and negative values. [#35134](https://github.com/ClickHouse/ClickHouse/pull/35134) ([Azat Khuzhin](https://github.com/azat)).
* The `update_lag` external dictionary configuration option was unusable showing the error message ``Unexpected key `update_lag` in dictionary source configuration``. [#35089](https://github.com/ClickHouse/ClickHouse/pull/35089) ([Jason Chu](https://github.com/1lann)).
* Avoid possible deadlock on server shutdown. [#35081](https://github.com/ClickHouse/ClickHouse/pull/35081) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing alias after function is optimized to a subcolumn when setting `optimize_functions_to_subcolumns` is enabled. Closes [#33798](https://github.com/ClickHouse/ClickHouse/issues/33798). [#35079](https://github.com/ClickHouse/ClickHouse/pull/35079) ([qieqieplus](https://github.com/qieqieplus)).
* Fix reading from `system.asynchronous_inserts` table if there exists asynchronous insert into table function. [#35050](https://github.com/ClickHouse/ClickHouse/pull/35050) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible exception `Reading for MergeTree family tables must be done with last position boundary` (relevant to operation on remote VFS). Closes [#34979](https://github.com/ClickHouse/ClickHouse/issues/34979). [#35001](https://github.com/ClickHouse/ClickHouse/pull/35001) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix unexpected result when use -State type aggregate function in window frame. [#34999](https://github.com/ClickHouse/ClickHouse/pull/34999) ([metahys](https://github.com/metahys)).
* Fix possible segfault in FileLog (experimental feature). Closes [#30749](https://github.com/ClickHouse/ClickHouse/issues/30749). [#34996](https://github.com/ClickHouse/ClickHouse/pull/34996) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible rare error `Cannot push block to port which already has data`. [#34993](https://github.com/ClickHouse/ClickHouse/pull/34993) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix wrong schema inference for unquoted dates in CSV. Closes [#34768](https://github.com/ClickHouse/ClickHouse/issues/34768). [#34961](https://github.com/ClickHouse/ClickHouse/pull/34961) ([Kruglov Pavel](https://github.com/Avogar)).
* Integration with Hive: Fix unexpected result when use `in` in `where` in hive query. [#34945](https://github.com/ClickHouse/ClickHouse/pull/34945) ([lgbo](https://github.com/lgbo-ustc)).
* Avoid busy polling in ClickHouse Keeper while searching for changelog files to delete. [#34931](https://github.com/ClickHouse/ClickHouse/pull/34931) ([Azat Khuzhin](https://github.com/azat)).
* Fix DateTime64 conversion from PostgreSQL. Closes [#33364](https://github.com/ClickHouse/ClickHouse/issues/33364). [#34910](https://github.com/ClickHouse/ClickHouse/pull/34910) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible "Part directory doesn't exist" during `INSERT` into MergeTree table backed by VFS over s3. [#34876](https://github.com/ClickHouse/ClickHouse/pull/34876) ([Azat Khuzhin](https://github.com/azat)).
* Support DDLs like CREATE USER to be executed on cross replicated cluster. [#34860](https://github.com/ClickHouse/ClickHouse/pull/34860) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Fix bugs for multiple columns group by in `WindowView` (experimental feature). [#34859](https://github.com/ClickHouse/ClickHouse/pull/34859) ([vxider](https://github.com/Vxider)).
* Fix possible failures in S2 functions when queries contain const columns. [#34745](https://github.com/ClickHouse/ClickHouse/pull/34745) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix bug for H3 funcs containing const columns which cause queries to fail. [#34743](https://github.com/ClickHouse/ClickHouse/pull/34743) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix `No such file or directory` with enabled `fsync_part_directory` and vertical merge. [#34739](https://github.com/ClickHouse/ClickHouse/pull/34739) ([Azat Khuzhin](https://github.com/azat)).
* Fix serialization/printing for system queries `RELOAD MODEL`, `RELOAD FUNCTION`, `RESTART DISK` when used `ON CLUSTER`. Closes [#34514](https://github.com/ClickHouse/ClickHouse/issues/34514). [#34696](https://github.com/ClickHouse/ClickHouse/pull/34696) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix `allow_experimental_projection_optimization` with `enable_global_with_statement` (before it may lead to `Stack size too large` error in case of multiple expressions in `WITH` clause, and also it executes scalar subqueries again and again, so not it will be more optimal). [#34650](https://github.com/ClickHouse/ClickHouse/pull/34650) ([Azat Khuzhin](https://github.com/azat)).
* Stop to select part for mutate when the other replica has already updated the transaction log for `ReplatedMergeTree` engine. [#34633](https://github.com/ClickHouse/ClickHouse/pull/34633) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Fix incorrect result of trivial count query when part movement feature is used [#34089](https://github.com/ClickHouse/ClickHouse/issues/34089). [#34385](https://github.com/ClickHouse/ClickHouse/pull/34385) ([nvartolomei](https://github.com/nvartolomei)).
* Fix inconsistency of `max_query_size` limitation in distributed subqueries. [#34078](https://github.com/ClickHouse/ClickHouse/pull/34078) ([Chao Ma](https://github.com/godliness)).
### <a id="222"></a> ClickHouse release v22.2, 2022-02-17
#### Upgrade Notes
@ -174,7 +308,7 @@
* This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). [#33574](https://github.com/ClickHouse/ClickHouse/pull/33574) ([Vitaly Baranov](https://github.com/vitlibar)).
### ClickHouse release v22.1, 2022-01-18
### <a id="221"></a> ClickHouse release v22.1, 2022-01-18
#### Upgrade Notes

View File

@ -15,7 +15,7 @@ The following versions of ClickHouse server are currently being supported with s
| 20.x | :x: |
| 21.1 | :x: |
| 21.2 | :x: |
| 21.3 | |
| 21.3 | :x: |
| 21.4 | :x: |
| 21.5 | :x: |
| 21.6 | :x: |
@ -23,9 +23,11 @@ The following versions of ClickHouse server are currently being supported with s
| 21.8 | ✅ |
| 21.9 | :x: |
| 21.10 | :x: |
| 21.11 | |
| 21.12 | |
| 21.11 | :x: |
| 21.12 | :x: |
| 22.1 | ✅ |
| 22.2 | ✅ |
| 22.3 | ✅ |
## Reporting a Vulnerability

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54460)
SET(VERSION_REVISION 54461)
SET(VERSION_MAJOR 22)
SET(VERSION_MINOR 3)
SET(VERSION_MINOR 4)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 75366fc95e510b7ac76759ef670702ae5f488a51)
SET(VERSION_DESCRIBE v22.3.1.1-testing)
SET(VERSION_STRING 22.3.1.1)
SET(VERSION_GITHASH 92ab33f560e638d1989c5ca543021ab53d110f5c)
SET(VERSION_DESCRIBE v22.4.1.1-testing)
SET(VERSION_STRING 22.4.1.1)
# end of autochange

View File

@ -51,6 +51,7 @@ The supported formats are:
| [PrettySpace](#prettyspace) | ✗ | ✔ |
| [Protobuf](#protobuf) | ✔ | ✔ |
| [ProtobufSingle](#protobufsingle) | ✔ | ✔ |
| [ProtobufList](#protobuflist) | ✔ | ✔ |
| [Avro](#data-format-avro) | ✔ | ✔ |
| [AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ |
| [Parquet](#data-format-parquet) | ✔ | ✔ |
@ -1230,7 +1231,38 @@ See also [how to read/write length-delimited protobuf messages in popular langua
## ProtobufSingle {#protobufsingle}
Same as [Protobuf](#protobuf) but for storing/parsing single Protobuf message without length delimiters.
Same as [Protobuf](#protobuf) but for storing/parsing a single Protobuf message without length delimiter.
As a result, only a single table row can be written/read.
## ProtobufList {#protobuflist}
Similar to Protobuf but rows are represented as a sequence of sub-messages contained in a message with fixed name "Envelope".
Usage example:
``` sql
SELECT * FROM test.table FORMAT ProtobufList SETTINGS format_schema = 'schemafile:MessageType'
```
``` bash
cat protobuflist_messages.bin | clickhouse-client --query "INSERT INTO test.table FORMAT ProtobufList SETTINGS format_schema='schemafile:MessageType'"
```
where the file `schemafile.proto` looks like this:
``` capnp
syntax = "proto3";
message Envelope {
message MessageType {
string name = 1;
string surname = 2;
uint32 birthDate = 3;
repeated string phoneNumbers = 4;
};
MessageType row = 1;
};
```
## Avro {#data-format-avro}

View File

@ -55,7 +55,7 @@ Internal coordination settings are located in `<keeper_server>.<coordination_set
- `auto_forwarding` — Allow to forward write requests from followers to the leader (default: true).
- `shutdown_timeout` — Wait to finish internal connections and shutdown (ms) (default: 5000).
- `startup_timeout` — If the server doesn't connect to other quorum participants in the specified timeout it will terminate (ms) (default: 30000).
- `four_letter_word_white_list` — White list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
- `four_letter_word_allow_list` — Allow list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro").
Quorum configuration is located in `<keeper_server>.<raft_configuration>` section and contain servers description.
@ -121,7 +121,7 @@ clickhouse keeper --config /etc/your_path_to_config/config.xml
ClickHouse Keeper also provides 4lw commands which are almost the same with Zookeeper. Each command is composed of four letters such as `mntr`, `stat` etc. There are some more interesting commands: `stat` gives some general information about the server and connected clients, while `srvr` and `cons` give extended details on server and connections respectively.
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro".
The 4lw commands has a allow list configuration `four_letter_word_allow_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro".
You can issue the commands to ClickHouse Keeper via telnet or nc, at the client port.
@ -201,7 +201,7 @@ Server stats reset.
```
server_id=1
tcp_port=2181
four_letter_word_white_list=*
four_letter_word_allow_list=*
log_storage_path=./coordination/logs
snapshot_storage_path=./coordination/snapshots
max_requests_batch_size=100

View File

@ -3290,6 +3290,19 @@ Possible values:
Default value: `16`.
## max_insert_delayed_streams_for_parallel_write {#max-insert-delayed-streams-for-parallel-write}
The maximum number of streams (columns) to delay final part flush.
It makes difference only if underlying storage supports parallel write (i.e. S3), otherwise it will not give any benefit.
Possible values:
- Positive integer.
- 0 or 1 — Disabled.
Default value: `1000` for S3 and `0` otherwise.
## opentelemetry_start_trace_probability {#opentelemetry-start-trace-probability}
Sets the probability that the ClickHouse can start a trace for executed queries (if no parent [trace context](https://www.w3.org/TR/trace-context/) is supplied).

View File

@ -225,15 +225,15 @@ This storage method works the same way as hashed and allows using date/time (arb
Example: The table contains discounts for each advertiser in the format:
``` text
+---------|-------------|-------------|------+
+---------------|---------------------|-------------------|--------+
| advertiser id | discount start date | discount end date | amount |
+===============+=====================+===================+========+
| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
+---------|-------------|-------------|------+
+---------------|---------------------|-------------------|--------+
| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
+---------|-------------|-------------|------+
+---------------|---------------------|-------------------|--------+
| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
+---------|-------------|-------------|------+
+---------------|---------------------|-------------------|--------+
```
To use a sample for date ranges, define the `range_min` and `range_max` elements in the [structure](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-structure.md). These elements must contain elements `name` and `type` (if `type` is not specified, the default type will be used - Date). `type` can be any numeric type (Date / DateTime / UInt64 / Int32 / others).
@ -272,10 +272,10 @@ LAYOUT(RANGE_HASHED())
RANGE(MIN first MAX last)
```
To work with these dictionaries, you need to pass an additional argument to the `dictGetT` function, for which a range is selected:
To work with these dictionaries, you need to pass an additional argument to the `dictGet*` function, for which a range is selected:
``` sql
dictGetT('dict_name', 'attr_name', id, date)
dictGet*('dict_name', 'attr_name', id, date)
```
This function returns the value for the specified `id`s and the date range that includes the passed date.
@ -479,17 +479,17 @@ This type of storage is for mapping network prefixes (IP addresses) to metadata
Example: The table contains network prefixes and their corresponding AS number and country code:
``` text
+-----------|-----|------+
+-----------------|-------|--------+
| prefix | asn | cca2 |
+=================+=======+========+
| 202.79.32.0/20 | 17501 | NP |
+-----------|-----|------+
+-----------------|-------|--------+
| 2620:0:870::/48 | 3856 | US |
+-----------|-----|------+
+-----------------|-------|--------+
| 2a02:6b8:1::/48 | 13238 | RU |
+-----------|-----|------+
+-----------------|-------|--------+
| 2001:db8::/32 | 65536 | ZZ |
+-----------|-----|------+
+-----------------|-------|--------+
```
When using this type of layout, the structure must have a composite key.
@ -538,10 +538,10 @@ PRIMARY KEY prefix
The key must have only one String type attribute that contains an allowed IP prefix. Other types are not supported yet.
For queries, you must use the same functions (`dictGetT` with a tuple) as for dictionaries with composite keys:
For queries, you must use the same functions (`dictGet*` with a tuple) as for dictionaries with composite keys:
``` sql
dictGetT('dict_name', 'attr_name', tuple(ip))
dictGet*('dict_name', 'attr_name', tuple(ip))
```
The function takes either `UInt32` for IPv4, or `FixedString(16)` for IPv6:

View File

@ -1392,12 +1392,24 @@ Returns the first element in the `arr1` array for which `func` returns something
Note that the `arrayFirst` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayFirstOrNull(func, arr1, …) {#array-first-or-null}
Returns the first element in the `arr1` array for which `func` returns something other than 0. If there are no such element, returns null.
Note that the `arrayFirstOrNull` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayLast(func, arr1, …) {#array-last}
Returns the last element in the `arr1` array for which `func` returns something other than 0.
Note that the `arrayLast` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayLastOrNull(func, arr1, …) {#array-last-or-null}
Returns the last element in the `arr1` array for which `func` returns something other than 0. If there are no such element, returns null.
Note that the `arrayLast` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayFirstIndex(func, arr1, …) {#array-first-index}
Returns the index of the first element in the `arr1` array for which `func` returns something other than 0.

View File

@ -1026,4 +1026,185 @@ Result:
│ 41162 │
└─────────────┘
```
## h3PointDistM {#h3pointdistm}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in meters.
**Syntax**
``` sql
h3PointDistM(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in meters.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistM(-10.0 ,0.0, 10.0, 0.0) as h3PointDistM;
```
Result:
``` text
┌──────h3PointDistM─┐
│ 2223901.039504589 │
└───────────────────┘
```
## h3PointDistKm {#h3pointdistkm}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in kilometers.
**Syntax**
``` sql
h3PointDistKm(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in kilometers.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistKm(-10.0 ,0.0, 10.0, 0.0) as h3PointDistKm;
```
Result:
``` text
┌─────h3PointDistKm─┐
│ 2223.901039504589 │
└───────────────────┘
```
## h3PointDistRads {#h3pointdistrads}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in radians.
**Syntax**
``` sql
h3PointDistRads(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in radians.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistRads(-10.0 ,0.0, 10.0, 0.0) as h3PointDistRads;
```
Result:
``` text
┌────h3PointDistRads─┐
│ 0.3490658503988659 │
└────────────────────┘
```
## h3GetRes0Indexes {#h3getres0indexes}
Returns an array of all the resolution 0 H3 indexes.
**Syntax**
``` sql
h3GetRes0Indexes()
```
**Returned values**
- Array of all the resolution 0 H3 indexes.
Type: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)).
**Example**
Query:
``` sql
select h3GetRes0Indexes as indexes ;
```
Result:
``` text
┌─indexes─────────────────────────────────────┐
│ [576495936675512319,576531121047601151,....]│
└─────────────────────────────────────────────┘
```
## h3GetPentagonIndexes {#h3getpentagonindexes}
Returns all the pentagon H3 indexes at the specified resolution.
**Syntax**
``` sql
h3GetPentagonIndexes(resolution)
```
**Parameter**
- `resolution` — Index resolution. Range: `[0, 15]`. Type: [UInt8](../../../sql-reference/data-types/int-uint.md).
**Returned value**
- Array of all pentagon H3 indexes.
Type: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)).
**Example**
Query:
``` sql
SELECT h3GetPentagonIndexes(3) AS indexes;
```
Result:
``` text
┌─indexes────────────────────────────────────────────────────────┐
│ [590112357393367039,590464201114255359,590816044835143679,...] │
└────────────────────────────────────────────────────────────────┘
```
[Original article](https://clickhouse.com/docs/en/sql-reference/functions/geo/h3) <!--hide-->

View File

@ -13,10 +13,18 @@ Alias: `INET_NTOA`.
## IPv4StringToNum(s) {#ipv4stringtonums}
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it returns 0.
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it throws exception.
Alias: `INET_ATON`.
## IPv4StringToNumOrDefault(s) {#ipv4stringtonums}
Same as `IPv4StringToNum`, but if the IPv4 address has an invalid format, it returns 0.
## IPv4StringToNumOrNull(s) {#ipv4stringtonums}
Same as `IPv4StringToNum`, but if the IPv4 address has an invalid format, it returns null.
## IPv4NumToStringClassC(num) {#ipv4numtostringclasscnum}
Similar to IPv4NumToString, but using xxx instead of the last octet.
@ -123,7 +131,7 @@ LIMIT 10
## IPv6StringToNum {#ipv6stringtonums}
The reverse function of [IPv6NumToString](#ipv6numtostringx). If the IPv6 address has an invalid format, it returns a string of null bytes.
The reverse function of [IPv6NumToString](#ipv6numtostringx). If the IPv6 address has an invalid format, it throws exception.
If the input string contains a valid IPv4 address, returns its IPv6 equivalent.
HEX can be uppercase or lowercase.
@ -168,6 +176,14 @@ Result:
- [cutIPv6](#cutipv6x-bytestocutforipv6-bytestocutforipv4).
## IPv6StringToNumOrDefault(s) {#ipv6stringtonums}
Same as `IPv6StringToNum`, but if the IPv6 address has an invalid format, it returns 0.
## IPv6StringToNumOrNull(s) {#ipv6stringtonums}
Same as `IPv6StringToNum`, but if the IPv6 address has an invalid format, it returns null.
## IPv4ToIPv6(x) {#ipv4toipv6x}
Takes a `UInt32` number. Interprets it as an IPv4 address in [big endian](https://en.wikipedia.org/wiki/Endianness). Returns a `FixedString(16)` value containing the IPv6 address in binary format. Examples:
@ -261,6 +277,14 @@ SELECT
└───────────────────────────────────┴──────────────────────────┘
```
## toIPv4OrDefault(string) {#toipv4ordefaultstring}
Same as `toIPv4`, but if the IPv4 address has an invalid format, it returns 0.
## toIPv4OrNull(string) {#toipv4ornullstring}
Same as `toIPv4`, but if the IPv4 address has an invalid format, it returns null.
## toIPv6 {#toipv6string}
Converts a string form of IPv6 address to [IPv6](../../sql-reference/data-types/domains/ipv6.md) type. If the IPv6 address has an invalid format, returns an empty value.
@ -317,6 +341,14 @@ Result:
└─────────────────────┘
```
## IPv6StringToNumOrDefault(s) {#toipv6ordefaultstring}
Same as `toIPv6`, but if the IPv6 address has an invalid format, it returns 0.
## IPv6StringToNumOrNull(s) {#toipv6ornullstring}
Same as `toIPv6`, but if the IPv6 address has an invalid format, it returns null.
## isIPv4String {#isipv4string}
Determines whether the input string is an IPv4 address or not. If `string` is IPv6 address returns `0`.

View File

@ -2,6 +2,49 @@
toc_priority: 76
toc_title: Security Changelog
---
## Fixed in ClickHouse 21.10.2.15, 2021-10-18 {#fixed-in-clickhouse-release-21-10-2-215-2021-10-18}
### CVE-2021-43304 {#cve-2021-43304}
Heap buffer overflow in Clickhouse's LZ4 compression codec when parsing a malicious query. There is no verification that the copy operations in the LZ4::decompressImpl loop and especially the arbitrary copy operation wildCopy<copy_amount>(op, ip, copy_end), dont exceed the destination buffers limits.
Credits: JFrog Security Research Team
### CVE-2021-43305 {#cve-2021-43305}
Heap buffer overflow in Clickhouse's LZ4 compression codec when parsing a malicious query. There is no verification that the copy operations in the LZ4::decompressImpl loop and especially the arbitrary copy operation wildCopy<copy_amount>(op, ip, copy_end), dont exceed the destination buffers limits. This issue is very similar to CVE-2021-43304, but the vulnerable copy operation is in a different wildCopy call.
Credits: JFrog Security Research Team
### CVE-2021-42387 {#cve-2021-42387}
Heap out-of-bounds read in Clickhouse's LZ4 compression codec when parsing a malicious query. As part of the LZ4::decompressImpl() loop, a 16-bit unsigned user-supplied value ('offset') is read from the compressed data. The offset is later used in the length of a copy operation, without checking the upper bounds of the source of the copy operation.
Credits: JFrog Security Research Team
### CVE-2021-42388 {#cve-2021-42388}
Heap out-of-bounds read in Clickhouse's LZ4 compression codec when parsing a malicious query. As part of the LZ4::decompressImpl() loop, a 16-bit unsigned user-supplied value ('offset') is read from the compressed data. The offset is later used in the length of a copy operation, without checking the lower bounds of the source of the copy operation.
Credits: JFrog Security Research Team
### CVE-2021-42389 {#cve-2021-42389}
Divide-by-zero in Clickhouse's Delta compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
Credits: JFrog Security Research Team
### CVE-2021-42390 {#cve-2021-42390}
Divide-by-zero in Clickhouse's DeltaDouble compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
Credits: JFrog Security Research Team
### CVE-2021-42391 {#cve-2021-42391}
Divide-by-zero in Clickhouse's Gorilla compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
Credits: JFrog Security Research Team
## Fixed in ClickHouse 21.4.3.21, 2021-04-12 {#fixed-in-clickhouse-release-21-4-3-21-2021-04-12}

View File

@ -54,7 +54,7 @@ ClickHouse Keeper может использоваться как равноце
- `auto_forwarding` — разрешить пересылку запросов на запись от последователей лидеру (по умолчанию: true).
- `shutdown_timeout` — время ожидания завершения внутренних подключений и выключения, в миллисекундах (по умолчанию: 5000).
- `startup_timeout` — время отключения сервера, если он не подключается к другим участникам кворума, в миллисекундах (по умолчанию: 30000).
- `four_letter_word_white_list` — список разрешенных 4-х буквенных команд (по умолчанию: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
- `four_letter_word_allow_list` — список разрешенных 4-х буквенных команд (по умолчанию: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro").
Конфигурация кворума находится в `<keeper_server>.<raft_configuration>` и содержит описание серверов.
@ -114,7 +114,7 @@ clickhouse-keeper --config /etc/your_path_to_config/config.xml --daemon
ClickHouse Keeper также поддерживает 4-х буквенные команды, почти такие же, как у Zookeeper. Каждая команда состоит из 4-х символов, например, `mntr`, `stat` и т. д. Несколько интересных команд: `stat` предоставляет общую информацию о сервере и подключенных клиентах, а `srvr` и `cons` предоставляют расширенные сведения о сервере и подключениях соответственно.
У 4-х буквенных команд есть параметр для настройки разрешенного списка `four_letter_word_white_list`, который имеет значение по умолчанию "conf,cons,crst,envi,ruok,srst,srvr,stat, wchc,wchs,dirs,mntr,isro".
У 4-х буквенных команд есть параметр для настройки разрешенного списка `four_letter_word_allow_list`, который имеет значение по умолчанию "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro".
Вы можете отправлять команды в ClickHouse Keeper через telnet или nc на порт для клиента.
@ -194,7 +194,7 @@ Server stats reset.
```
server_id=1
tcp_port=2181
four_letter_word_white_list=*
four_letter_word_allow_list=*
log_storage_path=./coordination/logs
snapshot_storage_path=./coordination/snapshots
max_requests_batch_size=100

View File

@ -1,4 +1,4 @@
Babel==2.8.0
Babel==2.9.1
backports-abc==0.5
backports.functools-lru-cache==1.6.1
beautifulsoup4==4.9.1
@ -10,22 +10,22 @@ cssmin==0.2.0
future==0.18.2
htmlmin==0.1.12
idna==2.10
Jinja2>=2.11.3
Jinja2>=3.0.3
jinja2-highlight==0.6.1
jsmin==3.0.0
livereload==2.6.2
livereload==2.6.3
Markdown==3.3.2
MarkupSafe==1.1.1
MarkupSafe==2.1.0
mkdocs==1.1.2
mkdocs-htmlproofer-plugin==0.0.3
mkdocs-macros-plugin==0.4.20
nltk==3.5
nltk==3.7
nose==1.3.7
protobuf==3.14.0
numpy==1.21.2
pymdown-extensions==8.0
python-slugify==4.0.1
PyYAML==5.4.1
PyYAML==6.0
repackage==0.7.3
requests==2.25.1
singledispatch==3.4.0.3
@ -34,5 +34,6 @@ soupsieve==2.0.1
termcolor==1.1.0
tornado==6.1
Unidecode==1.1.1
urllib3>=1.26.5
Pygments>=2.7.4
urllib3>=1.26.8
Pygments>=2.11.2

View File

@ -1,10 +1,5 @@
---
machine_translated: true
machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3
---
# system.numbers_mt {#system-numbers-mt}
# 系统。numbers_mt {#system-numbers-mt}
一样的 [系统。数字](../../operations/system-tables/numbers.md) 但读取是并行的。 这些数字可以以任何顺序返回。
与[system.numbers](../../operations/system-tables/numbers.md)相似,但读取是并行的。 这些数字可以以任何顺序返回。
用于测试。

View File

@ -31,7 +31,7 @@
- 对于dict_name分层字典查找child_id键是否位于ancestor_id或匹配ancestor_id。返回UInt8。
## 独裁主义 {#dictgethierarchy}
## dictGetHierarchy {#dictgethierarchy}
`dictGetHierarchy('dict_name', id)`

View File

@ -304,8 +304,8 @@ void LocalServer::setupUsers()
ConfigurationPtr users_config;
auto & access_control = global_context->getAccessControl();
access_control.setPlaintextPasswordSetting(config().getBool("allow_plaintext_password", true));
access_control.setNoPasswordSetting(config().getBool("allow_no_password", true));
access_control.setNoPasswordAllowed(config().getBool("allow_no_password", true));
access_control.setPlaintextPasswordAllowed(config().getBool("allow_plaintext_password", true));
if (config().has("users_config") || config().has("config-file") || fs::exists("config.xml"))
{
const auto users_config_path = config().getString("users_config", config().getString("config-file", "config.xml"));

View File

@ -1069,9 +1069,10 @@ if (ThreadFuzzer::instance().isEffective())
auto & access_control = global_context->getAccessControl();
if (config().has("custom_settings_prefixes"))
access_control.setCustomSettingsPrefixes(config().getString("custom_settings_prefixes"));
///set the allow_plaintext_and_no_password setting in context.
access_control.setPlaintextPasswordSetting(config().getBool("allow_plaintext_password", true));
access_control.setNoPasswordSetting(config().getBool("allow_no_password", true));
access_control.setNoPasswordAllowed(config().getBool("allow_no_password", true));
access_control.setPlaintextPasswordAllowed(config().getBool("allow_plaintext_password", true));
/// Initialize access storages.
try
{

View File

@ -368,7 +368,7 @@
<!-- Path to temporary data for processing hard queries. -->
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
<!-- Disable AuthType Plaintext_password and No_password for ACL. -->
<!-- Disable AuthType plaintext_password and no_password for ACL. -->
<!-- <allow_plaintext_password>0</allow_plaintext_password> -->
<!-- <allow_no_password>0</allow_no_password> -->`

View File

@ -173,7 +173,8 @@ void AccessControl::addUsersConfigStorage(const String & storage_name_, const Po
auto check_setting_name_function = [this](const std::string_view & setting_name) { checkSettingNameIsAllowed(setting_name); };
auto is_no_password_allowed_function = [this]() -> bool { return isNoPasswordAllowed(); };
auto is_plaintext_password_allowed_function = [this]() -> bool { return isPlaintextPasswordAllowed(); };
auto new_storage = std::make_shared<UsersConfigAccessStorage>(storage_name_, check_setting_name_function,is_no_password_allowed_function,is_plaintext_password_allowed_function);
auto new_storage = std::make_shared<UsersConfigAccessStorage>(storage_name_, check_setting_name_function,
is_no_password_allowed_function, is_plaintext_password_allowed_function);
new_storage->setConfig(users_config_);
addStorage(new_storage);
LOG_DEBUG(getLogger(), "Added {} access storage '{}', path: {}",
@ -209,7 +210,8 @@ void AccessControl::addUsersConfigStorage(
auto check_setting_name_function = [this](const std::string_view & setting_name) { checkSettingNameIsAllowed(setting_name); };
auto is_no_password_allowed_function = [this]() -> bool { return isNoPasswordAllowed(); };
auto is_plaintext_password_allowed_function = [this]() -> bool { return isPlaintextPasswordAllowed(); };
auto new_storage = std::make_shared<UsersConfigAccessStorage>(storage_name_, check_setting_name_function,is_no_password_allowed_function,is_plaintext_password_allowed_function);
auto new_storage = std::make_shared<UsersConfigAccessStorage>(storage_name_, check_setting_name_function,
is_no_password_allowed_function, is_plaintext_password_allowed_function);
new_storage->load(users_config_path_, include_from_path_, preprocessed_dir_, get_zookeeper_function_);
addStorage(new_storage);
LOG_DEBUG(getLogger(), "Added {} access storage '{}', path: {}", String(new_storage->getStorageType()), new_storage->getStorageName(), new_storage->getPath());
@ -411,7 +413,8 @@ UUID AccessControl::authenticate(const Credentials & credentials, const Poco::Ne
{
try
{
return MultipleAccessStorage::authenticate(credentials, address, *external_authenticators,allow_no_password, allow_plaintext_password);
return MultipleAccessStorage::authenticate(credentials, address, *external_authenticators, allow_no_password,
allow_plaintext_password);
}
catch (...)
{
@ -447,26 +450,38 @@ void AccessControl::setCustomSettingsPrefixes(const String & comma_separated_pre
setCustomSettingsPrefixes(prefixes);
}
void AccessControl::setPlaintextPasswordSetting(bool allow_plaintext_password_)
{
allow_plaintext_password = allow_plaintext_password_;
}
void AccessControl::setNoPasswordSetting(bool allow_no_password_)
{
allow_no_password = allow_no_password_;
}
bool AccessControl::isSettingNameAllowed(const std::string_view & setting_name) const
bool AccessControl::isSettingNameAllowed(const std::string_view setting_name) const
{
return custom_settings_prefixes->isSettingNameAllowed(setting_name);
}
void AccessControl::checkSettingNameIsAllowed(const std::string_view & setting_name) const
void AccessControl::checkSettingNameIsAllowed(const std::string_view setting_name) const
{
custom_settings_prefixes->checkSettingNameIsAllowed(setting_name);
}
void AccessControl::setNoPasswordAllowed(bool allow_no_password_)
{
allow_no_password = allow_no_password_;
}
bool AccessControl::isNoPasswordAllowed() const
{
return allow_no_password;
}
void AccessControl::setPlaintextPasswordAllowed(bool allow_plaintext_password_)
{
allow_plaintext_password = allow_plaintext_password_;
}
bool AccessControl::isPlaintextPasswordAllowed() const
{
return allow_plaintext_password;
}
std::shared_ptr<const ContextAccess> AccessControl::getContextAccess(
const UUID & user_id,
const std::vector<UUID> & current_roles,
@ -550,15 +565,6 @@ std::vector<QuotaUsage> AccessControl::getAllQuotasUsage() const
return quota_cache->getAllQuotasUsage();
}
bool AccessControl::isPlaintextPasswordAllowed() const
{
return allow_plaintext_password;
}
bool AccessControl::isNoPasswordAllowed() const
{
return allow_no_password;
}
std::shared_ptr<const EnabledSettings> AccessControl::getEnabledSettings(
const UUID & user_id,

View File

@ -49,8 +49,6 @@ class AccessControl : public MultipleAccessStorage
public:
AccessControl();
~AccessControl() override;
std::atomic_bool allow_plaintext_password;
std::atomic_bool allow_no_password;
/// Parses access entities from a configuration loaded from users.xml.
/// This function add UsersConfigAccessStorage if it wasn't added before.
@ -113,12 +111,16 @@ public:
/// This function also enables custom prefixes to be used.
void setCustomSettingsPrefixes(const Strings & prefixes);
void setCustomSettingsPrefixes(const String & comma_separated_prefixes);
bool isSettingNameAllowed(const std::string_view & name) const;
void checkSettingNameIsAllowed(const std::string_view & name) const;
bool isSettingNameAllowed(const std::string_view name) const;
void checkSettingNameIsAllowed(const std::string_view name) const;
//sets allow_plaintext_password and allow_no_password setting
void setPlaintextPasswordSetting(const bool allow_plaintext_password_);
void setNoPasswordSetting(const bool allow_no_password_);
/// Allows users without password (by default it's allowed).
void setNoPasswordAllowed(const bool allow_no_password_);
bool isNoPasswordAllowed() const;
/// Allows users with plaintext password (by default it's allowed).
void setPlaintextPasswordAllowed(const bool allow_plaintext_password_);
bool isPlaintextPasswordAllowed() const;
UUID authenticate(const Credentials & credentials, const Poco::Net::IPAddress & address) const;
void setExternalAuthenticatorsConfig(const Poco::Util::AbstractConfiguration & config);
@ -153,9 +155,6 @@ public:
std::vector<QuotaUsage> getAllQuotasUsage() const;
bool isPlaintextPasswordAllowed() const;
bool isNoPasswordAllowed() const;
std::shared_ptr<const EnabledSettings> getEnabledSettings(
const UUID & user_id,
const SettingsProfileElements & settings_from_user,
@ -177,6 +176,8 @@ private:
std::unique_ptr<SettingsProfilesCache> settings_profiles_cache;
std::unique_ptr<ExternalAuthenticators> external_authenticators;
std::unique_ptr<CustomSettingsPrefixes> custom_settings_prefixes;
std::atomic_bool allow_plaintext_password = true;
std::atomic_bool allow_no_password = true;
};
}

View File

@ -120,7 +120,7 @@ AccessEntityPtr deserializeAccessEntityImpl(const String & definition)
if (res)
throw Exception("Two access entities attached in the same file", ErrorCodes::INCORRECT_ACCESS_ENTITY_DEFINITION);
res = user = std::make_unique<User>();
InterpreterCreateUserQuery::updateUserFromQuery(*user, *create_user_query);
InterpreterCreateUserQuery::updateUserFromQuery(*user, *create_user_query, /* allow_no_password = */ true, /* allow_plaintext_password = */ true);
}
else if (auto * create_role_query = query->as<ASTCreateRoleQuery>())
{

View File

@ -441,7 +441,9 @@ void IAccessStorage::notify(const Notifications & notifications)
UUID IAccessStorage::authenticate(
const Credentials & credentials,
const Poco::Net::IPAddress & address,
const ExternalAuthenticators & external_authenticators, bool allow_no_password, bool allow_plaintext_password) const
const ExternalAuthenticators & external_authenticators,
bool allow_no_password,
bool allow_plaintext_password) const
{
return *authenticateImpl(credentials, address, external_authenticators, /* throw_if_user_not_exists = */ true, allow_no_password, allow_plaintext_password);
}
@ -451,7 +453,9 @@ std::optional<UUID> IAccessStorage::authenticate(
const Credentials & credentials,
const Poco::Net::IPAddress & address,
const ExternalAuthenticators & external_authenticators,
bool throw_if_user_not_exists, bool allow_no_password, bool allow_plaintext_password) const
bool throw_if_user_not_exists,
bool allow_no_password,
bool allow_plaintext_password) const
{
return authenticateImpl(credentials, address, external_authenticators, throw_if_user_not_exists, allow_no_password, allow_plaintext_password);
}
@ -461,7 +465,9 @@ std::optional<UUID> IAccessStorage::authenticateImpl(
const Credentials & credentials,
const Poco::Net::IPAddress & address,
const ExternalAuthenticators & external_authenticators,
bool throw_if_user_not_exists, bool allow_no_password, bool allow_plaintext_password) const
bool throw_if_user_not_exists,
bool allow_no_password,
bool allow_plaintext_password) const
{
if (auto id = find<User>(credentials.getUserName()))
{
@ -469,8 +475,11 @@ std::optional<UUID> IAccessStorage::authenticateImpl(
{
if (!isAddressAllowed(*user, address))
throwAddressNotAllowed(address);
if (isNoPasswordAllowed(*user, allow_no_password) || isPlaintextPasswordAllowed(*user, allow_plaintext_password))
throwPasswordTypeNotAllowed();
auto auth_type = user->auth_data.getType();
if (((auth_type == AuthenticationType::NO_PASSWORD) && !allow_no_password) ||
((auth_type == AuthenticationType::PLAINTEXT_PASSWORD) && !allow_plaintext_password))
throwAuthenticationTypeNotAllowed(auth_type);
if (!areCredentialsValid(*user, credentials, external_authenticators))
throwInvalidCredentials();
@ -506,15 +515,6 @@ bool IAccessStorage::isAddressAllowed(const User & user, const Poco::Net::IPAddr
return user.allowed_client_hosts.contains(address);
}
bool IAccessStorage::isPlaintextPasswordAllowed(const User & user, bool allow_plaintext_password)
{
return !allow_plaintext_password && user.auth_data.getType() == AuthenticationType::PLAINTEXT_PASSWORD;
}
bool IAccessStorage::isNoPasswordAllowed(const User & user, bool allow_no_password)
{
return !allow_no_password && user.auth_data.getType() == AuthenticationType::NO_PASSWORD;
}
UUID IAccessStorage::generateRandomID()
{
@ -610,11 +610,12 @@ void IAccessStorage::throwAddressNotAllowed(const Poco::Net::IPAddress & address
throw Exception("Connections from " + address.toString() + " are not allowed", ErrorCodes::IP_ADDRESS_NOT_ALLOWED);
}
void IAccessStorage::throwPasswordTypeNotAllowed()
void IAccessStorage::throwAuthenticationTypeNotAllowed(AuthenticationType auth_type)
{
throw Exception(
"Authentication denied for users configured with AuthType PLAINTEXT_PASSWORD and NO_PASSWORD. Please check with Clickhouse admin to allow allow PLAINTEXT_PASSWORD and NO_PASSWORD through server configuration ",
ErrorCodes::AUTHENTICATION_FAILED);
ErrorCodes::AUTHENTICATION_FAILED,
"Authentication type {} is not allowed, check the setting allow_{} in the server configuration",
toString(auth_type), AuthenticationTypeInfo::get(auth_type).name);
}
void IAccessStorage::throwInvalidCredentials()
{

View File

@ -18,6 +18,7 @@ namespace DB
struct User;
class Credentials;
class ExternalAuthenticators;
enum class AuthenticationType;
/// Contains entities, i.e. instances of classes derived from IAccessEntity.
/// The implementations of this class MUST be thread-safe.
@ -148,7 +149,7 @@ public:
/// Finds a user, check the provided credentials and returns the ID of the user if they are valid.
/// Throws an exception if no such user or credentials are invalid.
UUID authenticate(const Credentials & credentials, const Poco::Net::IPAddress & address, const ExternalAuthenticators & external_authenticators, bool allow_no_password=true, bool allow_plaintext_password=true) const;
UUID authenticate(const Credentials & credentials, const Poco::Net::IPAddress & address, const ExternalAuthenticators & external_authenticators, bool allow_no_password, bool allow_plaintext_password) const;
std::optional<UUID> authenticate(const Credentials & credentials, const Poco::Net::IPAddress & address, const ExternalAuthenticators & external_authenticators, bool throw_if_user_not_exists, bool allow_no_password, bool allow_plaintext_password) const;
protected:
@ -164,8 +165,6 @@ protected:
virtual std::optional<UUID> authenticateImpl(const Credentials & credentials, const Poco::Net::IPAddress & address, const ExternalAuthenticators & external_authenticators, bool throw_if_user_not_exists, bool allow_no_password, bool allow_plaintext_password) const;
virtual bool areCredentialsValid(const User & user, const Credentials & credentials, const ExternalAuthenticators & external_authenticators) const;
virtual bool isAddressAllowed(const User & user, const Poco::Net::IPAddress & address) const;
static bool isPlaintextPasswordAllowed(const User & user, bool allow_plaintext_password) ;
static bool isNoPasswordAllowed(const User & user, bool allow_no_password);
static UUID generateRandomID();
Poco::Logger * getLogger() const;
static String formatEntityTypeWithName(AccessEntityType type, const String & name) { return AccessEntityTypeInfo::get(type).formatEntityNameWithType(name); }
@ -181,7 +180,7 @@ protected:
[[noreturn]] void throwReadonlyCannotRemove(AccessEntityType type, const String & name) const;
[[noreturn]] static void throwAddressNotAllowed(const Poco::Net::IPAddress & address);
[[noreturn]] static void throwInvalidCredentials();
[[noreturn]] static void throwPasswordTypeNotAllowed();
[[noreturn]] static void throwAuthenticationTypeNotAllowed(AuthenticationType auth_type);
using Notification = std::tuple<OnChangedHandler, UUID, AccessEntityPtr>;
using Notifications = std::vector<Notification>;
static void notify(const Notifications & notifications);

View File

@ -481,7 +481,9 @@ std::optional<UUID> LDAPAccessStorage::authenticateImpl(
const Credentials & credentials,
const Poco::Net::IPAddress & address,
const ExternalAuthenticators & external_authenticators,
bool throw_if_user_not_exists,bool allow_no_password __attribute__((unused)), bool allow_plaintext_password __attribute__((unused))) const
bool throw_if_user_not_exists,
bool /* allow_no_password */,
bool /* allow_plaintext_password */) const
{
std::scoped_lock lock(mutex);
auto id = memory_storage.find<User>(credentials.getUserName());

View File

@ -449,14 +449,20 @@ void MultipleAccessStorage::updateSubscriptionsToNestedStorages(std::unique_lock
}
std::optional<UUID> MultipleAccessStorage::authenticateImpl(const Credentials & credentials, const Poco::Net::IPAddress & address, const ExternalAuthenticators & external_authenticators, bool throw_if_user_not_exists,bool allow_no_password, bool allow_plaintext_password) const
std::optional<UUID>
MultipleAccessStorage::authenticateImpl(const Credentials & credentials, const Poco::Net::IPAddress & address,
const ExternalAuthenticators & external_authenticators,
bool throw_if_user_not_exists,
bool allow_no_password, bool allow_plaintext_password) const
{
auto storages = getStoragesInternal();
for (size_t i = 0; i != storages->size(); ++i)
{
const auto & storage = (*storages)[i];
bool is_last_storage = (i == storages->size() - 1);
auto id = storage->authenticate(credentials, address, external_authenticators, (throw_if_user_not_exists && is_last_storage), allow_no_password, allow_plaintext_password);
auto id = storage->authenticate(credentials, address, external_authenticators,
(throw_if_user_not_exists && is_last_storage),
allow_no_password, allow_plaintext_password);
if (id)
{
std::lock_guard lock{mutex};

View File

@ -28,8 +28,6 @@ namespace ErrorCodes
extern const int BAD_ARGUMENTS;
extern const int UNKNOWN_ADDRESS_PATTERN_TYPE;
extern const int NOT_IMPLEMENTED;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
}
namespace
@ -50,7 +48,7 @@ namespace
UUID generateID(const IAccessEntity & entity) { return generateID(entity.getType(), entity.getName()); }
UserPtr parseUser(const Poco::Util::AbstractConfiguration & config, const String & user_name)
UserPtr parseUser(const Poco::Util::AbstractConfiguration & config, const String & user_name, bool allow_no_password, bool allow_plaintext_password)
{
auto user = std::make_shared<User>();
user->setName(user_name);
@ -130,6 +128,15 @@ namespace
user->auth_data.setSSLCertificateCommonNames(std::move(common_names));
}
auto auth_type = user->auth_data.getType();
if (((auth_type == AuthenticationType::NO_PASSWORD) && !allow_no_password) ||
((auth_type == AuthenticationType::PLAINTEXT_PASSWORD) && !allow_plaintext_password))
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Authentication type {} is not allowed, check the setting allow_{} in the server configuration",
toString(auth_type), AuthenticationTypeInfo::get(auth_type).name);
}
const auto profile_name_config = user_config + ".profile";
if (config.has(profile_name_config))
{
@ -225,24 +232,18 @@ namespace
}
std::vector<AccessEntityPtr> parseUsers(const Poco::Util::AbstractConfiguration & config, Fn<bool()> auto && is_no_password_allowed_function, Fn<bool()> auto && is_plaintext_password_allowed_function)
std::vector<AccessEntityPtr> parseUsers(const Poco::Util::AbstractConfiguration & config, bool allow_no_password, bool allow_plaintext_password)
{
Poco::Util::AbstractConfiguration::Keys user_names;
config.keys("users", user_names);
std::vector<AccessEntityPtr> users;
users.reserve(user_names.size());
bool allow_plaintext_password = is_plaintext_password_allowed_function();
bool allow_no_password = is_no_password_allowed_function();
for (const auto & user_name : user_names)
{
try
{
String user_config = "users." + user_name;
if ((config.has(user_config + ".password") && !allow_plaintext_password) || (config.has(user_config + ".no_password") && !allow_no_password))
throw Exception("Incorrect User configuration. User is not allowed to configure PLAINTEXT_PASSWORD or NO_PASSWORD. Please configure User with authtype SHA256_PASSWORD_HASH, SHA256_PASSWORD, DOUBLE_SHA1_PASSWORD OR enable setting allow_plaintext_and_no_password in server configuration to configure user with plaintext and no password Auth_Type"
" Though it is not recommended to use plaintext_password and No_password for user authentication.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
users.push_back(parseUser(config, user_name));
users.push_back(parseUser(config, user_name, allow_no_password, allow_plaintext_password));
}
catch (Exception & e)
{
@ -562,8 +563,10 @@ void UsersConfigAccessStorage::parseFromConfig(const Poco::Util::AbstractConfigu
{
try
{
bool no_password_allowed = is_no_password_allowed_function();
bool plaintext_password_allowed = is_plaintext_password_allowed_function();
std::vector<std::pair<UUID, AccessEntityPtr>> all_entities;
for (const auto & entity : parseUsers(config,is_no_password_allowed_function, is_plaintext_password_allowed_function))
for (const auto & entity : parseUsers(config, no_password_allowed, plaintext_password_allowed))
all_entities.emplace_back(generateID(*entity), entity);
for (const auto & entity : parseQuotas(config))
all_entities.emplace_back(generateID(*entity), entity);

View File

@ -38,7 +38,8 @@ struct AggregateFunctionWithProperties
AggregateFunctionWithProperties(const AggregateFunctionWithProperties &) = default;
AggregateFunctionWithProperties & operator = (const AggregateFunctionWithProperties &) = default;
template <typename Creator, std::enable_if_t<!std::is_same_v<Creator, AggregateFunctionWithProperties>> * = nullptr>
template <typename Creator>
requires (!std::is_same_v<Creator, AggregateFunctionWithProperties>)
AggregateFunctionWithProperties(Creator creator_, AggregateFunctionProperties properties_ = {}) /// NOLINT
: creator(std::forward<Creator>(creator_)), properties(std::move(properties_))
{

View File

@ -569,6 +569,14 @@ if (ENABLE_TESTS)
clickhouse_common_zookeeper
string_utils)
if (TARGET ch_contrib::simdjson)
target_link_libraries(unit_tests_dbms PRIVATE ch_contrib::simdjson)
endif()
if(TARGET ch_contrib::rapidjson)
target_include_directories(unit_tests_dbms PRIVATE ch_contrib::rapidjson)
endif()
if (TARGET ch_contrib::yaml_cpp)
target_link_libraries(unit_tests_dbms PRIVATE ch_contrib::yaml_cpp)
endif()

View File

@ -1092,10 +1092,11 @@ void ClientBase::sendData(Block & sample, const ColumnsDescription & columns_des
try
{
auto metadata = storage->getInMemoryMetadataPtr();
sendDataFromPipe(
storage->read(
sample.getNames(),
storage->getInMemoryMetadataPtr(),
storage->getStorageSnapshot(metadata),
query_info,
global_context,
{},

View File

@ -297,7 +297,7 @@ ColumnPtr ColumnAggregateFunction::filter(const Filter & filter, ssize_t result_
{
size_t size = data.size();
if (size != filter.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filter.size(), size);
if (size == 0)
return cloneEmpty();

View File

@ -608,7 +608,7 @@ ColumnPtr ColumnArray::filterString(const Filter & filt, ssize_t result_size_hin
{
size_t col_size = getOffsets().size();
if (col_size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), col_size);
if (0 == col_size)
return ColumnArray::create(data);
@ -676,7 +676,7 @@ ColumnPtr ColumnArray::filterGeneric(const Filter & filt, ssize_t result_size_hi
{
size_t size = getOffsets().size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
if (size == 0)
return ColumnArray::create(data);
@ -1189,4 +1189,12 @@ void ColumnArray::gather(ColumnGathererStream & gatherer)
gatherer.gather(*this);
}
size_t ColumnArray::getNumberOfDimensions() const
{
const auto * nested_array = checkAndGetColumn<ColumnArray>(*data);
if (!nested_array)
return 1;
return 1 + nested_array->getNumberOfDimensions(); /// Every modern C++ compiler optimizes tail recursion.
}
}

View File

@ -169,6 +169,8 @@ public:
bool isCollationSupported() const override { return getData().isCollationSupported(); }
size_t getNumberOfDimensions() const;
private:
WrappedPtr data;
WrappedPtr offsets;

View File

@ -266,7 +266,7 @@ ColumnPtr ColumnDecimal<T>::filter(const IColumn::Filter & filt, ssize_t result_
{
size_t size = data.size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
auto res = this->create(0, scale);
Container & res_data = res->getData();

View File

@ -207,7 +207,7 @@ ColumnPtr ColumnFixedString::filter(const IColumn::Filter & filt, ssize_t result
{
size_t col_size = size();
if (col_size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), col_size);
auto res = ColumnFixedString::create(n);

View File

@ -144,15 +144,15 @@ public:
double getRatioOfDefaultRows(double sample_ratio) const override
{
return null_map->getRatioOfDefaultRows(sample_ratio);
return getRatioOfDefaultRowsImpl<ColumnNullable>(sample_ratio);
}
void getIndicesOfNonDefaultRows(Offsets & indices, size_t from, size_t limit) const override
{
null_map->getIndicesOfNonDefaultRows(indices, from, limit);
getIndicesOfNonDefaultRowsImpl<ColumnNullable>(indices, from, limit);
}
ColumnPtr createWithOffsets(const IColumn::Offsets & offsets, const Field & default_field, size_t total_rows, size_t shift) const override;
ColumnPtr createWithOffsets(const Offsets & offsets, const Field & default_field, size_t total_rows, size_t shift) const override;
bool isNullable() const override { return true; }
bool isFixedAndContiguous() const override { return false; }

View File

@ -0,0 +1,819 @@
#include <Core/Field.h>
#include <Columns/ColumnObject.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnSparse.h>
#include <DataTypes/ObjectUtils.h>
#include <DataTypes/getLeastSupertype.h>
#include <DataTypes/DataTypeNothing.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/NestedUtils.h>
#include <Interpreters/castColumn.h>
#include <Interpreters/convertFieldToType.h>
#include <Common/HashTable/HashSet.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int ILLEGAL_COLUMN;
extern const int DUPLICATE_COLUMN;
extern const int NUMBER_OF_DIMENSIONS_MISMATHED;
extern const int NOT_IMPLEMENTED;
extern const int SIZES_OF_COLUMNS_DOESNT_MATCH;
}
namespace
{
/// Recreates column with default scalar values and keeps sizes of arrays.
ColumnPtr recreateColumnWithDefaultValues(
const ColumnPtr & column, const DataTypePtr & scalar_type, size_t num_dimensions)
{
const auto * column_array = checkAndGetColumn<ColumnArray>(column.get());
if (column_array && num_dimensions)
{
return ColumnArray::create(
recreateColumnWithDefaultValues(
column_array->getDataPtr(), scalar_type, num_dimensions - 1),
IColumn::mutate(column_array->getOffsetsPtr()));
}
return createArrayOfType(scalar_type, num_dimensions)->createColumn()->cloneResized(column->size());
}
/// Replaces NULL fields to given field or empty array.
class FieldVisitorReplaceNull : public StaticVisitor<Field>
{
public:
explicit FieldVisitorReplaceNull(
const Field & replacement_, size_t num_dimensions_)
: replacement(replacement_)
, num_dimensions(num_dimensions_)
{
}
Field operator()(const Null &) const
{
return num_dimensions
? createEmptyArrayField(num_dimensions)
: replacement;
}
Field operator()(const Array & x) const
{
assert(num_dimensions > 0);
const size_t size = x.size();
Array res(size);
for (size_t i = 0; i < size; ++i)
res[i] = applyVisitor(FieldVisitorReplaceNull(replacement, num_dimensions - 1), x[i]);
return res;
}
template <typename T>
Field operator()(const T & x) const { return x; }
private:
const Field & replacement;
size_t num_dimensions;
};
/// Calculates number of dimensions in array field.
/// Returns 0 for scalar fields.
class FieldVisitorToNumberOfDimensions : public StaticVisitor<size_t>
{
public:
size_t operator()(const Array & x) const
{
const size_t size = x.size();
std::optional<size_t> dimensions;
for (size_t i = 0; i < size; ++i)
{
/// Do not count Nulls, because they will be replaced by default
/// values with proper number of dimensions.
if (x[i].isNull())
continue;
size_t current_dimensions = applyVisitor(*this, x[i]);
if (!dimensions)
dimensions = current_dimensions;
else if (current_dimensions != *dimensions)
throw Exception(ErrorCodes::NUMBER_OF_DIMENSIONS_MISMATHED,
"Number of dimensions mismatched among array elements");
}
return 1 + dimensions.value_or(0);
}
template <typename T>
size_t operator()(const T &) const { return 0; }
};
/// Visitor that allows to get type of scalar field
/// or least common type of scalars in array.
/// More optimized version of FieldToDataType.
class FieldVisitorToScalarType : public StaticVisitor<>
{
public:
using FieldType = Field::Types::Which;
void operator()(const Array & x)
{
size_t size = x.size();
for (size_t i = 0; i < size; ++i)
applyVisitor(*this, x[i]);
}
void operator()(const UInt64 & x)
{
field_types.insert(FieldType::UInt64);
if (x <= std::numeric_limits<UInt8>::max())
type_indexes.insert(TypeIndex::UInt8);
else if (x <= std::numeric_limits<UInt16>::max())
type_indexes.insert(TypeIndex::UInt16);
else if (x <= std::numeric_limits<UInt32>::max())
type_indexes.insert(TypeIndex::UInt32);
else
type_indexes.insert(TypeIndex::UInt64);
}
void operator()(const Int64 & x)
{
field_types.insert(FieldType::Int64);
if (x <= std::numeric_limits<Int8>::max() && x >= std::numeric_limits<Int8>::min())
type_indexes.insert(TypeIndex::Int8);
else if (x <= std::numeric_limits<Int16>::max() && x >= std::numeric_limits<Int16>::min())
type_indexes.insert(TypeIndex::Int16);
else if (x <= std::numeric_limits<Int32>::max() && x >= std::numeric_limits<Int32>::min())
type_indexes.insert(TypeIndex::Int32);
else
type_indexes.insert(TypeIndex::Int64);
}
void operator()(const Null &)
{
have_nulls = true;
}
template <typename T>
void operator()(const T &)
{
field_types.insert(Field::TypeToEnum<NearestFieldType<T>>::value);
type_indexes.insert(TypeToTypeIndex<NearestFieldType<T>>);
}
DataTypePtr getScalarType() const { return getLeastSupertype(type_indexes, true); }
bool haveNulls() const { return have_nulls; }
bool needConvertField() const { return field_types.size() > 1; }
private:
TypeIndexSet type_indexes;
std::unordered_set<FieldType> field_types;
bool have_nulls = false;
};
}
FieldInfo getFieldInfo(const Field & field)
{
FieldVisitorToScalarType to_scalar_type_visitor;
applyVisitor(to_scalar_type_visitor, field);
return
{
to_scalar_type_visitor.getScalarType(),
to_scalar_type_visitor.haveNulls(),
to_scalar_type_visitor.needConvertField(),
applyVisitor(FieldVisitorToNumberOfDimensions(), field),
};
}
ColumnObject::Subcolumn::Subcolumn(MutableColumnPtr && data_, bool is_nullable_)
: least_common_type(getDataTypeByColumn(*data_))
, is_nullable(is_nullable_)
{
data.push_back(std::move(data_));
}
ColumnObject::Subcolumn::Subcolumn(
size_t size_, bool is_nullable_)
: least_common_type(std::make_shared<DataTypeNothing>())
, is_nullable(is_nullable_)
, num_of_defaults_in_prefix(size_)
{
}
size_t ColumnObject::Subcolumn::Subcolumn::size() const
{
size_t res = num_of_defaults_in_prefix;
for (const auto & part : data)
res += part->size();
return res;
}
size_t ColumnObject::Subcolumn::Subcolumn::byteSize() const
{
size_t res = 0;
for (const auto & part : data)
res += part->byteSize();
return res;
}
size_t ColumnObject::Subcolumn::Subcolumn::allocatedBytes() const
{
size_t res = 0;
for (const auto & part : data)
res += part->allocatedBytes();
return res;
}
void ColumnObject::Subcolumn::checkTypes() const
{
DataTypes prefix_types;
prefix_types.reserve(data.size());
for (size_t i = 0; i < data.size(); ++i)
{
auto current_type = getDataTypeByColumn(*data[i]);
prefix_types.push_back(current_type);
auto prefix_common_type = getLeastSupertype(prefix_types);
if (!prefix_common_type->equals(*current_type))
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Data type {} of column at position {} cannot represent all columns from i-th prefix",
current_type->getName(), i);
}
}
void ColumnObject::Subcolumn::insert(Field field)
{
auto info = getFieldInfo(field);
insert(std::move(field), std::move(info));
}
void ColumnObject::Subcolumn::addNewColumnPart(DataTypePtr type)
{
auto serialization = type->getSerialization(ISerialization::Kind::SPARSE);
data.push_back(type->createColumn(*serialization));
least_common_type = LeastCommonType{std::move(type)};
}
static bool isConversionRequiredBetweenIntegers(const IDataType & lhs, const IDataType & rhs)
{
/// If both of types are signed/unsigned integers and size of left field type
/// is less than right type, we don't need to convert field,
/// because all integer fields are stored in Int64/UInt64.
WhichDataType which_lhs(lhs);
WhichDataType which_rhs(rhs);
bool is_native_int = which_lhs.isNativeInt() && which_rhs.isNativeInt();
bool is_native_uint = which_lhs.isNativeUInt() && which_rhs.isNativeUInt();
return (is_native_int || is_native_uint)
&& lhs.getSizeOfValueInMemory() <= rhs.getSizeOfValueInMemory();
}
void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
{
auto base_type = std::move(info.scalar_type);
if (isNothing(base_type) && info.num_dimensions == 0)
{
insertDefault();
return;
}
auto column_dim = least_common_type.getNumberOfDimensions();
auto value_dim = info.num_dimensions;
if (isNothing(least_common_type.get()))
column_dim = value_dim;
if (field.isNull())
value_dim = column_dim;
if (value_dim != column_dim)
throw Exception(ErrorCodes::NUMBER_OF_DIMENSIONS_MISMATHED,
"Dimension of types mismatched between inserted value and column. "
"Dimension of value: {}. Dimension of column: {}",
value_dim, column_dim);
if (is_nullable)
base_type = makeNullable(base_type);
if (!is_nullable && info.have_nulls)
field = applyVisitor(FieldVisitorReplaceNull(base_type->getDefault(), value_dim), std::move(field));
bool type_changed = false;
const auto & least_common_base_type = least_common_type.getBase();
if (data.empty())
{
addNewColumnPart(createArrayOfType(std::move(base_type), value_dim));
}
else if (!least_common_base_type->equals(*base_type) && !isNothing(base_type))
{
if (!isConversionRequiredBetweenIntegers(*base_type, *least_common_base_type))
{
base_type = getLeastSupertype(DataTypes{std::move(base_type), least_common_base_type}, true);
type_changed = true;
if (!least_common_base_type->equals(*base_type))
addNewColumnPart(createArrayOfType(std::move(base_type), value_dim));
}
}
if (type_changed || info.need_convert)
field = convertFieldToTypeOrThrow(field, *least_common_type.get());
data.back()->insert(field);
}
void ColumnObject::Subcolumn::insertRangeFrom(const Subcolumn & src, size_t start, size_t length)
{
assert(src.isFinalized());
const auto & src_column = src.data.back();
const auto & src_type = src.least_common_type.get();
if (data.empty())
{
addNewColumnPart(src.least_common_type.get());
data.back()->insertRangeFrom(*src_column, start, length);
}
else if (least_common_type.get()->equals(*src_type))
{
data.back()->insertRangeFrom(*src_column, start, length);
}
else
{
auto new_least_common_type = getLeastSupertype(DataTypes{least_common_type.get(), src_type}, true);
auto casted_column = castColumn({src_column, src_type, ""}, new_least_common_type);
if (!least_common_type.get()->equals(*new_least_common_type))
addNewColumnPart(std::move(new_least_common_type));
data.back()->insertRangeFrom(*casted_column, start, length);
}
}
bool ColumnObject::Subcolumn::isFinalized() const
{
return data.empty() ||
(data.size() == 1 && !data[0]->isSparse() && num_of_defaults_in_prefix == 0);
}
void ColumnObject::Subcolumn::finalize()
{
if (isFinalized())
return;
if (data.size() == 1 && num_of_defaults_in_prefix == 0)
{
data[0] = data[0]->convertToFullColumnIfSparse();
return;
}
const auto & to_type = least_common_type.get();
auto result_column = to_type->createColumn();
if (num_of_defaults_in_prefix)
result_column->insertManyDefaults(num_of_defaults_in_prefix);
for (auto & part : data)
{
part = part->convertToFullColumnIfSparse();
auto from_type = getDataTypeByColumn(*part);
size_t part_size = part->size();
if (!from_type->equals(*to_type))
{
auto offsets = ColumnUInt64::create();
auto & offsets_data = offsets->getData();
/// We need to convert only non-default values and then recreate column
/// with default value of new type, because default values (which represents misses in data)
/// may be inconsistent between types (e.g "0" in UInt64 and empty string in String).
part->getIndicesOfNonDefaultRows(offsets_data, 0, part_size);
if (offsets->size() == part_size)
{
part = castColumn({part, from_type, ""}, to_type);
}
else
{
auto values = part->index(*offsets, offsets->size());
values = castColumn({values, from_type, ""}, to_type);
part = values->createWithOffsets(offsets_data, to_type->getDefault(), part_size, /*shift=*/ 0);
}
}
result_column->insertRangeFrom(*part, 0, part_size);
}
data = { std::move(result_column) };
num_of_defaults_in_prefix = 0;
}
void ColumnObject::Subcolumn::insertDefault()
{
if (data.empty())
++num_of_defaults_in_prefix;
else
data.back()->insertDefault();
}
void ColumnObject::Subcolumn::insertManyDefaults(size_t length)
{
if (data.empty())
num_of_defaults_in_prefix += length;
else
data.back()->insertManyDefaults(length);
}
void ColumnObject::Subcolumn::popBack(size_t n)
{
assert(n <= size());
size_t num_removed = 0;
for (auto it = data.rbegin(); it != data.rend(); ++it)
{
if (n == 0)
break;
auto & column = *it;
if (n < column->size())
{
column->popBack(n);
n = 0;
}
else
{
++num_removed;
n -= column->size();
}
}
data.resize(data.size() - num_removed);
num_of_defaults_in_prefix -= n;
}
Field ColumnObject::Subcolumn::getLastField() const
{
if (data.empty())
return Field();
const auto & last_part = data.back();
assert(!last_part->empty());
return (*last_part)[last_part->size() - 1];
}
ColumnObject::Subcolumn ColumnObject::Subcolumn::recreateWithDefaultValues(const FieldInfo & field_info) const
{
auto scalar_type = field_info.scalar_type;
if (is_nullable)
scalar_type = makeNullable(scalar_type);
Subcolumn new_subcolumn;
new_subcolumn.least_common_type = LeastCommonType{createArrayOfType(scalar_type, field_info.num_dimensions)};
new_subcolumn.is_nullable = is_nullable;
new_subcolumn.num_of_defaults_in_prefix = num_of_defaults_in_prefix;
new_subcolumn.data.reserve(data.size());
for (const auto & part : data)
new_subcolumn.data.push_back(recreateColumnWithDefaultValues(
part, scalar_type, field_info.num_dimensions));
return new_subcolumn;
}
IColumn & ColumnObject::Subcolumn::getFinalizedColumn()
{
assert(isFinalized());
return *data[0];
}
const IColumn & ColumnObject::Subcolumn::getFinalizedColumn() const
{
assert(isFinalized());
return *data[0];
}
const ColumnPtr & ColumnObject::Subcolumn::getFinalizedColumnPtr() const
{
assert(isFinalized());
return data[0];
}
ColumnObject::Subcolumn::LeastCommonType::LeastCommonType(DataTypePtr type_)
: type(std::move(type_))
, base_type(getBaseTypeOfArray(type))
, num_dimensions(DB::getNumberOfDimensions(*type))
{
}
ColumnObject::ColumnObject(bool is_nullable_)
: is_nullable(is_nullable_)
, num_rows(0)
{
}
ColumnObject::ColumnObject(SubcolumnsTree && subcolumns_, bool is_nullable_)
: is_nullable(is_nullable_)
, subcolumns(std::move(subcolumns_))
, num_rows(subcolumns.empty() ? 0 : (*subcolumns.begin())->data.size())
{
checkConsistency();
}
void ColumnObject::checkConsistency() const
{
if (subcolumns.empty())
return;
for (const auto & leaf : subcolumns)
{
if (num_rows != leaf->data.size())
{
throw Exception(ErrorCodes::LOGICAL_ERROR, "Sizes of subcolumns are inconsistent in ColumnObject."
" Subcolumn '{}' has {} rows, but expected size is {}",
leaf->path.getPath(), leaf->data.size(), num_rows);
}
}
}
size_t ColumnObject::size() const
{
#ifndef NDEBUG
checkConsistency();
#endif
return num_rows;
}
MutableColumnPtr ColumnObject::cloneResized(size_t new_size) const
{
/// cloneResized with new_size == 0 is used for cloneEmpty().
if (new_size != 0)
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
"ColumnObject doesn't support resize to non-zero length");
return ColumnObject::create(is_nullable);
}
size_t ColumnObject::byteSize() const
{
size_t res = 0;
for (const auto & entry : subcolumns)
res += entry->data.byteSize();
return res;
}
size_t ColumnObject::allocatedBytes() const
{
size_t res = 0;
for (const auto & entry : subcolumns)
res += entry->data.allocatedBytes();
return res;
}
void ColumnObject::forEachSubcolumn(ColumnCallback callback)
{
if (!isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot iterate over non-finalized ColumnObject");
for (auto & entry : subcolumns)
callback(entry->data.data.back());
}
void ColumnObject::insert(const Field & field)
{
const auto & object = field.get<const Object &>();
HashSet<StringRef, StringRefHash> inserted;
size_t old_size = size();
for (const auto & [key_str, value] : object)
{
PathInData key(key_str);
inserted.insert(key_str);
if (!hasSubcolumn(key))
addSubcolumn(key, old_size);
auto & subcolumn = getSubcolumn(key);
subcolumn.insert(value);
}
for (auto & entry : subcolumns)
if (!inserted.has(entry->path.getPath()))
entry->data.insertDefault();
++num_rows;
}
void ColumnObject::insertDefault()
{
for (auto & entry : subcolumns)
entry->data.insertDefault();
++num_rows;
}
Field ColumnObject::operator[](size_t n) const
{
if (!isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get Field from non-finalized ColumnObject");
Object object;
for (const auto & entry : subcolumns)
object[entry->path.getPath()] = (*entry->data.data.back())[n];
return object;
}
void ColumnObject::get(size_t n, Field & res) const
{
if (!isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get Field from non-finalized ColumnObject");
auto & object = res.get<Object &>();
for (const auto & entry : subcolumns)
{
auto it = object.try_emplace(entry->path.getPath()).first;
entry->data.data.back()->get(n, it->second);
}
}
void ColumnObject::insertRangeFrom(const IColumn & src, size_t start, size_t length)
{
const auto & src_object = assert_cast<const ColumnObject &>(src);
for (auto & entry : subcolumns)
{
if (src_object.hasSubcolumn(entry->path))
entry->data.insertRangeFrom(src_object.getSubcolumn(entry->path), start, length);
else
entry->data.insertManyDefaults(length);
}
num_rows += length;
finalize();
}
ColumnPtr ColumnObject::replicate(const Offsets & offsets) const
{
if (!isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot replicate non-finalized ColumnObject");
auto res_column = ColumnObject::create(is_nullable);
for (const auto & entry : subcolumns)
{
auto replicated_data = entry->data.data.back()->replicate(offsets)->assumeMutable();
res_column->addSubcolumn(entry->path, std::move(replicated_data));
}
return res_column;
}
void ColumnObject::popBack(size_t length)
{
for (auto & entry : subcolumns)
entry->data.popBack(length);
num_rows -= length;
}
const ColumnObject::Subcolumn & ColumnObject::getSubcolumn(const PathInData & key) const
{
if (const auto * node = subcolumns.findLeaf(key))
return node->data;
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "There is no subcolumn {} in ColumnObject", key.getPath());
}
ColumnObject::Subcolumn & ColumnObject::getSubcolumn(const PathInData & key)
{
if (const auto * node = subcolumns.findLeaf(key))
return const_cast<SubcolumnsTree::Node *>(node)->data;
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "There is no subcolumn {} in ColumnObject", key.getPath());
}
bool ColumnObject::hasSubcolumn(const PathInData & key) const
{
return subcolumns.findLeaf(key) != nullptr;
}
void ColumnObject::addSubcolumn(const PathInData & key, MutableColumnPtr && subcolumn)
{
size_t new_size = subcolumn->size();
bool inserted = subcolumns.add(key, Subcolumn(std::move(subcolumn), is_nullable));
if (!inserted)
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
if (num_rows == 0)
num_rows = new_size;
else if (new_size != num_rows)
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH,
"Size of subcolumn {} ({}) is inconsistent with column size ({})",
key.getPath(), new_size, num_rows);
}
void ColumnObject::addSubcolumn(const PathInData & key, size_t new_size)
{
bool inserted = subcolumns.add(key, Subcolumn(new_size, is_nullable));
if (!inserted)
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
if (num_rows == 0)
num_rows = new_size;
else if (new_size != num_rows)
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH,
"Required size of subcolumn {} ({}) is inconsistent with column size ({})",
key.getPath(), new_size, num_rows);
}
void ColumnObject::addNestedSubcolumn(const PathInData & key, const FieldInfo & field_info, size_t new_size)
{
if (!key.hasNested())
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Cannot add Nested subcolumn, because path doesn't contain Nested");
bool inserted = false;
/// We find node that represents the same Nested type as @key.
const auto * nested_node = subcolumns.findBestMatch(key);
if (nested_node)
{
/// Find any leaf of Nested subcolumn.
const auto * leaf = subcolumns.findLeaf(nested_node, [&](const auto &) { return true; });
assert(leaf);
/// Recreate subcolumn with default values and the same sizes of arrays.
auto new_subcolumn = leaf->data.recreateWithDefaultValues(field_info);
/// It's possible that we have already inserted value from current row
/// to this subcolumn. So, adjust size to expected.
if (new_subcolumn.size() > new_size)
new_subcolumn.popBack(new_subcolumn.size() - new_size);
assert(new_subcolumn.size() == new_size);
inserted = subcolumns.add(key, new_subcolumn);
}
else
{
/// If node was not found just add subcolumn with empty arrays.
inserted = subcolumns.add(key, Subcolumn(new_size, is_nullable));
}
if (!inserted)
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
if (num_rows == 0)
num_rows = new_size;
}
PathsInData ColumnObject::getKeys() const
{
PathsInData keys;
keys.reserve(subcolumns.size());
for (const auto & entry : subcolumns)
keys.emplace_back(entry->path);
return keys;
}
bool ColumnObject::isFinalized() const
{
return std::all_of(subcolumns.begin(), subcolumns.end(),
[](const auto & entry) { return entry->data.isFinalized(); });
}
void ColumnObject::finalize()
{
size_t old_size = size();
SubcolumnsTree new_subcolumns;
for (auto && entry : subcolumns)
{
const auto & least_common_type = entry->data.getLeastCommonType();
/// Do not add subcolumns, which consists only from NULLs.
if (isNothing(getBaseTypeOfArray(least_common_type)))
continue;
entry->data.finalize();
new_subcolumns.add(entry->path, entry->data);
}
/// If all subcolumns were skipped add a dummy subcolumn,
/// because Tuple type must have at least one element.
if (new_subcolumns.empty())
new_subcolumns.add(PathInData{COLUMN_NAME_DUMMY}, Subcolumn{ColumnUInt8::create(old_size, 0), is_nullable});
std::swap(subcolumns, new_subcolumns);
checkObjectHasNoAmbiguosPaths(getKeys());
}
}

237
src/Columns/ColumnObject.h Normal file
View File

@ -0,0 +1,237 @@
#pragma once
#include <Core/Field.h>
#include <Core/Names.h>
#include <Columns/IColumn.h>
#include <Common/PODArray.h>
#include <Common/HashTable/HashMap.h>
#include <DataTypes/Serializations/JSONDataParser.h>
#include <DataTypes/Serializations/SubcolumnsTree.h>
#include <DataTypes/IDataType.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
/// Info that represents a scalar or array field in a decomposed view.
/// It allows to recreate field with different number
/// of dimensions or nullability.
struct FieldInfo
{
/// The common type of of all scalars in field.
DataTypePtr scalar_type;
/// Do we have NULL scalar in field.
bool have_nulls;
/// If true then we have scalars with different types in array and
/// we need to convert scalars to the common type.
bool need_convert;
/// Number of dimension in array. 0 if field is scalar.
size_t num_dimensions;
};
FieldInfo getFieldInfo(const Field & field);
/** A column that represents object with dynamic set of subcolumns.
* Subcolumns are identified by paths in document and are stored in
* a trie-like structure. ColumnObject is not suitable for writing into tables
* and it should be converted to Tuple with fixed set of subcolumns before that.
*/
class ColumnObject final : public COWHelper<IColumn, ColumnObject>
{
public:
/** Class that represents one subcolumn.
* It stores values in several parts of column
* and keeps current common type of all parts.
* We add a new column part with a new type, when we insert a field,
* which can't be converted to the current common type.
* After insertion of all values subcolumn should be finalized
* for writing and other operations.
*/
class Subcolumn
{
public:
Subcolumn() = default;
Subcolumn(size_t size_, bool is_nullable_);
Subcolumn(MutableColumnPtr && data_, bool is_nullable_);
size_t size() const;
size_t byteSize() const;
size_t allocatedBytes() const;
bool isFinalized() const;
const DataTypePtr & getLeastCommonType() const { return least_common_type.get(); }
/// Checks the consistency of column's parts stored in @data.
void checkTypes() const;
/// Inserts a field, which scalars can be arbitrary, but number of
/// dimensions should be consistent with current common type.
void insert(Field field);
void insert(Field field, FieldInfo info);
void insertDefault();
void insertManyDefaults(size_t length);
void insertRangeFrom(const Subcolumn & src, size_t start, size_t length);
void popBack(size_t n);
/// Converts all column's parts to the common type and
/// creates a single column that stores all values.
void finalize();
/// Returns last inserted field.
Field getLastField() const;
/// Recreates subcolumn with default scalar values and keeps sizes of arrays.
/// Used to create columns of type Nested with consistent array sizes.
Subcolumn recreateWithDefaultValues(const FieldInfo & field_info) const;
/// Returns single column if subcolumn in finalizes.
/// Otherwise -- undefined behaviour.
IColumn & getFinalizedColumn();
const IColumn & getFinalizedColumn() const;
const ColumnPtr & getFinalizedColumnPtr() const;
friend class ColumnObject;
private:
class LeastCommonType
{
public:
LeastCommonType() = default;
explicit LeastCommonType(DataTypePtr type_);
const DataTypePtr & get() const { return type; }
const DataTypePtr & getBase() const { return base_type; }
size_t getNumberOfDimensions() const { return num_dimensions; }
private:
DataTypePtr type;
DataTypePtr base_type;
size_t num_dimensions = 0;
};
void addNewColumnPart(DataTypePtr type);
/// Current least common type of all values inserted to this subcolumn.
LeastCommonType least_common_type;
/// If true then common type type of subcolumn is Nullable
/// and default values are NULLs.
bool is_nullable = false;
/// Parts of column. Parts should be in increasing order in terms of subtypes/supertypes.
/// That means that the least common type for i-th prefix is the type of i-th part
/// and it's the supertype for all type of column from 0 to i-1.
std::vector<WrappedPtr> data;
/// Until we insert any non-default field we don't know further
/// least common type and we count number of defaults in prefix,
/// which will be converted to the default type of final common type.
size_t num_of_defaults_in_prefix = 0;
};
using SubcolumnsTree = SubcolumnsTree<Subcolumn>;
private:
/// If true then all subcolumns are nullable.
const bool is_nullable;
SubcolumnsTree subcolumns;
size_t num_rows;
public:
static constexpr auto COLUMN_NAME_DUMMY = "_dummy";
explicit ColumnObject(bool is_nullable_);
ColumnObject(SubcolumnsTree && subcolumns_, bool is_nullable_);
/// Checks that all subcolumns have consistent sizes.
void checkConsistency() const;
bool hasSubcolumn(const PathInData & key) const;
const Subcolumn & getSubcolumn(const PathInData & key) const;
Subcolumn & getSubcolumn(const PathInData & key);
void incrementNumRows() { ++num_rows; }
/// Adds a subcolumn from existing IColumn.
void addSubcolumn(const PathInData & key, MutableColumnPtr && subcolumn);
/// Adds a subcolumn of specific size with default values.
void addSubcolumn(const PathInData & key, size_t new_size);
/// Adds a subcolumn of type Nested of specific size with default values.
/// It cares about consistency of sizes of Nested arrays.
void addNestedSubcolumn(const PathInData & key, const FieldInfo & field_info, size_t new_size);
const SubcolumnsTree & getSubcolumns() const { return subcolumns; }
SubcolumnsTree & getSubcolumns() { return subcolumns; }
PathsInData getKeys() const;
/// Finalizes all subcolumns.
void finalize();
bool isFinalized() const;
/// Part of interface
const char * getFamilyName() const override { return "Object"; }
TypeIndex getDataType() const override { return TypeIndex::Object; }
size_t size() const override;
MutableColumnPtr cloneResized(size_t new_size) const override;
size_t byteSize() const override;
size_t allocatedBytes() const override;
void forEachSubcolumn(ColumnCallback callback) override;
void insert(const Field & field) override;
void insertDefault() override;
void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;
ColumnPtr replicate(const Offsets & offsets) const override;
void popBack(size_t length) override;
Field operator[](size_t n) const override;
void get(size_t n, Field & res) const override;
/// All other methods throw exception.
ColumnPtr decompress() const override { throwMustBeConcrete(); }
StringRef getDataAt(size_t) const override { throwMustBeConcrete(); }
bool isDefaultAt(size_t) const override { throwMustBeConcrete(); }
void insertData(const char *, size_t) override { throwMustBeConcrete(); }
StringRef serializeValueIntoArena(size_t, Arena &, char const *&) const override { throwMustBeConcrete(); }
const char * deserializeAndInsertFromArena(const char *) override { throwMustBeConcrete(); }
const char * skipSerializedInArena(const char *) const override { throwMustBeConcrete(); }
void updateHashWithValue(size_t, SipHash &) const override { throwMustBeConcrete(); }
void updateWeakHash32(WeakHash32 &) const override { throwMustBeConcrete(); }
void updateHashFast(SipHash &) const override { throwMustBeConcrete(); }
ColumnPtr filter(const Filter &, ssize_t) const override { throwMustBeConcrete(); }
void expand(const Filter &, bool) override { throwMustBeConcrete(); }
ColumnPtr permute(const Permutation &, size_t) const override { throwMustBeConcrete(); }
ColumnPtr index(const IColumn &, size_t) const override { throwMustBeConcrete(); }
int compareAt(size_t, size_t, const IColumn &, int) const override { throwMustBeConcrete(); }
void compareColumn(const IColumn &, size_t, PaddedPODArray<UInt64> *, PaddedPODArray<Int8> &, int, int) const override { throwMustBeConcrete(); }
bool hasEqualValues() const override { throwMustBeConcrete(); }
void getPermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &) const override { throwMustBeConcrete(); }
void updatePermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &, EqualRanges &) const override { throwMustBeConcrete(); }
MutableColumns scatter(ColumnIndex, const Selector &) const override { throwMustBeConcrete(); }
void gather(ColumnGathererStream &) override { throwMustBeConcrete(); }
void getExtremes(Field &, Field &) const override { throwMustBeConcrete(); }
size_t byteSizeAt(size_t) const override { throwMustBeConcrete(); }
double getRatioOfDefaultRows(double) const override { throwMustBeConcrete(); }
void getIndicesOfNonDefaultRows(Offsets &, size_t, size_t) const override { throwMustBeConcrete(); }
private:
[[noreturn]] static void throwMustBeConcrete()
{
throw Exception("ColumnObject must be converted to ColumnTuple before use", ErrorCodes::LOGICAL_ERROR);
}
};
}

View File

@ -288,7 +288,7 @@ void ColumnSparse::popBack(size_t n)
ColumnPtr ColumnSparse::filter(const Filter & filt, ssize_t) const
{
if (_size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), _size);
if (offsets->empty())
{

View File

@ -381,7 +381,7 @@ ColumnPtr ColumnVector<T>::filter(const IColumn::Filter & filt, ssize_t result_s
{
size_t size = data.size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
auto res = this->create();
Container & res_data = res->getData();
@ -450,7 +450,7 @@ void ColumnVector<T>::applyZeroMap(const IColumn::Filter & filt, bool inverted)
{
size_t size = data.size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
const UInt8 * filt_pos = filt.data();
const UInt8 * filt_end = filt_pos + size;

View File

@ -192,7 +192,7 @@ namespace
{
const size_t size = src_offsets.size();
if (size != filt.size())
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
ResultOffsetsBuilder result_offsets_builder(res_offsets);

View File

@ -883,8 +883,8 @@ public:
return toDayNum(years_lut[year - DATE_LUT_MIN_YEAR]);
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto toStartOfQuarterInterval(Date d, UInt64 quarters) const
{
if (quarters == 1)
@ -892,8 +892,8 @@ public:
return toStartOfMonthInterval(d, quarters * 3);
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto toStartOfMonthInterval(Date d, UInt64 months) const
{
if (months == 1)
@ -906,8 +906,8 @@ public:
return toDayNum(years_months_lut[month_total_index / months * months]);
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto toStartOfWeekInterval(Date d, UInt64 weeks) const
{
if (weeks == 1)
@ -920,8 +920,8 @@ public:
return ExtendedDayNum(4 + (d - 4) / days * days);
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline Time toStartOfDayInterval(Date d, UInt64 days) const
{
if (days == 1)
@ -1219,10 +1219,8 @@ public:
/// If resulting month has less deys than source month, then saturation can happen.
/// Example: 31 Aug + 1 month = 30 Sep.
template <
typename DateTime,
typename
= std::enable_if_t<std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>>>
template <typename DateTime>
requires std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>
inline Time NO_SANITIZE_UNDEFINED addMonths(DateTime t, Int64 delta) const
{
const auto result_day = addMonthsIndex(t, delta);
@ -1247,8 +1245,8 @@ public:
return res;
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto NO_SANITIZE_UNDEFINED addMonths(Date d, Int64 delta) const
{
if constexpr (std::is_same_v<Date, DayNum>)
@ -1280,10 +1278,8 @@ public:
}
/// Saturation can occur if 29 Feb is mapped to non-leap year.
template <
typename DateTime,
typename
= std::enable_if_t<std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>>>
template <typename DateTime>
requires std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>
inline Time addYears(DateTime t, Int64 delta) const
{
auto result_day = addYearsIndex(t, delta);
@ -1308,8 +1304,8 @@ public:
return res;
}
template <typename Date,
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
template <typename Date>
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
inline auto addYears(Date d, Int64 delta) const
{
if constexpr (std::is_same_v<Date, DayNum>)

View File

@ -613,6 +613,7 @@
M(642, CANNOT_PACK_ARCHIVE) \
M(643, CANNOT_UNPACK_ARCHIVE) \
M(644, REMOTE_FS_OBJECT_CACHE_ERROR) \
M(645, NUMBER_OF_DIMENSIONS_MISMATHED) \
\
M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \

View File

@ -205,7 +205,8 @@ void rethrowFirstException(const Exceptions & exceptions);
template <typename T>
std::enable_if_t<std::is_pointer_v<T>, T> exception_cast(std::exception_ptr e)
requires std::is_pointer_v<T>
T exception_cast(std::exception_ptr e)
{
try
{

View File

@ -46,6 +46,11 @@ public:
throw Exception("Cannot convert Map to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
}
T operator() (const Object &) const
{
throw Exception("Cannot convert Object to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
}
T operator() (const UInt64 & x) const { return T(x); }
T operator() (const Int64 & x) const { return T(x); }
T operator() (const Int128 & x) const { return T(x); }
@ -113,7 +118,8 @@ public:
throw Exception("Cannot convert AggregateFunctionStateData to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
}
template <typename U, typename = std::enable_if_t<is_big_int_v<U>> >
template <typename U>
requires is_big_int_v<U>
T operator() (const U & x) const
{
if constexpr (is_decimal<T>)

View File

@ -95,6 +95,23 @@ String FieldVisitorDump::operator() (const Map & x) const
return wb.str();
}
String FieldVisitorDump::operator() (const Object & x) const
{
WriteBufferFromOwnString wb;
wb << "Object_(";
for (auto it = x.begin(); it != x.end(); ++it)
{
if (it != x.begin())
wb << ", ";
wb << "(" << it->first << ", " << applyVisitor(*this, it->second) << ")";
}
wb << ')';
return wb.str();
}
String FieldVisitorDump::operator() (const AggregateFunctionStateData & x) const
{
WriteBufferFromOwnString wb;

View File

@ -22,6 +22,7 @@ public:
String operator() (const Array & x) const;
String operator() (const Tuple & x) const;
String operator() (const Map & x) const;
String operator() (const Object & x) const;
String operator() (const DecimalField<Decimal32> & x) const;
String operator() (const DecimalField<Decimal64> & x) const;
String operator() (const DecimalField<Decimal128> & x) const;

View File

@ -94,6 +94,19 @@ void FieldVisitorHash::operator() (const Array & x) const
applyVisitor(*this, elem);
}
void FieldVisitorHash::operator() (const Object & x) const
{
UInt8 type = Field::Types::Object;
hash.update(type);
hash.update(x.size());
for (const auto & [key, value]: x)
{
hash.update(key);
applyVisitor(*this, value);
}
}
void FieldVisitorHash::operator() (const DecimalField<Decimal32> & x) const
{
UInt8 type = Field::Types::Decimal32;

View File

@ -28,6 +28,7 @@ public:
void operator() (const Array & x) const;
void operator() (const Tuple & x) const;
void operator() (const Map & x) const;
void operator() (const Object & x) const;
void operator() (const DecimalField<Decimal32> & x) const;
void operator() (const DecimalField<Decimal64> & x) const;
void operator() (const DecimalField<Decimal128> & x) const;

View File

@ -26,6 +26,7 @@ bool FieldVisitorSum::operator() (String &) const { throw Exception("Cannot sum
bool FieldVisitorSum::operator() (Array &) const { throw Exception("Cannot sum Arrays", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (Tuple &) const { throw Exception("Cannot sum Tuples", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (Map &) const { throw Exception("Cannot sum Maps", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (Object &) const { throw Exception("Cannot sum Objects", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (UUID &) const { throw Exception("Cannot sum UUIDs", ErrorCodes::LOGICAL_ERROR); }
bool FieldVisitorSum::operator() (AggregateFunctionStateData &) const

View File

@ -25,6 +25,7 @@ public:
bool operator() (Array &) const;
bool operator() (Tuple &) const;
bool operator() (Map &) const;
bool operator() (Object &) const;
bool operator() (UUID &) const;
bool operator() (AggregateFunctionStateData &) const;
bool operator() (bool &) const;
@ -36,7 +37,8 @@ public:
return x.getValue() != T(0);
}
template <typename T, typename = std::enable_if_t<is_big_int_v<T>> >
template <typename T>
requires is_big_int_v<T>
bool operator() (T & x) const
{
x += rhs.reinterpret<T>();

View File

@ -126,5 +126,24 @@ String FieldVisitorToString::operator() (const Map & x) const
return wb.str();
}
String FieldVisitorToString::operator() (const Object & x) const
{
WriteBufferFromOwnString wb;
wb << '{';
for (auto it = x.begin(); it != x.end(); ++it)
{
if (it != x.begin())
wb << ", ";
writeDoubleQuoted(it->first, wb);
wb << ": " << applyVisitor(*this, it->second);
}
wb << '}';
return wb.str();
}
}

View File

@ -22,6 +22,7 @@ public:
String operator() (const Array & x) const;
String operator() (const Tuple & x) const;
String operator() (const Map & x) const;
String operator() (const Object & x) const;
String operator() (const DecimalField<Decimal32> & x) const;
String operator() (const DecimalField<Decimal64> & x) const;
String operator() (const DecimalField<Decimal128> & x) const;

View File

@ -66,6 +66,20 @@ void FieldVisitorWriteBinary::operator() (const Map & x, WriteBuffer & buf) cons
}
}
void FieldVisitorWriteBinary::operator() (const Object & x, WriteBuffer & buf) const
{
const size_t size = x.size();
writeBinary(size, buf);
for (const auto & [key, value] : x)
{
const UInt8 type = value.getType();
writeBinary(type, buf);
writeBinary(key, buf);
Field::dispatch([&buf] (const auto & val) { FieldVisitorWriteBinary()(val, buf); }, value);
}
}
void FieldVisitorWriteBinary::operator()(const bool & x, WriteBuffer & buf) const
{
writeBinary(UInt8(x), buf);

View File

@ -21,6 +21,7 @@ public:
void operator() (const Array & x, WriteBuffer & buf) const;
void operator() (const Tuple & x, WriteBuffer & buf) const;
void operator() (const Map & x, WriteBuffer & buf) const;
void operator() (const Object & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal32> & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal64> & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal128> & x, WriteBuffer & buf) const;

View File

@ -194,7 +194,7 @@ void FileSegment::write(const char * from, size_t size)
{
std::lock_guard segment_lock(mutex);
LOG_ERROR(log, "Failed to write to cache. File segment info: {}", getInfoForLog());
LOG_ERROR(log, "Failed to write to cache. File segment info: {}", getInfoForLogImpl(segment_lock));
download_state = State::PARTIALLY_DOWNLOADED_NO_CONTINUATION;
@ -405,7 +405,11 @@ void FileSegment::completeImpl(bool allow_non_strict_checking)
String FileSegment::getInfoForLog() const
{
std::lock_guard segment_lock(mutex);
return getInfoForLogImpl(segment_lock);
}
String FileSegment::getInfoForLogImpl(std::lock_guard<std::mutex> & segment_lock) const
{
WriteBufferFromOwnString info;
info << "File segment: " << range().toString() << ", ";
info << "state: " << download_state << ", ";

View File

@ -130,6 +130,7 @@ private:
static String getCallerIdImpl(bool allow_non_strict_checking = false);
void resetDownloaderImpl(std::lock_guard<std::mutex> & segment_lock);
size_t getDownloadedSize(std::lock_guard<std::mutex> & segment_lock) const;
String getInfoForLogImpl(std::lock_guard<std::mutex> & segment_lock) const;
const Range segment_range;

View File

@ -1,5 +1,11 @@
#pragma once
#include <base/StringRef.h>
#include <base/logger_useful.h>
#include <string_view>
#include <unordered_map>
#include <Common/Arena.h>
#include <Common/getResource.h>
#include <Common/HashTable/HashMap.h>
@ -10,11 +16,6 @@
#include <IO/readFloatText.h>
#include <IO/ZstdInflatingReadBuffer.h>
#include <base/StringRef.h>
#include <base/logger_useful.h>
#include <string_view>
#include <unordered_map>
namespace DB
{
@ -34,7 +35,6 @@ namespace ErrorCodes
class FrequencyHolder
{
public:
struct Language
{
@ -52,6 +52,7 @@ public:
public:
using Map = HashMap<StringRef, Float64>;
using Container = std::vector<Language>;
using EncodingMap = HashMap<UInt16, Float64>;
using EncodingContainer = std::vector<Encoding>;
@ -61,6 +62,30 @@ public:
return instance;
}
const Map & getEmotionalDict() const
{
return emotional_dict;
}
const EncodingContainer & getEncodingsFrequency() const
{
return encodings_freq;
}
const Container & getProgrammingFrequency() const
{
return programming_freq;
}
private:
FrequencyHolder()
{
loadEmotionalDict();
loadEncodingsFrequency();
loadProgrammingFrequency();
}
void loadEncodingsFrequency()
{
Poco::Logger * log = &Poco::Logger::get("EncodingsFrequency");
@ -119,7 +144,6 @@ public:
LOG_TRACE(log, "Charset frequencies was added, charsets count: {}", encodings_freq.size());
}
void loadEmotionalDict()
{
Poco::Logger * log = &Poco::Logger::get("EmotionalDict");
@ -158,7 +182,6 @@ public:
LOG_TRACE(log, "Emotional dictionary was added. Word count: {}", std::to_string(count));
}
void loadProgrammingFrequency()
{
Poco::Logger * log = &Poco::Logger::get("ProgrammingFrequency");
@ -211,42 +234,10 @@ public:
LOG_TRACE(log, "Programming languages frequencies was added");
}
const Map & getEmotionalDict()
{
std::lock_guard lock(mutex);
if (emotional_dict.empty())
loadEmotionalDict();
return emotional_dict;
}
const EncodingContainer & getEncodingsFrequency()
{
std::lock_guard lock(mutex);
if (encodings_freq.empty())
loadEncodingsFrequency();
return encodings_freq;
}
const Container & getProgrammingFrequency()
{
std::lock_guard lock(mutex);
if (programming_freq.empty())
loadProgrammingFrequency();
return programming_freq;
}
private:
Arena string_pool;
Map emotional_dict;
Container programming_freq;
EncodingContainer encodings_freq;
std::mutex mutex;
};
}

View File

@ -130,6 +130,7 @@ public:
IntervalTree() { nodes.resize(1); }
template <typename TValue = Value, std::enable_if_t<std::is_same_v<TValue, IntervalTreeVoidValue>, bool> = true>
requires std::is_same_v<Value, IntervalTreeVoidValue>
ALWAYS_INLINE bool emplace(Interval interval)
{
assert(!tree_is_built);

View File

@ -76,7 +76,8 @@ public:
void add(const char * value) { add(std::make_unique<JSONString>(value)); }
void add(bool value) { add(std::make_unique<JSONBool>(std::move(value))); }
template <typename T, std::enable_if_t<std::is_arithmetic_v<T>, bool> = true>
template <typename T>
requires std::is_arithmetic_v<T>
void add(T value) { add(std::make_unique<JSONNumber<T>>(value)); }
void format(const FormatSettings & settings, FormatContext & context) override;
@ -100,7 +101,8 @@ public:
void add(std::string key, std::string_view value) { add(std::move(key), std::make_unique<JSONString>(value)); }
void add(std::string key, bool value) { add(std::move(key), std::make_unique<JSONBool>(std::move(value))); }
template <typename T, std::enable_if_t<std::is_arithmetic_v<T>, bool> = true>
template <typename T>
requires std::is_arithmetic_v<T>
void add(std::string key, T value) { add(std::move(key), std::make_unique<JSONNumber<T>>(value)); }
void format(const FormatSettings & settings, FormatContext & context) override;

View File

@ -82,7 +82,8 @@ private:
#endif
public:
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
StringSearcher(const CharT * needle_, const size_t needle_size_)
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_size{needle_size_}
{
@ -191,7 +192,8 @@ public:
#endif
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compareTrivial(const CharT * haystack_pos, const CharT * const haystack_end, const uint8_t * needle_pos) const
{
while (haystack_pos < haystack_end && needle_pos < needle_end)
@ -217,7 +219,8 @@ public:
return needle_pos == needle_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * haystack_end, const CharT * pos) const
{
@ -262,7 +265,8 @@ public:
/** Returns haystack_end if not found.
*/
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
if (0 == needle_size)
@ -338,7 +342,8 @@ public:
return haystack_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
@ -367,7 +372,8 @@ private:
#endif
public:
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
StringSearcher(const CharT * needle_, const size_t needle_size)
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_end{needle + needle_size}
{
@ -399,7 +405,8 @@ public:
#endif
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * /*haystack_end*/, const CharT * pos) const
{
#ifdef __SSE4_1__
@ -453,7 +460,8 @@ public:
return false;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
if (needle == needle_end)
@ -540,7 +548,8 @@ public:
return haystack_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
@ -568,7 +577,8 @@ private:
#endif
public:
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
StringSearcher(const CharT * needle_, const size_t needle_size)
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_end{needle + needle_size}
{
@ -596,7 +606,8 @@ public:
#endif
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * /*haystack_end*/, const CharT * pos) const
{
#ifdef __SSE4_1__
@ -642,7 +653,8 @@ public:
return false;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
if (needle == needle_end)
@ -722,7 +734,8 @@ public:
return haystack_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
@ -740,7 +753,8 @@ class TokenSearcher : public StringSearcherBase
size_t needle_size;
public:
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
TokenSearcher(const CharT * needle_, const size_t needle_size_)
: searcher{needle_, needle_size_},
needle_size(needle_size_)
@ -752,7 +766,8 @@ public:
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool compare(const CharT * haystack, const CharT * haystack_end, const CharT * pos) const
{
// use searcher only if pos is in the beginning of token and pos + searcher.needle_size is end of token.
@ -762,7 +777,8 @@ public:
return false;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
// use searcher.search(), then verify that returned value is a token
@ -781,13 +797,15 @@ public:
return haystack_end;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
ALWAYS_INLINE bool isToken(const CharT * haystack, const CharT * const haystack_end, const CharT* p) const
{
return (p == haystack || isTokenSeparator(*(p - 1)))
@ -819,11 +837,13 @@ struct LibCASCIICaseSensitiveStringSearcher : public StringSearcherBase
{
const char * const needle;
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
LibCASCIICaseSensitiveStringSearcher(const CharT * const needle_, const size_t /* needle_size */)
: needle(reinterpret_cast<const char *>(needle_)) {}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
const auto * res = strstr(reinterpret_cast<const char *>(haystack), reinterpret_cast<const char *>(needle));
@ -832,7 +852,8 @@ struct LibCASCIICaseSensitiveStringSearcher : public StringSearcherBase
return reinterpret_cast<const CharT *>(res);
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);
@ -843,11 +864,13 @@ struct LibCASCIICaseInsensitiveStringSearcher : public StringSearcherBase
{
const char * const needle;
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
LibCASCIICaseInsensitiveStringSearcher(const CharT * const needle_, const size_t /* needle_size */)
: needle(reinterpret_cast<const char *>(needle_)) {}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
{
const auto * res = strcasestr(reinterpret_cast<const char *>(haystack), reinterpret_cast<const char *>(needle));
@ -856,7 +879,8 @@ struct LibCASCIICaseInsensitiveStringSearcher : public StringSearcherBase
return reinterpret_cast<const CharT *>(res);
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
const CharT * search(const CharT * haystack, const size_t haystack_size) const
{
return search(haystack, haystack + haystack_size);

View File

@ -9,7 +9,6 @@
#include <filesystem>
#include <fstream>
#include <optional>
#include <sstream>
#include <unordered_set>
#include <fcntl.h>
@ -21,6 +20,8 @@
#include <sys/types.h>
#include <dirent.h>
#include <boost/algorithm/string/split.hpp>
#include <base/errnoToString.h>
@ -247,9 +248,9 @@ static_assert(sizeof(raw_events_info) / sizeof(raw_events_info[0]) == NUMBER_OF_
#undef CACHE_EVENT
// A map of event name -> event index, to parse event list in settings.
static std::unordered_map<std::string, size_t> populateEventMap()
static std::unordered_map<std::string_view, size_t> populateEventMap()
{
std::unordered_map<std::string, size_t> name_to_index;
std::unordered_map<std::string_view, size_t> name_to_index;
name_to_index.reserve(NUMBER_OF_RAW_EVENTS);
for (size_t i = 0; i < NUMBER_OF_RAW_EVENTS; ++i)
@ -455,10 +456,10 @@ std::vector<size_t> PerfEventsCounters::eventIndicesFromString(const std::string
return result;
}
std::vector<std::string> event_names;
boost::split(event_names, events_list, [](char c) { return c == ','; });
std::istringstream iss(events_list); // STYLE_CHECK_ALLOW_STD_STRING_STREAM
std::string event_name;
while (std::getline(iss, event_name, ','))
for (auto & event_name : event_names)
{
// Allow spaces at the beginning of the token, so that you can write 'a, b'.
event_name.erase(0, event_name.find_first_not_of(' '));

View File

@ -75,7 +75,8 @@ inline size_t countCodePoints(const UInt8 * data, size_t size)
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
size_t convertCodePointToUTF8(int code_point, CharT * out_bytes, size_t out_length)
{
static const Poco::UTF8Encoding utf8;
@ -84,7 +85,8 @@ size_t convertCodePointToUTF8(int code_point, CharT * out_bytes, size_t out_leng
return res;
}
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
template <typename CharT>
requires (sizeof(CharT) == 1)
std::optional<uint32_t> convertUTF8ToCodePoint(const CharT * in_bytes, size_t in_length)
{
static const Poco::UTF8Encoding utf8;

View File

@ -13,6 +13,9 @@
#cmakedefine01 USE_CASSANDRA
#cmakedefine01 USE_SENTRY
#cmakedefine01 USE_GRPC
#cmakedefine01 USE_SIMDJSON
#cmakedefine01 USE_RAPIDJSON
#cmakedefine01 USE_DATASKETCHES
#cmakedefine01 USE_YAML_CPP
#cmakedefine01 CLICKHOUSE_SPLIT_BINARY

View File

@ -127,6 +127,7 @@ PoolWithFailover::Entry PoolWithFailover::get()
/// If we cannot connect to some replica due to pool overflow, than we will wait and connect.
PoolPtr * full_pool = nullptr;
std::map<std::string, std::tuple<std::string, int>> error_detail;
for (size_t try_no = 0; try_no < max_tries; ++try_no)
{
@ -160,6 +161,15 @@ PoolWithFailover::Entry PoolWithFailover::get()
}
app.logger().warning("Connection to " + pool->getDescription() + " failed: " + e.displayText());
//save all errors to error_detail
if (error_detail.contains(pool->getDescription()))
{
error_detail[pool->getDescription()] = {e.displayText(), e.code()};
}
else
{
error_detail.insert({pool->getDescription(), {e.displayText(), e.code()}});
}
continue;
}
@ -180,7 +190,14 @@ PoolWithFailover::Entry PoolWithFailover::get()
message << "Connections to all replicas failed: ";
for (auto it = replicas_by_priority.begin(); it != replicas_by_priority.end(); ++it)
for (auto jt = it->second.begin(); jt != it->second.end(); ++jt)
{
message << (it == replicas_by_priority.begin() && jt == it->second.begin() ? "" : ", ") << (*jt)->getDescription();
if (error_detail.contains((*jt)->getDescription()))
{
std::tuple<std::string, int> error_and_code = error_detail[(*jt)->getDescription()];
message << ", ERROR " << std::get<1>(error_and_code) << " : " << std::get<0>(error_and_code);
}
}
throw Poco::Exception(message.str());
}

View File

@ -25,7 +25,8 @@ namespace DB
* In the rest, behaves like a dynamic_cast.
*/
template <typename To, typename From>
std::enable_if_t<std::is_reference_v<To>, To> typeid_cast(From & from)
requires std::is_reference_v<To>
To typeid_cast(From & from)
{
try
{
@ -43,7 +44,8 @@ std::enable_if_t<std::is_reference_v<To>, To> typeid_cast(From & from)
template <typename To, typename From>
std::enable_if_t<std::is_pointer_v<To>, To> typeid_cast(From * from)
requires std::is_pointer_v<To>
To typeid_cast(From * from)
{
try
{
@ -60,7 +62,8 @@ std::enable_if_t<std::is_pointer_v<To>, To> typeid_cast(From * from)
template <typename To, typename From>
std::enable_if_t<is_shared_ptr_v<To>, To> typeid_cast(const std::shared_ptr<From> & from)
requires is_shared_ptr_v<To>
To typeid_cast(const std::shared_ptr<From> & from)
{
try
{

View File

@ -37,7 +37,7 @@ void CoordinationSettings::loadFromConfig(const String & config_elem, const Poco
}
const String KeeperConfigurationAndSettings::DEFAULT_FOUR_LETTER_WORD_CMD = "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro";
const String KeeperConfigurationAndSettings::DEFAULT_FOUR_LETTER_WORD_CMD = "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro";
KeeperConfigurationAndSettings::KeeperConfigurationAndSettings()
: server_id(NOT_EXIST)
@ -82,8 +82,8 @@ void KeeperConfigurationAndSettings::dump(WriteBufferFromOwnString & buf) const
write_int(tcp_port_secure);
}
writeText("four_letter_word_white_list=", buf);
writeText(four_letter_word_white_list, buf);
writeText("four_letter_word_allow_list=", buf);
writeText(four_letter_word_allow_list, buf);
buf.write('\n');
writeText("log_storage_path=", buf);
@ -177,7 +177,11 @@ KeeperConfigurationAndSettings::loadFromConfig(const Poco::Util::AbstractConfigu
ret->super_digest = config.getString("keeper_server.superdigest");
}
ret->four_letter_word_white_list = config.getString("keeper_server.four_letter_word_white_list", DEFAULT_FOUR_LETTER_WORD_CMD);
ret->four_letter_word_allow_list = config.getString(
"keeper_server.four_letter_word_allow_list",
config.getString("keeper_server.four_letter_word_white_list",
DEFAULT_FOUR_LETTER_WORD_CMD));
ret->log_storage_path = getLogsPathFromConfig(config, standalone_keeper_);
ret->snapshot_storage_path = getSnapshotsPathFromConfig(config, standalone_keeper_);

View File

@ -68,7 +68,7 @@ struct KeeperConfigurationAndSettings
int tcp_port;
int tcp_port_secure;
String four_letter_word_white_list;
String four_letter_word_allow_list;
String super_digest;

View File

@ -129,7 +129,7 @@ void FourLetterCommandFactory::registerCommands(KeeperDispatcher & keeper_dispat
FourLetterCommandPtr watch_command = std::make_shared<WatchCommand>(keeper_dispatcher);
factory.registerCommand(watch_command);
factory.initializeWhiteList(keeper_dispatcher);
factory.initializeAllowList(keeper_dispatcher);
factory.setInitialize(true);
}
}
@ -137,17 +137,17 @@ void FourLetterCommandFactory::registerCommands(KeeperDispatcher & keeper_dispat
bool FourLetterCommandFactory::isEnabled(int32_t code)
{
checkInitialization();
if (!white_list.empty() && *white_list.cbegin() == WHITE_LIST_ALL)
if (!allow_list.empty() && *allow_list.cbegin() == ALLOW_LIST_ALL)
return true;
return std::find(white_list.begin(), white_list.end(), code) != white_list.end();
return std::find(allow_list.begin(), allow_list.end(), code) != allow_list.end();
}
void FourLetterCommandFactory::initializeWhiteList(KeeperDispatcher & keeper_dispatcher)
void FourLetterCommandFactory::initializeAllowList(KeeperDispatcher & keeper_dispatcher)
{
const auto & keeper_settings = keeper_dispatcher.getKeeperConfigurationAndSettings();
String list_str = keeper_settings->four_letter_word_white_list;
String list_str = keeper_settings->four_letter_word_allow_list;
Strings tokens;
splitInto<','>(tokens, list_str);
@ -157,15 +157,15 @@ void FourLetterCommandFactory::initializeWhiteList(KeeperDispatcher & keeper_dis
if (token == "*")
{
white_list.clear();
white_list.push_back(WHITE_LIST_ALL);
allow_list.clear();
allow_list.push_back(ALLOW_LIST_ALL);
return;
}
else
{
if (commands.contains(IFourLetterCommand::toCode(token)))
{
white_list.push_back(IFourLetterCommand::toCode(token));
allow_list.push_back(IFourLetterCommand::toCode(token));
}
else
{

View File

@ -40,10 +40,10 @@ struct FourLetterCommandFactory : private boost::noncopyable
{
public:
using Commands = std::unordered_map<int32_t, FourLetterCommandPtr>;
using WhiteList = std::vector<int32_t>;
using AllowList = std::vector<int32_t>;
///represent '*' which is used in white list
static constexpr int32_t WHITE_LIST_ALL = 0;
///represent '*' which is used in allow list
static constexpr int32_t ALLOW_LIST_ALL = 0;
bool isKnown(int32_t code);
bool isEnabled(int32_t code);
@ -52,7 +52,7 @@ public:
/// There is no need to make it thread safe, because registration is no initialization and get is after startup.
void registerCommand(FourLetterCommandPtr & command);
void initializeWhiteList(KeeperDispatcher & keeper_dispatcher);
void initializeAllowList(KeeperDispatcher & keeper_dispatcher);
void checkInitialization() const;
bool isInitialized() const { return initialized; }
@ -64,7 +64,7 @@ public:
private:
std::atomic<bool> initialized = false;
Commands commands;
WhiteList white_list;
AllowList allow_list;
};
/**Tests if server is running in a non-error state. The server will respond with imok if it is running.
@ -130,7 +130,7 @@ struct StatResetCommand : public IFourLetterCommand
};
/// A command that does not do anything except reply to client with predefined message.
///It is used to inform clients who execute none white listed four letter word commands.
///It is used to inform clients who execute none allow listed four letter word commands.
struct NopCommand : public IFourLetterCommand
{
explicit NopCommand(KeeperDispatcher & keeper_dispatcher_)

View File

@ -726,18 +726,6 @@ void convertToFullIfSparse(Block & block)
column.column = recursiveRemoveSparse(column.column);
}
ColumnPtr getColumnFromBlock(const Block & block, const NameAndTypePair & column)
{
auto current_column = block.getByName(column.getNameInStorage()).column;
current_column = current_column->decompress();
if (column.isSubcolumn())
return column.getTypeInStorage()->getSubcolumn(column.getSubcolumnName(), current_column);
return current_column;
}
Block materializeBlock(const Block & block)
{
if (!block)

View File

@ -196,10 +196,6 @@ void getBlocksDifference(const Block & lhs, const Block & rhs, std::string & out
void convertToFullIfSparse(Block & block);
/// Helps in-memory storages to extract columns from block.
/// Properly handles cases, when column is a subcolumn and when it is compressed.
ColumnPtr getColumnFromBlock(const Block & block, const NameAndTypePair & column);
/// Converts columns-constants to full columns ("materializes" them).
Block materializeBlock(const Block & block);
void materializeBlockInplace(Block & block);

View File

@ -115,8 +115,8 @@ private:
}
template <typename T, typename U>
static std::enable_if_t<is_decimal<T> && is_decimal<U>, Shift>
getScales(const DataTypePtr & left_type, const DataTypePtr & right_type)
requires is_decimal<T> && is_decimal<U>
static Shift getScales(const DataTypePtr & left_type, const DataTypePtr & right_type)
{
const DataTypeDecimalBase<T> * decimal0 = checkDecimalBase<T>(*left_type);
const DataTypeDecimalBase<U> * decimal1 = checkDecimalBase<U>(*right_type);
@ -137,8 +137,8 @@ private:
}
template <typename T, typename U>
static std::enable_if_t<is_decimal<T> && !is_decimal<U>, Shift>
getScales(const DataTypePtr & left_type, const DataTypePtr &)
requires is_decimal<T> && (!is_decimal<U>)
static Shift getScales(const DataTypePtr & left_type, const DataTypePtr &)
{
Shift shift;
const DataTypeDecimalBase<T> * decimal0 = checkDecimalBase<T>(*left_type);
@ -148,8 +148,8 @@ private:
}
template <typename T, typename U>
static std::enable_if_t<!is_decimal<T> && is_decimal<U>, Shift>
getScales(const DataTypePtr &, const DataTypePtr & right_type)
requires (!is_decimal<T>) && is_decimal<U>
static Shift getScales(const DataTypePtr &, const DataTypePtr & right_type)
{
Shift shift;
const DataTypeDecimalBase<U> * decimal1 = checkDecimalBase<U>(*right_type);

View File

@ -99,6 +99,12 @@ inline Field getBinaryValue(UInt8 type, ReadBuffer & buf)
readBinary(value, buf);
return value;
}
case Field::Types::Object:
{
Object value;
readBinary(value, buf);
return value;
}
case Field::Types::AggregateFunctionState:
{
AggregateFunctionStateData value;
@ -208,6 +214,40 @@ void writeText(const Map & x, WriteBuffer & buf)
writeFieldText(Field(x), buf);
}
void readBinary(Object & x, ReadBuffer & buf)
{
size_t size;
readBinary(size, buf);
for (size_t index = 0; index < size; ++index)
{
UInt8 type;
String key;
readBinary(type, buf);
readBinary(key, buf);
x[key] = getBinaryValue(type, buf);
}
}
void writeBinary(const Object & x, WriteBuffer & buf)
{
const size_t size = x.size();
writeBinary(size, buf);
for (const auto & [key, value] : x)
{
const UInt8 type = value.getType();
writeBinary(type, buf);
writeBinary(key, buf);
Field::dispatch([&buf] (const auto & val) { FieldVisitorWriteBinary()(val, buf); }, value);
}
}
void writeText(const Object & x, WriteBuffer & buf)
{
writeFieldText(Field(x), buf);
}
template <typename T>
void readQuoted(DecimalField<T> & x, ReadBuffer & buf)
{

View File

@ -3,6 +3,7 @@
#include <cassert>
#include <vector>
#include <algorithm>
#include <map>
#include <type_traits>
#include <functional>
@ -49,10 +50,22 @@ DEFINE_FIELD_VECTOR(Array);
DEFINE_FIELD_VECTOR(Tuple);
/// An array with the following structure: [(key1, value1), (key2, value2), ...]
DEFINE_FIELD_VECTOR(Map);
DEFINE_FIELD_VECTOR(Map); /// TODO: use map instead of vector.
#undef DEFINE_FIELD_VECTOR
using FieldMap = std::map<String, Field, std::less<String>, AllocatorWithMemoryTracking<std::pair<const String, Field>>>;
#define DEFINE_FIELD_MAP(X) \
struct X : public FieldMap \
{ \
using FieldMap::FieldMap; \
}
DEFINE_FIELD_MAP(Object);
#undef DEFINE_FIELD_MAP
struct AggregateFunctionStateData
{
String name; /// Name with arguments.
@ -219,6 +232,7 @@ template <> struct NearestFieldTypeImpl<String> { using Type = String; };
template <> struct NearestFieldTypeImpl<Array> { using Type = Array; };
template <> struct NearestFieldTypeImpl<Tuple> { using Type = Tuple; };
template <> struct NearestFieldTypeImpl<Map> { using Type = Map; };
template <> struct NearestFieldTypeImpl<Object> { using Type = Object; };
template <> struct NearestFieldTypeImpl<bool> { using Type = UInt64; };
template <> struct NearestFieldTypeImpl<Null> { using Type = Null; };
@ -283,6 +297,7 @@ public:
Map = 26,
UUID = 27,
Bool = 28,
Object = 29,
};
};
@ -472,6 +487,7 @@ public:
case Types::Array: return get<Array>() < rhs.get<Array>();
case Types::Tuple: return get<Tuple>() < rhs.get<Tuple>();
case Types::Map: return get<Map>() < rhs.get<Map>();
case Types::Object: return get<Object>() < rhs.get<Object>();
case Types::Decimal32: return get<DecimalField<Decimal32>>() < rhs.get<DecimalField<Decimal32>>();
case Types::Decimal64: return get<DecimalField<Decimal64>>() < rhs.get<DecimalField<Decimal64>>();
case Types::Decimal128: return get<DecimalField<Decimal128>>() < rhs.get<DecimalField<Decimal128>>();
@ -510,6 +526,7 @@ public:
case Types::Array: return get<Array>() <= rhs.get<Array>();
case Types::Tuple: return get<Tuple>() <= rhs.get<Tuple>();
case Types::Map: return get<Map>() <= rhs.get<Map>();
case Types::Object: return get<Object>() <= rhs.get<Object>();
case Types::Decimal32: return get<DecimalField<Decimal32>>() <= rhs.get<DecimalField<Decimal32>>();
case Types::Decimal64: return get<DecimalField<Decimal64>>() <= rhs.get<DecimalField<Decimal64>>();
case Types::Decimal128: return get<DecimalField<Decimal128>>() <= rhs.get<DecimalField<Decimal128>>();
@ -548,6 +565,7 @@ public:
case Types::Array: return get<Array>() == rhs.get<Array>();
case Types::Tuple: return get<Tuple>() == rhs.get<Tuple>();
case Types::Map: return get<Map>() == rhs.get<Map>();
case Types::Object: return get<Object>() == rhs.get<Object>();
case Types::UInt128: return get<UInt128>() == rhs.get<UInt128>();
case Types::UInt256: return get<UInt256>() == rhs.get<UInt256>();
case Types::Int128: return get<Int128>() == rhs.get<Int128>();
@ -597,6 +615,7 @@ public:
bool value = bool(field.template get<UInt64>());
return f(value);
}
case Types::Object: return f(field.template get<Object>());
case Types::Decimal32: return f(field.template get<DecimalField<Decimal32>>());
case Types::Decimal64: return f(field.template get<DecimalField<Decimal64>>());
case Types::Decimal128: return f(field.template get<DecimalField<Decimal128>>());
@ -713,6 +732,9 @@ private:
case Types::Map:
destroy<Map>();
break;
case Types::Object:
destroy<Object>();
break;
case Types::AggregateFunctionState:
destroy<AggregateFunctionStateData>();
break;
@ -737,26 +759,27 @@ private:
using Row = std::vector<Field>;
template <> struct Field::TypeToEnum<Null> { static const Types::Which value = Types::Null; };
template <> struct Field::TypeToEnum<UInt64> { static const Types::Which value = Types::UInt64; };
template <> struct Field::TypeToEnum<UInt128> { static const Types::Which value = Types::UInt128; };
template <> struct Field::TypeToEnum<UInt256> { static const Types::Which value = Types::UInt256; };
template <> struct Field::TypeToEnum<Int64> { static const Types::Which value = Types::Int64; };
template <> struct Field::TypeToEnum<Int128> { static const Types::Which value = Types::Int128; };
template <> struct Field::TypeToEnum<Int256> { static const Types::Which value = Types::Int256; };
template <> struct Field::TypeToEnum<UUID> { static const Types::Which value = Types::UUID; };
template <> struct Field::TypeToEnum<Float64> { static const Types::Which value = Types::Float64; };
template <> struct Field::TypeToEnum<String> { static const Types::Which value = Types::String; };
template <> struct Field::TypeToEnum<Array> { static const Types::Which value = Types::Array; };
template <> struct Field::TypeToEnum<Tuple> { static const Types::Which value = Types::Tuple; };
template <> struct Field::TypeToEnum<Map> { static const Types::Which value = Types::Map; };
template <> struct Field::TypeToEnum<DecimalField<Decimal32>>{ static const Types::Which value = Types::Decimal32; };
template <> struct Field::TypeToEnum<DecimalField<Decimal64>>{ static const Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<DecimalField<Decimal128>>{ static const Types::Which value = Types::Decimal128; };
template <> struct Field::TypeToEnum<DecimalField<Decimal256>>{ static const Types::Which value = Types::Decimal256; };
template <> struct Field::TypeToEnum<DecimalField<DateTime64>>{ static const Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<AggregateFunctionStateData>{ static const Types::Which value = Types::AggregateFunctionState; };
template <> struct Field::TypeToEnum<bool>{ static const Types::Which value = Types::Bool; };
template <> struct Field::TypeToEnum<Null> { static constexpr Types::Which value = Types::Null; };
template <> struct Field::TypeToEnum<UInt64> { static constexpr Types::Which value = Types::UInt64; };
template <> struct Field::TypeToEnum<UInt128> { static constexpr Types::Which value = Types::UInt128; };
template <> struct Field::TypeToEnum<UInt256> { static constexpr Types::Which value = Types::UInt256; };
template <> struct Field::TypeToEnum<Int64> { static constexpr Types::Which value = Types::Int64; };
template <> struct Field::TypeToEnum<Int128> { static constexpr Types::Which value = Types::Int128; };
template <> struct Field::TypeToEnum<Int256> { static constexpr Types::Which value = Types::Int256; };
template <> struct Field::TypeToEnum<UUID> { static constexpr Types::Which value = Types::UUID; };
template <> struct Field::TypeToEnum<Float64> { static constexpr Types::Which value = Types::Float64; };
template <> struct Field::TypeToEnum<String> { static constexpr Types::Which value = Types::String; };
template <> struct Field::TypeToEnum<Array> { static constexpr Types::Which value = Types::Array; };
template <> struct Field::TypeToEnum<Tuple> { static constexpr Types::Which value = Types::Tuple; };
template <> struct Field::TypeToEnum<Map> { static constexpr Types::Which value = Types::Map; };
template <> struct Field::TypeToEnum<Object> { static constexpr Types::Which value = Types::Object; };
template <> struct Field::TypeToEnum<DecimalField<Decimal32>>{ static constexpr Types::Which value = Types::Decimal32; };
template <> struct Field::TypeToEnum<DecimalField<Decimal64>>{ static constexpr Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<DecimalField<Decimal128>>{ static constexpr Types::Which value = Types::Decimal128; };
template <> struct Field::TypeToEnum<DecimalField<Decimal256>>{ static constexpr Types::Which value = Types::Decimal256; };
template <> struct Field::TypeToEnum<DecimalField<DateTime64>>{ static constexpr Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<AggregateFunctionStateData>{ static constexpr Types::Which value = Types::AggregateFunctionState; };
template <> struct Field::TypeToEnum<bool>{ static constexpr Types::Which value = Types::Bool; };
template <> struct Field::EnumToType<Field::Types::Null> { using Type = Null; };
template <> struct Field::EnumToType<Field::Types::UInt64> { using Type = UInt64; };
@ -771,6 +794,7 @@ template <> struct Field::EnumToType<Field::Types::String> { using Type = Strin
template <> struct Field::EnumToType<Field::Types::Array> { using Type = Array; };
template <> struct Field::EnumToType<Field::Types::Tuple> { using Type = Tuple; };
template <> struct Field::EnumToType<Field::Types::Map> { using Type = Map; };
template <> struct Field::EnumToType<Field::Types::Object> { using Type = Object; };
template <> struct Field::EnumToType<Field::Types::Decimal32> { using Type = DecimalField<Decimal32>; };
template <> struct Field::EnumToType<Field::Types::Decimal64> { using Type = DecimalField<Decimal64>; };
template <> struct Field::EnumToType<Field::Types::Decimal128> { using Type = DecimalField<Decimal128>; };
@ -931,34 +955,39 @@ class WriteBuffer;
/// It is assumed that all elements of the array have the same type.
void readBinary(Array & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Array &, ReadBuffer &) { throw Exception("Cannot read Array.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Array &, ReadBuffer &) { throw Exception("Cannot read Array.", ErrorCodes::NOT_IMPLEMENTED); }
/// It is assumed that all elements of the array have the same type.
/// Also write size and type into buf. UInt64 and Int64 is written in variadic size form
void writeBinary(const Array & x, WriteBuffer & buf);
void writeText(const Array & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Array &, WriteBuffer &) { throw Exception("Cannot write Array quoted.", ErrorCodes::NOT_IMPLEMENTED); }
void readBinary(Tuple & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Tuple &, ReadBuffer &) { throw Exception("Cannot read Tuple.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Tuple &, ReadBuffer &) { throw Exception("Cannot read Tuple.", ErrorCodes::NOT_IMPLEMENTED); }
void writeBinary(const Tuple & x, WriteBuffer & buf);
void writeText(const Tuple & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Tuple &, WriteBuffer &) { throw Exception("Cannot write Tuple quoted.", ErrorCodes::NOT_IMPLEMENTED); }
void readBinary(Map & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Map &, ReadBuffer &) { throw Exception("Cannot read Map.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Map &, ReadBuffer &) { throw Exception("Cannot read Map.", ErrorCodes::NOT_IMPLEMENTED); }
void writeBinary(const Map & x, WriteBuffer & buf);
void writeText(const Map & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Map &, WriteBuffer &) { throw Exception("Cannot write Map quoted.", ErrorCodes::NOT_IMPLEMENTED); }
void readBinary(Object & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Object &, ReadBuffer &) { throw Exception("Cannot read Object.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Object &, ReadBuffer &) { throw Exception("Cannot read Object.", ErrorCodes::NOT_IMPLEMENTED); }
void writeBinary(const Object & x, WriteBuffer & buf);
void writeText(const Object & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Object &, WriteBuffer &) { throw Exception("Cannot write Object quoted.", ErrorCodes::NOT_IMPLEMENTED); }
__attribute__ ((noreturn)) inline void writeText(const AggregateFunctionStateData &, WriteBuffer &)
{
// This probably doesn't make any sense, but we have to have it for
@ -977,8 +1006,6 @@ void readQuoted(DecimalField<T> & x, ReadBuffer & buf);
void writeFieldText(const Field & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Tuple &, WriteBuffer &) { throw Exception("Cannot write Tuple quoted.", ErrorCodes::NOT_IMPLEMENTED); }
String toString(const Field & x);
}

View File

@ -53,7 +53,8 @@ struct MultiEnum
return bitset;
}
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
template <typename ValueType>
requires std::is_convertible_v<ValueType, StorageType>
void setValue(ValueType new_value)
{
// Can't set value from any enum avoid confusion
@ -66,7 +67,8 @@ struct MultiEnum
return bitset == other.bitset;
}
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
template <typename ValueType>
requires std::is_convertible_v<ValueType, StorageType>
bool operator==(ValueType other) const
{
// Shouldn't be comparable with any enum to avoid confusion
@ -80,13 +82,15 @@ struct MultiEnum
return !(*this == other);
}
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
template <typename ValueType>
requires std::is_convertible_v<ValueType, StorageType>
friend bool operator==(ValueType left, MultiEnum right)
{
return right.operator==(left);
}
template <typename L, typename = typename std::enable_if<!std::is_same_v<L, MultiEnum>>::type>
template <typename L>
requires (!std::is_same_v<L, MultiEnum>)
friend bool operator!=(L left, MultiEnum right)
{
return !(right.operator==(left));

View File

@ -44,6 +44,7 @@ class IColumn;
M(UInt64, min_insert_block_size_bytes_for_materialized_views, 0, "Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)", 0) \
M(UInt64, max_joined_block_size_rows, DEFAULT_BLOCK_SIZE, "Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.", 0) \
M(UInt64, max_insert_threads, 0, "The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see 'max_threads' setting.", 0) \
M(UInt64, max_insert_delayed_streams_for_parallel_write, 0, "The maximum number of streams (columns) to delay final part flush. Default - auto (1000 in case of underlying storage supports parallel write, for example S3 and disabled otherwise)", 0) \
M(UInt64, max_final_threads, 16, "The maximum number of threads to read from table with FINAL.", 0) \
M(MaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
M(UInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.", 0) \
@ -136,7 +137,7 @@ class IColumn;
\
M(Bool, skip_unavailable_shards, false, "If true, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.", 0) \
\
M(UInt64, parallel_distributed_insert_select, 0, "Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard", 0) \
M(UInt64, parallel_distributed_insert_select, 0, "Process distributed INSERT SELECT query in the same cluster on local tables on every shard; if set to 1 - SELECT is executed on each shard; if set to 2 - SELECT and INSERT are executed on each shard", 0) \
M(UInt64, distributed_group_by_no_merge, 0, "If 1, Do not merge aggregation states from different servers for distributed queries (shards will process query up to the Complete stage, initiator just proxies the data from the shards). If 2 the initiator will apply ORDER BY and LIMIT stages (it is not in case when shard process query up to the Complete stage)", 0) \
M(UInt64, distributed_push_down_limit, 1, "If 1, LIMIT will be applied on each shard separatelly. Usually you don't need to use it, since this will be done automatically if it is possible, i.e. for simple query SELECT FROM LIMIT.", 0) \
M(Bool, optimize_distributed_group_by_sharding_key, true, "Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server).", 0) \
@ -471,6 +472,7 @@ class IColumn;
M(Bool, allow_experimental_geo_types, false, "Allow geo data types such as Point, Ring, Polygon, MultiPolygon", 0) \
M(Bool, data_type_default_nullable, false, "Data types without NULL or NOT NULL will make Nullable", 0) \
M(Bool, cast_keep_nullable, false, "CAST operator keep Nullable for result data type", 0) \
M(Bool, cast_ipv4_ipv6_default_on_conversion_error, false, "CAST operator into IPv4, CAST operator into IPV6 type, toIPv4, toIPv6 functions will return default value instead of throwing exception on conversion error.", 0) \
M(Bool, alter_partition_verbose_result, false, "Output information about affected parts. Currently works only for FREEZE and ATTACH commands.", 0) \
M(Bool, allow_experimental_database_materialized_mysql, false, "Allow to create database with Engine=MaterializedMySQL(...).", 0) \
M(Bool, allow_experimental_database_materialized_postgresql, false, "Allow to create database with Engine=MaterializedPostgreSQL(...).", 0) \
@ -490,6 +492,7 @@ class IColumn;
M(Bool, force_optimize_projection, false, "If projection optimization is enabled, SELECT queries need to use projection", 0) \
M(Bool, async_socket_for_remote, true, "Asynchronously read from socket executing remote query", 0) \
M(Bool, insert_null_as_default, true, "Insert DEFAULT values instead of NULL in INSERT SELECT (UNION ALL)", 0) \
M(Bool, describe_extend_object_types, false, "Deduce concrete type of columns of type Object in DESCRIBE query", 0) \
M(Bool, describe_include_subcolumns, false, "If true, subcolumns of all table columns will be included into result of DESCRIBE query", 0) \
\
M(Bool, optimize_rewrite_sum_if_to_count_if, true, "Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent", 0) \
@ -551,7 +554,7 @@ class IColumn;
M(UInt64, remote_fs_read_max_backoff_ms, 10000, "Max wait time when trying to read data for remote disk", 0) \
M(UInt64, remote_fs_read_backoff_max_tries, 5, "Max attempts to read with backoff", 0) \
M(Bool, remote_fs_enable_cache, true, "Use cache for remote filesystem. This setting does not turn on/off cache for disks (must me done via disk config), but allows to bypass cache for some queries if intended", 0) \
M(UInt64, remote_fs_cache_max_wait_sec, 5, "Allow to wait a most this number of seconds for download of current remote_fs_buffer_size bytes, and skip cache if exceeded", 0) \
M(UInt64, remote_fs_cache_max_wait_sec, 5, "Allow to wait at most this number of seconds for download of current remote_fs_buffer_size bytes, and skip cache if exceeded", 0) \
\
M(UInt64, http_max_tries, 10, "Max attempts to read via http.", 0) \
M(UInt64, http_retry_initial_backoff_ms, 100, "Min milliseconds for backoff, when retrying read via http", 0) \
@ -566,6 +569,7 @@ class IColumn;
/** Experimental functions */ \
M(Bool, allow_experimental_funnel_functions, false, "Enable experimental functions for funnel analysis.", 0) \
M(Bool, allow_experimental_nlp_functions, false, "Enable experimental functions for natural language processing.", 0) \
M(Bool, allow_experimental_object_type, false, "Allow Object and JSON data types", 0) \
M(String, insert_deduplication_token, "", "If not empty, used for duplicate detection instead of data digest", 0) \
// End of COMMON_SETTINGS
// Please add settings related to formats into the FORMAT_FACTORY_SETTINGS and move obsolete settings to OBSOLETE_SETTINGS.

View File

@ -87,6 +87,7 @@ enum class TypeIndex
AggregateFunction,
LowCardinality,
Map,
Object,
};
#if !defined(__clang__)
#pragma GCC diagnostic pop

View File

@ -15,6 +15,8 @@
#cmakedefine01 USE_NURAFT
#cmakedefine01 USE_NLP
#cmakedefine01 USE_KRB5
#cmakedefine01 USE_SIMDJSON
#cmakedefine01 USE_RAPIDJSON
#cmakedefine01 USE_FILELOG
#cmakedefine01 USE_ODBC
#cmakedefine01 USE_REPLXX

View File

@ -7,7 +7,8 @@ namespace DB
// Use template to disable implicit casting for certain overloaded types such as Field, which leads
// to overload resolution ambiguity.
class Field;
template <typename T, typename U = std::enable_if_t<std::is_same_v<T, Field>>>
template <typename T>
requires std::is_same_v<T, Field>
std::ostream & operator<<(std::ostream & stream, const T & what);
struct NameAndTypePair;

View File

@ -1,3 +1,5 @@
add_subdirectory (Serializations)
if (ENABLE_EXAMPLES)
add_subdirectory(examples)
add_subdirectory (examples)
endif ()

View File

@ -213,6 +213,7 @@ DataTypeFactory::DataTypeFactory()
registerDataTypeDomainSimpleAggregateFunction(*this);
registerDataTypeDomainGeo(*this);
registerDataTypeMap(*this);
registerDataTypeObject(*this);
}
DataTypeFactory & DataTypeFactory::instance()

View File

@ -87,5 +87,6 @@ void registerDataTypeDomainIPv4AndIPv6(DataTypeFactory & factory);
void registerDataTypeDomainBool(DataTypeFactory & factory);
void registerDataTypeDomainSimpleAggregateFunction(DataTypeFactory & factory);
void registerDataTypeDomainGeo(DataTypeFactory & factory);
void registerDataTypeObject(DataTypeFactory & factory);
}

View File

@ -0,0 +1,82 @@
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/Serializations/SerializationObject.h>
#include <Parsers/IAST.h>
#include <Parsers/ASTLiteral.h>
#include <Parsers/ASTFunction.h>
#include <IO/Operators.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int UNEXPECTED_AST_STRUCTURE;
}
DataTypeObject::DataTypeObject(const String & schema_format_, bool is_nullable_)
: schema_format(Poco::toLower(schema_format_))
, is_nullable(is_nullable_)
{
}
bool DataTypeObject::equals(const IDataType & rhs) const
{
if (const auto * object = typeid_cast<const DataTypeObject *>(&rhs))
return schema_format == object->schema_format && is_nullable == object->is_nullable;
return false;
}
SerializationPtr DataTypeObject::doGetDefaultSerialization() const
{
return getObjectSerialization(schema_format);
}
String DataTypeObject::doGetName() const
{
WriteBufferFromOwnString out;
if (is_nullable)
out << "Object(Nullable(" << quote << schema_format << "))";
else
out << "Object(" << quote << schema_format << ")";
return out.str();
}
static DataTypePtr create(const ASTPtr & arguments)
{
if (!arguments || arguments->children.size() != 1)
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Object data type family must have one argument - name of schema format");
ASTPtr schema_argument = arguments->children[0];
bool is_nullable = false;
if (const auto * func = schema_argument->as<ASTFunction>())
{
if (func->name != "Nullable" || func->arguments->children.size() != 1)
throw Exception(ErrorCodes::UNEXPECTED_AST_STRUCTURE,
"Expected 'Nullable(<schema_name>)' as parameter for type Object", func->name);
schema_argument = func->arguments->children[0];
is_nullable = true;
}
const auto * literal = schema_argument->as<ASTLiteral>();
if (!literal || literal->value.getType() != Field::Types::String)
throw Exception(ErrorCodes::UNEXPECTED_AST_STRUCTURE,
"Object data type family must have a const string as its schema name parameter");
return std::make_shared<DataTypeObject>(literal->value.get<const String &>(), is_nullable);
}
void registerDataTypeObject(DataTypeFactory & factory)
{
factory.registerDataType("Object", create);
factory.registerSimpleDataType("JSON",
[] { return std::make_shared<DataTypeObject>("JSON", false); },
DataTypeFactory::CaseInsensitive);
}
}

View File

@ -0,0 +1,45 @@
#pragma once
#include <DataTypes/IDataType.h>
#include <Core/Field.h>
#include <Columns/ColumnObject.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
}
class DataTypeObject : public IDataType
{
private:
String schema_format;
bool is_nullable;
public:
DataTypeObject(const String & schema_format_, bool is_nullable_);
const char * getFamilyName() const override { return "Object"; }
String doGetName() const override;
TypeIndex getTypeId() const override { return TypeIndex::Object; }
MutableColumnPtr createColumn() const override { return ColumnObject::create(is_nullable); }
Field getDefault() const override
{
throw Exception("Method getDefault() is not implemented for data type " + getName(), ErrorCodes::NOT_IMPLEMENTED);
}
bool haveSubtypes() const override { return false; }
bool equals(const IDataType & rhs) const override;
bool isParametric() const override { return true; }
SerializationPtr doGetDefaultSerialization() const override;
bool hasNullableSubcolumns() const { return is_nullable; }
};
}

View File

@ -1,6 +1,7 @@
#include <DataTypes/FieldToDataType.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypesDecimal.h>
#include <DataTypes/DataTypeString.h>
@ -108,12 +109,11 @@ DataTypePtr FieldToDataType::operator() (const Array & x) const
element_types.reserve(x.size());
for (const Field & elem : x)
element_types.emplace_back(applyVisitor(FieldToDataType(), elem));
element_types.emplace_back(applyVisitor(FieldToDataType(allow_convertion_to_string), elem));
return std::make_shared<DataTypeArray>(getLeastSupertype(element_types));
return std::make_shared<DataTypeArray>(getLeastSupertype(element_types, allow_convertion_to_string));
}
DataTypePtr FieldToDataType::operator() (const Tuple & tuple) const
{
if (tuple.empty())
@ -123,7 +123,7 @@ DataTypePtr FieldToDataType::operator() (const Tuple & tuple) const
element_types.reserve(tuple.size());
for (const auto & element : tuple)
element_types.push_back(applyVisitor(FieldToDataType(), element));
element_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), element));
return std::make_shared<DataTypeTuple>(element_types);
}
@ -139,11 +139,19 @@ DataTypePtr FieldToDataType::operator() (const Map & map) const
{
const auto & tuple = elem.safeGet<const Tuple &>();
assert(tuple.size() == 2);
key_types.push_back(applyVisitor(FieldToDataType(), tuple[0]));
value_types.push_back(applyVisitor(FieldToDataType(), tuple[1]));
key_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), tuple[0]));
value_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), tuple[1]));
}
return std::make_shared<DataTypeMap>(getLeastSupertype(key_types), getLeastSupertype(value_types));
return std::make_shared<DataTypeMap>(
getLeastSupertype(key_types, allow_convertion_to_string),
getLeastSupertype(value_types, allow_convertion_to_string));
}
DataTypePtr FieldToDataType::operator() (const Object &) const
{
/// TODO: Do we need different parameters for type Object?
return std::make_shared<DataTypeObject>("json", false);
}
DataTypePtr FieldToDataType::operator() (const AggregateFunctionStateData & x) const

View File

@ -20,26 +20,34 @@ using DataTypePtr = std::shared_ptr<const IDataType>;
class FieldToDataType : public StaticVisitor<DataTypePtr>
{
public:
FieldToDataType(bool allow_convertion_to_string_ = false)
: allow_convertion_to_string(allow_convertion_to_string_)
{
}
DataTypePtr operator() (const Null & x) const;
DataTypePtr operator() (const UInt64 & x) const;
DataTypePtr operator() (const UInt128 & x) const;
DataTypePtr operator() (const UInt256 & x) const;
DataTypePtr operator() (const Int64 & x) const;
DataTypePtr operator() (const Int128 & x) const;
DataTypePtr operator() (const Int256 & x) const;
DataTypePtr operator() (const UUID & x) const;
DataTypePtr operator() (const Float64 & x) const;
DataTypePtr operator() (const String & x) const;
DataTypePtr operator() (const Array & x) const;
DataTypePtr operator() (const Tuple & tuple) const;
DataTypePtr operator() (const Map & map) const;
DataTypePtr operator() (const Object & map) const;
DataTypePtr operator() (const DecimalField<Decimal32> & x) const;
DataTypePtr operator() (const DecimalField<Decimal64> & x) const;
DataTypePtr operator() (const DecimalField<Decimal128> & x) const;
DataTypePtr operator() (const DecimalField<Decimal256> & x) const;
DataTypePtr operator() (const AggregateFunctionStateData & x) const;
DataTypePtr operator() (const UInt256 & x) const;
DataTypePtr operator() (const Int256 & x) const;
DataTypePtr operator() (const bool & x) const;
private:
bool allow_convertion_to_string;
};
}

View File

@ -126,19 +126,25 @@ DataTypePtr IDataType::tryGetSubcolumnType(const String & subcolumn_name) const
DataTypePtr IDataType::getSubcolumnType(const String & subcolumn_name) const
{
SubstreamData data = { getDefaultSerialization(), getPtr(), nullptr, nullptr };
return getForSubcolumn<DataTypePtr>(subcolumn_name, data, &SubstreamData::type);
return getForSubcolumn<DataTypePtr>(subcolumn_name, data, &SubstreamData::type, true);
}
SerializationPtr IDataType::getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const
ColumnPtr IDataType::tryGetSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const
{
SubstreamData data = { serialization, nullptr, nullptr, nullptr };
return getForSubcolumn<SerializationPtr>(subcolumn_name, data, &SubstreamData::serialization);
SubstreamData data = { getDefaultSerialization(), nullptr, column, nullptr };
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column, false);
}
ColumnPtr IDataType::getSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const
{
SubstreamData data = { getDefaultSerialization(), nullptr, column, nullptr };
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column);
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column, true);
}
SerializationPtr IDataType::getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const
{
SubstreamData data = { serialization, nullptr, nullptr, nullptr };
return getForSubcolumn<SerializationPtr>(subcolumn_name, data, &SubstreamData::serialization, true);
}
Names IDataType::getSubcolumnNames() const

View File

@ -82,9 +82,11 @@ public:
DataTypePtr tryGetSubcolumnType(const String & subcolumn_name) const;
DataTypePtr getSubcolumnType(const String & subcolumn_name) const;
SerializationPtr getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const;
ColumnPtr tryGetSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const;
ColumnPtr getSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const;
SerializationPtr getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const;
using SubstreamData = ISerialization::SubstreamData;
using SubstreamPath = ISerialization::SubstreamPath;
@ -309,7 +311,7 @@ private:
const String & subcolumn_name,
const SubstreamData & data,
Ptr SubstreamData::*member,
bool throw_if_null = true) const;
bool throw_if_null) const;
};
@ -373,11 +375,13 @@ struct WhichDataType
constexpr bool isMap() const {return idx == TypeIndex::Map; }
constexpr bool isSet() const { return idx == TypeIndex::Set; }
constexpr bool isInterval() const { return idx == TypeIndex::Interval; }
constexpr bool isObject() const { return idx == TypeIndex::Object; }
constexpr bool isNothing() const { return idx == TypeIndex::Nothing; }
constexpr bool isNullable() const { return idx == TypeIndex::Nullable; }
constexpr bool isFunction() const { return idx == TypeIndex::Function; }
constexpr bool isAggregateFunction() const { return idx == TypeIndex::AggregateFunction; }
constexpr bool isSimple() const { return isInt() || isUInt() || isFloat() || isString(); }
constexpr bool isLowCarnality() const { return idx == TypeIndex::LowCardinality; }
};
@ -399,10 +403,16 @@ inline bool isEnum(const DataTypePtr & data_type) { return WhichDataType(data_ty
inline bool isDecimal(const DataTypePtr & data_type) { return WhichDataType(data_type).isDecimal(); }
inline bool isTuple(const DataTypePtr & data_type) { return WhichDataType(data_type).isTuple(); }
inline bool isArray(const DataTypePtr & data_type) { return WhichDataType(data_type).isArray(); }
inline bool isMap(const DataTypePtr & data_type) { return WhichDataType(data_type).isMap(); }
inline bool isMap(const DataTypePtr & data_type) {return WhichDataType(data_type).isMap(); }
inline bool isNothing(const DataTypePtr & data_type) { return WhichDataType(data_type).isNothing(); }
inline bool isUUID(const DataTypePtr & data_type) { return WhichDataType(data_type).isUUID(); }
template <typename T>
inline bool isObject(const T & data_type)
{
return WhichDataType(data_type).isObject();
}
template <typename T>
inline bool isUInt8(const T & data_type)
{

View File

@ -30,6 +30,12 @@ namespace Nested
std::string concatenateName(const std::string & nested_table_name, const std::string & nested_field_name)
{
if (nested_table_name.empty())
return nested_field_name;
if (nested_field_name.empty())
return nested_table_name;
return nested_table_name + "." + nested_field_name;
}

View File

@ -0,0 +1,703 @@
#include <DataTypes/ObjectUtils.h>
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/DataTypeNothing.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeNested.h>
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/getLeastSupertype.h>
#include <DataTypes/NestedUtils.h>
#include <Columns/ColumnObject.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnNullable.h>
#include <Parsers/ASTSelectQuery.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTLiteral.h>
#include <Parsers/ASTFunction.h>
#include <IO/Operators.h>
namespace DB
{
namespace ErrorCodes
{
extern const int TYPE_MISMATCH;
extern const int LOGICAL_ERROR;
extern const int DUPLICATE_COLUMN;
}
size_t getNumberOfDimensions(const IDataType & type)
{
if (const auto * type_array = typeid_cast<const DataTypeArray *>(&type))
return type_array->getNumberOfDimensions();
return 0;
}
size_t getNumberOfDimensions(const IColumn & column)
{
if (const auto * column_array = checkAndGetColumn<ColumnArray>(column))
return column_array->getNumberOfDimensions();
return 0;
}
DataTypePtr getBaseTypeOfArray(const DataTypePtr & type)
{
/// Get raw pointers to avoid extra copying of type pointers.
const DataTypeArray * last_array = nullptr;
const auto * current_type = type.get();
while (const auto * type_array = typeid_cast<const DataTypeArray *>(current_type))
{
current_type = type_array->getNestedType().get();
last_array = type_array;
}
return last_array ? last_array->getNestedType() : type;
}
ColumnPtr getBaseColumnOfArray(const ColumnPtr & column)
{
/// Get raw pointers to avoid extra copying of column pointers.
const ColumnArray * last_array = nullptr;
const auto * current_column = column.get();
while (const auto * column_array = checkAndGetColumn<ColumnArray>(current_column))
{
current_column = &column_array->getData();
last_array = column_array;
}
return last_array ? last_array->getDataPtr() : column;
}
DataTypePtr createArrayOfType(DataTypePtr type, size_t num_dimensions)
{
for (size_t i = 0; i < num_dimensions; ++i)
type = std::make_shared<DataTypeArray>(std::move(type));
return type;
}
ColumnPtr createArrayOfColumn(ColumnPtr column, size_t num_dimensions)
{
for (size_t i = 0; i < num_dimensions; ++i)
column = ColumnArray::create(column);
return column;
}
Array createEmptyArrayField(size_t num_dimensions)
{
if (num_dimensions == 0)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot create array field with 0 dimensions");
Array array;
Array * current_array = &array;
for (size_t i = 1; i < num_dimensions; ++i)
{
current_array->push_back(Array());
current_array = &current_array->back().get<Array &>();
}
return array;
}
DataTypePtr getDataTypeByColumn(const IColumn & column)
{
auto idx = column.getDataType();
if (WhichDataType(idx).isSimple())
return DataTypeFactory::instance().get(String(magic_enum::enum_name(idx)));
if (const auto * column_array = checkAndGetColumn<ColumnArray>(&column))
return std::make_shared<DataTypeArray>(getDataTypeByColumn(column_array->getData()));
if (const auto * column_nullable = checkAndGetColumn<ColumnNullable>(&column))
return makeNullable(getDataTypeByColumn(column_nullable->getNestedColumn()));
/// TODO: add more types.
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get data type of column {}", column.getFamilyName());
}
template <size_t I, typename Tuple>
static auto extractVector(const std::vector<Tuple> & vec)
{
static_assert(I < std::tuple_size_v<Tuple>);
std::vector<std::tuple_element_t<I, Tuple>> res;
res.reserve(vec.size());
for (const auto & elem : vec)
res.emplace_back(std::get<I>(elem));
return res;
}
void convertObjectsToTuples(NamesAndTypesList & columns_list, Block & block, const NamesAndTypesList & extended_storage_columns)
{
std::unordered_map<String, DataTypePtr> storage_columns_map;
for (const auto & [name, type] : extended_storage_columns)
storage_columns_map[name] = type;
for (auto & name_type : columns_list)
{
if (!isObject(name_type.type))
continue;
auto & column = block.getByName(name_type.name);
if (!isObject(column.type))
throw Exception(ErrorCodes::TYPE_MISMATCH,
"Type for column '{}' mismatch in columns list and in block. In list: {}, in block: {}",
name_type.name, name_type.type->getName(), column.type->getName());
const auto & column_object = assert_cast<const ColumnObject &>(*column.column);
const auto & subcolumns = column_object.getSubcolumns();
if (!column_object.isFinalized())
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Cannot convert to tuple column '{}' from type {}. Column should be finalized first",
name_type.name, name_type.type->getName());
PathsInData tuple_paths;
DataTypes tuple_types;
Columns tuple_columns;
for (const auto & entry : subcolumns)
{
tuple_paths.emplace_back(entry->path);
tuple_types.emplace_back(entry->data.getLeastCommonType());
tuple_columns.emplace_back(entry->data.getFinalizedColumnPtr());
}
auto it = storage_columns_map.find(name_type.name);
if (it == storage_columns_map.end())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Column '{}' not found in storage", name_type.name);
std::tie(column.column, column.type) = unflattenTuple(tuple_paths, tuple_types, tuple_columns);
name_type.type = column.type;
/// Check that constructed Tuple type and type in storage are compatible.
getLeastCommonTypeForObject({column.type, it->second}, true);
}
}
static bool isPrefix(const PathInData::Parts & prefix, const PathInData::Parts & parts)
{
if (prefix.size() > parts.size())
return false;
for (size_t i = 0; i < prefix.size(); ++i)
if (prefix[i].key != parts[i].key)
return false;
return true;
}
void checkObjectHasNoAmbiguosPaths(const PathsInData & paths)
{
size_t size = paths.size();
for (size_t i = 0; i < size; ++i)
{
for (size_t j = 0; j < i; ++j)
{
if (isPrefix(paths[i].getParts(), paths[j].getParts())
|| isPrefix(paths[j].getParts(), paths[i].getParts()))
throw Exception(ErrorCodes::DUPLICATE_COLUMN,
"Data in Object has ambiguous paths: '{}' and '{}'",
paths[i].getPath(), paths[j].getPath());
}
}
}
DataTypePtr getLeastCommonTypeForObject(const DataTypes & types, bool check_ambiguos_paths)
{
if (types.empty())
return nullptr;
bool all_equal = true;
for (size_t i = 1; i < types.size(); ++i)
{
if (!types[i]->equals(*types[0]))
{
all_equal = false;
break;
}
}
if (all_equal)
return types[0];
/// Types of subcolumns by path from all tuples.
std::unordered_map<PathInData, DataTypes, PathInData::Hash> subcolumns_types;
/// First we flatten tuples, then get common type for paths
/// and finally unflatten paths and create new tuple type.
for (const auto & type : types)
{
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(type.get());
if (!type_tuple)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Least common type for object can be deduced only from tuples, but {} given", type->getName());
auto [tuple_paths, tuple_types] = flattenTuple(type);
assert(tuple_paths.size() == tuple_types.size());
for (size_t i = 0; i < tuple_paths.size(); ++i)
subcolumns_types[tuple_paths[i]].push_back(tuple_types[i]);
}
PathsInData tuple_paths;
DataTypes tuple_types;
/// Get the least common type for all paths.
for (const auto & [key, subtypes] : subcolumns_types)
{
assert(!subtypes.empty());
if (key.getPath() == ColumnObject::COLUMN_NAME_DUMMY)
continue;
size_t first_dim = getNumberOfDimensions(*subtypes[0]);
for (size_t i = 1; i < subtypes.size(); ++i)
if (first_dim != getNumberOfDimensions(*subtypes[i]))
throw Exception(ErrorCodes::TYPE_MISMATCH,
"Uncompatible types of subcolumn '{}': {} and {}",
key.getPath(), subtypes[0]->getName(), subtypes[i]->getName());
tuple_paths.emplace_back(key);
tuple_types.emplace_back(getLeastSupertype(subtypes, /*allow_conversion_to_string=*/ true));
}
if (tuple_paths.empty())
{
tuple_paths.emplace_back(ColumnObject::COLUMN_NAME_DUMMY);
tuple_types.emplace_back(std::make_shared<DataTypeUInt8>());
}
if (check_ambiguos_paths)
checkObjectHasNoAmbiguosPaths(tuple_paths);
return unflattenTuple(tuple_paths, tuple_types);
}
NameSet getNamesOfObjectColumns(const NamesAndTypesList & columns_list)
{
NameSet res;
for (const auto & [name, type] : columns_list)
if (isObject(type))
res.insert(name);
return res;
}
bool hasObjectColumns(const ColumnsDescription & columns)
{
return std::any_of(columns.begin(), columns.end(), [](const auto & column) { return isObject(column.type); });
}
void extendObjectColumns(NamesAndTypesList & columns_list, const ColumnsDescription & object_columns, bool with_subcolumns)
{
NamesAndTypesList subcolumns_list;
for (auto & column : columns_list)
{
auto object_column = object_columns.tryGetColumn(GetColumnsOptions::All, column.name);
if (object_column)
{
column.type = object_column->type;
if (with_subcolumns)
subcolumns_list.splice(subcolumns_list.end(), object_columns.getSubcolumns(column.name));
}
}
columns_list.splice(columns_list.end(), std::move(subcolumns_list));
}
void updateObjectColumns(ColumnsDescription & object_columns, const NamesAndTypesList & new_columns)
{
for (const auto & new_column : new_columns)
{
auto object_column = object_columns.tryGetColumn(GetColumnsOptions::All, new_column.name);
if (object_column && !object_column->type->equals(*new_column.type))
{
object_columns.modify(new_column.name, [&](auto & column)
{
column.type = getLeastCommonTypeForObject({object_column->type, new_column.type});
});
}
}
}
namespace
{
void flattenTupleImpl(
PathInDataBuilder & builder,
DataTypePtr type,
std::vector<PathInData::Parts> & new_paths,
DataTypes & new_types)
{
if (const auto * type_tuple = typeid_cast<const DataTypeTuple *>(type.get()))
{
const auto & tuple_names = type_tuple->getElementNames();
const auto & tuple_types = type_tuple->getElements();
for (size_t i = 0; i < tuple_names.size(); ++i)
{
builder.append(tuple_names[i], false);
flattenTupleImpl(builder, tuple_types[i], new_paths, new_types);
builder.popBack();
}
}
else if (const auto * type_array = typeid_cast<const DataTypeArray *>(type.get()))
{
PathInDataBuilder element_builder;
std::vector<PathInData::Parts> element_paths;
DataTypes element_types;
flattenTupleImpl(element_builder, type_array->getNestedType(), element_paths, element_types);
assert(element_paths.size() == element_types.size());
for (size_t i = 0; i < element_paths.size(); ++i)
{
builder.append(element_paths[i], true);
new_paths.emplace_back(builder.getParts());
new_types.emplace_back(std::make_shared<DataTypeArray>(element_types[i]));
builder.popBack(element_paths[i].size());
}
}
else
{
new_paths.emplace_back(builder.getParts());
new_types.emplace_back(type);
}
}
/// @offsets_columns are used as stack of array offsets and allows to recreate Array columns.
void flattenTupleImpl(const ColumnPtr & column, Columns & new_columns, Columns & offsets_columns)
{
if (const auto * column_tuple = checkAndGetColumn<ColumnTuple>(column.get()))
{
const auto & subcolumns = column_tuple->getColumns();
for (const auto & subcolumn : subcolumns)
flattenTupleImpl(subcolumn, new_columns, offsets_columns);
}
else if (const auto * column_array = checkAndGetColumn<ColumnArray>(column.get()))
{
offsets_columns.push_back(column_array->getOffsetsPtr());
flattenTupleImpl(column_array->getDataPtr(), new_columns, offsets_columns);
offsets_columns.pop_back();
}
else
{
if (!offsets_columns.empty())
{
auto new_column = ColumnArray::create(column, offsets_columns.back());
for (auto it = offsets_columns.rbegin() + 1; it != offsets_columns.rend(); ++it)
new_column = ColumnArray::create(new_column, *it);
new_columns.push_back(std::move(new_column));
}
else
{
new_columns.push_back(column);
}
}
}
DataTypePtr reduceNumberOfDimensions(DataTypePtr type, size_t dimensions_to_reduce)
{
while (dimensions_to_reduce--)
{
const auto * type_array = typeid_cast<const DataTypeArray *>(type.get());
if (!type_array)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Not enough dimensions to reduce");
type = type_array->getNestedType();
}
return type;
}
ColumnPtr reduceNumberOfDimensions(ColumnPtr column, size_t dimensions_to_reduce)
{
while (dimensions_to_reduce--)
{
const auto * column_array = typeid_cast<const ColumnArray *>(column.get());
if (!column_array)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Not enough dimensions to reduce");
column = column_array->getDataPtr();
}
return column;
}
/// We save intermediate column, type and number of array
/// dimensions for each intermediate node in path in subcolumns tree.
struct ColumnWithTypeAndDimensions
{
ColumnPtr column;
DataTypePtr type;
size_t array_dimensions;
};
using SubcolumnsTreeWithColumns = SubcolumnsTree<ColumnWithTypeAndDimensions>;
using Node = SubcolumnsTreeWithColumns::Node;
/// Creates data type and column from tree of subcolumns.
ColumnWithTypeAndDimensions createTypeFromNode(const Node * node)
{
auto collect_tuple_elemets = [](const auto & children)
{
std::vector<std::tuple<String, ColumnWithTypeAndDimensions>> tuple_elements;
tuple_elements.reserve(children.size());
for (const auto & [name, child] : children)
{
auto column = createTypeFromNode(child.get());
tuple_elements.emplace_back(name, std::move(column));
}
/// Sort to always create the same type for the same set of subcolumns.
std::sort(tuple_elements.begin(), tuple_elements.end(),
[](const auto & lhs, const auto & rhs) { return std::get<0>(lhs) < std::get<0>(rhs); });
auto tuple_names = extractVector<0>(tuple_elements);
auto tuple_columns = extractVector<1>(tuple_elements);
return std::make_tuple(std::move(tuple_names), std::move(tuple_columns));
};
if (node->kind == Node::SCALAR)
{
return node->data;
}
else if (node->kind == Node::NESTED)
{
auto [tuple_names, tuple_columns] = collect_tuple_elemets(node->children);
Columns offsets_columns;
offsets_columns.reserve(tuple_columns[0].array_dimensions + 1);
/// If we have a Nested node and child node with anonymous array levels
/// we need to push a Nested type through all array levels.
/// Example: { "k1": [[{"k2": 1, "k3": 2}] } should be parsed as
/// `k1 Array(Nested(k2 Int, k3 Int))` and k1 is marked as Nested
/// and `k2` and `k3` has anonymous_array_level = 1 in that case.
const auto & current_array = assert_cast<const ColumnArray &>(*node->data.column);
offsets_columns.push_back(current_array.getOffsetsPtr());
auto first_column = tuple_columns[0].column;
for (size_t i = 0; i < tuple_columns[0].array_dimensions; ++i)
{
const auto & column_array = assert_cast<const ColumnArray &>(*first_column);
offsets_columns.push_back(column_array.getOffsetsPtr());
first_column = column_array.getDataPtr();
}
size_t num_elements = tuple_columns.size();
Columns tuple_elements_columns(num_elements);
DataTypes tuple_elements_types(num_elements);
/// Reduce extra array dimensions to get columns and types of Nested elements.
for (size_t i = 0; i < num_elements; ++i)
{
assert(tuple_columns[i].array_dimensions == tuple_columns[0].array_dimensions);
tuple_elements_columns[i] = reduceNumberOfDimensions(tuple_columns[i].column, tuple_columns[i].array_dimensions);
tuple_elements_types[i] = reduceNumberOfDimensions(tuple_columns[i].type, tuple_columns[i].array_dimensions);
}
auto result_column = ColumnArray::create(ColumnTuple::create(tuple_elements_columns), offsets_columns.back());
auto result_type = createNested(tuple_elements_types, tuple_names);
/// Recreate result Array type and Array column.
for (auto it = offsets_columns.rbegin() + 1; it != offsets_columns.rend(); ++it)
{
result_column = ColumnArray::create(result_column, *it);
result_type = std::make_shared<DataTypeArray>(result_type);
}
return {result_column, result_type, tuple_columns[0].array_dimensions};
}
else
{
auto [tuple_names, tuple_columns] = collect_tuple_elemets(node->children);
size_t num_elements = tuple_columns.size();
Columns tuple_elements_columns(num_elements);
DataTypes tuple_elements_types(num_elements);
for (size_t i = 0; i < tuple_columns.size(); ++i)
{
assert(tuple_columns[i].array_dimensions == tuple_columns[0].array_dimensions);
tuple_elements_columns[i] = tuple_columns[i].column;
tuple_elements_types[i] = tuple_columns[i].type;
}
auto result_column = ColumnTuple::create(tuple_elements_columns);
auto result_type = std::make_shared<DataTypeTuple>(tuple_elements_types, tuple_names);
return {result_column, result_type, tuple_columns[0].array_dimensions};
}
}
}
std::pair<PathsInData, DataTypes> flattenTuple(const DataTypePtr & type)
{
std::vector<PathInData::Parts> new_path_parts;
DataTypes new_types;
PathInDataBuilder builder;
flattenTupleImpl(builder, type, new_path_parts, new_types);
PathsInData new_paths(new_path_parts.begin(), new_path_parts.end());
return {new_paths, new_types};
}
ColumnPtr flattenTuple(const ColumnPtr & column)
{
Columns new_columns;
Columns offsets_columns;
flattenTupleImpl(column, new_columns, offsets_columns);
return ColumnTuple::create(new_columns);
}
DataTypePtr unflattenTuple(const PathsInData & paths, const DataTypes & tuple_types)
{
assert(paths.size() == tuple_types.size());
Columns tuple_columns;
tuple_columns.reserve(tuple_types.size());
for (const auto & type : tuple_types)
tuple_columns.emplace_back(type->createColumn());
return unflattenTuple(paths, tuple_types, tuple_columns).second;
}
std::pair<ColumnPtr, DataTypePtr> unflattenTuple(
const PathsInData & paths,
const DataTypes & tuple_types,
const Columns & tuple_columns)
{
assert(paths.size() == tuple_types.size());
assert(paths.size() == tuple_columns.size());
/// We add all paths to the subcolumn tree and then create a type from it.
/// The tree stores column, type and number of array dimensions
/// for each intermediate node.
SubcolumnsTreeWithColumns tree;
for (size_t i = 0; i < paths.size(); ++i)
{
auto column = tuple_columns[i];
auto type = tuple_types[i];
const auto & parts = paths[i].getParts();
size_t num_parts = parts.size();
size_t pos = 0;
tree.add(paths[i], [&](Node::Kind kind, bool exists) -> std::shared_ptr<Node>
{
if (pos >= num_parts)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Not enough name parts for path {}. Expected at least {}, got {}",
paths[i].getPath(), pos + 1, num_parts);
size_t array_dimensions = kind == Node::NESTED ? 1 : parts[pos].anonymous_array_level;
ColumnWithTypeAndDimensions current_column{column, type, array_dimensions};
/// Get type and column for next node.
if (array_dimensions)
{
type = reduceNumberOfDimensions(type, array_dimensions);
column = reduceNumberOfDimensions(column, array_dimensions);
}
++pos;
if (exists)
return nullptr;
return kind == Node::SCALAR
? std::make_shared<Node>(kind, current_column, paths[i])
: std::make_shared<Node>(kind, current_column);
});
}
auto [column, type, _] = createTypeFromNode(tree.getRoot());
return std::make_pair(std::move(column), std::move(type));
}
static void addConstantToWithClause(const ASTPtr & query, const String & column_name, const DataTypePtr & data_type)
{
auto & select = query->as<ASTSelectQuery &>();
if (!select.with())
select.setExpression(ASTSelectQuery::Expression::WITH, std::make_shared<ASTExpressionList>());
/// TODO: avoid materialize
auto node = makeASTFunction("materialize",
makeASTFunction("CAST",
std::make_shared<ASTLiteral>(data_type->getDefault()),
std::make_shared<ASTLiteral>(data_type->getName())));
node->alias = column_name;
node->prefer_alias_to_column_name = true;
select.with()->children.push_back(std::move(node));
}
/// @expected_columns and @available_columns contain descriptions
/// of extended Object columns.
void replaceMissedSubcolumnsByConstants(
const ColumnsDescription & expected_columns,
const ColumnsDescription & available_columns,
ASTPtr query)
{
NamesAndTypes missed_names_types;
/// Find all subcolumns that are in @expected_columns, but not in @available_columns.
for (const auto & column : available_columns)
{
auto expected_column = expected_columns.getColumn(GetColumnsOptions::All, column.name);
/// Extract all paths from both descriptions to easily check existence of subcolumns.
auto [available_paths, available_types] = flattenTuple(column.type);
auto [expected_paths, expected_types] = flattenTuple(expected_column.type);
auto extract_names_and_types = [&column](const auto & paths, const auto & types)
{
NamesAndTypes res;
res.reserve(paths.size());
for (size_t i = 0; i < paths.size(); ++i)
{
auto full_name = Nested::concatenateName(column.name, paths[i].getPath());
res.emplace_back(full_name, types[i]);
}
std::sort(res.begin(), res.end());
return res;
};
auto available_names_types = extract_names_and_types(available_paths, available_types);
auto expected_names_types = extract_names_and_types(expected_paths, expected_types);
std::set_difference(
expected_names_types.begin(), expected_names_types.end(),
available_names_types.begin(), available_names_types.end(),
std::back_inserter(missed_names_types),
[](const auto & lhs, const auto & rhs) { return lhs.name < rhs.name; });
}
if (missed_names_types.empty())
return;
IdentifierNameSet identifiers;
query->collectIdentifierNames(identifiers);
/// Replace missed subcolumns to default literals of theirs type.
for (const auto & [name, type] : missed_names_types)
if (identifiers.count(name))
addConstantToWithClause(query, name, type);
}
void finalizeObjectColumns(MutableColumns & columns)
{
for (auto & column : columns)
if (auto * column_object = typeid_cast<ColumnObject *>(column.get()))
column_object->finalize();
}
}

140
src/DataTypes/ObjectUtils.h Normal file
View File

@ -0,0 +1,140 @@
#pragma once
#include <Core/Block.h>
#include <Core/NamesAndTypes.h>
#include <Common/FieldVisitors.h>
#include <Storages/ColumnsDescription.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/Serializations/JSONDataParser.h>
#include <DataTypes/DataTypesNumber.h>
#include <Columns/ColumnObject.h>
namespace DB
{
/// Returns number of dimensions in Array type. 0 if type is not array.
size_t getNumberOfDimensions(const IDataType & type);
/// Returns number of dimensions in Array column. 0 if column is not array.
size_t getNumberOfDimensions(const IColumn & column);
/// Returns type of scalars of Array of arbitrary dimensions.
DataTypePtr getBaseTypeOfArray(const DataTypePtr & type);
/// Returns Array type with requested scalar type and number of dimensions.
DataTypePtr createArrayOfType(DataTypePtr type, size_t num_dimensions);
/// Returns column of scalars of Array of arbitrary dimensions.
ColumnPtr getBaseColumnOfArray(const ColumnPtr & column);
/// Returns empty Array column with requested scalar column and number of dimensions.
ColumnPtr createArrayOfColumn(const ColumnPtr & column, size_t num_dimensions);
/// Returns Array with requested number of dimensions and no scalars.
Array createEmptyArrayField(size_t num_dimensions);
/// Tries to get data type by column. Only limited subset of types is supported
DataTypePtr getDataTypeByColumn(const IColumn & column);
/// Converts Object types and columns to Tuples in @columns_list and @block
/// and checks that types are consistent with types in @extended_storage_columns.
void convertObjectsToTuples(NamesAndTypesList & columns_list, Block & block, const NamesAndTypesList & extended_storage_columns);
/// Checks that each path is not the prefix of any other path.
void checkObjectHasNoAmbiguosPaths(const PathsInData & paths);
/// Receives several Tuple types and deduces the least common type among them.
DataTypePtr getLeastCommonTypeForObject(const DataTypes & types, bool check_ambiguos_paths = false);
/// Converts types of object columns to tuples in @columns_list
/// according to @object_columns and adds all tuple's subcolumns if needed.
void extendObjectColumns(NamesAndTypesList & columns_list, const ColumnsDescription & object_columns, bool with_subcolumns);
NameSet getNamesOfObjectColumns(const NamesAndTypesList & columns_list);
bool hasObjectColumns(const ColumnsDescription & columns);
void finalizeObjectColumns(MutableColumns & columns);
/// Updates types of objects in @object_columns inplace
/// according to types in new_columns.
void updateObjectColumns(ColumnsDescription & object_columns, const NamesAndTypesList & new_columns);
using DataTypeTuplePtr = std::shared_ptr<DataTypeTuple>;
/// Flattens nested Tuple to plain Tuple. I.e extracts all paths and types from tuple.
/// E.g. Tuple(t Tuple(c1 UInt32, c2 String), c3 UInt64) -> Tuple(t.c1 UInt32, t.c2 String, c3 UInt32)
std::pair<PathsInData, DataTypes> flattenTuple(const DataTypePtr & type);
/// Flattens nested Tuple column to plain Tuple column.
ColumnPtr flattenTuple(const ColumnPtr & column);
/// The reverse operation to 'flattenTuple'.
/// Creates nested Tuple from all paths and types.
/// E.g. Tuple(t.c1 UInt32, t.c2 String, c3 UInt32) -> Tuple(t Tuple(c1 UInt32, c2 String), c3 UInt64)
DataTypePtr unflattenTuple(
const PathsInData & paths,
const DataTypes & tuple_types);
std::pair<ColumnPtr, DataTypePtr> unflattenTuple(
const PathsInData & paths,
const DataTypes & tuple_types,
const Columns & tuple_columns);
/// For all columns which exist in @expected_columns and
/// don't exist in @available_columns adds to WITH clause
/// an alias with column name to literal of default value of column type.
void replaceMissedSubcolumnsByConstants(
const ColumnsDescription & expected_columns,
const ColumnsDescription & available_columns,
ASTPtr query);
/// Receives range of objects, which contains collections
/// of columns-like objects (e.g. ColumnsDescription or NamesAndTypesList)
/// and deduces the common types of object columns for all entries.
/// @entry_columns_getter should extract reference to collection of
/// columns-like objects from entry to which Iterator points.
/// columns-like object should have fields "name" and "type".
template <typename Iterator, typename EntryColumnsGetter>
ColumnsDescription getObjectColumns(
Iterator begin, Iterator end,
const ColumnsDescription & storage_columns,
EntryColumnsGetter && entry_columns_getter)
{
ColumnsDescription res;
if (begin == end)
{
for (const auto & column : storage_columns)
{
if (isObject(column.type))
{
auto tuple_type = std::make_shared<DataTypeTuple>(
DataTypes{std::make_shared<DataTypeUInt8>()},
Names{ColumnObject::COLUMN_NAME_DUMMY});
res.add({column.name, std::move(tuple_type)});
}
}
return res;
}
std::unordered_map<String, DataTypes> types_in_entries;
for (auto it = begin; it != end; ++it)
{
const auto & entry_columns = entry_columns_getter(*it);
for (const auto & column : entry_columns)
{
auto storage_column = storage_columns.tryGetPhysical(column.name);
if (storage_column && isObject(storage_column->type))
types_in_entries[column.name].push_back(column.type);
}
}
for (const auto & [name, types] : types_in_entries)
res.add({name, getLeastCommonTypeForObject(types)});
return res;
}
}

View File

@ -0,0 +1,3 @@
if (ENABLE_TESTS)
add_subdirectory (tests)
endif ()

View File

@ -172,6 +172,10 @@ String getNameForSubstreamPath(
else
stream_name += "." + it->tuple_element_name;
}
else if (it->type == Substream::ObjectElement)
{
stream_name += escapeForFileName(".") + escapeForFileName(it->object_key_name);
}
}
return stream_name;

View File

@ -125,6 +125,9 @@ public:
SparseElements,
SparseOffsets,
ObjectStructure,
ObjectElement,
Regular,
};
@ -133,6 +136,9 @@ public:
/// Index of tuple element, starting at 1 or name.
String tuple_element_name;
/// Name of subcolumn of object column.
String object_key_name;
/// Do we need to escape a dot in filenames for tuple elements.
bool escape_tuple_delimiter = true;

View File

@ -0,0 +1,183 @@
#pragma once
#include <IO/ReadHelpers.h>
#include <Common/HashTable/HashMap.h>
#include <Common/checkStackSize.h>
#include <DataTypes/Serializations/PathInData.h>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
class ReadBuffer;
class WriteBuffer;
template <typename Element>
static Field getValueAsField(const Element & element)
{
if (element.isBool()) return element.getBool();
if (element.isInt64()) return element.getInt64();
if (element.isUInt64()) return element.getUInt64();
if (element.isDouble()) return element.getDouble();
if (element.isString()) return element.getString();
if (element.isNull()) return Field();
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unsupported type of JSON field");
}
template <typename ParserImpl>
class JSONDataParser
{
public:
using Element = typename ParserImpl::Element;
static void readJSON(String & s, ReadBuffer & buf)
{
readJSONObjectPossiblyInvalid(s, buf);
}
std::optional<ParseResult> parse(const char * begin, size_t length)
{
std::string_view json{begin, length};
Element document;
if (!parser.parse(json, document))
return {};
ParseResult result;
PathInDataBuilder builder;
std::vector<PathInData::Parts> paths;
traverse(document, builder, paths, result.values);
result.paths.reserve(paths.size());
for (auto && path : paths)
result.paths.emplace_back(std::move(path));
return result;
}
private:
void traverse(
const Element & element,
PathInDataBuilder & builder,
std::vector<PathInData::Parts> & paths,
std::vector<Field> & values)
{
checkStackSize();
if (element.isObject())
{
auto object = element.getObject();
paths.reserve(paths.size() + object.size());
values.reserve(values.size() + object.size());
for (auto it = object.begin(); it != object.end(); ++it)
{
const auto & [key, value] = *it;
traverse(value, builder.append(key, false), paths, values);
builder.popBack();
}
}
else if (element.isArray())
{
auto array = element.getArray();
using PathPartsWithArray = std::pair<PathInData::Parts, Array>;
using PathToArray = HashMapWithStackMemory<UInt128, PathPartsWithArray, UInt128TrivialHash, 5>;
/// Traverse elements of array and collect an array
/// of fields by each path.
PathToArray arrays_by_path;
Arena strings_pool;
size_t current_size = 0;
for (auto it = array.begin(); it != array.end(); ++it)
{
std::vector<PathInData::Parts> element_paths;
std::vector<Field> element_values;
PathInDataBuilder element_builder;
traverse(*it, element_builder, element_paths, element_values);
size_t size = element_paths.size();
size_t keys_to_update = arrays_by_path.size();
for (size_t i = 0; i < size; ++i)
{
UInt128 hash = PathInData::getPartsHash(element_paths[i]);
if (auto * found = arrays_by_path.find(hash))
{
auto & path_array = found->getMapped().second;
assert(path_array.size() == current_size);
path_array.push_back(std::move(element_values[i]));
--keys_to_update;
}
else
{
/// We found a new key. Add and empty array with current size.
Array path_array;
path_array.reserve(array.size());
path_array.resize(current_size);
path_array.push_back(std::move(element_values[i]));
auto & elem = arrays_by_path[hash];
elem.first = std::move(element_paths[i]);
elem.second = std::move(path_array);
}
}
/// If some of the keys are missed in current element,
/// add default values for them.
if (keys_to_update)
{
for (auto & [_, value] : arrays_by_path)
{
auto & path_array = value.second;
assert(path_array.size() == current_size || path_array.size() == current_size + 1);
if (path_array.size() == current_size)
path_array.push_back(Field());
}
}
++current_size;
}
if (arrays_by_path.empty())
{
paths.push_back(builder.getParts());
values.push_back(Array());
}
else
{
paths.reserve(paths.size() + arrays_by_path.size());
values.reserve(values.size() + arrays_by_path.size());
for (auto && [_, value] : arrays_by_path)
{
auto && [path, path_array] = value;
/// Merge prefix path and path of array element.
paths.push_back(builder.append(path, true).getParts());
values.push_back(std::move(path_array));
builder.popBack(path.size());
}
}
}
else
{
paths.push_back(builder.getParts());
values.push_back(getValueAsField(element));
}
}
ParserImpl parser;
};
}

Some files were not shown because too many files have changed in this diff Show More