mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-22 23:52:03 +00:00
Merge branch 'master' into distributedmultiplejoin
This commit is contained in:
commit
e5a5ab2a40
1
.gitattributes
vendored
1
.gitattributes
vendored
@ -1,2 +1,3 @@
|
||||
contrib/* linguist-vendored
|
||||
*.h linguist-language=C++
|
||||
tests/queries/0_stateless/data_json/* binary
|
||||
|
2
.github/workflows/release.yml
vendored
2
.github/workflows/release.yml
vendored
@ -32,7 +32,7 @@ jobs:
|
||||
uses: svenstaro/upload-release-action@v2
|
||||
with:
|
||||
repo_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
file: ${{runner.temp}}/release_packages/*
|
||||
file: ${{runner.temp}}/push_to_artifactory/*
|
||||
overwrite: true
|
||||
tag: ${{ github.ref }}
|
||||
file_glob: true
|
||||
|
138
CHANGELOG.md
138
CHANGELOG.md
@ -1,4 +1,138 @@
|
||||
### ClickHouse release v22.2, 2022-02-17
|
||||
### Table of Contents
|
||||
**[ClickHouse release v22.3-lts, 2022-03-17](#223)**<br>
|
||||
**[ClickHouse release v22.2, 2022-02-17](#222)**<br>
|
||||
**[ClickHouse release v22.1, 2022-01-18](#221)**<br>
|
||||
**[Changelog for 2021](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/whats-new/changelog/2021.md)**<br>
|
||||
|
||||
|
||||
## <a id="223"></a> ClickHouse release v22.3-lts, 2022-03-17
|
||||
|
||||
#### Backward Incompatible Change
|
||||
|
||||
* Make `arrayCompact` function behave as other higher-order functions: perform compaction not of lambda function results but on the original array. If you're using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrapping `arrayCompact` arguments into `arrayMap`. Closes [#34010](https://github.com/ClickHouse/ClickHouse/issues/34010) [#18535](https://github.com/ClickHouse/ClickHouse/issues/18535) [#14778](https://github.com/ClickHouse/ClickHouse/issues/14778). [#34795](https://github.com/ClickHouse/ClickHouse/pull/34795) ([Alexandre Snarskii](https://github.com/snar)).
|
||||
* Change implementation specific behavior on overflow of function `toDatetime`. It will be saturated to the nearest min/max supported instant of datetime instead of wraparound. This change is highlighted as "backward incompatible" because someone may unintentionally rely on the old behavior. [#32898](https://github.com/ClickHouse/ClickHouse/pull/32898) ([HaiBo Li](https://github.com/marising)).
|
||||
|
||||
#### New Feature
|
||||
|
||||
* Support for caching data locally for remote filesystems. It can be enabled for `s3` disks. Closes [#28961](https://github.com/ClickHouse/ClickHouse/issues/28961). [#33717](https://github.com/ClickHouse/ClickHouse/pull/33717) ([Kseniia Sumarokova](https://github.com/kssenii)). In the meantime, we enabled the test suite on s3 filesystem and no more known issues exist, so it is started to be production ready.
|
||||
* Add new table function `hive`. It can be used as follows `hive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>')` for example `SELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', 'id Nullable(String), score Nullable(Int32), day Nullable(String)', 'day')`. [#34946](https://github.com/ClickHouse/ClickHouse/pull/34946) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Support authentication of users connected via SSL by their X.509 certificate. [#31484](https://github.com/ClickHouse/ClickHouse/pull/31484) ([eungenue](https://github.com/eungenue)).
|
||||
* Support schema inference for inserting into table functions `file`/`hdfs`/`s3`/`url`. [#34732](https://github.com/ClickHouse/ClickHouse/pull/34732) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Now you can read `system.zookeeper` table without restrictions on path or using `like` expression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable setting `allow_unrestricted_reads_from_keeper`. [#34609](https://github.com/ClickHouse/ClickHouse/pull/34609) ([Sergei Trifonov](https://github.com/serxa)).
|
||||
* Display CPU and memory metrics in clickhouse-local. Close [#34545](https://github.com/ClickHouse/ClickHouse/issues/34545). [#34605](https://github.com/ClickHouse/ClickHouse/pull/34605) ([李扬](https://github.com/taiyang-li)).
|
||||
* Implement `startsWith` and `endsWith` function for arrays, closes [#33982](https://github.com/ClickHouse/ClickHouse/issues/33982). [#34368](https://github.com/ClickHouse/ClickHouse/pull/34368) ([usurai](https://github.com/usurai)).
|
||||
* Add three functions for Map data type: 1. `mapReplace(map1, map2)` - replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2. `mapFilter` 3. `mapMap`. mapFilter and mapMap are higher order functions, accepting two arguments, the first argument is a lambda function with k, v pair as arguments, the second argument is a column of type Map. [#33698](https://github.com/ClickHouse/ClickHouse/pull/33698) ([hexiaoting](https://github.com/hexiaoting)).
|
||||
* Allow getting default user and password for clickhouse-client from the `CLICKHOUSE_USER` and `CLICKHOUSE_PASSWORD` environment variables. Close [#34538](https://github.com/ClickHouse/ClickHouse/issues/34538). [#34947](https://github.com/ClickHouse/ClickHouse/pull/34947) ([DR](https://github.com/freedomDR)).
|
||||
|
||||
#### Experimental Feature
|
||||
|
||||
* New data type `Object(<schema_format>)`, which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.g `data.key1.key2` or with cast operator `data.key1.key2::Int64`.
|
||||
* Add `database_replicated_allow_only_replicated_engine` setting. When enabled, it only allowed to only create `Replicated` tables or tables with stateless engines in `Replicated` databases. [#35214](https://github.com/ClickHouse/ClickHouse/pull/35214) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). Note that `Replicated` database is still an experimental feature.
|
||||
|
||||
#### Performance Improvement
|
||||
|
||||
* Improve performance of insertion into `MergeTree` tables by optimizing sorting. Up to 2x improvement is observed on realistic benchmarks. [#34750](https://github.com/ClickHouse/ClickHouse/pull/34750) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Columns pruning when reading Parquet, ORC and Arrow files from URL and S3. Closes [#34163](https://github.com/ClickHouse/ClickHouse/issues/34163). [#34849](https://github.com/ClickHouse/ClickHouse/pull/34849) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Columns pruning when reading Parquet, ORC and Arrow files from Hive. [#34954](https://github.com/ClickHouse/ClickHouse/pull/34954) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* A bunch of performance optimizations from a performance superhero. Improve performance of processing queries with large `IN` section. Improve performance of `direct` dictionary if its source is `ClickHouse`. Improve performance of `detectCharset `, `detectLanguageUnknown ` functions. [#34888](https://github.com/ClickHouse/ClickHouse/pull/34888) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve performance of `any` aggregate function by using more batching. [#34760](https://github.com/ClickHouse/ClickHouse/pull/34760) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Multiple improvements for performance of `clickhouse-keeper`: less locking [#35010](https://github.com/ClickHouse/ClickHouse/pull/35010) ([zhanglistar](https://github.com/zhanglistar)), lower memory usage by streaming reading and writing of snapshot instead of full copy. [#34584](https://github.com/ClickHouse/ClickHouse/pull/34584) ([zhanglistar](https://github.com/zhanglistar)), optimizing compaction of log store in the RAFT implementation. [#34534](https://github.com/ClickHouse/ClickHouse/pull/34534) ([zhanglistar](https://github.com/zhanglistar)), versioning of the internal data structure [#34486](https://github.com/ClickHouse/ClickHouse/pull/34486) ([zhanglistar](https://github.com/zhanglistar)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
* Allow asynchronous inserts to table functions. Fixes [#34864](https://github.com/ClickHouse/ClickHouse/issues/34864). [#34866](https://github.com/ClickHouse/ClickHouse/pull/34866) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Implicit type casting of the key argument for functions `dictGetHierarchy`, `dictIsIn`, `dictGetChildren`, `dictGetDescendants`. Closes [#34970](https://github.com/ClickHouse/ClickHouse/issues/34970). [#35027](https://github.com/ClickHouse/ClickHouse/pull/35027) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* `EXPLAIN AST` query can output AST in form of a graph in Graphviz format: `EXPLAIN AST graph = 1 SELECT * FROM system.parts`. [#35173](https://github.com/ClickHouse/ClickHouse/pull/35173) ([李扬](https://github.com/taiyang-li)).
|
||||
* When large files were written with `s3` table function or table engine, the content type on the files was mistakenly set to `application/xml` due to a bug in the AWS SDK. This closes [#33964](https://github.com/ClickHouse/ClickHouse/issues/33964). [#34433](https://github.com/ClickHouse/ClickHouse/pull/34433) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Change restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also `SHOW CREATE ROW POLICY` will always show `AS permissive` or `AS restrictive` in row policy's definition. [#34596](https://github.com/ClickHouse/ClickHouse/pull/34596) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Improve schema inference with globs in File/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. [#34465](https://github.com/ClickHouse/ClickHouse/pull/34465) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Play UI now correctly detects the preferred light/dark theme from the OS. [#35068](https://github.com/ClickHouse/ClickHouse/pull/35068) ([peledni](https://github.com/peledni)).
|
||||
* Added `date_time_input_format = 'best_effort_us'`. Closes [#34799](https://github.com/ClickHouse/ClickHouse/issues/34799). [#34982](https://github.com/ClickHouse/ClickHouse/pull/34982) ([WenYao](https://github.com/Cai-Yao)).
|
||||
* A new settings called `allow_plaintext_password` and `allow_no_password` are added in server configuration which turn on/off authentication types that can be potentially insecure in some environments. They are allowed by default. [#34738](https://github.com/ClickHouse/ClickHouse/pull/34738) ([Heena Bansal](https://github.com/HeenaBansal2009)).
|
||||
* Support for `DateTime64` data type in `Arrow` format, closes [#8280](https://github.com/ClickHouse/ClickHouse/issues/8280) and closes [#28574](https://github.com/ClickHouse/ClickHouse/issues/28574). [#34561](https://github.com/ClickHouse/ClickHouse/pull/34561) ([李扬](https://github.com/taiyang-li)).
|
||||
* Reload `remote_url_allow_hosts` (filtering of outgoing connections) on config update. [#35294](https://github.com/ClickHouse/ClickHouse/pull/35294) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Support `--testmode` parameter for `clickhouse-local`. This parameter enables interpretation of test hints that we use in functional tests. [#35264](https://github.com/ClickHouse/ClickHouse/pull/35264) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Add `distributed_depth` to query log. It is like a more detailed variant of `is_initial_query` [#35207](https://github.com/ClickHouse/ClickHouse/pull/35207) ([李扬](https://github.com/taiyang-li)).
|
||||
* Respect `remote_url_allow_hosts` for `MySQL` and `PostgreSQL` table functions. [#35191](https://github.com/ClickHouse/ClickHouse/pull/35191) ([Heena Bansal](https://github.com/HeenaBansal2009)).
|
||||
* Added `disk_name` field to `system.part_log`. [#35178](https://github.com/ClickHouse/ClickHouse/pull/35178) ([Artyom Yurkov](https://github.com/Varinara)).
|
||||
* Do not retry non-rertiable errors when querying remote URLs. Closes [#35161](https://github.com/ClickHouse/ClickHouse/issues/35161). [#35172](https://github.com/ClickHouse/ClickHouse/pull/35172) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Support distributed INSERT SELECT queries (the setting `parallel_distributed_insert_select`) table function `view()`. [#35132](https://github.com/ClickHouse/ClickHouse/pull/35132) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* More precise memory tracking during `INSERT` into `Buffer` with `AggregateFunction`. [#35072](https://github.com/ClickHouse/ClickHouse/pull/35072) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes [#34787](https://github.com/ClickHouse/ClickHouse/issues/34787). [#35032](https://github.com/ClickHouse/ClickHouse/pull/35032) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. [#35004](https://github.com/ClickHouse/ClickHouse/pull/35004) ([alesapin](https://github.com/alesapin)).
|
||||
* Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes [#34929](https://github.com/ClickHouse/ClickHouse/issues/34929). [#34949](https://github.com/ClickHouse/ClickHouse/pull/34949) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Use connection pool for Hive metastore client. [#34940](https://github.com/ClickHouse/ClickHouse/pull/34940) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Ignore per-column `TTL` in `CREATE TABLE AS` if new table engine does not support it (i.e. if the engine is not of `MergeTree` family). [#34938](https://github.com/ClickHouse/ClickHouse/pull/34938) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Allow `LowCardinality` strings for `ngrambf_v1`/`tokenbf_v1` indexes. Closes [#21865](https://github.com/ClickHouse/ClickHouse/issues/21865). [#34911](https://github.com/ClickHouse/ClickHouse/pull/34911) ([Lars Hiller Eidnes](https://github.com/larspars)).
|
||||
* Allow opening empty sqlite db if the file doesn't exist. Closes [#33367](https://github.com/ClickHouse/ClickHouse/issues/33367). [#34907](https://github.com/ClickHouse/ClickHouse/pull/34907) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Implement memory statistics for FreeBSD - this is required for `max_server_memory_usage` to work correctly. [#34902](https://github.com/ClickHouse/ClickHouse/pull/34902) ([Alexandre Snarskii](https://github.com/snar)).
|
||||
* In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes [#34324](https://github.com/ClickHouse/ClickHouse/issues/34324). [#34801](https://github.com/ClickHouse/ClickHouse/pull/34801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Now `ALTER TABLE DROP COLUMN columnX` queries for `MergeTree` table engines will work instantly when `columnX` is an `ALIAS` column. Fixes [#34660](https://github.com/ClickHouse/ClickHouse/issues/34660). [#34786](https://github.com/ClickHouse/ClickHouse/pull/34786) ([alesapin](https://github.com/alesapin)).
|
||||
* Show hints when user mistyped the name of a data skipping index. Closes [#29698](https://github.com/ClickHouse/ClickHouse/issues/29698). [#34764](https://github.com/ClickHouse/ClickHouse/pull/34764) ([flynn](https://github.com/ucasfl)).
|
||||
* Support `remote()`/`cluster()` table functions for `parallel_distributed_insert_select`. [#34728](https://github.com/ClickHouse/ClickHouse/pull/34728) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Do not reset logging that configured via `--log-file`/`--errorlog-file` command line options in case of empty configuration in the config file. [#34718](https://github.com/ClickHouse/ClickHouse/pull/34718) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. [#34684](https://github.com/ClickHouse/ClickHouse/pull/34684) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Allow specifying argument names for executable UDFs. This is necessary for formats where argument name is part of serialization, like `Native`, `JSONEachRow`. Closes [#34604](https://github.com/ClickHouse/ClickHouse/issues/34604). [#34653](https://github.com/ClickHouse/ClickHouse/pull/34653) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* `MaterializedMySQL` (experimental feature) now supports `materialized_mysql_tables_list` (a comma-separated list of MySQL database tables, which will be replicated by the MaterializedMySQL database engine. Default value: empty list — means all the tables will be replicated), mentioned at [#32977](https://github.com/ClickHouse/ClickHouse/issues/32977). [#34487](https://github.com/ClickHouse/ClickHouse/pull/34487) ([zzsmdfj](https://github.com/zzsmdfj)).
|
||||
* Improve OpenTelemetry span logs for INSERT operation on distributed table. [#34480](https://github.com/ClickHouse/ClickHouse/pull/34480) ([Frank Chen](https://github.com/FrankChen021)).
|
||||
* Make the znode `ctime` and `mtime` consistent between servers in ClickHouse Keeper. [#33441](https://github.com/ClickHouse/ClickHouse/pull/33441) ([小路](https://github.com/nicelulu)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Package repository is migrated to JFrog Artifactory (**Mikhail f. Shiryaev**).
|
||||
* Randomize some settings in functional tests, so more possible combinations of settings will be tested. This is yet another fuzzing method to ensure better test coverage. This closes [#32268](https://github.com/ClickHouse/ClickHouse/issues/32268). [#34092](https://github.com/ClickHouse/ClickHouse/pull/34092) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Drop PVS-Studio from our CI. [#34680](https://github.com/ClickHouse/ClickHouse/pull/34680) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Add an ability to build stripped binaries with CMake. In previous versions it was performed by dh-tools. [#35196](https://github.com/ClickHouse/ClickHouse/pull/35196) ([alesapin](https://github.com/alesapin)).
|
||||
* Smaller "fat-free" `clickhouse-keeper` build. [#35031](https://github.com/ClickHouse/ClickHouse/pull/35031) ([alesapin](https://github.com/alesapin)).
|
||||
* Use @robot-clickhouse as an author and committer for PRs like https://github.com/ClickHouse/ClickHouse/pull/34685. [#34793](https://github.com/ClickHouse/ClickHouse/pull/34793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Limit DWARF version for debug info by 4 max, because our internal stack symbolizer cannot parse DWARF version 5. This makes sense if you compile ClickHouse with clang-15. [#34777](https://github.com/ClickHouse/ClickHouse/pull/34777) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Remove `clickhouse-test` debian package as unneeded complication. CI use tests from repository and standalone testing via deb package is no longer supported. [#34606](https://github.com/ClickHouse/ClickHouse/pull/34606) ([Ilya Yatsishin](https://github.com/qoega)).
|
||||
|
||||
#### Bug Fix (user-visible misbehaviour in official stable or prestable release)
|
||||
|
||||
* A fix for HDFS integration: When the inner buffer size is too small, NEED_MORE_INPUT in `HadoopSnappyDecoder` will run multi times (>=3) for one compressed block. This makes the input data be copied into the wrong place in `HadoopSnappyDecoder::buffer`. [#35116](https://github.com/ClickHouse/ClickHouse/pull/35116) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Ignore obsolete grants in ATTACH GRANT statements. This PR fixes [#34815](https://github.com/ClickHouse/ClickHouse/issues/34815). [#34855](https://github.com/ClickHouse/ClickHouse/pull/34855) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes [#35312](https://github.com/ClickHouse/ClickHouse/issues/35312). [#35313](https://github.com/ClickHouse/ClickHouse/pull/35313) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix partial merge join duplicate rows bug, close [#31009](https://github.com/ClickHouse/ClickHouse/issues/31009). [#35311](https://github.com/ClickHouse/ClickHouse/pull/35311) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix possible `Assertion 'position() != working_buffer.end()' failed` while using bzip2 compression with small `max_read_buffer_size` setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. [#35300](https://github.com/ClickHouse/ClickHouse/pull/35300) ([Kruglov Pavel](https://github.com/Avogar)). While using lz4 compression with a small max_read_buffer_size setting value. [#35296](https://github.com/ClickHouse/ClickHouse/pull/35296) ([Kruglov Pavel](https://github.com/Avogar)). While using lzma compression with small `max_read_buffer_size` setting value. [#35295](https://github.com/ClickHouse/ClickHouse/pull/35295) ([Kruglov Pavel](https://github.com/Avogar)). While using `brotli` compression with a small `max_read_buffer_size` setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. [#35281](https://github.com/ClickHouse/ClickHouse/pull/35281) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix possible segfault in `JSONEachRow` schema inference. [#35291](https://github.com/ClickHouse/ClickHouse/pull/35291) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix `CHECK TABLE` query in case when sparse columns are enabled in table. [#35274](https://github.com/ClickHouse/ClickHouse/pull/35274) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Avoid std::terminate in case of exception in reading from remote VFS. [#35257](https://github.com/ClickHouse/ClickHouse/pull/35257) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix reading port from config, close [#34776](https://github.com/ClickHouse/ClickHouse/issues/34776). [#35193](https://github.com/ClickHouse/ClickHouse/pull/35193) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix error in query with `WITH TOTALS` in case if `HAVING` returned empty result. This fixes [#33711](https://github.com/ClickHouse/ClickHouse/issues/33711). [#35186](https://github.com/ClickHouse/ClickHouse/pull/35186) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix a corner case of `replaceRegexpAll`, close [#35117](https://github.com/ClickHouse/ClickHouse/issues/35117). [#35182](https://github.com/ClickHouse/ClickHouse/pull/35182) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Schema inference didn't work properly on case of `INSERT INTO FUNCTION s3(...) FROM ...`, it tried to read schema from s3 file instead of from select query. [#35176](https://github.com/ClickHouse/ClickHouse/pull/35176) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix MaterializedPostgreSQL (experimental feature) `table overrides` for partition by, etc. Closes [#35048](https://github.com/ClickHouse/ClickHouse/issues/35048). [#35162](https://github.com/ClickHouse/ClickHouse/pull/35162) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix MaterializedPostgreSQL (experimental feature) adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes [#33800](https://github.com/ClickHouse/ClickHouse/issues/33800). Closes [#34922](https://github.com/ClickHouse/ClickHouse/issues/34922). Closes [#34315](https://github.com/ClickHouse/ClickHouse/issues/34315). [#35158](https://github.com/ClickHouse/ClickHouse/pull/35158) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix partition pruning error when non-monotonic function is used with IN operator. This fixes [#35136](https://github.com/ClickHouse/ClickHouse/issues/35136). [#35146](https://github.com/ClickHouse/ClickHouse/pull/35146) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fixed slightly incorrect translation of YAML configs to XML. [#35135](https://github.com/ClickHouse/ClickHouse/pull/35135) ([Miel Donkers](https://github.com/mdonkers)).
|
||||
* Fix `optimize_skip_unused_shards_rewrite_in` for signed columns and negative values. [#35134](https://github.com/ClickHouse/ClickHouse/pull/35134) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* The `update_lag` external dictionary configuration option was unusable showing the error message ``Unexpected key `update_lag` in dictionary source configuration``. [#35089](https://github.com/ClickHouse/ClickHouse/pull/35089) ([Jason Chu](https://github.com/1lann)).
|
||||
* Avoid possible deadlock on server shutdown. [#35081](https://github.com/ClickHouse/ClickHouse/pull/35081) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix missing alias after function is optimized to a subcolumn when setting `optimize_functions_to_subcolumns` is enabled. Closes [#33798](https://github.com/ClickHouse/ClickHouse/issues/33798). [#35079](https://github.com/ClickHouse/ClickHouse/pull/35079) ([qieqieplus](https://github.com/qieqieplus)).
|
||||
* Fix reading from `system.asynchronous_inserts` table if there exists asynchronous insert into table function. [#35050](https://github.com/ClickHouse/ClickHouse/pull/35050) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix possible exception `Reading for MergeTree family tables must be done with last position boundary` (relevant to operation on remote VFS). Closes [#34979](https://github.com/ClickHouse/ClickHouse/issues/34979). [#35001](https://github.com/ClickHouse/ClickHouse/pull/35001) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix unexpected result when use -State type aggregate function in window frame. [#34999](https://github.com/ClickHouse/ClickHouse/pull/34999) ([metahys](https://github.com/metahys)).
|
||||
* Fix possible segfault in FileLog (experimental feature). Closes [#30749](https://github.com/ClickHouse/ClickHouse/issues/30749). [#34996](https://github.com/ClickHouse/ClickHouse/pull/34996) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix possible rare error `Cannot push block to port which already has data`. [#34993](https://github.com/ClickHouse/ClickHouse/pull/34993) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix wrong schema inference for unquoted dates in CSV. Closes [#34768](https://github.com/ClickHouse/ClickHouse/issues/34768). [#34961](https://github.com/ClickHouse/ClickHouse/pull/34961) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Integration with Hive: Fix unexpected result when use `in` in `where` in hive query. [#34945](https://github.com/ClickHouse/ClickHouse/pull/34945) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Avoid busy polling in ClickHouse Keeper while searching for changelog files to delete. [#34931](https://github.com/ClickHouse/ClickHouse/pull/34931) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix DateTime64 conversion from PostgreSQL. Closes [#33364](https://github.com/ClickHouse/ClickHouse/issues/33364). [#34910](https://github.com/ClickHouse/ClickHouse/pull/34910) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix possible "Part directory doesn't exist" during `INSERT` into MergeTree table backed by VFS over s3. [#34876](https://github.com/ClickHouse/ClickHouse/pull/34876) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Support DDLs like CREATE USER to be executed on cross replicated cluster. [#34860](https://github.com/ClickHouse/ClickHouse/pull/34860) ([Jianmei Zhang](https://github.com/zhangjmruc)).
|
||||
* Fix bugs for multiple columns group by in `WindowView` (experimental feature). [#34859](https://github.com/ClickHouse/ClickHouse/pull/34859) ([vxider](https://github.com/Vxider)).
|
||||
* Fix possible failures in S2 functions when queries contain const columns. [#34745](https://github.com/ClickHouse/ClickHouse/pull/34745) ([Bharat Nallan](https://github.com/bharatnc)).
|
||||
* Fix bug for H3 funcs containing const columns which cause queries to fail. [#34743](https://github.com/ClickHouse/ClickHouse/pull/34743) ([Bharat Nallan](https://github.com/bharatnc)).
|
||||
* Fix `No such file or directory` with enabled `fsync_part_directory` and vertical merge. [#34739](https://github.com/ClickHouse/ClickHouse/pull/34739) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix serialization/printing for system queries `RELOAD MODEL`, `RELOAD FUNCTION`, `RESTART DISK` when used `ON CLUSTER`. Closes [#34514](https://github.com/ClickHouse/ClickHouse/issues/34514). [#34696](https://github.com/ClickHouse/ClickHouse/pull/34696) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix `allow_experimental_projection_optimization` with `enable_global_with_statement` (before it may lead to `Stack size too large` error in case of multiple expressions in `WITH` clause, and also it executes scalar subqueries again and again, so not it will be more optimal). [#34650](https://github.com/ClickHouse/ClickHouse/pull/34650) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Stop to select part for mutate when the other replica has already updated the transaction log for `ReplatedMergeTree` engine. [#34633](https://github.com/ClickHouse/ClickHouse/pull/34633) ([Jianmei Zhang](https://github.com/zhangjmruc)).
|
||||
* Fix incorrect result of trivial count query when part movement feature is used [#34089](https://github.com/ClickHouse/ClickHouse/issues/34089). [#34385](https://github.com/ClickHouse/ClickHouse/pull/34385) ([nvartolomei](https://github.com/nvartolomei)).
|
||||
* Fix inconsistency of `max_query_size` limitation in distributed subqueries. [#34078](https://github.com/ClickHouse/ClickHouse/pull/34078) ([Chao Ma](https://github.com/godliness)).
|
||||
|
||||
|
||||
### <a id="222"></a> ClickHouse release v22.2, 2022-02-17
|
||||
|
||||
#### Upgrade Notes
|
||||
|
||||
@ -174,7 +308,7 @@
|
||||
* This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). [#33574](https://github.com/ClickHouse/ClickHouse/pull/33574) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
|
||||
|
||||
### ClickHouse release v22.1, 2022-01-18
|
||||
### <a id="221"></a> ClickHouse release v22.1, 2022-01-18
|
||||
|
||||
#### Upgrade Notes
|
||||
|
||||
|
@ -15,7 +15,7 @@ The following versions of ClickHouse server are currently being supported with s
|
||||
| 20.x | :x: |
|
||||
| 21.1 | :x: |
|
||||
| 21.2 | :x: |
|
||||
| 21.3 | ✅ |
|
||||
| 21.3 | :x: |
|
||||
| 21.4 | :x: |
|
||||
| 21.5 | :x: |
|
||||
| 21.6 | :x: |
|
||||
@ -23,9 +23,11 @@ The following versions of ClickHouse server are currently being supported with s
|
||||
| 21.8 | ✅ |
|
||||
| 21.9 | :x: |
|
||||
| 21.10 | :x: |
|
||||
| 21.11 | ✅ |
|
||||
| 21.12 | ✅ |
|
||||
| 21.11 | :x: |
|
||||
| 21.12 | :x: |
|
||||
| 22.1 | ✅ |
|
||||
| 22.2 | ✅ |
|
||||
| 22.3 | ✅ |
|
||||
|
||||
## Reporting a Vulnerability
|
||||
|
||||
|
@ -2,11 +2,11 @@
|
||||
|
||||
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
|
||||
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
|
||||
SET(VERSION_REVISION 54460)
|
||||
SET(VERSION_REVISION 54461)
|
||||
SET(VERSION_MAJOR 22)
|
||||
SET(VERSION_MINOR 3)
|
||||
SET(VERSION_MINOR 4)
|
||||
SET(VERSION_PATCH 1)
|
||||
SET(VERSION_GITHASH 75366fc95e510b7ac76759ef670702ae5f488a51)
|
||||
SET(VERSION_DESCRIBE v22.3.1.1-testing)
|
||||
SET(VERSION_STRING 22.3.1.1)
|
||||
SET(VERSION_GITHASH 92ab33f560e638d1989c5ca543021ab53d110f5c)
|
||||
SET(VERSION_DESCRIBE v22.4.1.1-testing)
|
||||
SET(VERSION_STRING 22.4.1.1)
|
||||
# end of autochange
|
||||
|
@ -229,6 +229,25 @@ As simple code editors, you can use Sublime Text or Visual Studio Code, or Kate
|
||||
|
||||
Just in case, it is worth mentioning that CLion creates `build` path on its own, it also on its own selects `debug` for build type, for configuration it uses a version of CMake that is defined in CLion and not the one installed by you, and finally, CLion will use `make` to run build tasks instead of `ninja`. This is normal behaviour, just keep that in mind to avoid confusion.
|
||||
|
||||
## Debugging
|
||||
|
||||
Many graphical IDEs offer with an integrated debugger but you can also use a standalone debugger.
|
||||
|
||||
### GDB
|
||||
|
||||
### LLDB
|
||||
|
||||
# tell LLDB where to find the source code
|
||||
settings set target.source-map /path/to/build/dir /path/to/source/dir
|
||||
|
||||
# configure LLDB to display code before/after currently executing line
|
||||
settings set stop-line-count-before 10
|
||||
settings set stop-line-count-after 10
|
||||
|
||||
target create ./clickhouse-client
|
||||
# <set breakpoints here>
|
||||
process launch -- --query="SELECT * FROM TAB"
|
||||
|
||||
## Writing Code {#writing-code}
|
||||
|
||||
The description of ClickHouse architecture can be found here: https://clickhouse.com/docs/en/development/architecture/
|
||||
|
@ -5,30 +5,19 @@ toc_title: Playground
|
||||
|
||||
# ClickHouse Playground {#clickhouse-playground}
|
||||
|
||||
!!! warning "Warning"
|
||||
This service is deprecated and will be replaced in foreseeable future.
|
||||
|
||||
[ClickHouse Playground](https://play.clickhouse.com) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
|
||||
Several example datasets are available in Playground as well as sample queries that show ClickHouse features. There’s also a selection of ClickHouse LTS releases to experiment with.
|
||||
[ClickHouse Playground](https://play.clickhouse.com/play?user=play) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
|
||||
Several example datasets are available in Playground.
|
||||
|
||||
You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
|
||||
|
||||
## Credentials {#credentials}
|
||||
|
||||
| Parameter | Value |
|
||||
|:--------------------|:----------------------------------------|
|
||||
| HTTPS endpoint | `https://play-api.clickhouse.com:8443` |
|
||||
| Native TCP endpoint | `play-api.clickhouse.com:9440` |
|
||||
| User | `playground` |
|
||||
| Password | `clickhouse` |
|
||||
|
||||
There are additional endpoints with specific ClickHouse releases to experiment with their differences (ports and user/password are the same as above):
|
||||
|
||||
- 20.3 LTS: `play-api-v20-3.clickhouse.com`
|
||||
- 19.14 LTS: `play-api-v19-14.clickhouse.com`
|
||||
|
||||
!!! note "Note"
|
||||
All these endpoints require a secure TLS connection.
|
||||
| Parameter | Value |
|
||||
|:--------------------|:-----------------------------------|
|
||||
| HTTPS endpoint | `https://play.clickhouse.com:443/` |
|
||||
| Native TCP endpoint | `play.clickhouse.com:9440` |
|
||||
| User | `explorer` or `play` |
|
||||
| Password | (empty) |
|
||||
|
||||
## Limitations {#limitations}
|
||||
|
||||
@ -37,23 +26,18 @@ The queries are executed as a read-only user. It implies some limitations:
|
||||
- DDL queries are not allowed
|
||||
- INSERT queries are not allowed
|
||||
|
||||
The following settings are also enforced:
|
||||
|
||||
- [max_result_bytes=10485760](../operations/settings/query-complexity/#max-result-bytes)
|
||||
- [max_result_rows=2000](../operations/settings/query-complexity/#setting-max_result_rows)
|
||||
- [result_overflow_mode=break](../operations/settings/query-complexity/#result-overflow-mode)
|
||||
- [max_execution_time=60000](../operations/settings/query-complexity/#max-execution-time)
|
||||
The service also have quotas on its usage.
|
||||
|
||||
## Examples {#examples}
|
||||
|
||||
HTTPS endpoint example with `curl`:
|
||||
|
||||
``` bash
|
||||
curl "https://play-api.clickhouse.com:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
|
||||
curl "https://play.clickhouse.com/?user=explorer" --data-binary "SELECT 'Play ClickHouse'"
|
||||
```
|
||||
|
||||
TCP endpoint example with [CLI](../interfaces/cli.md):
|
||||
|
||||
``` bash
|
||||
clickhouse client --secure -h play-api.clickhouse.com --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
|
||||
clickhouse client --secure --host play.clickhouse.com --user explorer
|
||||
```
|
||||
|
@ -51,6 +51,7 @@ The supported formats are:
|
||||
| [PrettySpace](#prettyspace) | ✗ | ✔ |
|
||||
| [Protobuf](#protobuf) | ✔ | ✔ |
|
||||
| [ProtobufSingle](#protobufsingle) | ✔ | ✔ |
|
||||
| [ProtobufList](#protobuflist) | ✔ | ✔ |
|
||||
| [Avro](#data-format-avro) | ✔ | ✔ |
|
||||
| [AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ |
|
||||
| [Parquet](#data-format-parquet) | ✔ | ✔ |
|
||||
@ -1230,7 +1231,38 @@ See also [how to read/write length-delimited protobuf messages in popular langua
|
||||
|
||||
## ProtobufSingle {#protobufsingle}
|
||||
|
||||
Same as [Protobuf](#protobuf) but for storing/parsing single Protobuf message without length delimiters.
|
||||
Same as [Protobuf](#protobuf) but for storing/parsing a single Protobuf message without length delimiter.
|
||||
As a result, only a single table row can be written/read.
|
||||
|
||||
## ProtobufList {#protobuflist}
|
||||
|
||||
Similar to Protobuf but rows are represented as a sequence of sub-messages contained in a message with fixed name "Envelope".
|
||||
|
||||
Usage example:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM test.table FORMAT ProtobufList SETTINGS format_schema = 'schemafile:MessageType'
|
||||
```
|
||||
|
||||
``` bash
|
||||
cat protobuflist_messages.bin | clickhouse-client --query "INSERT INTO test.table FORMAT ProtobufList SETTINGS format_schema='schemafile:MessageType'"
|
||||
```
|
||||
|
||||
where the file `schemafile.proto` looks like this:
|
||||
|
||||
``` capnp
|
||||
syntax = "proto3";
|
||||
|
||||
message Envelope {
|
||||
message MessageType {
|
||||
string name = 1;
|
||||
string surname = 2;
|
||||
uint32 birthDate = 3;
|
||||
repeated string phoneNumbers = 4;
|
||||
};
|
||||
MessageType row = 1;
|
||||
};
|
||||
```
|
||||
|
||||
## Avro {#data-format-avro}
|
||||
|
||||
@ -1364,7 +1396,8 @@ The table below shows supported data types and how they match ClickHouse [data t
|
||||
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
@ -1421,7 +1454,8 @@ The table below shows supported data types and how they match ClickHouse [data t
|
||||
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT32` |
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `FLOAT64` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
@ -1483,7 +1517,8 @@ The table below shows supported data types and how they match ClickHouse [data t
|
||||
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `DATE32` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `TIMESTAMP` |
|
||||
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |
|
||||
|
@ -55,7 +55,7 @@ Internal coordination settings are located in `<keeper_server>.<coordination_set
|
||||
- `auto_forwarding` — Allow to forward write requests from followers to the leader (default: true).
|
||||
- `shutdown_timeout` — Wait to finish internal connections and shutdown (ms) (default: 5000).
|
||||
- `startup_timeout` — If the server doesn't connect to other quorum participants in the specified timeout it will terminate (ms) (default: 30000).
|
||||
- `four_letter_word_white_list` — White list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
|
||||
- `four_letter_word_allow_list` — Allow list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro").
|
||||
|
||||
Quorum configuration is located in `<keeper_server>.<raft_configuration>` section and contain servers description.
|
||||
|
||||
@ -121,7 +121,7 @@ clickhouse keeper --config /etc/your_path_to_config/config.xml
|
||||
|
||||
ClickHouse Keeper also provides 4lw commands which are almost the same with Zookeeper. Each command is composed of four letters such as `mntr`, `stat` etc. There are some more interesting commands: `stat` gives some general information about the server and connected clients, while `srvr` and `cons` give extended details on server and connections respectively.
|
||||
|
||||
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro".
|
||||
The 4lw commands has a allow list configuration `four_letter_word_allow_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro".
|
||||
|
||||
You can issue the commands to ClickHouse Keeper via telnet or nc, at the client port.
|
||||
|
||||
@ -201,7 +201,7 @@ Server stats reset.
|
||||
```
|
||||
server_id=1
|
||||
tcp_port=2181
|
||||
four_letter_word_white_list=*
|
||||
four_letter_word_allow_list=*
|
||||
log_storage_path=./coordination/logs
|
||||
snapshot_storage_path=./coordination/snapshots
|
||||
max_requests_batch_size=100
|
||||
|
@ -3290,6 +3290,19 @@ Possible values:
|
||||
|
||||
Default value: `16`.
|
||||
|
||||
## max_insert_delayed_streams_for_parallel_write {#max-insert-delayed-streams-for-parallel-write}
|
||||
|
||||
The maximum number of streams (columns) to delay final part flush.
|
||||
|
||||
It makes difference only if underlying storage supports parallel write (i.e. S3), otherwise it will not give any benefit.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Positive integer.
|
||||
- 0 or 1 — Disabled.
|
||||
|
||||
Default value: `1000` for S3 and `0` otherwise.
|
||||
|
||||
## opentelemetry_start_trace_probability {#opentelemetry-start-trace-probability}
|
||||
|
||||
Sets the probability that the ClickHouse can start a trace for executed queries (if no parent [trace context](https://www.w3.org/TR/trace-context/) is supplied).
|
||||
|
@ -225,15 +225,15 @@ This storage method works the same way as hashed and allows using date/time (arb
|
||||
Example: The table contains discounts for each advertiser in the format:
|
||||
|
||||
``` text
|
||||
+---------|-------------|-------------|------+
|
||||
+---------------|---------------------|-------------------|--------+
|
||||
| advertiser id | discount start date | discount end date | amount |
|
||||
+===============+=====================+===================+========+
|
||||
| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
|
||||
+---------|-------------|-------------|------+
|
||||
+---------------|---------------------|-------------------|--------+
|
||||
| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
|
||||
+---------|-------------|-------------|------+
|
||||
+---------------|---------------------|-------------------|--------+
|
||||
| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
|
||||
+---------|-------------|-------------|------+
|
||||
+---------------|---------------------|-------------------|--------+
|
||||
```
|
||||
|
||||
To use a sample for date ranges, define the `range_min` and `range_max` elements in the [structure](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-structure.md). These elements must contain elements `name` and `type` (if `type` is not specified, the default type will be used - Date). `type` can be any numeric type (Date / DateTime / UInt64 / Int32 / others).
|
||||
@ -272,10 +272,10 @@ LAYOUT(RANGE_HASHED())
|
||||
RANGE(MIN first MAX last)
|
||||
```
|
||||
|
||||
To work with these dictionaries, you need to pass an additional argument to the `dictGetT` function, for which a range is selected:
|
||||
To work with these dictionaries, you need to pass an additional argument to the `dictGet*` function, for which a range is selected:
|
||||
|
||||
``` sql
|
||||
dictGetT('dict_name', 'attr_name', id, date)
|
||||
dictGet*('dict_name', 'attr_name', id, date)
|
||||
```
|
||||
|
||||
This function returns the value for the specified `id`s and the date range that includes the passed date.
|
||||
@ -479,17 +479,17 @@ This type of storage is for mapping network prefixes (IP addresses) to metadata
|
||||
Example: The table contains network prefixes and their corresponding AS number and country code:
|
||||
|
||||
``` text
|
||||
+-----------|-----|------+
|
||||
+-----------------|-------|--------+
|
||||
| prefix | asn | cca2 |
|
||||
+=================+=======+========+
|
||||
| 202.79.32.0/20 | 17501 | NP |
|
||||
+-----------|-----|------+
|
||||
+-----------------|-------|--------+
|
||||
| 2620:0:870::/48 | 3856 | US |
|
||||
+-----------|-----|------+
|
||||
+-----------------|-------|--------+
|
||||
| 2a02:6b8:1::/48 | 13238 | RU |
|
||||
+-----------|-----|------+
|
||||
+-----------------|-------|--------+
|
||||
| 2001:db8::/32 | 65536 | ZZ |
|
||||
+-----------|-----|------+
|
||||
+-----------------|-------|--------+
|
||||
```
|
||||
|
||||
When using this type of layout, the structure must have a composite key.
|
||||
@ -538,10 +538,10 @@ PRIMARY KEY prefix
|
||||
|
||||
The key must have only one String type attribute that contains an allowed IP prefix. Other types are not supported yet.
|
||||
|
||||
For queries, you must use the same functions (`dictGetT` with a tuple) as for dictionaries with composite keys:
|
||||
For queries, you must use the same functions (`dictGet*` with a tuple) as for dictionaries with composite keys:
|
||||
|
||||
``` sql
|
||||
dictGetT('dict_name', 'attr_name', tuple(ip))
|
||||
dictGet*('dict_name', 'attr_name', tuple(ip))
|
||||
```
|
||||
|
||||
The function takes either `UInt32` for IPv4, or `FixedString(16)` for IPv6:
|
||||
|
@ -1392,12 +1392,24 @@ Returns the first element in the `arr1` array for which `func` returns something
|
||||
|
||||
Note that the `arrayFirst` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted.
|
||||
|
||||
## arrayFirstOrNull(func, arr1, …) {#array-first-or-null}
|
||||
|
||||
Returns the first element in the `arr1` array for which `func` returns something other than 0. If there are no such element, returns null.
|
||||
|
||||
Note that the `arrayFirstOrNull` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted.
|
||||
|
||||
## arrayLast(func, arr1, …) {#array-last}
|
||||
|
||||
Returns the last element in the `arr1` array for which `func` returns something other than 0.
|
||||
|
||||
Note that the `arrayLast` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted.
|
||||
|
||||
## arrayLastOrNull(func, arr1, …) {#array-last-or-null}
|
||||
|
||||
Returns the last element in the `arr1` array for which `func` returns something other than 0. If there are no such element, returns null.
|
||||
|
||||
Note that the `arrayLast` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted.
|
||||
|
||||
## arrayFirstIndex(func, arr1, …) {#array-first-index}
|
||||
|
||||
Returns the index of the first element in the `arr1` array for which `func` returns something other than 0.
|
||||
|
@ -1026,4 +1026,185 @@ Result:
|
||||
│ 41162 │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
## h3PointDistM {#h3pointdistm}
|
||||
|
||||
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in meters.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
h3PointDistM(lat1, lon1, lat2, lon2)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Haversine or great circle distance in meters.
|
||||
|
||||
Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
select h3PointDistM(-10.0 ,0.0, 10.0, 0.0) as h3PointDistM;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌──────h3PointDistM─┐
|
||||
│ 2223901.039504589 │
|
||||
└───────────────────┘
|
||||
```
|
||||
|
||||
## h3PointDistKm {#h3pointdistkm}
|
||||
|
||||
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in kilometers.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
h3PointDistKm(lat1, lon1, lat2, lon2)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Haversine or great circle distance in kilometers.
|
||||
|
||||
Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
select h3PointDistKm(-10.0 ,0.0, 10.0, 0.0) as h3PointDistKm;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─────h3PointDistKm─┐
|
||||
│ 2223.901039504589 │
|
||||
└───────────────────┘
|
||||
```
|
||||
|
||||
## h3PointDistRads {#h3pointdistrads}
|
||||
|
||||
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in radians.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
h3PointDistRads(lat1, lon1, lat2, lon2)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Haversine or great circle distance in radians.
|
||||
|
||||
Type: [Float64](../../../sql-reference/data-types/float.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
select h3PointDistRads(-10.0 ,0.0, 10.0, 0.0) as h3PointDistRads;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌────h3PointDistRads─┐
|
||||
│ 0.3490658503988659 │
|
||||
└────────────────────┘
|
||||
```
|
||||
|
||||
## h3GetRes0Indexes {#h3getres0indexes}
|
||||
|
||||
Returns an array of all the resolution 0 H3 indexes.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
h3GetRes0Indexes()
|
||||
```
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Array of all the resolution 0 H3 indexes.
|
||||
|
||||
Type: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
select h3GetRes0Indexes as indexes ;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─indexes─────────────────────────────────────┐
|
||||
│ [576495936675512319,576531121047601151,....]│
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## h3GetPentagonIndexes {#h3getpentagonindexes}
|
||||
|
||||
Returns all the pentagon H3 indexes at the specified resolution.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
h3GetPentagonIndexes(resolution)
|
||||
```
|
||||
|
||||
**Parameter**
|
||||
|
||||
- `resolution` — Index resolution. Range: `[0, 15]`. Type: [UInt8](../../../sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Array of all pentagon H3 indexes.
|
||||
|
||||
Type: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT h3GetPentagonIndexes(3) AS indexes;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─indexes────────────────────────────────────────────────────────┐
|
||||
│ [590112357393367039,590464201114255359,590816044835143679,...] │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.com/docs/en/sql-reference/functions/geo/h3) <!--hide-->
|
||||
|
@ -13,10 +13,18 @@ Alias: `INET_NTOA`.
|
||||
|
||||
## IPv4StringToNum(s) {#ipv4stringtonums}
|
||||
|
||||
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it returns 0.
|
||||
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it throws exception.
|
||||
|
||||
Alias: `INET_ATON`.
|
||||
|
||||
## IPv4StringToNumOrDefault(s) {#ipv4stringtonums}
|
||||
|
||||
Same as `IPv4StringToNum`, but if the IPv4 address has an invalid format, it returns 0.
|
||||
|
||||
## IPv4StringToNumOrNull(s) {#ipv4stringtonums}
|
||||
|
||||
Same as `IPv4StringToNum`, but if the IPv4 address has an invalid format, it returns null.
|
||||
|
||||
## IPv4NumToStringClassC(num) {#ipv4numtostringclasscnum}
|
||||
|
||||
Similar to IPv4NumToString, but using xxx instead of the last octet.
|
||||
@ -123,7 +131,7 @@ LIMIT 10
|
||||
|
||||
## IPv6StringToNum {#ipv6stringtonums}
|
||||
|
||||
The reverse function of [IPv6NumToString](#ipv6numtostringx). If the IPv6 address has an invalid format, it returns a string of null bytes.
|
||||
The reverse function of [IPv6NumToString](#ipv6numtostringx). If the IPv6 address has an invalid format, it throws exception.
|
||||
|
||||
If the input string contains a valid IPv4 address, returns its IPv6 equivalent.
|
||||
HEX can be uppercase or lowercase.
|
||||
@ -168,6 +176,14 @@ Result:
|
||||
|
||||
- [cutIPv6](#cutipv6x-bytestocutforipv6-bytestocutforipv4).
|
||||
|
||||
## IPv6StringToNumOrDefault(s) {#ipv6stringtonums}
|
||||
|
||||
Same as `IPv6StringToNum`, but if the IPv6 address has an invalid format, it returns 0.
|
||||
|
||||
## IPv6StringToNumOrNull(s) {#ipv6stringtonums}
|
||||
|
||||
Same as `IPv6StringToNum`, but if the IPv6 address has an invalid format, it returns null.
|
||||
|
||||
## IPv4ToIPv6(x) {#ipv4toipv6x}
|
||||
|
||||
Takes a `UInt32` number. Interprets it as an IPv4 address in [big endian](https://en.wikipedia.org/wiki/Endianness). Returns a `FixedString(16)` value containing the IPv6 address in binary format. Examples:
|
||||
@ -261,6 +277,14 @@ SELECT
|
||||
└───────────────────────────────────┴──────────────────────────┘
|
||||
```
|
||||
|
||||
## toIPv4OrDefault(string) {#toipv4ordefaultstring}
|
||||
|
||||
Same as `toIPv4`, but if the IPv4 address has an invalid format, it returns 0.
|
||||
|
||||
## toIPv4OrNull(string) {#toipv4ornullstring}
|
||||
|
||||
Same as `toIPv4`, but if the IPv4 address has an invalid format, it returns null.
|
||||
|
||||
## toIPv6 {#toipv6string}
|
||||
|
||||
Converts a string form of IPv6 address to [IPv6](../../sql-reference/data-types/domains/ipv6.md) type. If the IPv6 address has an invalid format, returns an empty value.
|
||||
@ -317,6 +341,14 @@ Result:
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
## IPv6StringToNumOrDefault(s) {#toipv6ordefaultstring}
|
||||
|
||||
Same as `toIPv6`, but if the IPv6 address has an invalid format, it returns 0.
|
||||
|
||||
## IPv6StringToNumOrNull(s) {#toipv6ornullstring}
|
||||
|
||||
Same as `toIPv6`, but if the IPv6 address has an invalid format, it returns null.
|
||||
|
||||
## isIPv4String {#isipv4string}
|
||||
|
||||
Determines whether the input string is an IPv4 address or not. If `string` is IPv6 address returns `0`.
|
||||
|
@ -2,6 +2,49 @@
|
||||
toc_priority: 76
|
||||
toc_title: Security Changelog
|
||||
---
|
||||
## Fixed in ClickHouse 21.10.2.15, 2021-10-18 {#fixed-in-clickhouse-release-21-10-2-215-2021-10-18}
|
||||
|
||||
### CVE-2021-43304 {#cve-2021-43304}
|
||||
|
||||
Heap buffer overflow in Clickhouse's LZ4 compression codec when parsing a malicious query. There is no verification that the copy operations in the LZ4::decompressImpl loop and especially the arbitrary copy operation wildCopy<copy_amount>(op, ip, copy_end), don’t exceed the destination buffer’s limits.
|
||||
|
||||
Credits: JFrog Security Research Team
|
||||
|
||||
### CVE-2021-43305 {#cve-2021-43305}
|
||||
|
||||
Heap buffer overflow in Clickhouse's LZ4 compression codec when parsing a malicious query. There is no verification that the copy operations in the LZ4::decompressImpl loop and especially the arbitrary copy operation wildCopy<copy_amount>(op, ip, copy_end), don’t exceed the destination buffer’s limits. This issue is very similar to CVE-2021-43304, but the vulnerable copy operation is in a different wildCopy call.
|
||||
|
||||
Credits: JFrog Security Research Team
|
||||
|
||||
### CVE-2021-42387 {#cve-2021-42387}
|
||||
|
||||
Heap out-of-bounds read in Clickhouse's LZ4 compression codec when parsing a malicious query. As part of the LZ4::decompressImpl() loop, a 16-bit unsigned user-supplied value ('offset') is read from the compressed data. The offset is later used in the length of a copy operation, without checking the upper bounds of the source of the copy operation.
|
||||
|
||||
Credits: JFrog Security Research Team
|
||||
|
||||
### CVE-2021-42388 {#cve-2021-42388}
|
||||
|
||||
Heap out-of-bounds read in Clickhouse's LZ4 compression codec when parsing a malicious query. As part of the LZ4::decompressImpl() loop, a 16-bit unsigned user-supplied value ('offset') is read from the compressed data. The offset is later used in the length of a copy operation, without checking the lower bounds of the source of the copy operation.
|
||||
|
||||
Credits: JFrog Security Research Team
|
||||
|
||||
### CVE-2021-42389 {#cve-2021-42389}
|
||||
|
||||
Divide-by-zero in Clickhouse's Delta compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
|
||||
|
||||
Credits: JFrog Security Research Team
|
||||
|
||||
### CVE-2021-42390 {#cve-2021-42390}
|
||||
|
||||
Divide-by-zero in Clickhouse's DeltaDouble compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
|
||||
|
||||
Credits: JFrog Security Research Team
|
||||
|
||||
### CVE-2021-42391 {#cve-2021-42391}
|
||||
|
||||
Divide-by-zero in Clickhouse's Gorilla compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
|
||||
|
||||
Credits: JFrog Security Research Team
|
||||
|
||||
## Fixed in ClickHouse 21.4.3.21, 2021-04-12 {#fixed-in-clickhouse-release-21-4-3-21-2021-04-12}
|
||||
|
||||
|
@ -5,58 +5,39 @@ toc_title: Playground
|
||||
|
||||
# ClickHouse Playground {#clickhouse-playground}
|
||||
|
||||
!!! warning "Warning"
|
||||
This service is deprecated and will be replaced in foreseeable future.
|
||||
[ClickHouse Playground](https://play.clickhouse.com/play?user=play) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
|
||||
Several example datasets are available in Playground.
|
||||
|
||||
[ClickHouse Playground](https://play.clickhouse.com) では、サーバーやクラスタを設定することなく、即座にクエリを実行して ClickHouse を試すことができます。
|
||||
いくつかの例のデータセットは、Playground だけでなく、ClickHouse の機能を示すサンプルクエリとして利用可能です. また、 ClickHouse の LTS リリースで試すこともできます。
|
||||
You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
|
||||
|
||||
任意の HTTP クライアントを使用してプレイグラウンドへのクエリを作成することができます。例えば[curl](https://curl.haxx.se)、[wget](https://www.gnu.org/software/wget/)、[JDBC](../interfaces/jdbc.md)または[ODBC](../interfaces/odbc.md)ドライバを使用して接続を設定します。
|
||||
ClickHouse をサポートするソフトウェア製品の詳細情報は[こちら](../interfaces/index.md)をご覧ください。
|
||||
## Credentials {#credentials}
|
||||
|
||||
## 資格情報 {#credentials}
|
||||
| Parameter | Value |
|
||||
|:--------------------|:-----------------------------------|
|
||||
| HTTPS endpoint | `https://play.clickhouse.com:443/` |
|
||||
| Native TCP endpoint | `play.clickhouse.com:9440` |
|
||||
| User | `explorer` or `play` |
|
||||
| Password | (empty) |
|
||||
|
||||
| パラメータ | 値 |
|
||||
| :---------------------------- | :-------------------------------------- |
|
||||
| HTTPS エンドポイント | `https://play-api.clickhouse.com:8443` |
|
||||
| ネイティブ TCP エンドポイント | `play-api.clickhouse.com:9440` |
|
||||
| ユーザ名 | `playgrounnd` |
|
||||
| パスワード | `clickhouse` |
|
||||
## Limitations {#limitations}
|
||||
|
||||
The queries are executed as a read-only user. It implies some limitations:
|
||||
|
||||
特定のClickHouseのリリースで試すために、追加のエンドポイントがあります。(ポートとユーザー/パスワードは上記と同じです)。
|
||||
- DDL queries are not allowed
|
||||
- INSERT queries are not allowed
|
||||
|
||||
- 20.3 LTS: `play-api-v20-3.clickhouse.com`
|
||||
- 19.14 LTS: `play-api-v19-14.clickhouse.com`
|
||||
The service also have quotas on its usage.
|
||||
|
||||
!!! note "備考"
|
||||
これらのエンドポイントはすべて、安全なTLS接続が必要です。
|
||||
## Examples {#examples}
|
||||
|
||||
|
||||
## 制限事項 {#limitations}
|
||||
|
||||
クエリは読み取り専用のユーザとして実行されます。これにはいくつかの制限があります。
|
||||
|
||||
- DDL クエリは許可されていません。
|
||||
- INSERT クエリは許可されていません。
|
||||
|
||||
また、以下の設定がなされています。
|
||||
|
||||
- [max_result_bytes=10485760](../operations/settings/query_complexity/#max-result-bytes)
|
||||
- [max_result_rows=2000](../operations/settings/query_complexity/#setting-max_result_rows)
|
||||
- [result_overflow_mode=break](../operations/settings/query_complexity/#result-overflow-mode)
|
||||
- [max_execution_time=60000](../operations/settings/query_complexity/#max-execution-time)
|
||||
|
||||
## 例 {#examples}
|
||||
|
||||
`curl` を用いて HTTPSエンドポイントへ接続する例:
|
||||
HTTPS endpoint example with `curl`:
|
||||
|
||||
``` bash
|
||||
curl "https://play-api.clickhouse.com:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
|
||||
curl "https://play.clickhouse.com/?user=explorer" --data-binary "SELECT 'Play ClickHouse'"
|
||||
```
|
||||
|
||||
[CLI](../interfaces/cli.md) で TCP エンドポイントへ接続する例:
|
||||
TCP endpoint example with [CLI](../interfaces/cli.md):
|
||||
|
||||
``` bash
|
||||
clickhouse client --secure -h play-api.clickhouse.com --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
|
||||
clickhouse client --secure --host play.clickhouse.com --user explorer
|
||||
```
|
||||
|
@ -5,53 +5,39 @@ toc_title: Playground
|
||||
|
||||
# ClickHouse Playground {#clickhouse-playground}
|
||||
|
||||
!!! warning "Warning"
|
||||
This service is deprecated and will be replaced in foreseeable future.
|
||||
[ClickHouse Playground](https://play.clickhouse.com/play?user=play) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
|
||||
Several example datasets are available in Playground.
|
||||
|
||||
[ClickHouse Playground](https://play.clickhouse.com) позволяет пользователям экспериментировать с ClickHouse, мгновенно выполняя запросы без настройки своего сервера или кластера.
|
||||
В Playground доступны несколько тестовых массивов данных, а также примеры запросов, которые показывают возможности ClickHouse. Кроме того, вы можете выбрать LTS релиз ClickHouse, который хотите протестировать.
|
||||
You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
|
||||
|
||||
Вы можете отправлять запросы к Playground с помощью любого HTTP-клиента, например [curl](https://curl.haxx.se) или [wget](https://www.gnu.org/software/wget/), также можно установить соединение с помощью драйверов [JDBC](../interfaces/jdbc.md) или [ODBC](../interfaces/odbc.md). Более подробная информация о программных продуктах, поддерживающих ClickHouse, доступна [здесь](../interfaces/index.md).
|
||||
## Credentials {#credentials}
|
||||
|
||||
## Параметры доступа {#credentials}
|
||||
| Parameter | Value |
|
||||
|:--------------------|:-----------------------------------|
|
||||
| HTTPS endpoint | `https://play.clickhouse.com:443/` |
|
||||
| Native TCP endpoint | `play.clickhouse.com:9440` |
|
||||
| User | `explorer` or `play` |
|
||||
| Password | (empty) |
|
||||
|
||||
| Параметр | Значение |
|
||||
|:--------------------|:----------------------------------------|
|
||||
| Конечная точка HTTPS| `https://play-api.clickhouse.com:8443` |
|
||||
| Конечная точка TCP | `play-api.clickhouse.com:9440` |
|
||||
| Пользователь | `playground` |
|
||||
| Пароль | `clickhouse` |
|
||||
## Limitations {#limitations}
|
||||
|
||||
Также можно подключаться к ClickHouse определённых релизов, чтобы протестировать их различия (порты и пользователь / пароль остаются неизменными):
|
||||
The queries are executed as a read-only user. It implies some limitations:
|
||||
|
||||
- 20.3 LTS: `play-api-v20-3.clickhouse.com`
|
||||
- 19.14 LTS: `play-api-v19-14.clickhouse.com`
|
||||
- DDL queries are not allowed
|
||||
- INSERT queries are not allowed
|
||||
|
||||
!!! note "Примечание"
|
||||
Для всех этих конечных точек требуется безопасное соединение TLS.
|
||||
The service also have quotas on its usage.
|
||||
|
||||
## Ограничения {#limitations}
|
||||
## Examples {#examples}
|
||||
|
||||
Запросы выполняются под пользователем с правами `readonly`, для которого есть следующие ограничения:
|
||||
- запрещены DDL запросы
|
||||
- запрещены INSERT запросы
|
||||
|
||||
Также установлены следующие опции:
|
||||
- [max_result_bytes=10485760](../operations/settings/query-complexity.md#max-result-bytes)
|
||||
- [max_result_rows=2000](../operations/settings/query-complexity.md#setting-max_result_rows)
|
||||
- [result_overflow_mode=break](../operations/settings/query-complexity.md#result-overflow-mode)
|
||||
- [max_execution_time=60000](../operations/settings/query-complexity.md#max-execution-time)
|
||||
|
||||
## Примеры {#examples}
|
||||
|
||||
Пример конечной точки HTTPS с `curl`:
|
||||
HTTPS endpoint example with `curl`:
|
||||
|
||||
``` bash
|
||||
curl "https://play-api.clickhouse.com:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
|
||||
curl "https://play.clickhouse.com/?user=explorer" --data-binary "SELECT 'Play ClickHouse'"
|
||||
```
|
||||
|
||||
Пример конечной точки TCP с [CLI](../interfaces/cli.md):
|
||||
TCP endpoint example with [CLI](../interfaces/cli.md):
|
||||
|
||||
``` bash
|
||||
clickhouse client --secure -h play-api.clickhouse.com --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
|
||||
clickhouse client --secure --host play.clickhouse.com --user explorer
|
||||
```
|
||||
|
@ -54,7 +54,7 @@ ClickHouse Keeper может использоваться как равноце
|
||||
- `auto_forwarding` — разрешить пересылку запросов на запись от последователей лидеру (по умолчанию: true).
|
||||
- `shutdown_timeout` — время ожидания завершения внутренних подключений и выключения, в миллисекундах (по умолчанию: 5000).
|
||||
- `startup_timeout` — время отключения сервера, если он не подключается к другим участникам кворума, в миллисекундах (по умолчанию: 30000).
|
||||
- `four_letter_word_white_list` — список разрешенных 4-х буквенных команд (по умолчанию: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
|
||||
- `four_letter_word_allow_list` — список разрешенных 4-х буквенных команд (по умолчанию: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro").
|
||||
|
||||
Конфигурация кворума находится в `<keeper_server>.<raft_configuration>` и содержит описание серверов.
|
||||
|
||||
@ -114,7 +114,7 @@ clickhouse-keeper --config /etc/your_path_to_config/config.xml --daemon
|
||||
|
||||
ClickHouse Keeper также поддерживает 4-х буквенные команды, почти такие же, как у Zookeeper. Каждая команда состоит из 4-х символов, например, `mntr`, `stat` и т. д. Несколько интересных команд: `stat` предоставляет общую информацию о сервере и подключенных клиентах, а `srvr` и `cons` предоставляют расширенные сведения о сервере и подключениях соответственно.
|
||||
|
||||
У 4-х буквенных команд есть параметр для настройки разрешенного списка `four_letter_word_white_list`, который имеет значение по умолчанию "conf,cons,crst,envi,ruok,srst,srvr,stat, wchc,wchs,dirs,mntr,isro".
|
||||
У 4-х буквенных команд есть параметр для настройки разрешенного списка `four_letter_word_allow_list`, который имеет значение по умолчанию "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro".
|
||||
|
||||
Вы можете отправлять команды в ClickHouse Keeper через telnet или nc на порт для клиента.
|
||||
|
||||
@ -194,7 +194,7 @@ Server stats reset.
|
||||
```
|
||||
server_id=1
|
||||
tcp_port=2181
|
||||
four_letter_word_white_list=*
|
||||
four_letter_word_allow_list=*
|
||||
log_storage_path=./coordination/logs
|
||||
snapshot_storage_path=./coordination/snapshots
|
||||
max_requests_batch_size=100
|
||||
|
@ -3,62 +3,41 @@ toc_priority: 14
|
||||
toc_title: 体验平台
|
||||
---
|
||||
|
||||
# ClickHouse体验平台 {#clickhouse-playground}
|
||||
# ClickHouse Playground {#clickhouse-playground}
|
||||
|
||||
!!! warning "Warning"
|
||||
This service is deprecated and will be replaced in foreseeable future.
|
||||
[ClickHouse Playground](https://play.clickhouse.com/play?user=play) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
|
||||
Several example datasets are available in Playground.
|
||||
|
||||
[ClickHouse体验平台](https://play.clickhouse.com?file=welcome) 允许人们通过即时运行查询来尝试ClickHouse,而无需设置他们的服务器或集群。
|
||||
|
||||
体验平台中提供几个示例数据集以及显示ClickHouse特性的示例查询。还有一些ClickHouse LTS版本可供尝试。
|
||||
|
||||
您可以使用任何HTTP客户端对ClickHouse体验平台进行查询,例如[curl](https://curl.haxx.se)或者[wget](https://www.gnu.org/software/wget/),或使用[JDBC](../interfaces/jdbc.md)或者[ODBC](../interfaces/odbc.md)驱动连接。关于支持ClickHouse的软件产品的更多信息详见[here](../interfaces/index.md).
|
||||
You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
|
||||
|
||||
## Credentials {#credentials}
|
||||
|
||||
| 参数 | 值 |
|
||||
|:--------------------|:----------------------------------------|
|
||||
| HTTPS端点 | `https://play-api.clickhouse.com:8443` |
|
||||
| TCP端点 | `play-api.clickhouse.com:9440` |
|
||||
| 用户 | `playground` |
|
||||
| 密码 | `clickhouse` |
|
||||
| Parameter | Value |
|
||||
|:--------------------|:-----------------------------------|
|
||||
| HTTPS endpoint | `https://play.clickhouse.com:443/` |
|
||||
| Native TCP endpoint | `play.clickhouse.com:9440` |
|
||||
| User | `explorer` or `play` |
|
||||
| Password | (empty) |
|
||||
|
||||
还有一些带有特定ClickHouse版本的附加信息来试验它们之间的差异(端口和用户/密码与上面相同):
|
||||
## Limitations {#limitations}
|
||||
|
||||
- 20.3 LTS: `play-api-v20-3.clickhouse.com`
|
||||
- 19.14 LTS: `play-api-v19-14.clickhouse.com`
|
||||
The queries are executed as a read-only user. It implies some limitations:
|
||||
|
||||
!!! note "注意"
|
||||
所有这些端点都需要安全的TLS连接。
|
||||
- DDL queries are not allowed
|
||||
- INSERT queries are not allowed
|
||||
|
||||
## 查询限制 {#limitations}
|
||||
The service also have quotas on its usage.
|
||||
|
||||
查询以只读用户身份执行。 这意味着一些局限性:
|
||||
## Examples {#examples}
|
||||
|
||||
- 不允许DDL查询
|
||||
- 不允许插入查询
|
||||
|
||||
还强制执行以下设置:
|
||||
- [max_result_bytes=10485760](../operations/settings/query-complexity/#max-result-bytes)
|
||||
- [max_result_rows=2000](../operations/settings/query-complexity/#setting-max_result_rows)
|
||||
- [result_overflow_mode=break](../operations/settings/query-complexity/#result-overflow-mode)
|
||||
- [max_execution_time=60000](../operations/settings/query-complexity/#max-execution-time)
|
||||
|
||||
ClickHouse体验还有如下:
|
||||
[ClickHouse管理服务](https://cloud.yandex.com/services/managed-clickhouse)
|
||||
实例托管 [Yandex云](https://cloud.yandex.com/)。
|
||||
更多信息 [云提供商](../commercial/cloud.md)。
|
||||
|
||||
## 示例 {#examples}
|
||||
|
||||
使用`curl`连接Https服务:
|
||||
HTTPS endpoint example with `curl`:
|
||||
|
||||
``` bash
|
||||
curl "https://play-api.clickhouse.com:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
|
||||
curl "https://play.clickhouse.com/?user=explorer" --data-binary "SELECT 'Play ClickHouse'"
|
||||
```
|
||||
|
||||
TCP连接示例[CLI](../interfaces/cli.md):
|
||||
TCP endpoint example with [CLI](../interfaces/cli.md):
|
||||
|
||||
``` bash
|
||||
clickhouse client --secure -h play-api.clickhouse.com --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
|
||||
clickhouse client --secure --host play.clickhouse.com --user explorer
|
||||
```
|
||||
|
@ -1240,7 +1240,8 @@ SELECT * FROM topic1_stream;
|
||||
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `STRING` |
|
||||
| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `STRING` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
@ -1295,7 +1296,8 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_
|
||||
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
|
||||
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
|
||||
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `DATE32` |
|
||||
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `TIMESTAMP` |
|
||||
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
|
||||
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
|
||||
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
|
||||
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
|
||||
| `-` | [Array](../sql-reference/data-types/array.md) | `LIST` |
|
||||
|
@ -1,10 +1,5 @@
|
||||
---
|
||||
machine_translated: true
|
||||
machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3
|
||||
---
|
||||
# system.numbers_mt {#system-numbers-mt}
|
||||
|
||||
# 系统。numbers_mt {#system-numbers-mt}
|
||||
|
||||
一样的 [系统。数字](../../operations/system-tables/numbers.md) 但读取是并行的。 这些数字可以以任何顺序返回。
|
||||
与[system.numbers](../../operations/system-tables/numbers.md)相似,但读取是并行的。 这些数字可以以任何顺序返回。
|
||||
|
||||
用于测试。
|
||||
|
@ -31,7 +31,7 @@
|
||||
|
||||
- 对于’dict_name’分层字典,查找’child_id’键是否位于’ancestor_id’内(或匹配’ancestor_id’)。返回UInt8。
|
||||
|
||||
## 独裁主义 {#dictgethierarchy}
|
||||
## dictGetHierarchy {#dictgethierarchy}
|
||||
|
||||
`dictGetHierarchy('dict_name', id)`
|
||||
|
||||
|
@ -38,7 +38,8 @@ struct AggregateFunctionWithProperties
|
||||
AggregateFunctionWithProperties(const AggregateFunctionWithProperties &) = default;
|
||||
AggregateFunctionWithProperties & operator = (const AggregateFunctionWithProperties &) = default;
|
||||
|
||||
template <typename Creator, std::enable_if_t<!std::is_same_v<Creator, AggregateFunctionWithProperties>> * = nullptr>
|
||||
template <typename Creator>
|
||||
requires (!std::is_same_v<Creator, AggregateFunctionWithProperties>)
|
||||
AggregateFunctionWithProperties(Creator creator_, AggregateFunctionProperties properties_ = {}) /// NOLINT
|
||||
: creator(std::forward<Creator>(creator_)), properties(std::move(properties_))
|
||||
{
|
||||
|
@ -569,6 +569,14 @@ if (ENABLE_TESTS)
|
||||
clickhouse_common_zookeeper
|
||||
string_utils)
|
||||
|
||||
if (TARGET ch_contrib::simdjson)
|
||||
target_link_libraries(unit_tests_dbms PRIVATE ch_contrib::simdjson)
|
||||
endif()
|
||||
|
||||
if(TARGET ch_contrib::rapidjson)
|
||||
target_include_directories(unit_tests_dbms PRIVATE ch_contrib::rapidjson)
|
||||
endif()
|
||||
|
||||
if (TARGET ch_contrib::yaml_cpp)
|
||||
target_link_libraries(unit_tests_dbms PRIVATE ch_contrib::yaml_cpp)
|
||||
endif()
|
||||
|
@ -1092,10 +1092,11 @@ void ClientBase::sendData(Block & sample, const ColumnsDescription & columns_des
|
||||
|
||||
try
|
||||
{
|
||||
auto metadata = storage->getInMemoryMetadataPtr();
|
||||
sendDataFromPipe(
|
||||
storage->read(
|
||||
sample.getNames(),
|
||||
storage->getInMemoryMetadataPtr(),
|
||||
storage->getStorageSnapshot(metadata),
|
||||
query_info,
|
||||
global_context,
|
||||
{},
|
||||
|
@ -297,7 +297,7 @@ ColumnPtr ColumnAggregateFunction::filter(const Filter & filter, ssize_t result_
|
||||
{
|
||||
size_t size = data.size();
|
||||
if (size != filter.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filter.size(), size);
|
||||
|
||||
if (size == 0)
|
||||
return cloneEmpty();
|
||||
|
@ -608,7 +608,7 @@ ColumnPtr ColumnArray::filterString(const Filter & filt, ssize_t result_size_hin
|
||||
{
|
||||
size_t col_size = getOffsets().size();
|
||||
if (col_size != filt.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), col_size);
|
||||
|
||||
if (0 == col_size)
|
||||
return ColumnArray::create(data);
|
||||
@ -676,7 +676,7 @@ ColumnPtr ColumnArray::filterGeneric(const Filter & filt, ssize_t result_size_hi
|
||||
{
|
||||
size_t size = getOffsets().size();
|
||||
if (size != filt.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
|
||||
|
||||
if (size == 0)
|
||||
return ColumnArray::create(data);
|
||||
@ -1189,4 +1189,12 @@ void ColumnArray::gather(ColumnGathererStream & gatherer)
|
||||
gatherer.gather(*this);
|
||||
}
|
||||
|
||||
size_t ColumnArray::getNumberOfDimensions() const
|
||||
{
|
||||
const auto * nested_array = checkAndGetColumn<ColumnArray>(*data);
|
||||
if (!nested_array)
|
||||
return 1;
|
||||
return 1 + nested_array->getNumberOfDimensions(); /// Every modern C++ compiler optimizes tail recursion.
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -60,7 +60,8 @@ public:
|
||||
return ColumnArray::create(nested_column->assumeMutable());
|
||||
}
|
||||
|
||||
template <typename ... Args, typename = typename std::enable_if<IsMutableColumns<Args ...>::value>::type>
|
||||
template <typename ... Args>
|
||||
requires (IsMutableColumns<Args ...>::value)
|
||||
static MutablePtr create(Args &&... args) { return Base::create(std::forward<Args>(args)...); }
|
||||
|
||||
/** On the index i there is an offset to the beginning of the i + 1 -th element. */
|
||||
@ -169,6 +170,8 @@ public:
|
||||
|
||||
bool isCollationSupported() const override { return getData().isCollationSupported(); }
|
||||
|
||||
size_t getNumberOfDimensions() const;
|
||||
|
||||
private:
|
||||
WrappedPtr data;
|
||||
WrappedPtr offsets;
|
||||
|
@ -266,7 +266,7 @@ ColumnPtr ColumnDecimal<T>::filter(const IColumn::Filter & filt, ssize_t result_
|
||||
{
|
||||
size_t size = data.size();
|
||||
if (size != filt.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
|
||||
|
||||
auto res = this->create(0, scale);
|
||||
Container & res_data = res->getData();
|
||||
|
@ -207,7 +207,7 @@ ColumnPtr ColumnFixedString::filter(const IColumn::Filter & filt, ssize_t result
|
||||
{
|
||||
size_t col_size = size();
|
||||
if (col_size != filt.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), col_size);
|
||||
|
||||
auto res = ColumnFixedString::create(n);
|
||||
|
||||
|
@ -36,7 +36,8 @@ public:
|
||||
static Ptr create(const ColumnPtr & column) { return ColumnMap::create(column->assumeMutable()); }
|
||||
static Ptr create(ColumnPtr && arg) { return create(arg); }
|
||||
|
||||
template <typename ... Args, typename = typename std::enable_if<IsMutableColumns<Args ...>::value>::type>
|
||||
template <typename ... Args>
|
||||
requires (IsMutableColumns<Args ...>::value)
|
||||
static MutablePtr create(Args &&... args) { return Base::create(std::forward<Args>(args)...); }
|
||||
|
||||
std::string getName() const override;
|
||||
|
@ -41,7 +41,8 @@ public:
|
||||
return ColumnNullable::create(nested_column_->assumeMutable(), null_map_->assumeMutable());
|
||||
}
|
||||
|
||||
template <typename ... Args, typename = typename std::enable_if<IsMutableColumns<Args ...>::value>::type>
|
||||
template <typename ... Args>
|
||||
requires (IsMutableColumns<Args ...>::value)
|
||||
static MutablePtr create(Args &&... args) { return Base::create(std::forward<Args>(args)...); }
|
||||
|
||||
const char * getFamilyName() const override { return "Nullable"; }
|
||||
@ -144,15 +145,15 @@ public:
|
||||
|
||||
double getRatioOfDefaultRows(double sample_ratio) const override
|
||||
{
|
||||
return null_map->getRatioOfDefaultRows(sample_ratio);
|
||||
return getRatioOfDefaultRowsImpl<ColumnNullable>(sample_ratio);
|
||||
}
|
||||
|
||||
void getIndicesOfNonDefaultRows(Offsets & indices, size_t from, size_t limit) const override
|
||||
{
|
||||
null_map->getIndicesOfNonDefaultRows(indices, from, limit);
|
||||
getIndicesOfNonDefaultRowsImpl<ColumnNullable>(indices, from, limit);
|
||||
}
|
||||
|
||||
ColumnPtr createWithOffsets(const IColumn::Offsets & offsets, const Field & default_field, size_t total_rows, size_t shift) const override;
|
||||
ColumnPtr createWithOffsets(const Offsets & offsets, const Field & default_field, size_t total_rows, size_t shift) const override;
|
||||
|
||||
bool isNullable() const override { return true; }
|
||||
bool isFixedAndContiguous() const override { return false; }
|
||||
|
819
src/Columns/ColumnObject.cpp
Normal file
819
src/Columns/ColumnObject.cpp
Normal file
@ -0,0 +1,819 @@
|
||||
#include <Core/Field.h>
|
||||
#include <Columns/ColumnObject.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnSparse.h>
|
||||
#include <DataTypes/ObjectUtils.h>
|
||||
#include <DataTypes/getLeastSupertype.h>
|
||||
#include <DataTypes/DataTypeNothing.h>
|
||||
#include <DataTypes/DataTypeNullable.h>
|
||||
#include <DataTypes/DataTypeFactory.h>
|
||||
#include <DataTypes/NestedUtils.h>
|
||||
#include <Interpreters/castColumn.h>
|
||||
#include <Interpreters/convertFieldToType.h>
|
||||
#include <Common/HashTable/HashSet.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int ILLEGAL_COLUMN;
|
||||
extern const int DUPLICATE_COLUMN;
|
||||
extern const int NUMBER_OF_DIMENSIONS_MISMATHED;
|
||||
extern const int NOT_IMPLEMENTED;
|
||||
extern const int SIZES_OF_COLUMNS_DOESNT_MATCH;
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
/// Recreates column with default scalar values and keeps sizes of arrays.
|
||||
ColumnPtr recreateColumnWithDefaultValues(
|
||||
const ColumnPtr & column, const DataTypePtr & scalar_type, size_t num_dimensions)
|
||||
{
|
||||
const auto * column_array = checkAndGetColumn<ColumnArray>(column.get());
|
||||
if (column_array && num_dimensions)
|
||||
{
|
||||
return ColumnArray::create(
|
||||
recreateColumnWithDefaultValues(
|
||||
column_array->getDataPtr(), scalar_type, num_dimensions - 1),
|
||||
IColumn::mutate(column_array->getOffsetsPtr()));
|
||||
}
|
||||
|
||||
return createArrayOfType(scalar_type, num_dimensions)->createColumn()->cloneResized(column->size());
|
||||
}
|
||||
|
||||
/// Replaces NULL fields to given field or empty array.
|
||||
class FieldVisitorReplaceNull : public StaticVisitor<Field>
|
||||
{
|
||||
public:
|
||||
explicit FieldVisitorReplaceNull(
|
||||
const Field & replacement_, size_t num_dimensions_)
|
||||
: replacement(replacement_)
|
||||
, num_dimensions(num_dimensions_)
|
||||
{
|
||||
}
|
||||
|
||||
Field operator()(const Null &) const
|
||||
{
|
||||
return num_dimensions
|
||||
? createEmptyArrayField(num_dimensions)
|
||||
: replacement;
|
||||
}
|
||||
|
||||
Field operator()(const Array & x) const
|
||||
{
|
||||
assert(num_dimensions > 0);
|
||||
const size_t size = x.size();
|
||||
Array res(size);
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
res[i] = applyVisitor(FieldVisitorReplaceNull(replacement, num_dimensions - 1), x[i]);
|
||||
return res;
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
Field operator()(const T & x) const { return x; }
|
||||
|
||||
private:
|
||||
const Field & replacement;
|
||||
size_t num_dimensions;
|
||||
};
|
||||
|
||||
/// Calculates number of dimensions in array field.
|
||||
/// Returns 0 for scalar fields.
|
||||
class FieldVisitorToNumberOfDimensions : public StaticVisitor<size_t>
|
||||
{
|
||||
public:
|
||||
size_t operator()(const Array & x) const
|
||||
{
|
||||
const size_t size = x.size();
|
||||
std::optional<size_t> dimensions;
|
||||
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
{
|
||||
/// Do not count Nulls, because they will be replaced by default
|
||||
/// values with proper number of dimensions.
|
||||
if (x[i].isNull())
|
||||
continue;
|
||||
|
||||
size_t current_dimensions = applyVisitor(*this, x[i]);
|
||||
if (!dimensions)
|
||||
dimensions = current_dimensions;
|
||||
else if (current_dimensions != *dimensions)
|
||||
throw Exception(ErrorCodes::NUMBER_OF_DIMENSIONS_MISMATHED,
|
||||
"Number of dimensions mismatched among array elements");
|
||||
}
|
||||
|
||||
return 1 + dimensions.value_or(0);
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
size_t operator()(const T &) const { return 0; }
|
||||
};
|
||||
|
||||
/// Visitor that allows to get type of scalar field
|
||||
/// or least common type of scalars in array.
|
||||
/// More optimized version of FieldToDataType.
|
||||
class FieldVisitorToScalarType : public StaticVisitor<>
|
||||
{
|
||||
public:
|
||||
using FieldType = Field::Types::Which;
|
||||
|
||||
void operator()(const Array & x)
|
||||
{
|
||||
size_t size = x.size();
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
applyVisitor(*this, x[i]);
|
||||
}
|
||||
|
||||
void operator()(const UInt64 & x)
|
||||
{
|
||||
field_types.insert(FieldType::UInt64);
|
||||
if (x <= std::numeric_limits<UInt8>::max())
|
||||
type_indexes.insert(TypeIndex::UInt8);
|
||||
else if (x <= std::numeric_limits<UInt16>::max())
|
||||
type_indexes.insert(TypeIndex::UInt16);
|
||||
else if (x <= std::numeric_limits<UInt32>::max())
|
||||
type_indexes.insert(TypeIndex::UInt32);
|
||||
else
|
||||
type_indexes.insert(TypeIndex::UInt64);
|
||||
}
|
||||
|
||||
void operator()(const Int64 & x)
|
||||
{
|
||||
field_types.insert(FieldType::Int64);
|
||||
if (x <= std::numeric_limits<Int8>::max() && x >= std::numeric_limits<Int8>::min())
|
||||
type_indexes.insert(TypeIndex::Int8);
|
||||
else if (x <= std::numeric_limits<Int16>::max() && x >= std::numeric_limits<Int16>::min())
|
||||
type_indexes.insert(TypeIndex::Int16);
|
||||
else if (x <= std::numeric_limits<Int32>::max() && x >= std::numeric_limits<Int32>::min())
|
||||
type_indexes.insert(TypeIndex::Int32);
|
||||
else
|
||||
type_indexes.insert(TypeIndex::Int64);
|
||||
}
|
||||
|
||||
void operator()(const Null &)
|
||||
{
|
||||
have_nulls = true;
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void operator()(const T &)
|
||||
{
|
||||
field_types.insert(Field::TypeToEnum<NearestFieldType<T>>::value);
|
||||
type_indexes.insert(TypeToTypeIndex<NearestFieldType<T>>);
|
||||
}
|
||||
|
||||
DataTypePtr getScalarType() const { return getLeastSupertype(type_indexes, true); }
|
||||
bool haveNulls() const { return have_nulls; }
|
||||
bool needConvertField() const { return field_types.size() > 1; }
|
||||
|
||||
private:
|
||||
TypeIndexSet type_indexes;
|
||||
std::unordered_set<FieldType> field_types;
|
||||
bool have_nulls = false;
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
FieldInfo getFieldInfo(const Field & field)
|
||||
{
|
||||
FieldVisitorToScalarType to_scalar_type_visitor;
|
||||
applyVisitor(to_scalar_type_visitor, field);
|
||||
|
||||
return
|
||||
{
|
||||
to_scalar_type_visitor.getScalarType(),
|
||||
to_scalar_type_visitor.haveNulls(),
|
||||
to_scalar_type_visitor.needConvertField(),
|
||||
applyVisitor(FieldVisitorToNumberOfDimensions(), field),
|
||||
};
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn::Subcolumn(MutableColumnPtr && data_, bool is_nullable_)
|
||||
: least_common_type(getDataTypeByColumn(*data_))
|
||||
, is_nullable(is_nullable_)
|
||||
{
|
||||
data.push_back(std::move(data_));
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn::Subcolumn(
|
||||
size_t size_, bool is_nullable_)
|
||||
: least_common_type(std::make_shared<DataTypeNothing>())
|
||||
, is_nullable(is_nullable_)
|
||||
, num_of_defaults_in_prefix(size_)
|
||||
{
|
||||
}
|
||||
|
||||
size_t ColumnObject::Subcolumn::Subcolumn::size() const
|
||||
{
|
||||
size_t res = num_of_defaults_in_prefix;
|
||||
for (const auto & part : data)
|
||||
res += part->size();
|
||||
return res;
|
||||
}
|
||||
|
||||
size_t ColumnObject::Subcolumn::Subcolumn::byteSize() const
|
||||
{
|
||||
size_t res = 0;
|
||||
for (const auto & part : data)
|
||||
res += part->byteSize();
|
||||
return res;
|
||||
}
|
||||
|
||||
size_t ColumnObject::Subcolumn::Subcolumn::allocatedBytes() const
|
||||
{
|
||||
size_t res = 0;
|
||||
for (const auto & part : data)
|
||||
res += part->allocatedBytes();
|
||||
return res;
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::checkTypes() const
|
||||
{
|
||||
DataTypes prefix_types;
|
||||
prefix_types.reserve(data.size());
|
||||
for (size_t i = 0; i < data.size(); ++i)
|
||||
{
|
||||
auto current_type = getDataTypeByColumn(*data[i]);
|
||||
prefix_types.push_back(current_type);
|
||||
auto prefix_common_type = getLeastSupertype(prefix_types);
|
||||
if (!prefix_common_type->equals(*current_type))
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR,
|
||||
"Data type {} of column at position {} cannot represent all columns from i-th prefix",
|
||||
current_type->getName(), i);
|
||||
}
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::insert(Field field)
|
||||
{
|
||||
auto info = getFieldInfo(field);
|
||||
insert(std::move(field), std::move(info));
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::addNewColumnPart(DataTypePtr type)
|
||||
{
|
||||
auto serialization = type->getSerialization(ISerialization::Kind::SPARSE);
|
||||
data.push_back(type->createColumn(*serialization));
|
||||
least_common_type = LeastCommonType{std::move(type)};
|
||||
}
|
||||
|
||||
static bool isConversionRequiredBetweenIntegers(const IDataType & lhs, const IDataType & rhs)
|
||||
{
|
||||
/// If both of types are signed/unsigned integers and size of left field type
|
||||
/// is less than right type, we don't need to convert field,
|
||||
/// because all integer fields are stored in Int64/UInt64.
|
||||
|
||||
WhichDataType which_lhs(lhs);
|
||||
WhichDataType which_rhs(rhs);
|
||||
|
||||
bool is_native_int = which_lhs.isNativeInt() && which_rhs.isNativeInt();
|
||||
bool is_native_uint = which_lhs.isNativeUInt() && which_rhs.isNativeUInt();
|
||||
|
||||
return (is_native_int || is_native_uint)
|
||||
&& lhs.getSizeOfValueInMemory() <= rhs.getSizeOfValueInMemory();
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
|
||||
{
|
||||
auto base_type = std::move(info.scalar_type);
|
||||
|
||||
if (isNothing(base_type) && info.num_dimensions == 0)
|
||||
{
|
||||
insertDefault();
|
||||
return;
|
||||
}
|
||||
|
||||
auto column_dim = least_common_type.getNumberOfDimensions();
|
||||
auto value_dim = info.num_dimensions;
|
||||
|
||||
if (isNothing(least_common_type.get()))
|
||||
column_dim = value_dim;
|
||||
|
||||
if (field.isNull())
|
||||
value_dim = column_dim;
|
||||
|
||||
if (value_dim != column_dim)
|
||||
throw Exception(ErrorCodes::NUMBER_OF_DIMENSIONS_MISMATHED,
|
||||
"Dimension of types mismatched between inserted value and column. "
|
||||
"Dimension of value: {}. Dimension of column: {}",
|
||||
value_dim, column_dim);
|
||||
|
||||
if (is_nullable)
|
||||
base_type = makeNullable(base_type);
|
||||
|
||||
if (!is_nullable && info.have_nulls)
|
||||
field = applyVisitor(FieldVisitorReplaceNull(base_type->getDefault(), value_dim), std::move(field));
|
||||
|
||||
bool type_changed = false;
|
||||
const auto & least_common_base_type = least_common_type.getBase();
|
||||
|
||||
if (data.empty())
|
||||
{
|
||||
addNewColumnPart(createArrayOfType(std::move(base_type), value_dim));
|
||||
}
|
||||
else if (!least_common_base_type->equals(*base_type) && !isNothing(base_type))
|
||||
{
|
||||
if (!isConversionRequiredBetweenIntegers(*base_type, *least_common_base_type))
|
||||
{
|
||||
base_type = getLeastSupertype(DataTypes{std::move(base_type), least_common_base_type}, true);
|
||||
type_changed = true;
|
||||
if (!least_common_base_type->equals(*base_type))
|
||||
addNewColumnPart(createArrayOfType(std::move(base_type), value_dim));
|
||||
}
|
||||
}
|
||||
|
||||
if (type_changed || info.need_convert)
|
||||
field = convertFieldToTypeOrThrow(field, *least_common_type.get());
|
||||
|
||||
data.back()->insert(field);
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::insertRangeFrom(const Subcolumn & src, size_t start, size_t length)
|
||||
{
|
||||
assert(src.isFinalized());
|
||||
|
||||
const auto & src_column = src.data.back();
|
||||
const auto & src_type = src.least_common_type.get();
|
||||
|
||||
if (data.empty())
|
||||
{
|
||||
addNewColumnPart(src.least_common_type.get());
|
||||
data.back()->insertRangeFrom(*src_column, start, length);
|
||||
}
|
||||
else if (least_common_type.get()->equals(*src_type))
|
||||
{
|
||||
data.back()->insertRangeFrom(*src_column, start, length);
|
||||
}
|
||||
else
|
||||
{
|
||||
auto new_least_common_type = getLeastSupertype(DataTypes{least_common_type.get(), src_type}, true);
|
||||
auto casted_column = castColumn({src_column, src_type, ""}, new_least_common_type);
|
||||
|
||||
if (!least_common_type.get()->equals(*new_least_common_type))
|
||||
addNewColumnPart(std::move(new_least_common_type));
|
||||
|
||||
data.back()->insertRangeFrom(*casted_column, start, length);
|
||||
}
|
||||
}
|
||||
|
||||
bool ColumnObject::Subcolumn::isFinalized() const
|
||||
{
|
||||
return data.empty() ||
|
||||
(data.size() == 1 && !data[0]->isSparse() && num_of_defaults_in_prefix == 0);
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::finalize()
|
||||
{
|
||||
if (isFinalized())
|
||||
return;
|
||||
|
||||
if (data.size() == 1 && num_of_defaults_in_prefix == 0)
|
||||
{
|
||||
data[0] = data[0]->convertToFullColumnIfSparse();
|
||||
return;
|
||||
}
|
||||
|
||||
const auto & to_type = least_common_type.get();
|
||||
auto result_column = to_type->createColumn();
|
||||
|
||||
if (num_of_defaults_in_prefix)
|
||||
result_column->insertManyDefaults(num_of_defaults_in_prefix);
|
||||
|
||||
for (auto & part : data)
|
||||
{
|
||||
part = part->convertToFullColumnIfSparse();
|
||||
auto from_type = getDataTypeByColumn(*part);
|
||||
size_t part_size = part->size();
|
||||
|
||||
if (!from_type->equals(*to_type))
|
||||
{
|
||||
auto offsets = ColumnUInt64::create();
|
||||
auto & offsets_data = offsets->getData();
|
||||
|
||||
/// We need to convert only non-default values and then recreate column
|
||||
/// with default value of new type, because default values (which represents misses in data)
|
||||
/// may be inconsistent between types (e.g "0" in UInt64 and empty string in String).
|
||||
|
||||
part->getIndicesOfNonDefaultRows(offsets_data, 0, part_size);
|
||||
|
||||
if (offsets->size() == part_size)
|
||||
{
|
||||
part = castColumn({part, from_type, ""}, to_type);
|
||||
}
|
||||
else
|
||||
{
|
||||
auto values = part->index(*offsets, offsets->size());
|
||||
values = castColumn({values, from_type, ""}, to_type);
|
||||
part = values->createWithOffsets(offsets_data, to_type->getDefault(), part_size, /*shift=*/ 0);
|
||||
}
|
||||
}
|
||||
|
||||
result_column->insertRangeFrom(*part, 0, part_size);
|
||||
}
|
||||
|
||||
data = { std::move(result_column) };
|
||||
num_of_defaults_in_prefix = 0;
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::insertDefault()
|
||||
{
|
||||
if (data.empty())
|
||||
++num_of_defaults_in_prefix;
|
||||
else
|
||||
data.back()->insertDefault();
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::insertManyDefaults(size_t length)
|
||||
{
|
||||
if (data.empty())
|
||||
num_of_defaults_in_prefix += length;
|
||||
else
|
||||
data.back()->insertManyDefaults(length);
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::popBack(size_t n)
|
||||
{
|
||||
assert(n <= size());
|
||||
|
||||
size_t num_removed = 0;
|
||||
for (auto it = data.rbegin(); it != data.rend(); ++it)
|
||||
{
|
||||
if (n == 0)
|
||||
break;
|
||||
|
||||
auto & column = *it;
|
||||
if (n < column->size())
|
||||
{
|
||||
column->popBack(n);
|
||||
n = 0;
|
||||
}
|
||||
else
|
||||
{
|
||||
++num_removed;
|
||||
n -= column->size();
|
||||
}
|
||||
}
|
||||
|
||||
data.resize(data.size() - num_removed);
|
||||
num_of_defaults_in_prefix -= n;
|
||||
}
|
||||
|
||||
Field ColumnObject::Subcolumn::getLastField() const
|
||||
{
|
||||
if (data.empty())
|
||||
return Field();
|
||||
|
||||
const auto & last_part = data.back();
|
||||
assert(!last_part->empty());
|
||||
return (*last_part)[last_part->size() - 1];
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn ColumnObject::Subcolumn::recreateWithDefaultValues(const FieldInfo & field_info) const
|
||||
{
|
||||
auto scalar_type = field_info.scalar_type;
|
||||
if (is_nullable)
|
||||
scalar_type = makeNullable(scalar_type);
|
||||
|
||||
Subcolumn new_subcolumn;
|
||||
new_subcolumn.least_common_type = LeastCommonType{createArrayOfType(scalar_type, field_info.num_dimensions)};
|
||||
new_subcolumn.is_nullable = is_nullable;
|
||||
new_subcolumn.num_of_defaults_in_prefix = num_of_defaults_in_prefix;
|
||||
new_subcolumn.data.reserve(data.size());
|
||||
|
||||
for (const auto & part : data)
|
||||
new_subcolumn.data.push_back(recreateColumnWithDefaultValues(
|
||||
part, scalar_type, field_info.num_dimensions));
|
||||
|
||||
return new_subcolumn;
|
||||
}
|
||||
|
||||
IColumn & ColumnObject::Subcolumn::getFinalizedColumn()
|
||||
{
|
||||
assert(isFinalized());
|
||||
return *data[0];
|
||||
}
|
||||
|
||||
const IColumn & ColumnObject::Subcolumn::getFinalizedColumn() const
|
||||
{
|
||||
assert(isFinalized());
|
||||
return *data[0];
|
||||
}
|
||||
|
||||
const ColumnPtr & ColumnObject::Subcolumn::getFinalizedColumnPtr() const
|
||||
{
|
||||
assert(isFinalized());
|
||||
return data[0];
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn::LeastCommonType::LeastCommonType(DataTypePtr type_)
|
||||
: type(std::move(type_))
|
||||
, base_type(getBaseTypeOfArray(type))
|
||||
, num_dimensions(DB::getNumberOfDimensions(*type))
|
||||
{
|
||||
}
|
||||
|
||||
ColumnObject::ColumnObject(bool is_nullable_)
|
||||
: is_nullable(is_nullable_)
|
||||
, num_rows(0)
|
||||
{
|
||||
}
|
||||
|
||||
ColumnObject::ColumnObject(SubcolumnsTree && subcolumns_, bool is_nullable_)
|
||||
: is_nullable(is_nullable_)
|
||||
, subcolumns(std::move(subcolumns_))
|
||||
, num_rows(subcolumns.empty() ? 0 : (*subcolumns.begin())->data.size())
|
||||
|
||||
{
|
||||
checkConsistency();
|
||||
}
|
||||
|
||||
void ColumnObject::checkConsistency() const
|
||||
{
|
||||
if (subcolumns.empty())
|
||||
return;
|
||||
|
||||
for (const auto & leaf : subcolumns)
|
||||
{
|
||||
if (num_rows != leaf->data.size())
|
||||
{
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Sizes of subcolumns are inconsistent in ColumnObject."
|
||||
" Subcolumn '{}' has {} rows, but expected size is {}",
|
||||
leaf->path.getPath(), leaf->data.size(), num_rows);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
size_t ColumnObject::size() const
|
||||
{
|
||||
#ifndef NDEBUG
|
||||
checkConsistency();
|
||||
#endif
|
||||
return num_rows;
|
||||
}
|
||||
|
||||
MutableColumnPtr ColumnObject::cloneResized(size_t new_size) const
|
||||
{
|
||||
/// cloneResized with new_size == 0 is used for cloneEmpty().
|
||||
if (new_size != 0)
|
||||
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
|
||||
"ColumnObject doesn't support resize to non-zero length");
|
||||
|
||||
return ColumnObject::create(is_nullable);
|
||||
}
|
||||
|
||||
size_t ColumnObject::byteSize() const
|
||||
{
|
||||
size_t res = 0;
|
||||
for (const auto & entry : subcolumns)
|
||||
res += entry->data.byteSize();
|
||||
return res;
|
||||
}
|
||||
|
||||
size_t ColumnObject::allocatedBytes() const
|
||||
{
|
||||
size_t res = 0;
|
||||
for (const auto & entry : subcolumns)
|
||||
res += entry->data.allocatedBytes();
|
||||
return res;
|
||||
}
|
||||
|
||||
void ColumnObject::forEachSubcolumn(ColumnCallback callback)
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot iterate over non-finalized ColumnObject");
|
||||
|
||||
for (auto & entry : subcolumns)
|
||||
callback(entry->data.data.back());
|
||||
}
|
||||
|
||||
void ColumnObject::insert(const Field & field)
|
||||
{
|
||||
const auto & object = field.get<const Object &>();
|
||||
|
||||
HashSet<StringRef, StringRefHash> inserted;
|
||||
size_t old_size = size();
|
||||
for (const auto & [key_str, value] : object)
|
||||
{
|
||||
PathInData key(key_str);
|
||||
inserted.insert(key_str);
|
||||
if (!hasSubcolumn(key))
|
||||
addSubcolumn(key, old_size);
|
||||
|
||||
auto & subcolumn = getSubcolumn(key);
|
||||
subcolumn.insert(value);
|
||||
}
|
||||
|
||||
for (auto & entry : subcolumns)
|
||||
if (!inserted.has(entry->path.getPath()))
|
||||
entry->data.insertDefault();
|
||||
|
||||
++num_rows;
|
||||
}
|
||||
|
||||
void ColumnObject::insertDefault()
|
||||
{
|
||||
for (auto & entry : subcolumns)
|
||||
entry->data.insertDefault();
|
||||
|
||||
++num_rows;
|
||||
}
|
||||
|
||||
Field ColumnObject::operator[](size_t n) const
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get Field from non-finalized ColumnObject");
|
||||
|
||||
Object object;
|
||||
for (const auto & entry : subcolumns)
|
||||
object[entry->path.getPath()] = (*entry->data.data.back())[n];
|
||||
|
||||
return object;
|
||||
}
|
||||
|
||||
void ColumnObject::get(size_t n, Field & res) const
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get Field from non-finalized ColumnObject");
|
||||
|
||||
auto & object = res.get<Object &>();
|
||||
for (const auto & entry : subcolumns)
|
||||
{
|
||||
auto it = object.try_emplace(entry->path.getPath()).first;
|
||||
entry->data.data.back()->get(n, it->second);
|
||||
}
|
||||
}
|
||||
|
||||
void ColumnObject::insertRangeFrom(const IColumn & src, size_t start, size_t length)
|
||||
{
|
||||
const auto & src_object = assert_cast<const ColumnObject &>(src);
|
||||
|
||||
for (auto & entry : subcolumns)
|
||||
{
|
||||
if (src_object.hasSubcolumn(entry->path))
|
||||
entry->data.insertRangeFrom(src_object.getSubcolumn(entry->path), start, length);
|
||||
else
|
||||
entry->data.insertManyDefaults(length);
|
||||
}
|
||||
|
||||
num_rows += length;
|
||||
finalize();
|
||||
}
|
||||
|
||||
ColumnPtr ColumnObject::replicate(const Offsets & offsets) const
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot replicate non-finalized ColumnObject");
|
||||
|
||||
auto res_column = ColumnObject::create(is_nullable);
|
||||
for (const auto & entry : subcolumns)
|
||||
{
|
||||
auto replicated_data = entry->data.data.back()->replicate(offsets)->assumeMutable();
|
||||
res_column->addSubcolumn(entry->path, std::move(replicated_data));
|
||||
}
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
void ColumnObject::popBack(size_t length)
|
||||
{
|
||||
for (auto & entry : subcolumns)
|
||||
entry->data.popBack(length);
|
||||
|
||||
num_rows -= length;
|
||||
}
|
||||
|
||||
const ColumnObject::Subcolumn & ColumnObject::getSubcolumn(const PathInData & key) const
|
||||
{
|
||||
if (const auto * node = subcolumns.findLeaf(key))
|
||||
return node->data;
|
||||
|
||||
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "There is no subcolumn {} in ColumnObject", key.getPath());
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn & ColumnObject::getSubcolumn(const PathInData & key)
|
||||
{
|
||||
if (const auto * node = subcolumns.findLeaf(key))
|
||||
return const_cast<SubcolumnsTree::Node *>(node)->data;
|
||||
|
||||
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "There is no subcolumn {} in ColumnObject", key.getPath());
|
||||
}
|
||||
|
||||
bool ColumnObject::hasSubcolumn(const PathInData & key) const
|
||||
{
|
||||
return subcolumns.findLeaf(key) != nullptr;
|
||||
}
|
||||
|
||||
void ColumnObject::addSubcolumn(const PathInData & key, MutableColumnPtr && subcolumn)
|
||||
{
|
||||
size_t new_size = subcolumn->size();
|
||||
bool inserted = subcolumns.add(key, Subcolumn(std::move(subcolumn), is_nullable));
|
||||
|
||||
if (!inserted)
|
||||
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
|
||||
|
||||
if (num_rows == 0)
|
||||
num_rows = new_size;
|
||||
else if (new_size != num_rows)
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH,
|
||||
"Size of subcolumn {} ({}) is inconsistent with column size ({})",
|
||||
key.getPath(), new_size, num_rows);
|
||||
}
|
||||
|
||||
void ColumnObject::addSubcolumn(const PathInData & key, size_t new_size)
|
||||
{
|
||||
bool inserted = subcolumns.add(key, Subcolumn(new_size, is_nullable));
|
||||
if (!inserted)
|
||||
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
|
||||
|
||||
if (num_rows == 0)
|
||||
num_rows = new_size;
|
||||
else if (new_size != num_rows)
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH,
|
||||
"Required size of subcolumn {} ({}) is inconsistent with column size ({})",
|
||||
key.getPath(), new_size, num_rows);
|
||||
}
|
||||
|
||||
void ColumnObject::addNestedSubcolumn(const PathInData & key, const FieldInfo & field_info, size_t new_size)
|
||||
{
|
||||
if (!key.hasNested())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR,
|
||||
"Cannot add Nested subcolumn, because path doesn't contain Nested");
|
||||
|
||||
bool inserted = false;
|
||||
/// We find node that represents the same Nested type as @key.
|
||||
const auto * nested_node = subcolumns.findBestMatch(key);
|
||||
|
||||
if (nested_node)
|
||||
{
|
||||
/// Find any leaf of Nested subcolumn.
|
||||
const auto * leaf = subcolumns.findLeaf(nested_node, [&](const auto &) { return true; });
|
||||
assert(leaf);
|
||||
|
||||
/// Recreate subcolumn with default values and the same sizes of arrays.
|
||||
auto new_subcolumn = leaf->data.recreateWithDefaultValues(field_info);
|
||||
|
||||
/// It's possible that we have already inserted value from current row
|
||||
/// to this subcolumn. So, adjust size to expected.
|
||||
if (new_subcolumn.size() > new_size)
|
||||
new_subcolumn.popBack(new_subcolumn.size() - new_size);
|
||||
|
||||
assert(new_subcolumn.size() == new_size);
|
||||
inserted = subcolumns.add(key, new_subcolumn);
|
||||
}
|
||||
else
|
||||
{
|
||||
/// If node was not found just add subcolumn with empty arrays.
|
||||
inserted = subcolumns.add(key, Subcolumn(new_size, is_nullable));
|
||||
}
|
||||
|
||||
if (!inserted)
|
||||
throw Exception(ErrorCodes::DUPLICATE_COLUMN, "Subcolumn '{}' already exists", key.getPath());
|
||||
|
||||
if (num_rows == 0)
|
||||
num_rows = new_size;
|
||||
}
|
||||
|
||||
PathsInData ColumnObject::getKeys() const
|
||||
{
|
||||
PathsInData keys;
|
||||
keys.reserve(subcolumns.size());
|
||||
for (const auto & entry : subcolumns)
|
||||
keys.emplace_back(entry->path);
|
||||
return keys;
|
||||
}
|
||||
|
||||
bool ColumnObject::isFinalized() const
|
||||
{
|
||||
return std::all_of(subcolumns.begin(), subcolumns.end(),
|
||||
[](const auto & entry) { return entry->data.isFinalized(); });
|
||||
}
|
||||
|
||||
void ColumnObject::finalize()
|
||||
{
|
||||
size_t old_size = size();
|
||||
SubcolumnsTree new_subcolumns;
|
||||
for (auto && entry : subcolumns)
|
||||
{
|
||||
const auto & least_common_type = entry->data.getLeastCommonType();
|
||||
|
||||
/// Do not add subcolumns, which consists only from NULLs.
|
||||
if (isNothing(getBaseTypeOfArray(least_common_type)))
|
||||
continue;
|
||||
|
||||
entry->data.finalize();
|
||||
new_subcolumns.add(entry->path, entry->data);
|
||||
}
|
||||
|
||||
/// If all subcolumns were skipped add a dummy subcolumn,
|
||||
/// because Tuple type must have at least one element.
|
||||
if (new_subcolumns.empty())
|
||||
new_subcolumns.add(PathInData{COLUMN_NAME_DUMMY}, Subcolumn{ColumnUInt8::create(old_size, 0), is_nullable});
|
||||
|
||||
std::swap(subcolumns, new_subcolumns);
|
||||
checkObjectHasNoAmbiguosPaths(getKeys());
|
||||
}
|
||||
|
||||
}
|
237
src/Columns/ColumnObject.h
Normal file
237
src/Columns/ColumnObject.h
Normal file
@ -0,0 +1,237 @@
|
||||
#pragma once
|
||||
|
||||
#include <Core/Field.h>
|
||||
#include <Core/Names.h>
|
||||
#include <Columns/IColumn.h>
|
||||
#include <Common/PODArray.h>
|
||||
#include <Common/HashTable/HashMap.h>
|
||||
#include <DataTypes/Serializations/JSONDataParser.h>
|
||||
#include <DataTypes/Serializations/SubcolumnsTree.h>
|
||||
|
||||
#include <DataTypes/IDataType.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
/// Info that represents a scalar or array field in a decomposed view.
|
||||
/// It allows to recreate field with different number
|
||||
/// of dimensions or nullability.
|
||||
struct FieldInfo
|
||||
{
|
||||
/// The common type of of all scalars in field.
|
||||
DataTypePtr scalar_type;
|
||||
|
||||
/// Do we have NULL scalar in field.
|
||||
bool have_nulls;
|
||||
|
||||
/// If true then we have scalars with different types in array and
|
||||
/// we need to convert scalars to the common type.
|
||||
bool need_convert;
|
||||
|
||||
/// Number of dimension in array. 0 if field is scalar.
|
||||
size_t num_dimensions;
|
||||
};
|
||||
|
||||
FieldInfo getFieldInfo(const Field & field);
|
||||
|
||||
/** A column that represents object with dynamic set of subcolumns.
|
||||
* Subcolumns are identified by paths in document and are stored in
|
||||
* a trie-like structure. ColumnObject is not suitable for writing into tables
|
||||
* and it should be converted to Tuple with fixed set of subcolumns before that.
|
||||
*/
|
||||
class ColumnObject final : public COWHelper<IColumn, ColumnObject>
|
||||
{
|
||||
public:
|
||||
/** Class that represents one subcolumn.
|
||||
* It stores values in several parts of column
|
||||
* and keeps current common type of all parts.
|
||||
* We add a new column part with a new type, when we insert a field,
|
||||
* which can't be converted to the current common type.
|
||||
* After insertion of all values subcolumn should be finalized
|
||||
* for writing and other operations.
|
||||
*/
|
||||
class Subcolumn
|
||||
{
|
||||
public:
|
||||
Subcolumn() = default;
|
||||
Subcolumn(size_t size_, bool is_nullable_);
|
||||
Subcolumn(MutableColumnPtr && data_, bool is_nullable_);
|
||||
|
||||
size_t size() const;
|
||||
size_t byteSize() const;
|
||||
size_t allocatedBytes() const;
|
||||
|
||||
bool isFinalized() const;
|
||||
const DataTypePtr & getLeastCommonType() const { return least_common_type.get(); }
|
||||
|
||||
/// Checks the consistency of column's parts stored in @data.
|
||||
void checkTypes() const;
|
||||
|
||||
/// Inserts a field, which scalars can be arbitrary, but number of
|
||||
/// dimensions should be consistent with current common type.
|
||||
void insert(Field field);
|
||||
void insert(Field field, FieldInfo info);
|
||||
|
||||
void insertDefault();
|
||||
void insertManyDefaults(size_t length);
|
||||
void insertRangeFrom(const Subcolumn & src, size_t start, size_t length);
|
||||
void popBack(size_t n);
|
||||
|
||||
/// Converts all column's parts to the common type and
|
||||
/// creates a single column that stores all values.
|
||||
void finalize();
|
||||
|
||||
/// Returns last inserted field.
|
||||
Field getLastField() const;
|
||||
|
||||
/// Recreates subcolumn with default scalar values and keeps sizes of arrays.
|
||||
/// Used to create columns of type Nested with consistent array sizes.
|
||||
Subcolumn recreateWithDefaultValues(const FieldInfo & field_info) const;
|
||||
|
||||
/// Returns single column if subcolumn in finalizes.
|
||||
/// Otherwise -- undefined behaviour.
|
||||
IColumn & getFinalizedColumn();
|
||||
const IColumn & getFinalizedColumn() const;
|
||||
const ColumnPtr & getFinalizedColumnPtr() const;
|
||||
|
||||
friend class ColumnObject;
|
||||
|
||||
private:
|
||||
class LeastCommonType
|
||||
{
|
||||
public:
|
||||
LeastCommonType() = default;
|
||||
explicit LeastCommonType(DataTypePtr type_);
|
||||
|
||||
const DataTypePtr & get() const { return type; }
|
||||
const DataTypePtr & getBase() const { return base_type; }
|
||||
size_t getNumberOfDimensions() const { return num_dimensions; }
|
||||
|
||||
private:
|
||||
DataTypePtr type;
|
||||
DataTypePtr base_type;
|
||||
size_t num_dimensions = 0;
|
||||
};
|
||||
|
||||
void addNewColumnPart(DataTypePtr type);
|
||||
|
||||
/// Current least common type of all values inserted to this subcolumn.
|
||||
LeastCommonType least_common_type;
|
||||
|
||||
/// If true then common type type of subcolumn is Nullable
|
||||
/// and default values are NULLs.
|
||||
bool is_nullable = false;
|
||||
|
||||
/// Parts of column. Parts should be in increasing order in terms of subtypes/supertypes.
|
||||
/// That means that the least common type for i-th prefix is the type of i-th part
|
||||
/// and it's the supertype for all type of column from 0 to i-1.
|
||||
std::vector<WrappedPtr> data;
|
||||
|
||||
/// Until we insert any non-default field we don't know further
|
||||
/// least common type and we count number of defaults in prefix,
|
||||
/// which will be converted to the default type of final common type.
|
||||
size_t num_of_defaults_in_prefix = 0;
|
||||
};
|
||||
|
||||
using SubcolumnsTree = SubcolumnsTree<Subcolumn>;
|
||||
|
||||
private:
|
||||
/// If true then all subcolumns are nullable.
|
||||
const bool is_nullable;
|
||||
|
||||
SubcolumnsTree subcolumns;
|
||||
size_t num_rows;
|
||||
|
||||
public:
|
||||
static constexpr auto COLUMN_NAME_DUMMY = "_dummy";
|
||||
|
||||
explicit ColumnObject(bool is_nullable_);
|
||||
ColumnObject(SubcolumnsTree && subcolumns_, bool is_nullable_);
|
||||
|
||||
/// Checks that all subcolumns have consistent sizes.
|
||||
void checkConsistency() const;
|
||||
|
||||
bool hasSubcolumn(const PathInData & key) const;
|
||||
|
||||
const Subcolumn & getSubcolumn(const PathInData & key) const;
|
||||
Subcolumn & getSubcolumn(const PathInData & key);
|
||||
|
||||
void incrementNumRows() { ++num_rows; }
|
||||
|
||||
/// Adds a subcolumn from existing IColumn.
|
||||
void addSubcolumn(const PathInData & key, MutableColumnPtr && subcolumn);
|
||||
|
||||
/// Adds a subcolumn of specific size with default values.
|
||||
void addSubcolumn(const PathInData & key, size_t new_size);
|
||||
|
||||
/// Adds a subcolumn of type Nested of specific size with default values.
|
||||
/// It cares about consistency of sizes of Nested arrays.
|
||||
void addNestedSubcolumn(const PathInData & key, const FieldInfo & field_info, size_t new_size);
|
||||
|
||||
const SubcolumnsTree & getSubcolumns() const { return subcolumns; }
|
||||
SubcolumnsTree & getSubcolumns() { return subcolumns; }
|
||||
PathsInData getKeys() const;
|
||||
|
||||
/// Finalizes all subcolumns.
|
||||
void finalize();
|
||||
bool isFinalized() const;
|
||||
|
||||
/// Part of interface
|
||||
|
||||
const char * getFamilyName() const override { return "Object"; }
|
||||
TypeIndex getDataType() const override { return TypeIndex::Object; }
|
||||
|
||||
size_t size() const override;
|
||||
MutableColumnPtr cloneResized(size_t new_size) const override;
|
||||
size_t byteSize() const override;
|
||||
size_t allocatedBytes() const override;
|
||||
void forEachSubcolumn(ColumnCallback callback) override;
|
||||
void insert(const Field & field) override;
|
||||
void insertDefault() override;
|
||||
void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;
|
||||
ColumnPtr replicate(const Offsets & offsets) const override;
|
||||
void popBack(size_t length) override;
|
||||
Field operator[](size_t n) const override;
|
||||
void get(size_t n, Field & res) const override;
|
||||
|
||||
/// All other methods throw exception.
|
||||
|
||||
ColumnPtr decompress() const override { throwMustBeConcrete(); }
|
||||
StringRef getDataAt(size_t) const override { throwMustBeConcrete(); }
|
||||
bool isDefaultAt(size_t) const override { throwMustBeConcrete(); }
|
||||
void insertData(const char *, size_t) override { throwMustBeConcrete(); }
|
||||
StringRef serializeValueIntoArena(size_t, Arena &, char const *&) const override { throwMustBeConcrete(); }
|
||||
const char * deserializeAndInsertFromArena(const char *) override { throwMustBeConcrete(); }
|
||||
const char * skipSerializedInArena(const char *) const override { throwMustBeConcrete(); }
|
||||
void updateHashWithValue(size_t, SipHash &) const override { throwMustBeConcrete(); }
|
||||
void updateWeakHash32(WeakHash32 &) const override { throwMustBeConcrete(); }
|
||||
void updateHashFast(SipHash &) const override { throwMustBeConcrete(); }
|
||||
ColumnPtr filter(const Filter &, ssize_t) const override { throwMustBeConcrete(); }
|
||||
void expand(const Filter &, bool) override { throwMustBeConcrete(); }
|
||||
ColumnPtr permute(const Permutation &, size_t) const override { throwMustBeConcrete(); }
|
||||
ColumnPtr index(const IColumn &, size_t) const override { throwMustBeConcrete(); }
|
||||
int compareAt(size_t, size_t, const IColumn &, int) const override { throwMustBeConcrete(); }
|
||||
void compareColumn(const IColumn &, size_t, PaddedPODArray<UInt64> *, PaddedPODArray<Int8> &, int, int) const override { throwMustBeConcrete(); }
|
||||
bool hasEqualValues() const override { throwMustBeConcrete(); }
|
||||
void getPermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &) const override { throwMustBeConcrete(); }
|
||||
void updatePermutation(PermutationSortDirection, PermutationSortStability, size_t, int, Permutation &, EqualRanges &) const override { throwMustBeConcrete(); }
|
||||
MutableColumns scatter(ColumnIndex, const Selector &) const override { throwMustBeConcrete(); }
|
||||
void gather(ColumnGathererStream &) override { throwMustBeConcrete(); }
|
||||
void getExtremes(Field &, Field &) const override { throwMustBeConcrete(); }
|
||||
size_t byteSizeAt(size_t) const override { throwMustBeConcrete(); }
|
||||
double getRatioOfDefaultRows(double) const override { throwMustBeConcrete(); }
|
||||
void getIndicesOfNonDefaultRows(Offsets &, size_t, size_t) const override { throwMustBeConcrete(); }
|
||||
|
||||
private:
|
||||
[[noreturn]] static void throwMustBeConcrete()
|
||||
{
|
||||
throw Exception("ColumnObject must be converted to ColumnTuple before use", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
};
|
||||
|
||||
}
|
@ -288,7 +288,7 @@ void ColumnSparse::popBack(size_t n)
|
||||
ColumnPtr ColumnSparse::filter(const Filter & filt, ssize_t) const
|
||||
{
|
||||
if (_size != filt.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), _size);
|
||||
|
||||
if (offsets->empty())
|
||||
{
|
||||
|
@ -37,7 +37,8 @@ public:
|
||||
return Base::create(values_->assumeMutable(), offsets_->assumeMutable(), size_);
|
||||
}
|
||||
|
||||
template <typename TColumnPtr, typename = typename std::enable_if<IsMutableColumns<TColumnPtr>::value>::type>
|
||||
template <typename TColumnPtr>
|
||||
requires IsMutableColumns<TColumnPtr>::value
|
||||
static MutablePtr create(TColumnPtr && values_, TColumnPtr && offsets_, size_t size_)
|
||||
{
|
||||
return Base::create(std::forward<TColumnPtr>(values_), std::forward<TColumnPtr>(offsets_), size_);
|
||||
@ -48,7 +49,8 @@ public:
|
||||
return Base::create(values_->assumeMutable());
|
||||
}
|
||||
|
||||
template <typename TColumnPtr, typename = typename std::enable_if<IsMutableColumns<TColumnPtr>::value>::type>
|
||||
template <typename TColumnPtr>
|
||||
requires IsMutableColumns<TColumnPtr>::value
|
||||
static MutablePtr create(TColumnPtr && values_)
|
||||
{
|
||||
return Base::create(std::forward<TColumnPtr>(values_));
|
||||
|
@ -35,7 +35,8 @@ public:
|
||||
static Ptr create(const TupleColumns & columns);
|
||||
static Ptr create(Columns && arg) { return create(arg); }
|
||||
|
||||
template <typename Arg, typename = typename std::enable_if<std::is_rvalue_reference<Arg &&>::value>::type>
|
||||
template <typename Arg>
|
||||
requires std::is_rvalue_reference_v<Arg &&>
|
||||
static MutablePtr create(Arg && arg) { return Base::create(std::forward<Arg>(arg)); }
|
||||
|
||||
std::string getName() const override;
|
||||
|
@ -381,7 +381,7 @@ ColumnPtr ColumnVector<T>::filter(const IColumn::Filter & filt, ssize_t result_s
|
||||
{
|
||||
size_t size = data.size();
|
||||
if (size != filt.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
|
||||
|
||||
auto res = this->create();
|
||||
Container & res_data = res->getData();
|
||||
@ -450,7 +450,7 @@ void ColumnVector<T>::applyZeroMap(const IColumn::Filter & filt, bool inverted)
|
||||
{
|
||||
size_t size = data.size();
|
||||
if (size != filt.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
|
||||
|
||||
const UInt8 * filt_pos = filt.data();
|
||||
const UInt8 * filt_end = filt_pos + size;
|
||||
|
@ -192,7 +192,7 @@ namespace
|
||||
{
|
||||
const size_t size = src_offsets.size();
|
||||
if (size != filt.size())
|
||||
throw Exception("Size of filter doesn't match size of column.", ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH);
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH, "Size of filter ({}) doesn't match size of column ({})", filt.size(), size);
|
||||
|
||||
ResultOffsetsBuilder result_offsets_builder(res_offsets);
|
||||
|
||||
|
@ -883,8 +883,8 @@ public:
|
||||
return toDayNum(years_lut[year - DATE_LUT_MIN_YEAR]);
|
||||
}
|
||||
|
||||
template <typename Date,
|
||||
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
|
||||
template <typename Date>
|
||||
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
|
||||
inline auto toStartOfQuarterInterval(Date d, UInt64 quarters) const
|
||||
{
|
||||
if (quarters == 1)
|
||||
@ -892,8 +892,8 @@ public:
|
||||
return toStartOfMonthInterval(d, quarters * 3);
|
||||
}
|
||||
|
||||
template <typename Date,
|
||||
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
|
||||
template <typename Date>
|
||||
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
|
||||
inline auto toStartOfMonthInterval(Date d, UInt64 months) const
|
||||
{
|
||||
if (months == 1)
|
||||
@ -906,8 +906,8 @@ public:
|
||||
return toDayNum(years_months_lut[month_total_index / months * months]);
|
||||
}
|
||||
|
||||
template <typename Date,
|
||||
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
|
||||
template <typename Date>
|
||||
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
|
||||
inline auto toStartOfWeekInterval(Date d, UInt64 weeks) const
|
||||
{
|
||||
if (weeks == 1)
|
||||
@ -920,8 +920,8 @@ public:
|
||||
return ExtendedDayNum(4 + (d - 4) / days * days);
|
||||
}
|
||||
|
||||
template <typename Date,
|
||||
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
|
||||
template <typename Date>
|
||||
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
|
||||
inline Time toStartOfDayInterval(Date d, UInt64 days) const
|
||||
{
|
||||
if (days == 1)
|
||||
@ -1219,10 +1219,8 @@ public:
|
||||
|
||||
/// If resulting month has less deys than source month, then saturation can happen.
|
||||
/// Example: 31 Aug + 1 month = 30 Sep.
|
||||
template <
|
||||
typename DateTime,
|
||||
typename
|
||||
= std::enable_if_t<std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>>>
|
||||
template <typename DateTime>
|
||||
requires std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>
|
||||
inline Time NO_SANITIZE_UNDEFINED addMonths(DateTime t, Int64 delta) const
|
||||
{
|
||||
const auto result_day = addMonthsIndex(t, delta);
|
||||
@ -1247,8 +1245,8 @@ public:
|
||||
return res;
|
||||
}
|
||||
|
||||
template <typename Date,
|
||||
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
|
||||
template <typename Date>
|
||||
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
|
||||
inline auto NO_SANITIZE_UNDEFINED addMonths(Date d, Int64 delta) const
|
||||
{
|
||||
if constexpr (std::is_same_v<Date, DayNum>)
|
||||
@ -1280,10 +1278,8 @@ public:
|
||||
}
|
||||
|
||||
/// Saturation can occur if 29 Feb is mapped to non-leap year.
|
||||
template <
|
||||
typename DateTime,
|
||||
typename
|
||||
= std::enable_if_t<std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>>>
|
||||
template <typename DateTime>
|
||||
requires std::is_same_v<DateTime, UInt32> || std::is_same_v<DateTime, Int64> || std::is_same_v<DateTime, time_t>
|
||||
inline Time addYears(DateTime t, Int64 delta) const
|
||||
{
|
||||
auto result_day = addYearsIndex(t, delta);
|
||||
@ -1308,8 +1304,8 @@ public:
|
||||
return res;
|
||||
}
|
||||
|
||||
template <typename Date,
|
||||
typename = std::enable_if_t<std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>>>
|
||||
template <typename Date>
|
||||
requires std::is_same_v<Date, DayNum> || std::is_same_v<Date, ExtendedDayNum>
|
||||
inline auto addYears(Date d, Int64 delta) const
|
||||
{
|
||||
if constexpr (std::is_same_v<Date, DayNum>)
|
||||
|
@ -122,7 +122,8 @@ const uint32_t kMaxAbbreviationEntries = 1000;
|
||||
|
||||
// Read (bitwise) one object of type T
|
||||
template <typename T>
|
||||
std::enable_if_t<std::is_trivial_v<T> && std::is_standard_layout_v<T>, T> read(std::string_view & sp)
|
||||
requires std::is_trivial_v<T> && std::is_standard_layout_v<T>
|
||||
T read(std::string_view & sp)
|
||||
{
|
||||
SAFE_CHECK(sp.size() >= sizeof(T), "underflow");
|
||||
T x;
|
||||
|
@ -613,6 +613,7 @@
|
||||
M(642, CANNOT_PACK_ARCHIVE) \
|
||||
M(643, CANNOT_UNPACK_ARCHIVE) \
|
||||
M(644, REMOTE_FS_OBJECT_CACHE_ERROR) \
|
||||
M(645, NUMBER_OF_DIMENSIONS_MISMATHED) \
|
||||
\
|
||||
M(999, KEEPER_EXCEPTION) \
|
||||
M(1000, POCO_EXCEPTION) \
|
||||
|
@ -205,7 +205,8 @@ void rethrowFirstException(const Exceptions & exceptions);
|
||||
|
||||
|
||||
template <typename T>
|
||||
std::enable_if_t<std::is_pointer_v<T>, T> exception_cast(std::exception_ptr e)
|
||||
requires std::is_pointer_v<T>
|
||||
T exception_cast(std::exception_ptr e)
|
||||
{
|
||||
try
|
||||
{
|
||||
|
@ -46,6 +46,11 @@ public:
|
||||
throw Exception("Cannot convert Map to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
|
||||
}
|
||||
|
||||
T operator() (const Object &) const
|
||||
{
|
||||
throw Exception("Cannot convert Object to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
|
||||
}
|
||||
|
||||
T operator() (const UInt64 & x) const { return T(x); }
|
||||
T operator() (const Int64 & x) const { return T(x); }
|
||||
T operator() (const Int128 & x) const { return T(x); }
|
||||
@ -113,7 +118,8 @@ public:
|
||||
throw Exception("Cannot convert AggregateFunctionStateData to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
|
||||
}
|
||||
|
||||
template <typename U, typename = std::enable_if_t<is_big_int_v<U>> >
|
||||
template <typename U>
|
||||
requires is_big_int_v<U>
|
||||
T operator() (const U & x) const
|
||||
{
|
||||
if constexpr (is_decimal<T>)
|
||||
|
@ -95,6 +95,23 @@ String FieldVisitorDump::operator() (const Map & x) const
|
||||
return wb.str();
|
||||
}
|
||||
|
||||
String FieldVisitorDump::operator() (const Object & x) const
|
||||
{
|
||||
WriteBufferFromOwnString wb;
|
||||
|
||||
wb << "Object_(";
|
||||
for (auto it = x.begin(); it != x.end(); ++it)
|
||||
{
|
||||
if (it != x.begin())
|
||||
wb << ", ";
|
||||
wb << "(" << it->first << ", " << applyVisitor(*this, it->second) << ")";
|
||||
}
|
||||
wb << ')';
|
||||
|
||||
return wb.str();
|
||||
|
||||
}
|
||||
|
||||
String FieldVisitorDump::operator() (const AggregateFunctionStateData & x) const
|
||||
{
|
||||
WriteBufferFromOwnString wb;
|
||||
|
@ -22,6 +22,7 @@ public:
|
||||
String operator() (const Array & x) const;
|
||||
String operator() (const Tuple & x) const;
|
||||
String operator() (const Map & x) const;
|
||||
String operator() (const Object & x) const;
|
||||
String operator() (const DecimalField<Decimal32> & x) const;
|
||||
String operator() (const DecimalField<Decimal64> & x) const;
|
||||
String operator() (const DecimalField<Decimal128> & x) const;
|
||||
|
@ -94,6 +94,19 @@ void FieldVisitorHash::operator() (const Array & x) const
|
||||
applyVisitor(*this, elem);
|
||||
}
|
||||
|
||||
void FieldVisitorHash::operator() (const Object & x) const
|
||||
{
|
||||
UInt8 type = Field::Types::Object;
|
||||
hash.update(type);
|
||||
hash.update(x.size());
|
||||
|
||||
for (const auto & [key, value]: x)
|
||||
{
|
||||
hash.update(key);
|
||||
applyVisitor(*this, value);
|
||||
}
|
||||
}
|
||||
|
||||
void FieldVisitorHash::operator() (const DecimalField<Decimal32> & x) const
|
||||
{
|
||||
UInt8 type = Field::Types::Decimal32;
|
||||
|
@ -28,6 +28,7 @@ public:
|
||||
void operator() (const Array & x) const;
|
||||
void operator() (const Tuple & x) const;
|
||||
void operator() (const Map & x) const;
|
||||
void operator() (const Object & x) const;
|
||||
void operator() (const DecimalField<Decimal32> & x) const;
|
||||
void operator() (const DecimalField<Decimal64> & x) const;
|
||||
void operator() (const DecimalField<Decimal128> & x) const;
|
||||
|
@ -26,6 +26,7 @@ bool FieldVisitorSum::operator() (String &) const { throw Exception("Cannot sum
|
||||
bool FieldVisitorSum::operator() (Array &) const { throw Exception("Cannot sum Arrays", ErrorCodes::LOGICAL_ERROR); }
|
||||
bool FieldVisitorSum::operator() (Tuple &) const { throw Exception("Cannot sum Tuples", ErrorCodes::LOGICAL_ERROR); }
|
||||
bool FieldVisitorSum::operator() (Map &) const { throw Exception("Cannot sum Maps", ErrorCodes::LOGICAL_ERROR); }
|
||||
bool FieldVisitorSum::operator() (Object &) const { throw Exception("Cannot sum Objects", ErrorCodes::LOGICAL_ERROR); }
|
||||
bool FieldVisitorSum::operator() (UUID &) const { throw Exception("Cannot sum UUIDs", ErrorCodes::LOGICAL_ERROR); }
|
||||
|
||||
bool FieldVisitorSum::operator() (AggregateFunctionStateData &) const
|
||||
|
@ -25,6 +25,7 @@ public:
|
||||
bool operator() (Array &) const;
|
||||
bool operator() (Tuple &) const;
|
||||
bool operator() (Map &) const;
|
||||
bool operator() (Object &) const;
|
||||
bool operator() (UUID &) const;
|
||||
bool operator() (AggregateFunctionStateData &) const;
|
||||
bool operator() (bool &) const;
|
||||
@ -36,7 +37,8 @@ public:
|
||||
return x.getValue() != T(0);
|
||||
}
|
||||
|
||||
template <typename T, typename = std::enable_if_t<is_big_int_v<T>> >
|
||||
template <typename T>
|
||||
requires is_big_int_v<T>
|
||||
bool operator() (T & x) const
|
||||
{
|
||||
x += rhs.reinterpret<T>();
|
||||
|
@ -126,5 +126,24 @@ String FieldVisitorToString::operator() (const Map & x) const
|
||||
return wb.str();
|
||||
}
|
||||
|
||||
String FieldVisitorToString::operator() (const Object & x) const
|
||||
{
|
||||
WriteBufferFromOwnString wb;
|
||||
|
||||
wb << '{';
|
||||
for (auto it = x.begin(); it != x.end(); ++it)
|
||||
{
|
||||
if (it != x.begin())
|
||||
wb << ", ";
|
||||
|
||||
writeDoubleQuoted(it->first, wb);
|
||||
wb << ": " << applyVisitor(*this, it->second);
|
||||
}
|
||||
wb << '}';
|
||||
|
||||
return wb.str();
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
@ -22,6 +22,7 @@ public:
|
||||
String operator() (const Array & x) const;
|
||||
String operator() (const Tuple & x) const;
|
||||
String operator() (const Map & x) const;
|
||||
String operator() (const Object & x) const;
|
||||
String operator() (const DecimalField<Decimal32> & x) const;
|
||||
String operator() (const DecimalField<Decimal64> & x) const;
|
||||
String operator() (const DecimalField<Decimal128> & x) const;
|
||||
|
@ -66,6 +66,20 @@ void FieldVisitorWriteBinary::operator() (const Map & x, WriteBuffer & buf) cons
|
||||
}
|
||||
}
|
||||
|
||||
void FieldVisitorWriteBinary::operator() (const Object & x, WriteBuffer & buf) const
|
||||
{
|
||||
const size_t size = x.size();
|
||||
writeBinary(size, buf);
|
||||
|
||||
for (const auto & [key, value] : x)
|
||||
{
|
||||
const UInt8 type = value.getType();
|
||||
writeBinary(type, buf);
|
||||
writeBinary(key, buf);
|
||||
Field::dispatch([&buf] (const auto & val) { FieldVisitorWriteBinary()(val, buf); }, value);
|
||||
}
|
||||
}
|
||||
|
||||
void FieldVisitorWriteBinary::operator()(const bool & x, WriteBuffer & buf) const
|
||||
{
|
||||
writeBinary(UInt8(x), buf);
|
||||
|
@ -21,6 +21,7 @@ public:
|
||||
void operator() (const Array & x, WriteBuffer & buf) const;
|
||||
void operator() (const Tuple & x, WriteBuffer & buf) const;
|
||||
void operator() (const Map & x, WriteBuffer & buf) const;
|
||||
void operator() (const Object & x, WriteBuffer & buf) const;
|
||||
void operator() (const DecimalField<Decimal32> & x, WriteBuffer & buf) const;
|
||||
void operator() (const DecimalField<Decimal64> & x, WriteBuffer & buf) const;
|
||||
void operator() (const DecimalField<Decimal128> & x, WriteBuffer & buf) const;
|
||||
|
@ -194,7 +194,7 @@ void FileSegment::write(const char * from, size_t size)
|
||||
{
|
||||
std::lock_guard segment_lock(mutex);
|
||||
|
||||
LOG_ERROR(log, "Failed to write to cache. File segment info: {}", getInfoForLog());
|
||||
LOG_ERROR(log, "Failed to write to cache. File segment info: {}", getInfoForLogImpl(segment_lock));
|
||||
|
||||
download_state = State::PARTIALLY_DOWNLOADED_NO_CONTINUATION;
|
||||
|
||||
@ -405,7 +405,11 @@ void FileSegment::completeImpl(bool allow_non_strict_checking)
|
||||
String FileSegment::getInfoForLog() const
|
||||
{
|
||||
std::lock_guard segment_lock(mutex);
|
||||
return getInfoForLogImpl(segment_lock);
|
||||
}
|
||||
|
||||
String FileSegment::getInfoForLogImpl(std::lock_guard<std::mutex> & segment_lock) const
|
||||
{
|
||||
WriteBufferFromOwnString info;
|
||||
info << "File segment: " << range().toString() << ", ";
|
||||
info << "state: " << download_state << ", ";
|
||||
|
@ -130,6 +130,7 @@ private:
|
||||
static String getCallerIdImpl(bool allow_non_strict_checking = false);
|
||||
void resetDownloaderImpl(std::lock_guard<std::mutex> & segment_lock);
|
||||
size_t getDownloadedSize(std::lock_guard<std::mutex> & segment_lock) const;
|
||||
String getInfoForLogImpl(std::lock_guard<std::mutex> & segment_lock) const;
|
||||
|
||||
const Range segment_range;
|
||||
|
||||
|
@ -73,8 +73,8 @@ inline DB::UInt64 intHashCRC32(DB::UInt64 x, DB::UInt64 updated_value)
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
inline typename std::enable_if<(sizeof(T) > sizeof(DB::UInt64)), DB::UInt64>::type
|
||||
intHashCRC32(const T & x, DB::UInt64 updated_value)
|
||||
requires (sizeof(T) > sizeof(DB::UInt64))
|
||||
inline DB::UInt64 intHashCRC32(const T & x, DB::UInt64 updated_value)
|
||||
{
|
||||
const auto * begin = reinterpret_cast<const char *>(&x);
|
||||
for (size_t i = 0; i < sizeof(T); i += sizeof(UInt64))
|
||||
@ -155,7 +155,8 @@ inline UInt32 updateWeakHash32(const DB::UInt8 * pos, size_t size, DB::UInt32 up
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
inline size_t DefaultHash64(std::enable_if_t<(sizeof(T) <= sizeof(UInt64)), T> key)
|
||||
requires (sizeof(T) <= sizeof(UInt64))
|
||||
inline size_t DefaultHash64(T key)
|
||||
{
|
||||
union
|
||||
{
|
||||
@ -169,7 +170,8 @@ inline size_t DefaultHash64(std::enable_if_t<(sizeof(T) <= sizeof(UInt64)), T> k
|
||||
|
||||
|
||||
template <typename T>
|
||||
inline size_t DefaultHash64(std::enable_if_t<(sizeof(T) > sizeof(UInt64)), T> key)
|
||||
requires (sizeof(T) > sizeof(UInt64))
|
||||
inline size_t DefaultHash64(T key)
|
||||
{
|
||||
if constexpr (is_big_int_v<T> && sizeof(T) == 16)
|
||||
{
|
||||
@ -217,7 +219,8 @@ struct DefaultHash<T>
|
||||
template <typename T> struct HashCRC32;
|
||||
|
||||
template <typename T>
|
||||
inline size_t hashCRC32(std::enable_if_t<(sizeof(T) <= sizeof(UInt64)), T> key)
|
||||
requires (sizeof(T) <= sizeof(UInt64))
|
||||
inline size_t hashCRC32(T key)
|
||||
{
|
||||
union
|
||||
{
|
||||
@ -230,7 +233,8 @@ inline size_t hashCRC32(std::enable_if_t<(sizeof(T) <= sizeof(UInt64)), T> key)
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
inline size_t hashCRC32(std::enable_if_t<(sizeof(T) > sizeof(UInt64)), T> key)
|
||||
requires (sizeof(T) > sizeof(UInt64))
|
||||
inline size_t hashCRC32(T key)
|
||||
{
|
||||
return intHashCRC32(key, -1);
|
||||
}
|
||||
|
@ -78,8 +78,7 @@ template <UInt64 MaxValue> struct MinCounterType
|
||||
};
|
||||
|
||||
/// Denominator of expression for HyperLogLog algorithm.
|
||||
template <UInt8 precision, int max_rank, typename HashValueType, typename DenominatorType,
|
||||
DenominatorMode denominator_mode, typename Enable = void>
|
||||
template <UInt8 precision, int max_rank, typename HashValueType, typename DenominatorType, DenominatorMode denominator_mode>
|
||||
class Denominator;
|
||||
|
||||
/// Returns true if rank storage is big.
|
||||
@ -89,11 +88,12 @@ constexpr bool isBigRankStore(UInt8 precision)
|
||||
}
|
||||
|
||||
/// Used to deduce denominator type depending on options provided.
|
||||
template <typename HashValueType, typename DenominatorType, DenominatorMode denominator_mode, typename Enable = void>
|
||||
template <typename HashValueType, typename DenominatorType, DenominatorMode denominator_mode>
|
||||
struct IntermediateDenominator;
|
||||
|
||||
template <typename DenominatorType, DenominatorMode denominator_mode>
|
||||
struct IntermediateDenominator<UInt32, DenominatorType, denominator_mode, std::enable_if_t<denominator_mode != DenominatorMode::ExactType>>
|
||||
requires (denominator_mode != DenominatorMode::ExactType)
|
||||
struct IntermediateDenominator<UInt32, DenominatorType, denominator_mode>
|
||||
{
|
||||
using Type = double;
|
||||
};
|
||||
@ -113,11 +113,9 @@ struct IntermediateDenominator<HashValueType, DenominatorType, DenominatorMode::
|
||||
/// "Lightweight" implementation of expression's denominator for HyperLogLog algorithm.
|
||||
/// Uses minimum amount of memory, but estimates may be unstable.
|
||||
/// Satisfiable when rank storage is small enough.
|
||||
template <UInt8 precision, int max_rank, typename HashValueType, typename DenominatorType,
|
||||
DenominatorMode denominator_mode>
|
||||
class __attribute__((__packed__)) Denominator<precision, max_rank, HashValueType, DenominatorType,
|
||||
denominator_mode,
|
||||
std::enable_if_t<!details::isBigRankStore(precision) || !(denominator_mode == DenominatorMode::StableIfBig)>>
|
||||
template <UInt8 precision, int max_rank, typename HashValueType, typename DenominatorType, DenominatorMode denominator_mode>
|
||||
requires (!details::isBigRankStore(precision)) || (!(denominator_mode == DenominatorMode::StableIfBig))
|
||||
class __attribute__((__packed__)) Denominator<precision, max_rank, HashValueType, DenominatorType, denominator_mode>
|
||||
{
|
||||
private:
|
||||
using T = typename IntermediateDenominator<HashValueType, DenominatorType, denominator_mode>::Type;
|
||||
@ -156,11 +154,9 @@ private:
|
||||
/// Fully-functional version of expression's denominator for HyperLogLog algorithm.
|
||||
/// Spends more space that lightweight version. Estimates will always be stable.
|
||||
/// Used when rank storage is big.
|
||||
template <UInt8 precision, int max_rank, typename HashValueType, typename DenominatorType,
|
||||
DenominatorMode denominator_mode>
|
||||
class __attribute__((__packed__)) Denominator<precision, max_rank, HashValueType, DenominatorType,
|
||||
denominator_mode,
|
||||
std::enable_if_t<details::isBigRankStore(precision) && denominator_mode == DenominatorMode::StableIfBig>>
|
||||
template <UInt8 precision, int max_rank, typename HashValueType, typename DenominatorType, DenominatorMode denominator_mode>
|
||||
requires (details::isBigRankStore(precision)) && (denominator_mode == DenominatorMode::StableIfBig)
|
||||
class __attribute__((__packed__)) Denominator<precision, max_rank, HashValueType, DenominatorType, denominator_mode>
|
||||
{
|
||||
public:
|
||||
Denominator(DenominatorType initial_value) /// NOLINT
|
||||
|
@ -129,7 +129,8 @@ public:
|
||||
|
||||
IntervalTree() { nodes.resize(1); }
|
||||
|
||||
template <typename TValue = Value, std::enable_if_t<std::is_same_v<TValue, IntervalTreeVoidValue>, bool> = true>
|
||||
template <typename TValue = Value>
|
||||
requires std::is_same_v<Value, IntervalTreeVoidValue>
|
||||
ALWAYS_INLINE bool emplace(Interval interval)
|
||||
{
|
||||
assert(!tree_is_built);
|
||||
@ -156,19 +157,22 @@ public:
|
||||
return true;
|
||||
}
|
||||
|
||||
template <typename TValue = Value, std::enable_if_t<std::is_same_v<TValue, IntervalTreeVoidValue>, bool> = true>
|
||||
template <typename TValue = Value>
|
||||
requires std::is_same_v<TValue, IntervalTreeVoidValue>
|
||||
bool insert(Interval interval)
|
||||
{
|
||||
return emplace(interval);
|
||||
}
|
||||
|
||||
template <typename TValue = Value, std::enable_if_t<!std::is_same_v<TValue, IntervalTreeVoidValue>, bool> = true>
|
||||
template <typename TValue = Value>
|
||||
requires (!std::is_same_v<TValue, IntervalTreeVoidValue>)
|
||||
bool insert(Interval interval, const Value & value)
|
||||
{
|
||||
return emplace(interval, value);
|
||||
}
|
||||
|
||||
template <typename TValue = Value, std::enable_if_t<!std::is_same_v<TValue, IntervalTreeVoidValue>, bool> = true>
|
||||
template <typename TValue = Value>
|
||||
requires (!std::is_same_v<TValue, IntervalTreeVoidValue>)
|
||||
bool insert(Interval interval, Value && value)
|
||||
{
|
||||
return emplace(interval, std::move(value));
|
||||
|
@ -76,7 +76,8 @@ public:
|
||||
void add(const char * value) { add(std::make_unique<JSONString>(value)); }
|
||||
void add(bool value) { add(std::make_unique<JSONBool>(std::move(value))); }
|
||||
|
||||
template <typename T, std::enable_if_t<std::is_arithmetic_v<T>, bool> = true>
|
||||
template <typename T>
|
||||
requires std::is_arithmetic_v<T>
|
||||
void add(T value) { add(std::make_unique<JSONNumber<T>>(value)); }
|
||||
|
||||
void format(const FormatSettings & settings, FormatContext & context) override;
|
||||
@ -100,7 +101,8 @@ public:
|
||||
void add(std::string key, std::string_view value) { add(std::move(key), std::make_unique<JSONString>(value)); }
|
||||
void add(std::string key, bool value) { add(std::move(key), std::make_unique<JSONBool>(std::move(value))); }
|
||||
|
||||
template <typename T, std::enable_if_t<std::is_arithmetic_v<T>, bool> = true>
|
||||
template <typename T>
|
||||
requires std::is_arithmetic_v<T>
|
||||
void add(std::string key, T value) { add(std::move(key), std::make_unique<JSONNumber<T>>(value)); }
|
||||
|
||||
void format(const FormatSettings & settings, FormatContext & context) override;
|
||||
|
@ -82,7 +82,8 @@ private:
|
||||
#endif
|
||||
|
||||
public:
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
StringSearcher(const CharT * needle_, const size_t needle_size_)
|
||||
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_size{needle_size_}
|
||||
{
|
||||
@ -191,7 +192,8 @@ public:
|
||||
#endif
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
ALWAYS_INLINE bool compareTrivial(const CharT * haystack_pos, const CharT * const haystack_end, const uint8_t * needle_pos) const
|
||||
{
|
||||
while (haystack_pos < haystack_end && needle_pos < needle_end)
|
||||
@ -217,7 +219,8 @@ public:
|
||||
return needle_pos == needle_end;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * haystack_end, const CharT * pos) const
|
||||
{
|
||||
|
||||
@ -262,7 +265,8 @@ public:
|
||||
|
||||
/** Returns haystack_end if not found.
|
||||
*/
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
|
||||
{
|
||||
if (0 == needle_size)
|
||||
@ -338,7 +342,8 @@ public:
|
||||
return haystack_end;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const size_t haystack_size) const
|
||||
{
|
||||
return search(haystack, haystack + haystack_size);
|
||||
@ -367,7 +372,8 @@ private:
|
||||
#endif
|
||||
|
||||
public:
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
StringSearcher(const CharT * needle_, const size_t needle_size)
|
||||
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_end{needle + needle_size}
|
||||
{
|
||||
@ -399,7 +405,8 @@ public:
|
||||
#endif
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * /*haystack_end*/, const CharT * pos) const
|
||||
{
|
||||
#ifdef __SSE4_1__
|
||||
@ -453,7 +460,8 @@ public:
|
||||
return false;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
|
||||
{
|
||||
if (needle == needle_end)
|
||||
@ -540,7 +548,8 @@ public:
|
||||
return haystack_end;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const size_t haystack_size) const
|
||||
{
|
||||
return search(haystack, haystack + haystack_size);
|
||||
@ -568,7 +577,8 @@ private:
|
||||
#endif
|
||||
|
||||
public:
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
StringSearcher(const CharT * needle_, const size_t needle_size)
|
||||
: needle{reinterpret_cast<const uint8_t *>(needle_)}, needle_end{needle + needle_size}
|
||||
{
|
||||
@ -596,7 +606,8 @@ public:
|
||||
#endif
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
ALWAYS_INLINE bool compare(const CharT * /*haystack*/, const CharT * /*haystack_end*/, const CharT * pos) const
|
||||
{
|
||||
#ifdef __SSE4_1__
|
||||
@ -642,7 +653,8 @@ public:
|
||||
return false;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
|
||||
{
|
||||
if (needle == needle_end)
|
||||
@ -722,7 +734,8 @@ public:
|
||||
return haystack_end;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const size_t haystack_size) const
|
||||
{
|
||||
return search(haystack, haystack + haystack_size);
|
||||
@ -740,7 +753,8 @@ class TokenSearcher : public StringSearcherBase
|
||||
size_t needle_size;
|
||||
|
||||
public:
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
TokenSearcher(const CharT * needle_, const size_t needle_size_)
|
||||
: searcher{needle_, needle_size_},
|
||||
needle_size(needle_size_)
|
||||
@ -752,7 +766,8 @@ public:
|
||||
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
ALWAYS_INLINE bool compare(const CharT * haystack, const CharT * haystack_end, const CharT * pos) const
|
||||
{
|
||||
// use searcher only if pos is in the beginning of token and pos + searcher.needle_size is end of token.
|
||||
@ -762,7 +777,8 @@ public:
|
||||
return false;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
|
||||
{
|
||||
// use searcher.search(), then verify that returned value is a token
|
||||
@ -781,13 +797,15 @@ public:
|
||||
return haystack_end;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const size_t haystack_size) const
|
||||
{
|
||||
return search(haystack, haystack + haystack_size);
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
ALWAYS_INLINE bool isToken(const CharT * haystack, const CharT * const haystack_end, const CharT* p) const
|
||||
{
|
||||
return (p == haystack || isTokenSeparator(*(p - 1)))
|
||||
@ -819,11 +837,13 @@ struct LibCASCIICaseSensitiveStringSearcher : public StringSearcherBase
|
||||
{
|
||||
const char * const needle;
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
LibCASCIICaseSensitiveStringSearcher(const CharT * const needle_, const size_t /* needle_size */)
|
||||
: needle(reinterpret_cast<const char *>(needle_)) {}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
|
||||
{
|
||||
const auto * res = strstr(reinterpret_cast<const char *>(haystack), reinterpret_cast<const char *>(needle));
|
||||
@ -832,7 +852,8 @@ struct LibCASCIICaseSensitiveStringSearcher : public StringSearcherBase
|
||||
return reinterpret_cast<const CharT *>(res);
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const size_t haystack_size) const
|
||||
{
|
||||
return search(haystack, haystack + haystack_size);
|
||||
@ -843,11 +864,13 @@ struct LibCASCIICaseInsensitiveStringSearcher : public StringSearcherBase
|
||||
{
|
||||
const char * const needle;
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
LibCASCIICaseInsensitiveStringSearcher(const CharT * const needle_, const size_t /* needle_size */)
|
||||
: needle(reinterpret_cast<const char *>(needle_)) {}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const CharT * const haystack_end) const
|
||||
{
|
||||
const auto * res = strcasestr(reinterpret_cast<const char *>(haystack), reinterpret_cast<const char *>(needle));
|
||||
@ -856,7 +879,8 @@ struct LibCASCIICaseInsensitiveStringSearcher : public StringSearcherBase
|
||||
return reinterpret_cast<const CharT *>(res);
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
const CharT * search(const CharT * haystack, const size_t haystack_size) const
|
||||
{
|
||||
return search(haystack, haystack + haystack_size);
|
||||
|
@ -75,7 +75,8 @@ inline size_t countCodePoints(const UInt8 * data, size_t size)
|
||||
}
|
||||
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
size_t convertCodePointToUTF8(int code_point, CharT * out_bytes, size_t out_length)
|
||||
{
|
||||
static const Poco::UTF8Encoding utf8;
|
||||
@ -84,7 +85,8 @@ size_t convertCodePointToUTF8(int code_point, CharT * out_bytes, size_t out_leng
|
||||
return res;
|
||||
}
|
||||
|
||||
template <typename CharT, typename = std::enable_if_t<sizeof(CharT) == 1>>
|
||||
template <typename CharT>
|
||||
requires (sizeof(CharT) == 1)
|
||||
std::optional<uint32_t> convertUTF8ToCodePoint(const CharT * in_bytes, size_t in_length)
|
||||
{
|
||||
static const Poco::UTF8Encoding utf8;
|
||||
|
@ -13,6 +13,9 @@
|
||||
#cmakedefine01 USE_CASSANDRA
|
||||
#cmakedefine01 USE_SENTRY
|
||||
#cmakedefine01 USE_GRPC
|
||||
#cmakedefine01 USE_SIMDJSON
|
||||
#cmakedefine01 USE_RAPIDJSON
|
||||
|
||||
#cmakedefine01 USE_DATASKETCHES
|
||||
#cmakedefine01 USE_YAML_CPP
|
||||
#cmakedefine01 CLICKHOUSE_SPLIT_BINARY
|
||||
|
@ -127,6 +127,7 @@ PoolWithFailover::Entry PoolWithFailover::get()
|
||||
|
||||
/// If we cannot connect to some replica due to pool overflow, than we will wait and connect.
|
||||
PoolPtr * full_pool = nullptr;
|
||||
std::map<std::string, std::tuple<std::string, int>> error_detail;
|
||||
|
||||
for (size_t try_no = 0; try_no < max_tries; ++try_no)
|
||||
{
|
||||
@ -160,6 +161,15 @@ PoolWithFailover::Entry PoolWithFailover::get()
|
||||
}
|
||||
|
||||
app.logger().warning("Connection to " + pool->getDescription() + " failed: " + e.displayText());
|
||||
//save all errors to error_detail
|
||||
if (error_detail.contains(pool->getDescription()))
|
||||
{
|
||||
error_detail[pool->getDescription()] = {e.displayText(), e.code()};
|
||||
}
|
||||
else
|
||||
{
|
||||
error_detail.insert({pool->getDescription(), {e.displayText(), e.code()}});
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
@ -180,7 +190,14 @@ PoolWithFailover::Entry PoolWithFailover::get()
|
||||
message << "Connections to all replicas failed: ";
|
||||
for (auto it = replicas_by_priority.begin(); it != replicas_by_priority.end(); ++it)
|
||||
for (auto jt = it->second.begin(); jt != it->second.end(); ++jt)
|
||||
{
|
||||
message << (it == replicas_by_priority.begin() && jt == it->second.begin() ? "" : ", ") << (*jt)->getDescription();
|
||||
if (error_detail.contains((*jt)->getDescription()))
|
||||
{
|
||||
std::tuple<std::string, int> error_and_code = error_detail[(*jt)->getDescription()];
|
||||
message << ", ERROR " << std::get<1>(error_and_code) << " : " << std::get<0>(error_and_code);
|
||||
}
|
||||
}
|
||||
|
||||
throw Poco::Exception(message.str());
|
||||
}
|
||||
|
@ -259,7 +259,8 @@ namespace details
|
||||
{
|
||||
// To avoid stack overflow when converting to type with no appropriate c-tor,
|
||||
// resulting in endless recursive calls from `Value::get<T>()` to `Value::operator T()` to `Value::get<T>()` to ...
|
||||
template <typename T, typename std::enable_if_t<std::is_constructible_v<T, Value>>>
|
||||
template <typename T>
|
||||
requires std::is_constructible_v<T, Value>
|
||||
inline T contructFromValue(const Value & val)
|
||||
{
|
||||
return T(val);
|
||||
|
@ -25,7 +25,8 @@ namespace DB
|
||||
* In the rest, behaves like a dynamic_cast.
|
||||
*/
|
||||
template <typename To, typename From>
|
||||
std::enable_if_t<std::is_reference_v<To>, To> typeid_cast(From & from)
|
||||
requires std::is_reference_v<To>
|
||||
To typeid_cast(From & from)
|
||||
{
|
||||
try
|
||||
{
|
||||
@ -43,7 +44,8 @@ std::enable_if_t<std::is_reference_v<To>, To> typeid_cast(From & from)
|
||||
|
||||
|
||||
template <typename To, typename From>
|
||||
std::enable_if_t<std::is_pointer_v<To>, To> typeid_cast(From * from)
|
||||
requires std::is_pointer_v<To>
|
||||
To typeid_cast(From * from)
|
||||
{
|
||||
try
|
||||
{
|
||||
@ -60,7 +62,8 @@ std::enable_if_t<std::is_pointer_v<To>, To> typeid_cast(From * from)
|
||||
|
||||
|
||||
template <typename To, typename From>
|
||||
std::enable_if_t<is_shared_ptr_v<To>, To> typeid_cast(const std::shared_ptr<From> & from)
|
||||
requires is_shared_ptr_v<To>
|
||||
To typeid_cast(const std::shared_ptr<From> & from)
|
||||
{
|
||||
try
|
||||
{
|
||||
|
@ -37,7 +37,7 @@ void CoordinationSettings::loadFromConfig(const String & config_elem, const Poco
|
||||
}
|
||||
|
||||
|
||||
const String KeeperConfigurationAndSettings::DEFAULT_FOUR_LETTER_WORD_CMD = "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro";
|
||||
const String KeeperConfigurationAndSettings::DEFAULT_FOUR_LETTER_WORD_CMD = "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro";
|
||||
|
||||
KeeperConfigurationAndSettings::KeeperConfigurationAndSettings()
|
||||
: server_id(NOT_EXIST)
|
||||
@ -82,8 +82,8 @@ void KeeperConfigurationAndSettings::dump(WriteBufferFromOwnString & buf) const
|
||||
write_int(tcp_port_secure);
|
||||
}
|
||||
|
||||
writeText("four_letter_word_white_list=", buf);
|
||||
writeText(four_letter_word_white_list, buf);
|
||||
writeText("four_letter_word_allow_list=", buf);
|
||||
writeText(four_letter_word_allow_list, buf);
|
||||
buf.write('\n');
|
||||
|
||||
writeText("log_storage_path=", buf);
|
||||
@ -177,7 +177,11 @@ KeeperConfigurationAndSettings::loadFromConfig(const Poco::Util::AbstractConfigu
|
||||
ret->super_digest = config.getString("keeper_server.superdigest");
|
||||
}
|
||||
|
||||
ret->four_letter_word_white_list = config.getString("keeper_server.four_letter_word_white_list", DEFAULT_FOUR_LETTER_WORD_CMD);
|
||||
ret->four_letter_word_allow_list = config.getString(
|
||||
"keeper_server.four_letter_word_allow_list",
|
||||
config.getString("keeper_server.four_letter_word_white_list",
|
||||
DEFAULT_FOUR_LETTER_WORD_CMD));
|
||||
|
||||
|
||||
ret->log_storage_path = getLogsPathFromConfig(config, standalone_keeper_);
|
||||
ret->snapshot_storage_path = getSnapshotsPathFromConfig(config, standalone_keeper_);
|
||||
|
@ -68,7 +68,7 @@ struct KeeperConfigurationAndSettings
|
||||
int tcp_port;
|
||||
int tcp_port_secure;
|
||||
|
||||
String four_letter_word_white_list;
|
||||
String four_letter_word_allow_list;
|
||||
|
||||
String super_digest;
|
||||
|
||||
|
@ -129,7 +129,7 @@ void FourLetterCommandFactory::registerCommands(KeeperDispatcher & keeper_dispat
|
||||
FourLetterCommandPtr watch_command = std::make_shared<WatchCommand>(keeper_dispatcher);
|
||||
factory.registerCommand(watch_command);
|
||||
|
||||
factory.initializeWhiteList(keeper_dispatcher);
|
||||
factory.initializeAllowList(keeper_dispatcher);
|
||||
factory.setInitialize(true);
|
||||
}
|
||||
}
|
||||
@ -137,17 +137,17 @@ void FourLetterCommandFactory::registerCommands(KeeperDispatcher & keeper_dispat
|
||||
bool FourLetterCommandFactory::isEnabled(int32_t code)
|
||||
{
|
||||
checkInitialization();
|
||||
if (!white_list.empty() && *white_list.cbegin() == WHITE_LIST_ALL)
|
||||
if (!allow_list.empty() && *allow_list.cbegin() == ALLOW_LIST_ALL)
|
||||
return true;
|
||||
|
||||
return std::find(white_list.begin(), white_list.end(), code) != white_list.end();
|
||||
return std::find(allow_list.begin(), allow_list.end(), code) != allow_list.end();
|
||||
}
|
||||
|
||||
void FourLetterCommandFactory::initializeWhiteList(KeeperDispatcher & keeper_dispatcher)
|
||||
void FourLetterCommandFactory::initializeAllowList(KeeperDispatcher & keeper_dispatcher)
|
||||
{
|
||||
const auto & keeper_settings = keeper_dispatcher.getKeeperConfigurationAndSettings();
|
||||
|
||||
String list_str = keeper_settings->four_letter_word_white_list;
|
||||
String list_str = keeper_settings->four_letter_word_allow_list;
|
||||
Strings tokens;
|
||||
splitInto<','>(tokens, list_str);
|
||||
|
||||
@ -157,15 +157,15 @@ void FourLetterCommandFactory::initializeWhiteList(KeeperDispatcher & keeper_dis
|
||||
|
||||
if (token == "*")
|
||||
{
|
||||
white_list.clear();
|
||||
white_list.push_back(WHITE_LIST_ALL);
|
||||
allow_list.clear();
|
||||
allow_list.push_back(ALLOW_LIST_ALL);
|
||||
return;
|
||||
}
|
||||
else
|
||||
{
|
||||
if (commands.contains(IFourLetterCommand::toCode(token)))
|
||||
{
|
||||
white_list.push_back(IFourLetterCommand::toCode(token));
|
||||
allow_list.push_back(IFourLetterCommand::toCode(token));
|
||||
}
|
||||
else
|
||||
{
|
||||
|
@ -40,10 +40,10 @@ struct FourLetterCommandFactory : private boost::noncopyable
|
||||
{
|
||||
public:
|
||||
using Commands = std::unordered_map<int32_t, FourLetterCommandPtr>;
|
||||
using WhiteList = std::vector<int32_t>;
|
||||
using AllowList = std::vector<int32_t>;
|
||||
|
||||
///represent '*' which is used in white list
|
||||
static constexpr int32_t WHITE_LIST_ALL = 0;
|
||||
///represent '*' which is used in allow list
|
||||
static constexpr int32_t ALLOW_LIST_ALL = 0;
|
||||
|
||||
bool isKnown(int32_t code);
|
||||
bool isEnabled(int32_t code);
|
||||
@ -52,7 +52,7 @@ public:
|
||||
|
||||
/// There is no need to make it thread safe, because registration is no initialization and get is after startup.
|
||||
void registerCommand(FourLetterCommandPtr & command);
|
||||
void initializeWhiteList(KeeperDispatcher & keeper_dispatcher);
|
||||
void initializeAllowList(KeeperDispatcher & keeper_dispatcher);
|
||||
|
||||
void checkInitialization() const;
|
||||
bool isInitialized() const { return initialized; }
|
||||
@ -64,7 +64,7 @@ public:
|
||||
private:
|
||||
std::atomic<bool> initialized = false;
|
||||
Commands commands;
|
||||
WhiteList white_list;
|
||||
AllowList allow_list;
|
||||
};
|
||||
|
||||
/**Tests if server is running in a non-error state. The server will respond with imok if it is running.
|
||||
@ -130,7 +130,7 @@ struct StatResetCommand : public IFourLetterCommand
|
||||
};
|
||||
|
||||
/// A command that does not do anything except reply to client with predefined message.
|
||||
///It is used to inform clients who execute none white listed four letter word commands.
|
||||
///It is used to inform clients who execute none allow listed four letter word commands.
|
||||
struct NopCommand : public IFourLetterCommand
|
||||
{
|
||||
explicit NopCommand(KeeperDispatcher & keeper_dispatcher_)
|
||||
|
@ -726,18 +726,6 @@ void convertToFullIfSparse(Block & block)
|
||||
column.column = recursiveRemoveSparse(column.column);
|
||||
}
|
||||
|
||||
ColumnPtr getColumnFromBlock(const Block & block, const NameAndTypePair & column)
|
||||
{
|
||||
auto current_column = block.getByName(column.getNameInStorage()).column;
|
||||
current_column = current_column->decompress();
|
||||
|
||||
if (column.isSubcolumn())
|
||||
return column.getTypeInStorage()->getSubcolumn(column.getSubcolumnName(), current_column);
|
||||
|
||||
return current_column;
|
||||
}
|
||||
|
||||
|
||||
Block materializeBlock(const Block & block)
|
||||
{
|
||||
if (!block)
|
||||
|
@ -196,10 +196,6 @@ void getBlocksDifference(const Block & lhs, const Block & rhs, std::string & out
|
||||
|
||||
void convertToFullIfSparse(Block & block);
|
||||
|
||||
/// Helps in-memory storages to extract columns from block.
|
||||
/// Properly handles cases, when column is a subcolumn and when it is compressed.
|
||||
ColumnPtr getColumnFromBlock(const Block & block, const NameAndTypePair & column);
|
||||
|
||||
/// Converts columns-constants to full columns ("materializes" them).
|
||||
Block materializeBlock(const Block & block);
|
||||
void materializeBlockInplace(Block & block);
|
||||
|
@ -115,8 +115,8 @@ private:
|
||||
}
|
||||
|
||||
template <typename T, typename U>
|
||||
static std::enable_if_t<is_decimal<T> && is_decimal<U>, Shift>
|
||||
getScales(const DataTypePtr & left_type, const DataTypePtr & right_type)
|
||||
requires is_decimal<T> && is_decimal<U>
|
||||
static Shift getScales(const DataTypePtr & left_type, const DataTypePtr & right_type)
|
||||
{
|
||||
const DataTypeDecimalBase<T> * decimal0 = checkDecimalBase<T>(*left_type);
|
||||
const DataTypeDecimalBase<U> * decimal1 = checkDecimalBase<U>(*right_type);
|
||||
@ -137,8 +137,8 @@ private:
|
||||
}
|
||||
|
||||
template <typename T, typename U>
|
||||
static std::enable_if_t<is_decimal<T> && !is_decimal<U>, Shift>
|
||||
getScales(const DataTypePtr & left_type, const DataTypePtr &)
|
||||
requires is_decimal<T> && (!is_decimal<U>)
|
||||
static Shift getScales(const DataTypePtr & left_type, const DataTypePtr &)
|
||||
{
|
||||
Shift shift;
|
||||
const DataTypeDecimalBase<T> * decimal0 = checkDecimalBase<T>(*left_type);
|
||||
@ -148,8 +148,8 @@ private:
|
||||
}
|
||||
|
||||
template <typename T, typename U>
|
||||
static std::enable_if_t<!is_decimal<T> && is_decimal<U>, Shift>
|
||||
getScales(const DataTypePtr &, const DataTypePtr & right_type)
|
||||
requires (!is_decimal<T>) && is_decimal<U>
|
||||
static Shift getScales(const DataTypePtr &, const DataTypePtr & right_type)
|
||||
{
|
||||
Shift shift;
|
||||
const DataTypeDecimalBase<U> * decimal1 = checkDecimalBase<U>(*right_type);
|
||||
|
@ -99,6 +99,12 @@ inline Field getBinaryValue(UInt8 type, ReadBuffer & buf)
|
||||
readBinary(value, buf);
|
||||
return value;
|
||||
}
|
||||
case Field::Types::Object:
|
||||
{
|
||||
Object value;
|
||||
readBinary(value, buf);
|
||||
return value;
|
||||
}
|
||||
case Field::Types::AggregateFunctionState:
|
||||
{
|
||||
AggregateFunctionStateData value;
|
||||
@ -208,6 +214,40 @@ void writeText(const Map & x, WriteBuffer & buf)
|
||||
writeFieldText(Field(x), buf);
|
||||
}
|
||||
|
||||
void readBinary(Object & x, ReadBuffer & buf)
|
||||
{
|
||||
size_t size;
|
||||
readBinary(size, buf);
|
||||
|
||||
for (size_t index = 0; index < size; ++index)
|
||||
{
|
||||
UInt8 type;
|
||||
String key;
|
||||
readBinary(type, buf);
|
||||
readBinary(key, buf);
|
||||
x[key] = getBinaryValue(type, buf);
|
||||
}
|
||||
}
|
||||
|
||||
void writeBinary(const Object & x, WriteBuffer & buf)
|
||||
{
|
||||
const size_t size = x.size();
|
||||
writeBinary(size, buf);
|
||||
|
||||
for (const auto & [key, value] : x)
|
||||
{
|
||||
const UInt8 type = value.getType();
|
||||
writeBinary(type, buf);
|
||||
writeBinary(key, buf);
|
||||
Field::dispatch([&buf] (const auto & val) { FieldVisitorWriteBinary()(val, buf); }, value);
|
||||
}
|
||||
}
|
||||
|
||||
void writeText(const Object & x, WriteBuffer & buf)
|
||||
{
|
||||
writeFieldText(Field(x), buf);
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void readQuoted(DecimalField<T> & x, ReadBuffer & buf)
|
||||
{
|
||||
|
@ -3,6 +3,7 @@
|
||||
#include <cassert>
|
||||
#include <vector>
|
||||
#include <algorithm>
|
||||
#include <map>
|
||||
#include <type_traits>
|
||||
#include <functional>
|
||||
|
||||
@ -49,10 +50,22 @@ DEFINE_FIELD_VECTOR(Array);
|
||||
DEFINE_FIELD_VECTOR(Tuple);
|
||||
|
||||
/// An array with the following structure: [(key1, value1), (key2, value2), ...]
|
||||
DEFINE_FIELD_VECTOR(Map);
|
||||
DEFINE_FIELD_VECTOR(Map); /// TODO: use map instead of vector.
|
||||
|
||||
#undef DEFINE_FIELD_VECTOR
|
||||
|
||||
using FieldMap = std::map<String, Field, std::less<String>, AllocatorWithMemoryTracking<std::pair<const String, Field>>>;
|
||||
|
||||
#define DEFINE_FIELD_MAP(X) \
|
||||
struct X : public FieldMap \
|
||||
{ \
|
||||
using FieldMap::FieldMap; \
|
||||
}
|
||||
|
||||
DEFINE_FIELD_MAP(Object);
|
||||
|
||||
#undef DEFINE_FIELD_MAP
|
||||
|
||||
struct AggregateFunctionStateData
|
||||
{
|
||||
String name; /// Name with arguments.
|
||||
@ -219,6 +232,7 @@ template <> struct NearestFieldTypeImpl<String> { using Type = String; };
|
||||
template <> struct NearestFieldTypeImpl<Array> { using Type = Array; };
|
||||
template <> struct NearestFieldTypeImpl<Tuple> { using Type = Tuple; };
|
||||
template <> struct NearestFieldTypeImpl<Map> { using Type = Map; };
|
||||
template <> struct NearestFieldTypeImpl<Object> { using Type = Object; };
|
||||
template <> struct NearestFieldTypeImpl<bool> { using Type = UInt64; };
|
||||
template <> struct NearestFieldTypeImpl<Null> { using Type = Null; };
|
||||
|
||||
@ -226,7 +240,8 @@ template <> struct NearestFieldTypeImpl<AggregateFunctionStateData> { using Type
|
||||
|
||||
// For enum types, use the field type that corresponds to their underlying type.
|
||||
template <typename T>
|
||||
struct NearestFieldTypeImpl<T, std::enable_if_t<std::is_enum_v<T>>>
|
||||
requires std::is_enum_v<T>
|
||||
struct NearestFieldTypeImpl<T>
|
||||
{
|
||||
using Type = NearestFieldType<std::underlying_type_t<T>>;
|
||||
};
|
||||
@ -283,6 +298,7 @@ public:
|
||||
Map = 26,
|
||||
UUID = 27,
|
||||
Bool = 28,
|
||||
Object = 29,
|
||||
};
|
||||
};
|
||||
|
||||
@ -472,6 +488,7 @@ public:
|
||||
case Types::Array: return get<Array>() < rhs.get<Array>();
|
||||
case Types::Tuple: return get<Tuple>() < rhs.get<Tuple>();
|
||||
case Types::Map: return get<Map>() < rhs.get<Map>();
|
||||
case Types::Object: return get<Object>() < rhs.get<Object>();
|
||||
case Types::Decimal32: return get<DecimalField<Decimal32>>() < rhs.get<DecimalField<Decimal32>>();
|
||||
case Types::Decimal64: return get<DecimalField<Decimal64>>() < rhs.get<DecimalField<Decimal64>>();
|
||||
case Types::Decimal128: return get<DecimalField<Decimal128>>() < rhs.get<DecimalField<Decimal128>>();
|
||||
@ -510,6 +527,7 @@ public:
|
||||
case Types::Array: return get<Array>() <= rhs.get<Array>();
|
||||
case Types::Tuple: return get<Tuple>() <= rhs.get<Tuple>();
|
||||
case Types::Map: return get<Map>() <= rhs.get<Map>();
|
||||
case Types::Object: return get<Object>() <= rhs.get<Object>();
|
||||
case Types::Decimal32: return get<DecimalField<Decimal32>>() <= rhs.get<DecimalField<Decimal32>>();
|
||||
case Types::Decimal64: return get<DecimalField<Decimal64>>() <= rhs.get<DecimalField<Decimal64>>();
|
||||
case Types::Decimal128: return get<DecimalField<Decimal128>>() <= rhs.get<DecimalField<Decimal128>>();
|
||||
@ -548,6 +566,7 @@ public:
|
||||
case Types::Array: return get<Array>() == rhs.get<Array>();
|
||||
case Types::Tuple: return get<Tuple>() == rhs.get<Tuple>();
|
||||
case Types::Map: return get<Map>() == rhs.get<Map>();
|
||||
case Types::Object: return get<Object>() == rhs.get<Object>();
|
||||
case Types::UInt128: return get<UInt128>() == rhs.get<UInt128>();
|
||||
case Types::UInt256: return get<UInt256>() == rhs.get<UInt256>();
|
||||
case Types::Int128: return get<Int128>() == rhs.get<Int128>();
|
||||
@ -597,6 +616,7 @@ public:
|
||||
bool value = bool(field.template get<UInt64>());
|
||||
return f(value);
|
||||
}
|
||||
case Types::Object: return f(field.template get<Object>());
|
||||
case Types::Decimal32: return f(field.template get<DecimalField<Decimal32>>());
|
||||
case Types::Decimal64: return f(field.template get<DecimalField<Decimal64>>());
|
||||
case Types::Decimal128: return f(field.template get<DecimalField<Decimal128>>());
|
||||
@ -650,7 +670,8 @@ private:
|
||||
}
|
||||
|
||||
template <typename CharT>
|
||||
std::enable_if_t<sizeof(CharT) == 1> assignString(const CharT * data, size_t size)
|
||||
requires (sizeof(CharT) == 1)
|
||||
void assignString(const CharT * data, size_t size)
|
||||
{
|
||||
assert(which == Types::String);
|
||||
String * ptr = reinterpret_cast<String *>(&storage);
|
||||
@ -685,7 +706,8 @@ private:
|
||||
}
|
||||
|
||||
template <typename CharT>
|
||||
std::enable_if_t<sizeof(CharT) == 1> create(const CharT * data, size_t size)
|
||||
requires (sizeof(CharT) == 1)
|
||||
void create(const CharT * data, size_t size)
|
||||
{
|
||||
new (&storage) String(reinterpret_cast<const char *>(data), size);
|
||||
which = Types::String;
|
||||
@ -713,6 +735,9 @@ private:
|
||||
case Types::Map:
|
||||
destroy<Map>();
|
||||
break;
|
||||
case Types::Object:
|
||||
destroy<Object>();
|
||||
break;
|
||||
case Types::AggregateFunctionState:
|
||||
destroy<AggregateFunctionStateData>();
|
||||
break;
|
||||
@ -737,26 +762,27 @@ private:
|
||||
using Row = std::vector<Field>;
|
||||
|
||||
|
||||
template <> struct Field::TypeToEnum<Null> { static const Types::Which value = Types::Null; };
|
||||
template <> struct Field::TypeToEnum<UInt64> { static const Types::Which value = Types::UInt64; };
|
||||
template <> struct Field::TypeToEnum<UInt128> { static const Types::Which value = Types::UInt128; };
|
||||
template <> struct Field::TypeToEnum<UInt256> { static const Types::Which value = Types::UInt256; };
|
||||
template <> struct Field::TypeToEnum<Int64> { static const Types::Which value = Types::Int64; };
|
||||
template <> struct Field::TypeToEnum<Int128> { static const Types::Which value = Types::Int128; };
|
||||
template <> struct Field::TypeToEnum<Int256> { static const Types::Which value = Types::Int256; };
|
||||
template <> struct Field::TypeToEnum<UUID> { static const Types::Which value = Types::UUID; };
|
||||
template <> struct Field::TypeToEnum<Float64> { static const Types::Which value = Types::Float64; };
|
||||
template <> struct Field::TypeToEnum<String> { static const Types::Which value = Types::String; };
|
||||
template <> struct Field::TypeToEnum<Array> { static const Types::Which value = Types::Array; };
|
||||
template <> struct Field::TypeToEnum<Tuple> { static const Types::Which value = Types::Tuple; };
|
||||
template <> struct Field::TypeToEnum<Map> { static const Types::Which value = Types::Map; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<Decimal32>>{ static const Types::Which value = Types::Decimal32; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<Decimal64>>{ static const Types::Which value = Types::Decimal64; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<Decimal128>>{ static const Types::Which value = Types::Decimal128; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<Decimal256>>{ static const Types::Which value = Types::Decimal256; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<DateTime64>>{ static const Types::Which value = Types::Decimal64; };
|
||||
template <> struct Field::TypeToEnum<AggregateFunctionStateData>{ static const Types::Which value = Types::AggregateFunctionState; };
|
||||
template <> struct Field::TypeToEnum<bool>{ static const Types::Which value = Types::Bool; };
|
||||
template <> struct Field::TypeToEnum<Null> { static constexpr Types::Which value = Types::Null; };
|
||||
template <> struct Field::TypeToEnum<UInt64> { static constexpr Types::Which value = Types::UInt64; };
|
||||
template <> struct Field::TypeToEnum<UInt128> { static constexpr Types::Which value = Types::UInt128; };
|
||||
template <> struct Field::TypeToEnum<UInt256> { static constexpr Types::Which value = Types::UInt256; };
|
||||
template <> struct Field::TypeToEnum<Int64> { static constexpr Types::Which value = Types::Int64; };
|
||||
template <> struct Field::TypeToEnum<Int128> { static constexpr Types::Which value = Types::Int128; };
|
||||
template <> struct Field::TypeToEnum<Int256> { static constexpr Types::Which value = Types::Int256; };
|
||||
template <> struct Field::TypeToEnum<UUID> { static constexpr Types::Which value = Types::UUID; };
|
||||
template <> struct Field::TypeToEnum<Float64> { static constexpr Types::Which value = Types::Float64; };
|
||||
template <> struct Field::TypeToEnum<String> { static constexpr Types::Which value = Types::String; };
|
||||
template <> struct Field::TypeToEnum<Array> { static constexpr Types::Which value = Types::Array; };
|
||||
template <> struct Field::TypeToEnum<Tuple> { static constexpr Types::Which value = Types::Tuple; };
|
||||
template <> struct Field::TypeToEnum<Map> { static constexpr Types::Which value = Types::Map; };
|
||||
template <> struct Field::TypeToEnum<Object> { static constexpr Types::Which value = Types::Object; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<Decimal32>>{ static constexpr Types::Which value = Types::Decimal32; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<Decimal64>>{ static constexpr Types::Which value = Types::Decimal64; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<Decimal128>>{ static constexpr Types::Which value = Types::Decimal128; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<Decimal256>>{ static constexpr Types::Which value = Types::Decimal256; };
|
||||
template <> struct Field::TypeToEnum<DecimalField<DateTime64>>{ static constexpr Types::Which value = Types::Decimal64; };
|
||||
template <> struct Field::TypeToEnum<AggregateFunctionStateData>{ static constexpr Types::Which value = Types::AggregateFunctionState; };
|
||||
template <> struct Field::TypeToEnum<bool>{ static constexpr Types::Which value = Types::Bool; };
|
||||
|
||||
template <> struct Field::EnumToType<Field::Types::Null> { using Type = Null; };
|
||||
template <> struct Field::EnumToType<Field::Types::UInt64> { using Type = UInt64; };
|
||||
@ -771,6 +797,7 @@ template <> struct Field::EnumToType<Field::Types::String> { using Type = Strin
|
||||
template <> struct Field::EnumToType<Field::Types::Array> { using Type = Array; };
|
||||
template <> struct Field::EnumToType<Field::Types::Tuple> { using Type = Tuple; };
|
||||
template <> struct Field::EnumToType<Field::Types::Map> { using Type = Map; };
|
||||
template <> struct Field::EnumToType<Field::Types::Object> { using Type = Object; };
|
||||
template <> struct Field::EnumToType<Field::Types::Decimal32> { using Type = DecimalField<Decimal32>; };
|
||||
template <> struct Field::EnumToType<Field::Types::Decimal64> { using Type = DecimalField<Decimal64>; };
|
||||
template <> struct Field::EnumToType<Field::Types::Decimal128> { using Type = DecimalField<Decimal128>; };
|
||||
@ -931,34 +958,39 @@ class WriteBuffer;
|
||||
|
||||
/// It is assumed that all elements of the array have the same type.
|
||||
void readBinary(Array & x, ReadBuffer & buf);
|
||||
|
||||
[[noreturn]] inline void readText(Array &, ReadBuffer &) { throw Exception("Cannot read Array.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
[[noreturn]] inline void readQuoted(Array &, ReadBuffer &) { throw Exception("Cannot read Array.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
/// It is assumed that all elements of the array have the same type.
|
||||
/// Also write size and type into buf. UInt64 and Int64 is written in variadic size form
|
||||
void writeBinary(const Array & x, WriteBuffer & buf);
|
||||
|
||||
void writeText(const Array & x, WriteBuffer & buf);
|
||||
|
||||
[[noreturn]] inline void writeQuoted(const Array &, WriteBuffer &) { throw Exception("Cannot write Array quoted.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
void readBinary(Tuple & x, ReadBuffer & buf);
|
||||
|
||||
[[noreturn]] inline void readText(Tuple &, ReadBuffer &) { throw Exception("Cannot read Tuple.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
[[noreturn]] inline void readQuoted(Tuple &, ReadBuffer &) { throw Exception("Cannot read Tuple.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
void writeBinary(const Tuple & x, WriteBuffer & buf);
|
||||
|
||||
void writeText(const Tuple & x, WriteBuffer & buf);
|
||||
[[noreturn]] inline void writeQuoted(const Tuple &, WriteBuffer &) { throw Exception("Cannot write Tuple quoted.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
void readBinary(Map & x, ReadBuffer & buf);
|
||||
[[noreturn]] inline void readText(Map &, ReadBuffer &) { throw Exception("Cannot read Map.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
[[noreturn]] inline void readQuoted(Map &, ReadBuffer &) { throw Exception("Cannot read Map.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
void writeBinary(const Map & x, WriteBuffer & buf);
|
||||
void writeText(const Map & x, WriteBuffer & buf);
|
||||
[[noreturn]] inline void writeQuoted(const Map &, WriteBuffer &) { throw Exception("Cannot write Map quoted.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
void readBinary(Object & x, ReadBuffer & buf);
|
||||
[[noreturn]] inline void readText(Object &, ReadBuffer &) { throw Exception("Cannot read Object.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
[[noreturn]] inline void readQuoted(Object &, ReadBuffer &) { throw Exception("Cannot read Object.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
void writeBinary(const Object & x, WriteBuffer & buf);
|
||||
void writeText(const Object & x, WriteBuffer & buf);
|
||||
[[noreturn]] inline void writeQuoted(const Object &, WriteBuffer &) { throw Exception("Cannot write Object quoted.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
__attribute__ ((noreturn)) inline void writeText(const AggregateFunctionStateData &, WriteBuffer &)
|
||||
{
|
||||
// This probably doesn't make any sense, but we have to have it for
|
||||
@ -977,8 +1009,6 @@ void readQuoted(DecimalField<T> & x, ReadBuffer & buf);
|
||||
|
||||
void writeFieldText(const Field & x, WriteBuffer & buf);
|
||||
|
||||
[[noreturn]] inline void writeQuoted(const Tuple &, WriteBuffer &) { throw Exception("Cannot write Tuple quoted.", ErrorCodes::NOT_IMPLEMENTED); }
|
||||
|
||||
String toString(const Field & x);
|
||||
|
||||
}
|
||||
|
@ -17,7 +17,8 @@ struct MultiEnum
|
||||
: MultiEnum((toBitFlag(v) | ... | 0u))
|
||||
{}
|
||||
|
||||
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
|
||||
template <typename ValueType>
|
||||
requires std::is_convertible_v<ValueType, StorageType>
|
||||
constexpr explicit MultiEnum(ValueType v)
|
||||
: bitset(v)
|
||||
{
|
||||
@ -53,7 +54,8 @@ struct MultiEnum
|
||||
return bitset;
|
||||
}
|
||||
|
||||
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
|
||||
template <typename ValueType>
|
||||
requires std::is_convertible_v<ValueType, StorageType>
|
||||
void setValue(ValueType new_value)
|
||||
{
|
||||
// Can't set value from any enum avoid confusion
|
||||
@ -66,7 +68,8 @@ struct MultiEnum
|
||||
return bitset == other.bitset;
|
||||
}
|
||||
|
||||
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
|
||||
template <typename ValueType>
|
||||
requires std::is_convertible_v<ValueType, StorageType>
|
||||
bool operator==(ValueType other) const
|
||||
{
|
||||
// Shouldn't be comparable with any enum to avoid confusion
|
||||
@ -80,13 +83,15 @@ struct MultiEnum
|
||||
return !(*this == other);
|
||||
}
|
||||
|
||||
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
|
||||
template <typename ValueType>
|
||||
requires std::is_convertible_v<ValueType, StorageType>
|
||||
friend bool operator==(ValueType left, MultiEnum right)
|
||||
{
|
||||
return right.operator==(left);
|
||||
}
|
||||
|
||||
template <typename L, typename = typename std::enable_if<!std::is_same_v<L, MultiEnum>>::type>
|
||||
template <typename L>
|
||||
requires (!std::is_same_v<L, MultiEnum>)
|
||||
friend bool operator!=(L left, MultiEnum right)
|
||||
{
|
||||
return !(right.operator==(left));
|
||||
|
@ -44,6 +44,7 @@ class IColumn;
|
||||
M(UInt64, min_insert_block_size_bytes_for_materialized_views, 0, "Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)", 0) \
|
||||
M(UInt64, max_joined_block_size_rows, DEFAULT_BLOCK_SIZE, "Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.", 0) \
|
||||
M(UInt64, max_insert_threads, 0, "The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see 'max_threads' setting.", 0) \
|
||||
M(UInt64, max_insert_delayed_streams_for_parallel_write, 0, "The maximum number of streams (columns) to delay final part flush. Default - auto (1000 in case of underlying storage supports parallel write, for example S3 and disabled otherwise)", 0) \
|
||||
M(UInt64, max_final_threads, 16, "The maximum number of threads to read from table with FINAL.", 0) \
|
||||
M(MaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
|
||||
M(UInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.", 0) \
|
||||
@ -136,7 +137,7 @@ class IColumn;
|
||||
\
|
||||
M(Bool, skip_unavailable_shards, false, "If true, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.", 0) \
|
||||
\
|
||||
M(UInt64, parallel_distributed_insert_select, 0, "Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard", 0) \
|
||||
M(UInt64, parallel_distributed_insert_select, 0, "Process distributed INSERT SELECT query in the same cluster on local tables on every shard; if set to 1 - SELECT is executed on each shard; if set to 2 - SELECT and INSERT are executed on each shard", 0) \
|
||||
M(UInt64, distributed_group_by_no_merge, 0, "If 1, Do not merge aggregation states from different servers for distributed queries (shards will process query up to the Complete stage, initiator just proxies the data from the shards). If 2 the initiator will apply ORDER BY and LIMIT stages (it is not in case when shard process query up to the Complete stage)", 0) \
|
||||
M(UInt64, distributed_push_down_limit, 1, "If 1, LIMIT will be applied on each shard separatelly. Usually you don't need to use it, since this will be done automatically if it is possible, i.e. for simple query SELECT FROM LIMIT.", 0) \
|
||||
M(Bool, optimize_distributed_group_by_sharding_key, true, "Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server).", 0) \
|
||||
@ -471,6 +472,7 @@ class IColumn;
|
||||
M(Bool, allow_experimental_geo_types, false, "Allow geo data types such as Point, Ring, Polygon, MultiPolygon", 0) \
|
||||
M(Bool, data_type_default_nullable, false, "Data types without NULL or NOT NULL will make Nullable", 0) \
|
||||
M(Bool, cast_keep_nullable, false, "CAST operator keep Nullable for result data type", 0) \
|
||||
M(Bool, cast_ipv4_ipv6_default_on_conversion_error, false, "CAST operator into IPv4, CAST operator into IPV6 type, toIPv4, toIPv6 functions will return default value instead of throwing exception on conversion error.", 0) \
|
||||
M(Bool, alter_partition_verbose_result, false, "Output information about affected parts. Currently works only for FREEZE and ATTACH commands.", 0) \
|
||||
M(Bool, allow_experimental_database_materialized_mysql, false, "Allow to create database with Engine=MaterializedMySQL(...).", 0) \
|
||||
M(Bool, allow_experimental_database_materialized_postgresql, false, "Allow to create database with Engine=MaterializedPostgreSQL(...).", 0) \
|
||||
@ -490,6 +492,7 @@ class IColumn;
|
||||
M(Bool, force_optimize_projection, false, "If projection optimization is enabled, SELECT queries need to use projection", 0) \
|
||||
M(Bool, async_socket_for_remote, true, "Asynchronously read from socket executing remote query", 0) \
|
||||
M(Bool, insert_null_as_default, true, "Insert DEFAULT values instead of NULL in INSERT SELECT (UNION ALL)", 0) \
|
||||
M(Bool, describe_extend_object_types, false, "Deduce concrete type of columns of type Object in DESCRIBE query", 0) \
|
||||
M(Bool, describe_include_subcolumns, false, "If true, subcolumns of all table columns will be included into result of DESCRIBE query", 0) \
|
||||
\
|
||||
M(Bool, optimize_rewrite_sum_if_to_count_if, true, "Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent", 0) \
|
||||
@ -551,7 +554,7 @@ class IColumn;
|
||||
M(UInt64, remote_fs_read_max_backoff_ms, 10000, "Max wait time when trying to read data for remote disk", 0) \
|
||||
M(UInt64, remote_fs_read_backoff_max_tries, 5, "Max attempts to read with backoff", 0) \
|
||||
M(Bool, remote_fs_enable_cache, true, "Use cache for remote filesystem. This setting does not turn on/off cache for disks (must me done via disk config), but allows to bypass cache for some queries if intended", 0) \
|
||||
M(UInt64, remote_fs_cache_max_wait_sec, 5, "Allow to wait a most this number of seconds for download of current remote_fs_buffer_size bytes, and skip cache if exceeded", 0) \
|
||||
M(UInt64, remote_fs_cache_max_wait_sec, 5, "Allow to wait at most this number of seconds for download of current remote_fs_buffer_size bytes, and skip cache if exceeded", 0) \
|
||||
\
|
||||
M(UInt64, http_max_tries, 10, "Max attempts to read via http.", 0) \
|
||||
M(UInt64, http_retry_initial_backoff_ms, 100, "Min milliseconds for backoff, when retrying read via http", 0) \
|
||||
@ -566,6 +569,7 @@ class IColumn;
|
||||
/** Experimental functions */ \
|
||||
M(Bool, allow_experimental_funnel_functions, false, "Enable experimental functions for funnel analysis.", 0) \
|
||||
M(Bool, allow_experimental_nlp_functions, false, "Enable experimental functions for natural language processing.", 0) \
|
||||
M(Bool, allow_experimental_object_type, false, "Allow Object and JSON data types", 0) \
|
||||
M(String, insert_deduplication_token, "", "If not empty, used for duplicate detection instead of data digest", 0) \
|
||||
// End of COMMON_SETTINGS
|
||||
// Please add settings related to formats into the FORMAT_FACTORY_SETTINGS and move obsolete settings to OBSOLETE_SETTINGS.
|
||||
|
@ -87,6 +87,7 @@ enum class TypeIndex
|
||||
AggregateFunction,
|
||||
LowCardinality,
|
||||
Map,
|
||||
Object,
|
||||
};
|
||||
#if !defined(__clang__)
|
||||
#pragma GCC diagnostic pop
|
||||
|
@ -15,6 +15,8 @@
|
||||
#cmakedefine01 USE_NURAFT
|
||||
#cmakedefine01 USE_NLP
|
||||
#cmakedefine01 USE_KRB5
|
||||
#cmakedefine01 USE_SIMDJSON
|
||||
#cmakedefine01 USE_RAPIDJSON
|
||||
#cmakedefine01 USE_FILELOG
|
||||
#cmakedefine01 USE_ODBC
|
||||
#cmakedefine01 USE_REPLXX
|
||||
|
@ -7,7 +7,8 @@ namespace DB
|
||||
// Use template to disable implicit casting for certain overloaded types such as Field, which leads
|
||||
// to overload resolution ambiguity.
|
||||
class Field;
|
||||
template <typename T, typename U = std::enable_if_t<std::is_same_v<T, Field>>>
|
||||
template <typename T>
|
||||
requires std::is_same_v<T, Field>
|
||||
std::ostream & operator<<(std::ostream & stream, const T & what);
|
||||
|
||||
struct NameAndTypePair;
|
||||
|
@ -1,3 +1,5 @@
|
||||
add_subdirectory (Serializations)
|
||||
|
||||
if (ENABLE_EXAMPLES)
|
||||
add_subdirectory(examples)
|
||||
add_subdirectory (examples)
|
||||
endif ()
|
||||
|
@ -213,6 +213,7 @@ DataTypeFactory::DataTypeFactory()
|
||||
registerDataTypeDomainSimpleAggregateFunction(*this);
|
||||
registerDataTypeDomainGeo(*this);
|
||||
registerDataTypeMap(*this);
|
||||
registerDataTypeObject(*this);
|
||||
}
|
||||
|
||||
DataTypeFactory & DataTypeFactory::instance()
|
||||
|
@ -87,5 +87,6 @@ void registerDataTypeDomainIPv4AndIPv6(DataTypeFactory & factory);
|
||||
void registerDataTypeDomainBool(DataTypeFactory & factory);
|
||||
void registerDataTypeDomainSimpleAggregateFunction(DataTypeFactory & factory);
|
||||
void registerDataTypeDomainGeo(DataTypeFactory & factory);
|
||||
void registerDataTypeObject(DataTypeFactory & factory);
|
||||
|
||||
}
|
||||
|
82
src/DataTypes/DataTypeObject.cpp
Normal file
82
src/DataTypes/DataTypeObject.cpp
Normal file
@ -0,0 +1,82 @@
|
||||
#include <DataTypes/DataTypeObject.h>
|
||||
#include <DataTypes/DataTypeFactory.h>
|
||||
#include <DataTypes/Serializations/SerializationObject.h>
|
||||
|
||||
#include <Parsers/IAST.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <IO/Operators.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
|
||||
extern const int UNEXPECTED_AST_STRUCTURE;
|
||||
}
|
||||
|
||||
DataTypeObject::DataTypeObject(const String & schema_format_, bool is_nullable_)
|
||||
: schema_format(Poco::toLower(schema_format_))
|
||||
, is_nullable(is_nullable_)
|
||||
{
|
||||
}
|
||||
|
||||
bool DataTypeObject::equals(const IDataType & rhs) const
|
||||
{
|
||||
if (const auto * object = typeid_cast<const DataTypeObject *>(&rhs))
|
||||
return schema_format == object->schema_format && is_nullable == object->is_nullable;
|
||||
return false;
|
||||
}
|
||||
|
||||
SerializationPtr DataTypeObject::doGetDefaultSerialization() const
|
||||
{
|
||||
return getObjectSerialization(schema_format);
|
||||
}
|
||||
|
||||
String DataTypeObject::doGetName() const
|
||||
{
|
||||
WriteBufferFromOwnString out;
|
||||
if (is_nullable)
|
||||
out << "Object(Nullable(" << quote << schema_format << "))";
|
||||
else
|
||||
out << "Object(" << quote << schema_format << ")";
|
||||
return out.str();
|
||||
}
|
||||
|
||||
static DataTypePtr create(const ASTPtr & arguments)
|
||||
{
|
||||
if (!arguments || arguments->children.size() != 1)
|
||||
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
|
||||
"Object data type family must have one argument - name of schema format");
|
||||
|
||||
ASTPtr schema_argument = arguments->children[0];
|
||||
bool is_nullable = false;
|
||||
|
||||
if (const auto * func = schema_argument->as<ASTFunction>())
|
||||
{
|
||||
if (func->name != "Nullable" || func->arguments->children.size() != 1)
|
||||
throw Exception(ErrorCodes::UNEXPECTED_AST_STRUCTURE,
|
||||
"Expected 'Nullable(<schema_name>)' as parameter for type Object", func->name);
|
||||
|
||||
schema_argument = func->arguments->children[0];
|
||||
is_nullable = true;
|
||||
}
|
||||
|
||||
const auto * literal = schema_argument->as<ASTLiteral>();
|
||||
if (!literal || literal->value.getType() != Field::Types::String)
|
||||
throw Exception(ErrorCodes::UNEXPECTED_AST_STRUCTURE,
|
||||
"Object data type family must have a const string as its schema name parameter");
|
||||
|
||||
return std::make_shared<DataTypeObject>(literal->value.get<const String &>(), is_nullable);
|
||||
}
|
||||
|
||||
void registerDataTypeObject(DataTypeFactory & factory)
|
||||
{
|
||||
factory.registerDataType("Object", create);
|
||||
factory.registerSimpleDataType("JSON",
|
||||
[] { return std::make_shared<DataTypeObject>("JSON", false); },
|
||||
DataTypeFactory::CaseInsensitive);
|
||||
}
|
||||
|
||||
}
|
45
src/DataTypes/DataTypeObject.h
Normal file
45
src/DataTypes/DataTypeObject.h
Normal file
@ -0,0 +1,45 @@
|
||||
#pragma once
|
||||
|
||||
#include <DataTypes/IDataType.h>
|
||||
#include <Core/Field.h>
|
||||
#include <Columns/ColumnObject.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int NOT_IMPLEMENTED;
|
||||
}
|
||||
|
||||
class DataTypeObject : public IDataType
|
||||
{
|
||||
private:
|
||||
String schema_format;
|
||||
bool is_nullable;
|
||||
|
||||
public:
|
||||
DataTypeObject(const String & schema_format_, bool is_nullable_);
|
||||
|
||||
const char * getFamilyName() const override { return "Object"; }
|
||||
String doGetName() const override;
|
||||
TypeIndex getTypeId() const override { return TypeIndex::Object; }
|
||||
|
||||
MutableColumnPtr createColumn() const override { return ColumnObject::create(is_nullable); }
|
||||
|
||||
Field getDefault() const override
|
||||
{
|
||||
throw Exception("Method getDefault() is not implemented for data type " + getName(), ErrorCodes::NOT_IMPLEMENTED);
|
||||
}
|
||||
|
||||
bool haveSubtypes() const override { return false; }
|
||||
bool equals(const IDataType & rhs) const override;
|
||||
bool isParametric() const override { return true; }
|
||||
|
||||
SerializationPtr doGetDefaultSerialization() const override;
|
||||
|
||||
bool hasNullableSubcolumns() const { return is_nullable; }
|
||||
};
|
||||
|
||||
}
|
@ -1,6 +1,7 @@
|
||||
#include <DataTypes/FieldToDataType.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeMap.h>
|
||||
#include <DataTypes/DataTypeObject.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataTypes/DataTypesDecimal.h>
|
||||
#include <DataTypes/DataTypeString.h>
|
||||
@ -108,12 +109,11 @@ DataTypePtr FieldToDataType::operator() (const Array & x) const
|
||||
element_types.reserve(x.size());
|
||||
|
||||
for (const Field & elem : x)
|
||||
element_types.emplace_back(applyVisitor(FieldToDataType(), elem));
|
||||
element_types.emplace_back(applyVisitor(FieldToDataType(allow_convertion_to_string), elem));
|
||||
|
||||
return std::make_shared<DataTypeArray>(getLeastSupertype(element_types));
|
||||
return std::make_shared<DataTypeArray>(getLeastSupertype(element_types, allow_convertion_to_string));
|
||||
}
|
||||
|
||||
|
||||
DataTypePtr FieldToDataType::operator() (const Tuple & tuple) const
|
||||
{
|
||||
if (tuple.empty())
|
||||
@ -123,7 +123,7 @@ DataTypePtr FieldToDataType::operator() (const Tuple & tuple) const
|
||||
element_types.reserve(tuple.size());
|
||||
|
||||
for (const auto & element : tuple)
|
||||
element_types.push_back(applyVisitor(FieldToDataType(), element));
|
||||
element_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), element));
|
||||
|
||||
return std::make_shared<DataTypeTuple>(element_types);
|
||||
}
|
||||
@ -139,11 +139,19 @@ DataTypePtr FieldToDataType::operator() (const Map & map) const
|
||||
{
|
||||
const auto & tuple = elem.safeGet<const Tuple &>();
|
||||
assert(tuple.size() == 2);
|
||||
key_types.push_back(applyVisitor(FieldToDataType(), tuple[0]));
|
||||
value_types.push_back(applyVisitor(FieldToDataType(), tuple[1]));
|
||||
key_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), tuple[0]));
|
||||
value_types.push_back(applyVisitor(FieldToDataType(allow_convertion_to_string), tuple[1]));
|
||||
}
|
||||
|
||||
return std::make_shared<DataTypeMap>(getLeastSupertype(key_types), getLeastSupertype(value_types));
|
||||
return std::make_shared<DataTypeMap>(
|
||||
getLeastSupertype(key_types, allow_convertion_to_string),
|
||||
getLeastSupertype(value_types, allow_convertion_to_string));
|
||||
}
|
||||
|
||||
DataTypePtr FieldToDataType::operator() (const Object &) const
|
||||
{
|
||||
/// TODO: Do we need different parameters for type Object?
|
||||
return std::make_shared<DataTypeObject>("json", false);
|
||||
}
|
||||
|
||||
DataTypePtr FieldToDataType::operator() (const AggregateFunctionStateData & x) const
|
||||
|
@ -20,26 +20,34 @@ using DataTypePtr = std::shared_ptr<const IDataType>;
|
||||
class FieldToDataType : public StaticVisitor<DataTypePtr>
|
||||
{
|
||||
public:
|
||||
FieldToDataType(bool allow_convertion_to_string_ = false)
|
||||
: allow_convertion_to_string(allow_convertion_to_string_)
|
||||
{
|
||||
}
|
||||
|
||||
DataTypePtr operator() (const Null & x) const;
|
||||
DataTypePtr operator() (const UInt64 & x) const;
|
||||
DataTypePtr operator() (const UInt128 & x) const;
|
||||
DataTypePtr operator() (const UInt256 & x) const;
|
||||
DataTypePtr operator() (const Int64 & x) const;
|
||||
DataTypePtr operator() (const Int128 & x) const;
|
||||
DataTypePtr operator() (const Int256 & x) const;
|
||||
DataTypePtr operator() (const UUID & x) const;
|
||||
DataTypePtr operator() (const Float64 & x) const;
|
||||
DataTypePtr operator() (const String & x) const;
|
||||
DataTypePtr operator() (const Array & x) const;
|
||||
DataTypePtr operator() (const Tuple & tuple) const;
|
||||
DataTypePtr operator() (const Map & map) const;
|
||||
DataTypePtr operator() (const Object & map) const;
|
||||
DataTypePtr operator() (const DecimalField<Decimal32> & x) const;
|
||||
DataTypePtr operator() (const DecimalField<Decimal64> & x) const;
|
||||
DataTypePtr operator() (const DecimalField<Decimal128> & x) const;
|
||||
DataTypePtr operator() (const DecimalField<Decimal256> & x) const;
|
||||
DataTypePtr operator() (const AggregateFunctionStateData & x) const;
|
||||
DataTypePtr operator() (const UInt256 & x) const;
|
||||
DataTypePtr operator() (const Int256 & x) const;
|
||||
DataTypePtr operator() (const bool & x) const;
|
||||
|
||||
private:
|
||||
bool allow_convertion_to_string;
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
|
@ -126,19 +126,25 @@ DataTypePtr IDataType::tryGetSubcolumnType(const String & subcolumn_name) const
|
||||
DataTypePtr IDataType::getSubcolumnType(const String & subcolumn_name) const
|
||||
{
|
||||
SubstreamData data = { getDefaultSerialization(), getPtr(), nullptr, nullptr };
|
||||
return getForSubcolumn<DataTypePtr>(subcolumn_name, data, &SubstreamData::type);
|
||||
return getForSubcolumn<DataTypePtr>(subcolumn_name, data, &SubstreamData::type, true);
|
||||
}
|
||||
|
||||
SerializationPtr IDataType::getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const
|
||||
ColumnPtr IDataType::tryGetSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const
|
||||
{
|
||||
SubstreamData data = { serialization, nullptr, nullptr, nullptr };
|
||||
return getForSubcolumn<SerializationPtr>(subcolumn_name, data, &SubstreamData::serialization);
|
||||
SubstreamData data = { getDefaultSerialization(), nullptr, column, nullptr };
|
||||
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column, false);
|
||||
}
|
||||
|
||||
ColumnPtr IDataType::getSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const
|
||||
{
|
||||
SubstreamData data = { getDefaultSerialization(), nullptr, column, nullptr };
|
||||
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column);
|
||||
return getForSubcolumn<ColumnPtr>(subcolumn_name, data, &SubstreamData::column, true);
|
||||
}
|
||||
|
||||
SerializationPtr IDataType::getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const
|
||||
{
|
||||
SubstreamData data = { serialization, nullptr, nullptr, nullptr };
|
||||
return getForSubcolumn<SerializationPtr>(subcolumn_name, data, &SubstreamData::serialization, true);
|
||||
}
|
||||
|
||||
Names IDataType::getSubcolumnNames() const
|
||||
|
@ -82,9 +82,11 @@ public:
|
||||
DataTypePtr tryGetSubcolumnType(const String & subcolumn_name) const;
|
||||
DataTypePtr getSubcolumnType(const String & subcolumn_name) const;
|
||||
|
||||
SerializationPtr getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const;
|
||||
ColumnPtr tryGetSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const;
|
||||
ColumnPtr getSubcolumn(const String & subcolumn_name, const ColumnPtr & column) const;
|
||||
|
||||
SerializationPtr getSubcolumnSerialization(const String & subcolumn_name, const SerializationPtr & serialization) const;
|
||||
|
||||
using SubstreamData = ISerialization::SubstreamData;
|
||||
using SubstreamPath = ISerialization::SubstreamPath;
|
||||
|
||||
@ -309,7 +311,7 @@ private:
|
||||
const String & subcolumn_name,
|
||||
const SubstreamData & data,
|
||||
Ptr SubstreamData::*member,
|
||||
bool throw_if_null = true) const;
|
||||
bool throw_if_null) const;
|
||||
};
|
||||
|
||||
|
||||
@ -373,11 +375,13 @@ struct WhichDataType
|
||||
constexpr bool isMap() const {return idx == TypeIndex::Map; }
|
||||
constexpr bool isSet() const { return idx == TypeIndex::Set; }
|
||||
constexpr bool isInterval() const { return idx == TypeIndex::Interval; }
|
||||
constexpr bool isObject() const { return idx == TypeIndex::Object; }
|
||||
|
||||
constexpr bool isNothing() const { return idx == TypeIndex::Nothing; }
|
||||
constexpr bool isNullable() const { return idx == TypeIndex::Nullable; }
|
||||
constexpr bool isFunction() const { return idx == TypeIndex::Function; }
|
||||
constexpr bool isAggregateFunction() const { return idx == TypeIndex::AggregateFunction; }
|
||||
constexpr bool isSimple() const { return isInt() || isUInt() || isFloat() || isString(); }
|
||||
|
||||
constexpr bool isLowCarnality() const { return idx == TypeIndex::LowCardinality; }
|
||||
};
|
||||
@ -399,10 +403,16 @@ inline bool isEnum(const DataTypePtr & data_type) { return WhichDataType(data_ty
|
||||
inline bool isDecimal(const DataTypePtr & data_type) { return WhichDataType(data_type).isDecimal(); }
|
||||
inline bool isTuple(const DataTypePtr & data_type) { return WhichDataType(data_type).isTuple(); }
|
||||
inline bool isArray(const DataTypePtr & data_type) { return WhichDataType(data_type).isArray(); }
|
||||
inline bool isMap(const DataTypePtr & data_type) { return WhichDataType(data_type).isMap(); }
|
||||
inline bool isMap(const DataTypePtr & data_type) {return WhichDataType(data_type).isMap(); }
|
||||
inline bool isNothing(const DataTypePtr & data_type) { return WhichDataType(data_type).isNothing(); }
|
||||
inline bool isUUID(const DataTypePtr & data_type) { return WhichDataType(data_type).isUUID(); }
|
||||
|
||||
template <typename T>
|
||||
inline bool isObject(const T & data_type)
|
||||
{
|
||||
return WhichDataType(data_type).isObject();
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
inline bool isUInt8(const T & data_type)
|
||||
{
|
||||
|
@ -30,6 +30,12 @@ namespace Nested
|
||||
|
||||
std::string concatenateName(const std::string & nested_table_name, const std::string & nested_field_name)
|
||||
{
|
||||
if (nested_table_name.empty())
|
||||
return nested_field_name;
|
||||
|
||||
if (nested_field_name.empty())
|
||||
return nested_table_name;
|
||||
|
||||
return nested_table_name + "." + nested_field_name;
|
||||
}
|
||||
|
||||
|
703
src/DataTypes/ObjectUtils.cpp
Normal file
703
src/DataTypes/ObjectUtils.cpp
Normal file
@ -0,0 +1,703 @@
|
||||
#include <DataTypes/ObjectUtils.h>
|
||||
#include <DataTypes/DataTypeObject.h>
|
||||
#include <DataTypes/DataTypeNothing.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeNullable.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataTypes/DataTypeNested.h>
|
||||
#include <DataTypes/DataTypeFactory.h>
|
||||
#include <DataTypes/getLeastSupertype.h>
|
||||
#include <DataTypes/NestedUtils.h>
|
||||
#include <Columns/ColumnObject.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnNullable.h>
|
||||
#include <Parsers/ASTSelectQuery.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <IO/Operators.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int TYPE_MISMATCH;
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int DUPLICATE_COLUMN;
|
||||
}
|
||||
|
||||
size_t getNumberOfDimensions(const IDataType & type)
|
||||
{
|
||||
if (const auto * type_array = typeid_cast<const DataTypeArray *>(&type))
|
||||
return type_array->getNumberOfDimensions();
|
||||
return 0;
|
||||
}
|
||||
|
||||
size_t getNumberOfDimensions(const IColumn & column)
|
||||
{
|
||||
if (const auto * column_array = checkAndGetColumn<ColumnArray>(column))
|
||||
return column_array->getNumberOfDimensions();
|
||||
return 0;
|
||||
}
|
||||
|
||||
DataTypePtr getBaseTypeOfArray(const DataTypePtr & type)
|
||||
{
|
||||
/// Get raw pointers to avoid extra copying of type pointers.
|
||||
const DataTypeArray * last_array = nullptr;
|
||||
const auto * current_type = type.get();
|
||||
while (const auto * type_array = typeid_cast<const DataTypeArray *>(current_type))
|
||||
{
|
||||
current_type = type_array->getNestedType().get();
|
||||
last_array = type_array;
|
||||
}
|
||||
|
||||
return last_array ? last_array->getNestedType() : type;
|
||||
}
|
||||
|
||||
ColumnPtr getBaseColumnOfArray(const ColumnPtr & column)
|
||||
{
|
||||
/// Get raw pointers to avoid extra copying of column pointers.
|
||||
const ColumnArray * last_array = nullptr;
|
||||
const auto * current_column = column.get();
|
||||
while (const auto * column_array = checkAndGetColumn<ColumnArray>(current_column))
|
||||
{
|
||||
current_column = &column_array->getData();
|
||||
last_array = column_array;
|
||||
}
|
||||
|
||||
return last_array ? last_array->getDataPtr() : column;
|
||||
}
|
||||
|
||||
DataTypePtr createArrayOfType(DataTypePtr type, size_t num_dimensions)
|
||||
{
|
||||
for (size_t i = 0; i < num_dimensions; ++i)
|
||||
type = std::make_shared<DataTypeArray>(std::move(type));
|
||||
return type;
|
||||
}
|
||||
|
||||
ColumnPtr createArrayOfColumn(ColumnPtr column, size_t num_dimensions)
|
||||
{
|
||||
for (size_t i = 0; i < num_dimensions; ++i)
|
||||
column = ColumnArray::create(column);
|
||||
return column;
|
||||
}
|
||||
|
||||
Array createEmptyArrayField(size_t num_dimensions)
|
||||
{
|
||||
if (num_dimensions == 0)
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot create array field with 0 dimensions");
|
||||
|
||||
Array array;
|
||||
Array * current_array = &array;
|
||||
for (size_t i = 1; i < num_dimensions; ++i)
|
||||
{
|
||||
current_array->push_back(Array());
|
||||
current_array = ¤t_array->back().get<Array &>();
|
||||
}
|
||||
|
||||
return array;
|
||||
}
|
||||
|
||||
DataTypePtr getDataTypeByColumn(const IColumn & column)
|
||||
{
|
||||
auto idx = column.getDataType();
|
||||
if (WhichDataType(idx).isSimple())
|
||||
return DataTypeFactory::instance().get(String(magic_enum::enum_name(idx)));
|
||||
|
||||
if (const auto * column_array = checkAndGetColumn<ColumnArray>(&column))
|
||||
return std::make_shared<DataTypeArray>(getDataTypeByColumn(column_array->getData()));
|
||||
|
||||
if (const auto * column_nullable = checkAndGetColumn<ColumnNullable>(&column))
|
||||
return makeNullable(getDataTypeByColumn(column_nullable->getNestedColumn()));
|
||||
|
||||
/// TODO: add more types.
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get data type of column {}", column.getFamilyName());
|
||||
}
|
||||
|
||||
template <size_t I, typename Tuple>
|
||||
static auto extractVector(const std::vector<Tuple> & vec)
|
||||
{
|
||||
static_assert(I < std::tuple_size_v<Tuple>);
|
||||
std::vector<std::tuple_element_t<I, Tuple>> res;
|
||||
res.reserve(vec.size());
|
||||
for (const auto & elem : vec)
|
||||
res.emplace_back(std::get<I>(elem));
|
||||
return res;
|
||||
}
|
||||
|
||||
void convertObjectsToTuples(NamesAndTypesList & columns_list, Block & block, const NamesAndTypesList & extended_storage_columns)
|
||||
{
|
||||
std::unordered_map<String, DataTypePtr> storage_columns_map;
|
||||
for (const auto & [name, type] : extended_storage_columns)
|
||||
storage_columns_map[name] = type;
|
||||
|
||||
for (auto & name_type : columns_list)
|
||||
{
|
||||
if (!isObject(name_type.type))
|
||||
continue;
|
||||
|
||||
auto & column = block.getByName(name_type.name);
|
||||
if (!isObject(column.type))
|
||||
throw Exception(ErrorCodes::TYPE_MISMATCH,
|
||||
"Type for column '{}' mismatch in columns list and in block. In list: {}, in block: {}",
|
||||
name_type.name, name_type.type->getName(), column.type->getName());
|
||||
|
||||
const auto & column_object = assert_cast<const ColumnObject &>(*column.column);
|
||||
const auto & subcolumns = column_object.getSubcolumns();
|
||||
|
||||
if (!column_object.isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR,
|
||||
"Cannot convert to tuple column '{}' from type {}. Column should be finalized first",
|
||||
name_type.name, name_type.type->getName());
|
||||
|
||||
PathsInData tuple_paths;
|
||||
DataTypes tuple_types;
|
||||
Columns tuple_columns;
|
||||
|
||||
for (const auto & entry : subcolumns)
|
||||
{
|
||||
tuple_paths.emplace_back(entry->path);
|
||||
tuple_types.emplace_back(entry->data.getLeastCommonType());
|
||||
tuple_columns.emplace_back(entry->data.getFinalizedColumnPtr());
|
||||
}
|
||||
|
||||
auto it = storage_columns_map.find(name_type.name);
|
||||
if (it == storage_columns_map.end())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Column '{}' not found in storage", name_type.name);
|
||||
|
||||
std::tie(column.column, column.type) = unflattenTuple(tuple_paths, tuple_types, tuple_columns);
|
||||
name_type.type = column.type;
|
||||
|
||||
/// Check that constructed Tuple type and type in storage are compatible.
|
||||
getLeastCommonTypeForObject({column.type, it->second}, true);
|
||||
}
|
||||
}
|
||||
|
||||
static bool isPrefix(const PathInData::Parts & prefix, const PathInData::Parts & parts)
|
||||
{
|
||||
if (prefix.size() > parts.size())
|
||||
return false;
|
||||
|
||||
for (size_t i = 0; i < prefix.size(); ++i)
|
||||
if (prefix[i].key != parts[i].key)
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
void checkObjectHasNoAmbiguosPaths(const PathsInData & paths)
|
||||
{
|
||||
size_t size = paths.size();
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
{
|
||||
for (size_t j = 0; j < i; ++j)
|
||||
{
|
||||
if (isPrefix(paths[i].getParts(), paths[j].getParts())
|
||||
|| isPrefix(paths[j].getParts(), paths[i].getParts()))
|
||||
throw Exception(ErrorCodes::DUPLICATE_COLUMN,
|
||||
"Data in Object has ambiguous paths: '{}' and '{}'",
|
||||
paths[i].getPath(), paths[j].getPath());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
DataTypePtr getLeastCommonTypeForObject(const DataTypes & types, bool check_ambiguos_paths)
|
||||
{
|
||||
if (types.empty())
|
||||
return nullptr;
|
||||
|
||||
bool all_equal = true;
|
||||
for (size_t i = 1; i < types.size(); ++i)
|
||||
{
|
||||
if (!types[i]->equals(*types[0]))
|
||||
{
|
||||
all_equal = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (all_equal)
|
||||
return types[0];
|
||||
|
||||
/// Types of subcolumns by path from all tuples.
|
||||
std::unordered_map<PathInData, DataTypes, PathInData::Hash> subcolumns_types;
|
||||
|
||||
/// First we flatten tuples, then get common type for paths
|
||||
/// and finally unflatten paths and create new tuple type.
|
||||
for (const auto & type : types)
|
||||
{
|
||||
const auto * type_tuple = typeid_cast<const DataTypeTuple *>(type.get());
|
||||
if (!type_tuple)
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR,
|
||||
"Least common type for object can be deduced only from tuples, but {} given", type->getName());
|
||||
|
||||
auto [tuple_paths, tuple_types] = flattenTuple(type);
|
||||
assert(tuple_paths.size() == tuple_types.size());
|
||||
|
||||
for (size_t i = 0; i < tuple_paths.size(); ++i)
|
||||
subcolumns_types[tuple_paths[i]].push_back(tuple_types[i]);
|
||||
}
|
||||
|
||||
PathsInData tuple_paths;
|
||||
DataTypes tuple_types;
|
||||
|
||||
/// Get the least common type for all paths.
|
||||
for (const auto & [key, subtypes] : subcolumns_types)
|
||||
{
|
||||
assert(!subtypes.empty());
|
||||
if (key.getPath() == ColumnObject::COLUMN_NAME_DUMMY)
|
||||
continue;
|
||||
|
||||
size_t first_dim = getNumberOfDimensions(*subtypes[0]);
|
||||
for (size_t i = 1; i < subtypes.size(); ++i)
|
||||
if (first_dim != getNumberOfDimensions(*subtypes[i]))
|
||||
throw Exception(ErrorCodes::TYPE_MISMATCH,
|
||||
"Uncompatible types of subcolumn '{}': {} and {}",
|
||||
key.getPath(), subtypes[0]->getName(), subtypes[i]->getName());
|
||||
|
||||
tuple_paths.emplace_back(key);
|
||||
tuple_types.emplace_back(getLeastSupertype(subtypes, /*allow_conversion_to_string=*/ true));
|
||||
}
|
||||
|
||||
if (tuple_paths.empty())
|
||||
{
|
||||
tuple_paths.emplace_back(ColumnObject::COLUMN_NAME_DUMMY);
|
||||
tuple_types.emplace_back(std::make_shared<DataTypeUInt8>());
|
||||
}
|
||||
|
||||
if (check_ambiguos_paths)
|
||||
checkObjectHasNoAmbiguosPaths(tuple_paths);
|
||||
|
||||
return unflattenTuple(tuple_paths, tuple_types);
|
||||
}
|
||||
|
||||
NameSet getNamesOfObjectColumns(const NamesAndTypesList & columns_list)
|
||||
{
|
||||
NameSet res;
|
||||
for (const auto & [name, type] : columns_list)
|
||||
if (isObject(type))
|
||||
res.insert(name);
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
bool hasObjectColumns(const ColumnsDescription & columns)
|
||||
{
|
||||
return std::any_of(columns.begin(), columns.end(), [](const auto & column) { return isObject(column.type); });
|
||||
}
|
||||
|
||||
void extendObjectColumns(NamesAndTypesList & columns_list, const ColumnsDescription & object_columns, bool with_subcolumns)
|
||||
{
|
||||
NamesAndTypesList subcolumns_list;
|
||||
for (auto & column : columns_list)
|
||||
{
|
||||
auto object_column = object_columns.tryGetColumn(GetColumnsOptions::All, column.name);
|
||||
if (object_column)
|
||||
{
|
||||
column.type = object_column->type;
|
||||
|
||||
if (with_subcolumns)
|
||||
subcolumns_list.splice(subcolumns_list.end(), object_columns.getSubcolumns(column.name));
|
||||
}
|
||||
}
|
||||
|
||||
columns_list.splice(columns_list.end(), std::move(subcolumns_list));
|
||||
}
|
||||
|
||||
void updateObjectColumns(ColumnsDescription & object_columns, const NamesAndTypesList & new_columns)
|
||||
{
|
||||
for (const auto & new_column : new_columns)
|
||||
{
|
||||
auto object_column = object_columns.tryGetColumn(GetColumnsOptions::All, new_column.name);
|
||||
if (object_column && !object_column->type->equals(*new_column.type))
|
||||
{
|
||||
object_columns.modify(new_column.name, [&](auto & column)
|
||||
{
|
||||
column.type = getLeastCommonTypeForObject({object_column->type, new_column.type});
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
void flattenTupleImpl(
|
||||
PathInDataBuilder & builder,
|
||||
DataTypePtr type,
|
||||
std::vector<PathInData::Parts> & new_paths,
|
||||
DataTypes & new_types)
|
||||
{
|
||||
if (const auto * type_tuple = typeid_cast<const DataTypeTuple *>(type.get()))
|
||||
{
|
||||
const auto & tuple_names = type_tuple->getElementNames();
|
||||
const auto & tuple_types = type_tuple->getElements();
|
||||
|
||||
for (size_t i = 0; i < tuple_names.size(); ++i)
|
||||
{
|
||||
builder.append(tuple_names[i], false);
|
||||
flattenTupleImpl(builder, tuple_types[i], new_paths, new_types);
|
||||
builder.popBack();
|
||||
}
|
||||
}
|
||||
else if (const auto * type_array = typeid_cast<const DataTypeArray *>(type.get()))
|
||||
{
|
||||
PathInDataBuilder element_builder;
|
||||
std::vector<PathInData::Parts> element_paths;
|
||||
DataTypes element_types;
|
||||
|
||||
flattenTupleImpl(element_builder, type_array->getNestedType(), element_paths, element_types);
|
||||
assert(element_paths.size() == element_types.size());
|
||||
|
||||
for (size_t i = 0; i < element_paths.size(); ++i)
|
||||
{
|
||||
builder.append(element_paths[i], true);
|
||||
new_paths.emplace_back(builder.getParts());
|
||||
new_types.emplace_back(std::make_shared<DataTypeArray>(element_types[i]));
|
||||
builder.popBack(element_paths[i].size());
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
new_paths.emplace_back(builder.getParts());
|
||||
new_types.emplace_back(type);
|
||||
}
|
||||
}
|
||||
|
||||
/// @offsets_columns are used as stack of array offsets and allows to recreate Array columns.
|
||||
void flattenTupleImpl(const ColumnPtr & column, Columns & new_columns, Columns & offsets_columns)
|
||||
{
|
||||
if (const auto * column_tuple = checkAndGetColumn<ColumnTuple>(column.get()))
|
||||
{
|
||||
const auto & subcolumns = column_tuple->getColumns();
|
||||
for (const auto & subcolumn : subcolumns)
|
||||
flattenTupleImpl(subcolumn, new_columns, offsets_columns);
|
||||
}
|
||||
else if (const auto * column_array = checkAndGetColumn<ColumnArray>(column.get()))
|
||||
{
|
||||
offsets_columns.push_back(column_array->getOffsetsPtr());
|
||||
flattenTupleImpl(column_array->getDataPtr(), new_columns, offsets_columns);
|
||||
offsets_columns.pop_back();
|
||||
}
|
||||
else
|
||||
{
|
||||
if (!offsets_columns.empty())
|
||||
{
|
||||
auto new_column = ColumnArray::create(column, offsets_columns.back());
|
||||
for (auto it = offsets_columns.rbegin() + 1; it != offsets_columns.rend(); ++it)
|
||||
new_column = ColumnArray::create(new_column, *it);
|
||||
|
||||
new_columns.push_back(std::move(new_column));
|
||||
}
|
||||
else
|
||||
{
|
||||
new_columns.push_back(column);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
DataTypePtr reduceNumberOfDimensions(DataTypePtr type, size_t dimensions_to_reduce)
|
||||
{
|
||||
while (dimensions_to_reduce--)
|
||||
{
|
||||
const auto * type_array = typeid_cast<const DataTypeArray *>(type.get());
|
||||
if (!type_array)
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Not enough dimensions to reduce");
|
||||
|
||||
type = type_array->getNestedType();
|
||||
}
|
||||
|
||||
return type;
|
||||
}
|
||||
|
||||
ColumnPtr reduceNumberOfDimensions(ColumnPtr column, size_t dimensions_to_reduce)
|
||||
{
|
||||
while (dimensions_to_reduce--)
|
||||
{
|
||||
const auto * column_array = typeid_cast<const ColumnArray *>(column.get());
|
||||
if (!column_array)
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Not enough dimensions to reduce");
|
||||
|
||||
column = column_array->getDataPtr();
|
||||
}
|
||||
|
||||
return column;
|
||||
}
|
||||
|
||||
/// We save intermediate column, type and number of array
|
||||
/// dimensions for each intermediate node in path in subcolumns tree.
|
||||
struct ColumnWithTypeAndDimensions
|
||||
{
|
||||
ColumnPtr column;
|
||||
DataTypePtr type;
|
||||
size_t array_dimensions;
|
||||
};
|
||||
|
||||
using SubcolumnsTreeWithColumns = SubcolumnsTree<ColumnWithTypeAndDimensions>;
|
||||
using Node = SubcolumnsTreeWithColumns::Node;
|
||||
|
||||
/// Creates data type and column from tree of subcolumns.
|
||||
ColumnWithTypeAndDimensions createTypeFromNode(const Node * node)
|
||||
{
|
||||
auto collect_tuple_elemets = [](const auto & children)
|
||||
{
|
||||
std::vector<std::tuple<String, ColumnWithTypeAndDimensions>> tuple_elements;
|
||||
tuple_elements.reserve(children.size());
|
||||
for (const auto & [name, child] : children)
|
||||
{
|
||||
auto column = createTypeFromNode(child.get());
|
||||
tuple_elements.emplace_back(name, std::move(column));
|
||||
}
|
||||
|
||||
/// Sort to always create the same type for the same set of subcolumns.
|
||||
std::sort(tuple_elements.begin(), tuple_elements.end(),
|
||||
[](const auto & lhs, const auto & rhs) { return std::get<0>(lhs) < std::get<0>(rhs); });
|
||||
|
||||
auto tuple_names = extractVector<0>(tuple_elements);
|
||||
auto tuple_columns = extractVector<1>(tuple_elements);
|
||||
|
||||
return std::make_tuple(std::move(tuple_names), std::move(tuple_columns));
|
||||
};
|
||||
|
||||
if (node->kind == Node::SCALAR)
|
||||
{
|
||||
return node->data;
|
||||
}
|
||||
else if (node->kind == Node::NESTED)
|
||||
{
|
||||
auto [tuple_names, tuple_columns] = collect_tuple_elemets(node->children);
|
||||
|
||||
Columns offsets_columns;
|
||||
offsets_columns.reserve(tuple_columns[0].array_dimensions + 1);
|
||||
|
||||
/// If we have a Nested node and child node with anonymous array levels
|
||||
/// we need to push a Nested type through all array levels.
|
||||
/// Example: { "k1": [[{"k2": 1, "k3": 2}] } should be parsed as
|
||||
/// `k1 Array(Nested(k2 Int, k3 Int))` and k1 is marked as Nested
|
||||
/// and `k2` and `k3` has anonymous_array_level = 1 in that case.
|
||||
|
||||
const auto & current_array = assert_cast<const ColumnArray &>(*node->data.column);
|
||||
offsets_columns.push_back(current_array.getOffsetsPtr());
|
||||
|
||||
auto first_column = tuple_columns[0].column;
|
||||
for (size_t i = 0; i < tuple_columns[0].array_dimensions; ++i)
|
||||
{
|
||||
const auto & column_array = assert_cast<const ColumnArray &>(*first_column);
|
||||
offsets_columns.push_back(column_array.getOffsetsPtr());
|
||||
first_column = column_array.getDataPtr();
|
||||
}
|
||||
|
||||
size_t num_elements = tuple_columns.size();
|
||||
Columns tuple_elements_columns(num_elements);
|
||||
DataTypes tuple_elements_types(num_elements);
|
||||
|
||||
/// Reduce extra array dimensions to get columns and types of Nested elements.
|
||||
for (size_t i = 0; i < num_elements; ++i)
|
||||
{
|
||||
assert(tuple_columns[i].array_dimensions == tuple_columns[0].array_dimensions);
|
||||
tuple_elements_columns[i] = reduceNumberOfDimensions(tuple_columns[i].column, tuple_columns[i].array_dimensions);
|
||||
tuple_elements_types[i] = reduceNumberOfDimensions(tuple_columns[i].type, tuple_columns[i].array_dimensions);
|
||||
}
|
||||
|
||||
auto result_column = ColumnArray::create(ColumnTuple::create(tuple_elements_columns), offsets_columns.back());
|
||||
auto result_type = createNested(tuple_elements_types, tuple_names);
|
||||
|
||||
/// Recreate result Array type and Array column.
|
||||
for (auto it = offsets_columns.rbegin() + 1; it != offsets_columns.rend(); ++it)
|
||||
{
|
||||
result_column = ColumnArray::create(result_column, *it);
|
||||
result_type = std::make_shared<DataTypeArray>(result_type);
|
||||
}
|
||||
|
||||
return {result_column, result_type, tuple_columns[0].array_dimensions};
|
||||
}
|
||||
else
|
||||
{
|
||||
auto [tuple_names, tuple_columns] = collect_tuple_elemets(node->children);
|
||||
|
||||
size_t num_elements = tuple_columns.size();
|
||||
Columns tuple_elements_columns(num_elements);
|
||||
DataTypes tuple_elements_types(num_elements);
|
||||
|
||||
for (size_t i = 0; i < tuple_columns.size(); ++i)
|
||||
{
|
||||
assert(tuple_columns[i].array_dimensions == tuple_columns[0].array_dimensions);
|
||||
tuple_elements_columns[i] = tuple_columns[i].column;
|
||||
tuple_elements_types[i] = tuple_columns[i].type;
|
||||
}
|
||||
|
||||
auto result_column = ColumnTuple::create(tuple_elements_columns);
|
||||
auto result_type = std::make_shared<DataTypeTuple>(tuple_elements_types, tuple_names);
|
||||
|
||||
return {result_column, result_type, tuple_columns[0].array_dimensions};
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
std::pair<PathsInData, DataTypes> flattenTuple(const DataTypePtr & type)
|
||||
{
|
||||
std::vector<PathInData::Parts> new_path_parts;
|
||||
DataTypes new_types;
|
||||
PathInDataBuilder builder;
|
||||
|
||||
flattenTupleImpl(builder, type, new_path_parts, new_types);
|
||||
|
||||
PathsInData new_paths(new_path_parts.begin(), new_path_parts.end());
|
||||
return {new_paths, new_types};
|
||||
}
|
||||
|
||||
ColumnPtr flattenTuple(const ColumnPtr & column)
|
||||
{
|
||||
Columns new_columns;
|
||||
Columns offsets_columns;
|
||||
|
||||
flattenTupleImpl(column, new_columns, offsets_columns);
|
||||
return ColumnTuple::create(new_columns);
|
||||
}
|
||||
|
||||
DataTypePtr unflattenTuple(const PathsInData & paths, const DataTypes & tuple_types)
|
||||
{
|
||||
assert(paths.size() == tuple_types.size());
|
||||
Columns tuple_columns;
|
||||
tuple_columns.reserve(tuple_types.size());
|
||||
for (const auto & type : tuple_types)
|
||||
tuple_columns.emplace_back(type->createColumn());
|
||||
|
||||
return unflattenTuple(paths, tuple_types, tuple_columns).second;
|
||||
}
|
||||
|
||||
std::pair<ColumnPtr, DataTypePtr> unflattenTuple(
|
||||
const PathsInData & paths,
|
||||
const DataTypes & tuple_types,
|
||||
const Columns & tuple_columns)
|
||||
{
|
||||
assert(paths.size() == tuple_types.size());
|
||||
assert(paths.size() == tuple_columns.size());
|
||||
|
||||
/// We add all paths to the subcolumn tree and then create a type from it.
|
||||
/// The tree stores column, type and number of array dimensions
|
||||
/// for each intermediate node.
|
||||
SubcolumnsTreeWithColumns tree;
|
||||
|
||||
for (size_t i = 0; i < paths.size(); ++i)
|
||||
{
|
||||
auto column = tuple_columns[i];
|
||||
auto type = tuple_types[i];
|
||||
|
||||
const auto & parts = paths[i].getParts();
|
||||
size_t num_parts = parts.size();
|
||||
|
||||
size_t pos = 0;
|
||||
tree.add(paths[i], [&](Node::Kind kind, bool exists) -> std::shared_ptr<Node>
|
||||
{
|
||||
if (pos >= num_parts)
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR,
|
||||
"Not enough name parts for path {}. Expected at least {}, got {}",
|
||||
paths[i].getPath(), pos + 1, num_parts);
|
||||
|
||||
size_t array_dimensions = kind == Node::NESTED ? 1 : parts[pos].anonymous_array_level;
|
||||
ColumnWithTypeAndDimensions current_column{column, type, array_dimensions};
|
||||
|
||||
/// Get type and column for next node.
|
||||
if (array_dimensions)
|
||||
{
|
||||
type = reduceNumberOfDimensions(type, array_dimensions);
|
||||
column = reduceNumberOfDimensions(column, array_dimensions);
|
||||
}
|
||||
|
||||
++pos;
|
||||
if (exists)
|
||||
return nullptr;
|
||||
|
||||
return kind == Node::SCALAR
|
||||
? std::make_shared<Node>(kind, current_column, paths[i])
|
||||
: std::make_shared<Node>(kind, current_column);
|
||||
});
|
||||
}
|
||||
|
||||
auto [column, type, _] = createTypeFromNode(tree.getRoot());
|
||||
return std::make_pair(std::move(column), std::move(type));
|
||||
}
|
||||
|
||||
static void addConstantToWithClause(const ASTPtr & query, const String & column_name, const DataTypePtr & data_type)
|
||||
{
|
||||
auto & select = query->as<ASTSelectQuery &>();
|
||||
if (!select.with())
|
||||
select.setExpression(ASTSelectQuery::Expression::WITH, std::make_shared<ASTExpressionList>());
|
||||
|
||||
/// TODO: avoid materialize
|
||||
auto node = makeASTFunction("materialize",
|
||||
makeASTFunction("CAST",
|
||||
std::make_shared<ASTLiteral>(data_type->getDefault()),
|
||||
std::make_shared<ASTLiteral>(data_type->getName())));
|
||||
|
||||
node->alias = column_name;
|
||||
node->prefer_alias_to_column_name = true;
|
||||
select.with()->children.push_back(std::move(node));
|
||||
}
|
||||
|
||||
/// @expected_columns and @available_columns contain descriptions
|
||||
/// of extended Object columns.
|
||||
void replaceMissedSubcolumnsByConstants(
|
||||
const ColumnsDescription & expected_columns,
|
||||
const ColumnsDescription & available_columns,
|
||||
ASTPtr query)
|
||||
{
|
||||
NamesAndTypes missed_names_types;
|
||||
|
||||
/// Find all subcolumns that are in @expected_columns, but not in @available_columns.
|
||||
for (const auto & column : available_columns)
|
||||
{
|
||||
auto expected_column = expected_columns.getColumn(GetColumnsOptions::All, column.name);
|
||||
|
||||
/// Extract all paths from both descriptions to easily check existence of subcolumns.
|
||||
auto [available_paths, available_types] = flattenTuple(column.type);
|
||||
auto [expected_paths, expected_types] = flattenTuple(expected_column.type);
|
||||
|
||||
auto extract_names_and_types = [&column](const auto & paths, const auto & types)
|
||||
{
|
||||
NamesAndTypes res;
|
||||
res.reserve(paths.size());
|
||||
for (size_t i = 0; i < paths.size(); ++i)
|
||||
{
|
||||
auto full_name = Nested::concatenateName(column.name, paths[i].getPath());
|
||||
res.emplace_back(full_name, types[i]);
|
||||
}
|
||||
|
||||
std::sort(res.begin(), res.end());
|
||||
return res;
|
||||
};
|
||||
|
||||
auto available_names_types = extract_names_and_types(available_paths, available_types);
|
||||
auto expected_names_types = extract_names_and_types(expected_paths, expected_types);
|
||||
|
||||
std::set_difference(
|
||||
expected_names_types.begin(), expected_names_types.end(),
|
||||
available_names_types.begin(), available_names_types.end(),
|
||||
std::back_inserter(missed_names_types),
|
||||
[](const auto & lhs, const auto & rhs) { return lhs.name < rhs.name; });
|
||||
}
|
||||
|
||||
if (missed_names_types.empty())
|
||||
return;
|
||||
|
||||
IdentifierNameSet identifiers;
|
||||
query->collectIdentifierNames(identifiers);
|
||||
|
||||
/// Replace missed subcolumns to default literals of theirs type.
|
||||
for (const auto & [name, type] : missed_names_types)
|
||||
if (identifiers.count(name))
|
||||
addConstantToWithClause(query, name, type);
|
||||
}
|
||||
|
||||
void finalizeObjectColumns(MutableColumns & columns)
|
||||
{
|
||||
for (auto & column : columns)
|
||||
if (auto * column_object = typeid_cast<ColumnObject *>(column.get()))
|
||||
column_object->finalize();
|
||||
}
|
||||
|
||||
}
|
140
src/DataTypes/ObjectUtils.h
Normal file
140
src/DataTypes/ObjectUtils.h
Normal file
@ -0,0 +1,140 @@
|
||||
#pragma once
|
||||
|
||||
#include <Core/Block.h>
|
||||
#include <Core/NamesAndTypes.h>
|
||||
#include <Common/FieldVisitors.h>
|
||||
#include <Storages/ColumnsDescription.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/Serializations/JSONDataParser.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <Columns/ColumnObject.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Returns number of dimensions in Array type. 0 if type is not array.
|
||||
size_t getNumberOfDimensions(const IDataType & type);
|
||||
|
||||
/// Returns number of dimensions in Array column. 0 if column is not array.
|
||||
size_t getNumberOfDimensions(const IColumn & column);
|
||||
|
||||
/// Returns type of scalars of Array of arbitrary dimensions.
|
||||
DataTypePtr getBaseTypeOfArray(const DataTypePtr & type);
|
||||
|
||||
/// Returns Array type with requested scalar type and number of dimensions.
|
||||
DataTypePtr createArrayOfType(DataTypePtr type, size_t num_dimensions);
|
||||
|
||||
/// Returns column of scalars of Array of arbitrary dimensions.
|
||||
ColumnPtr getBaseColumnOfArray(const ColumnPtr & column);
|
||||
|
||||
/// Returns empty Array column with requested scalar column and number of dimensions.
|
||||
ColumnPtr createArrayOfColumn(const ColumnPtr & column, size_t num_dimensions);
|
||||
|
||||
/// Returns Array with requested number of dimensions and no scalars.
|
||||
Array createEmptyArrayField(size_t num_dimensions);
|
||||
|
||||
/// Tries to get data type by column. Only limited subset of types is supported
|
||||
DataTypePtr getDataTypeByColumn(const IColumn & column);
|
||||
|
||||
/// Converts Object types and columns to Tuples in @columns_list and @block
|
||||
/// and checks that types are consistent with types in @extended_storage_columns.
|
||||
void convertObjectsToTuples(NamesAndTypesList & columns_list, Block & block, const NamesAndTypesList & extended_storage_columns);
|
||||
|
||||
/// Checks that each path is not the prefix of any other path.
|
||||
void checkObjectHasNoAmbiguosPaths(const PathsInData & paths);
|
||||
|
||||
/// Receives several Tuple types and deduces the least common type among them.
|
||||
DataTypePtr getLeastCommonTypeForObject(const DataTypes & types, bool check_ambiguos_paths = false);
|
||||
|
||||
/// Converts types of object columns to tuples in @columns_list
|
||||
/// according to @object_columns and adds all tuple's subcolumns if needed.
|
||||
void extendObjectColumns(NamesAndTypesList & columns_list, const ColumnsDescription & object_columns, bool with_subcolumns);
|
||||
|
||||
NameSet getNamesOfObjectColumns(const NamesAndTypesList & columns_list);
|
||||
bool hasObjectColumns(const ColumnsDescription & columns);
|
||||
void finalizeObjectColumns(MutableColumns & columns);
|
||||
|
||||
/// Updates types of objects in @object_columns inplace
|
||||
/// according to types in new_columns.
|
||||
void updateObjectColumns(ColumnsDescription & object_columns, const NamesAndTypesList & new_columns);
|
||||
|
||||
using DataTypeTuplePtr = std::shared_ptr<DataTypeTuple>;
|
||||
|
||||
/// Flattens nested Tuple to plain Tuple. I.e extracts all paths and types from tuple.
|
||||
/// E.g. Tuple(t Tuple(c1 UInt32, c2 String), c3 UInt64) -> Tuple(t.c1 UInt32, t.c2 String, c3 UInt32)
|
||||
std::pair<PathsInData, DataTypes> flattenTuple(const DataTypePtr & type);
|
||||
|
||||
/// Flattens nested Tuple column to plain Tuple column.
|
||||
ColumnPtr flattenTuple(const ColumnPtr & column);
|
||||
|
||||
/// The reverse operation to 'flattenTuple'.
|
||||
/// Creates nested Tuple from all paths and types.
|
||||
/// E.g. Tuple(t.c1 UInt32, t.c2 String, c3 UInt32) -> Tuple(t Tuple(c1 UInt32, c2 String), c3 UInt64)
|
||||
DataTypePtr unflattenTuple(
|
||||
const PathsInData & paths,
|
||||
const DataTypes & tuple_types);
|
||||
|
||||
std::pair<ColumnPtr, DataTypePtr> unflattenTuple(
|
||||
const PathsInData & paths,
|
||||
const DataTypes & tuple_types,
|
||||
const Columns & tuple_columns);
|
||||
|
||||
/// For all columns which exist in @expected_columns and
|
||||
/// don't exist in @available_columns adds to WITH clause
|
||||
/// an alias with column name to literal of default value of column type.
|
||||
void replaceMissedSubcolumnsByConstants(
|
||||
const ColumnsDescription & expected_columns,
|
||||
const ColumnsDescription & available_columns,
|
||||
ASTPtr query);
|
||||
|
||||
/// Receives range of objects, which contains collections
|
||||
/// of columns-like objects (e.g. ColumnsDescription or NamesAndTypesList)
|
||||
/// and deduces the common types of object columns for all entries.
|
||||
/// @entry_columns_getter should extract reference to collection of
|
||||
/// columns-like objects from entry to which Iterator points.
|
||||
/// columns-like object should have fields "name" and "type".
|
||||
template <typename Iterator, typename EntryColumnsGetter>
|
||||
ColumnsDescription getObjectColumns(
|
||||
Iterator begin, Iterator end,
|
||||
const ColumnsDescription & storage_columns,
|
||||
EntryColumnsGetter && entry_columns_getter)
|
||||
{
|
||||
ColumnsDescription res;
|
||||
|
||||
if (begin == end)
|
||||
{
|
||||
for (const auto & column : storage_columns)
|
||||
{
|
||||
if (isObject(column.type))
|
||||
{
|
||||
auto tuple_type = std::make_shared<DataTypeTuple>(
|
||||
DataTypes{std::make_shared<DataTypeUInt8>()},
|
||||
Names{ColumnObject::COLUMN_NAME_DUMMY});
|
||||
|
||||
res.add({column.name, std::move(tuple_type)});
|
||||
}
|
||||
}
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
std::unordered_map<String, DataTypes> types_in_entries;
|
||||
|
||||
for (auto it = begin; it != end; ++it)
|
||||
{
|
||||
const auto & entry_columns = entry_columns_getter(*it);
|
||||
for (const auto & column : entry_columns)
|
||||
{
|
||||
auto storage_column = storage_columns.tryGetPhysical(column.name);
|
||||
if (storage_column && isObject(storage_column->type))
|
||||
types_in_entries[column.name].push_back(column.type);
|
||||
}
|
||||
}
|
||||
|
||||
for (const auto & [name, types] : types_in_entries)
|
||||
res.add({name, getLeastCommonTypeForObject(types)});
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
}
|
3
src/DataTypes/Serializations/CMakeLists.txt
Normal file
3
src/DataTypes/Serializations/CMakeLists.txt
Normal file
@ -0,0 +1,3 @@
|
||||
if (ENABLE_TESTS)
|
||||
add_subdirectory (tests)
|
||||
endif ()
|
@ -172,6 +172,10 @@ String getNameForSubstreamPath(
|
||||
else
|
||||
stream_name += "." + it->tuple_element_name;
|
||||
}
|
||||
else if (it->type == Substream::ObjectElement)
|
||||
{
|
||||
stream_name += escapeForFileName(".") + escapeForFileName(it->object_key_name);
|
||||
}
|
||||
}
|
||||
|
||||
return stream_name;
|
||||
|
@ -125,6 +125,9 @@ public:
|
||||
SparseElements,
|
||||
SparseOffsets,
|
||||
|
||||
ObjectStructure,
|
||||
ObjectElement,
|
||||
|
||||
Regular,
|
||||
};
|
||||
|
||||
@ -133,6 +136,9 @@ public:
|
||||
/// Index of tuple element, starting at 1 or name.
|
||||
String tuple_element_name;
|
||||
|
||||
/// Name of subcolumn of object column.
|
||||
String object_key_name;
|
||||
|
||||
/// Do we need to escape a dot in filenames for tuple elements.
|
||||
bool escape_tuple_delimiter = true;
|
||||
|
||||
|
183
src/DataTypes/Serializations/JSONDataParser.h
Normal file
183
src/DataTypes/Serializations/JSONDataParser.h
Normal file
@ -0,0 +1,183 @@
|
||||
#pragma once
|
||||
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <Common/HashTable/HashMap.h>
|
||||
#include <Common/checkStackSize.h>
|
||||
#include <DataTypes/Serializations/PathInData.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
class ReadBuffer;
|
||||
class WriteBuffer;
|
||||
|
||||
template <typename Element>
|
||||
static Field getValueAsField(const Element & element)
|
||||
{
|
||||
if (element.isBool()) return element.getBool();
|
||||
if (element.isInt64()) return element.getInt64();
|
||||
if (element.isUInt64()) return element.getUInt64();
|
||||
if (element.isDouble()) return element.getDouble();
|
||||
if (element.isString()) return element.getString();
|
||||
if (element.isNull()) return Field();
|
||||
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unsupported type of JSON field");
|
||||
}
|
||||
|
||||
template <typename ParserImpl>
|
||||
class JSONDataParser
|
||||
{
|
||||
public:
|
||||
using Element = typename ParserImpl::Element;
|
||||
|
||||
static void readJSON(String & s, ReadBuffer & buf)
|
||||
{
|
||||
readJSONObjectPossiblyInvalid(s, buf);
|
||||
}
|
||||
|
||||
std::optional<ParseResult> parse(const char * begin, size_t length)
|
||||
{
|
||||
std::string_view json{begin, length};
|
||||
Element document;
|
||||
if (!parser.parse(json, document))
|
||||
return {};
|
||||
|
||||
ParseResult result;
|
||||
PathInDataBuilder builder;
|
||||
std::vector<PathInData::Parts> paths;
|
||||
|
||||
traverse(document, builder, paths, result.values);
|
||||
|
||||
result.paths.reserve(paths.size());
|
||||
for (auto && path : paths)
|
||||
result.paths.emplace_back(std::move(path));
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
private:
|
||||
void traverse(
|
||||
const Element & element,
|
||||
PathInDataBuilder & builder,
|
||||
std::vector<PathInData::Parts> & paths,
|
||||
std::vector<Field> & values)
|
||||
{
|
||||
checkStackSize();
|
||||
|
||||
if (element.isObject())
|
||||
{
|
||||
auto object = element.getObject();
|
||||
|
||||
paths.reserve(paths.size() + object.size());
|
||||
values.reserve(values.size() + object.size());
|
||||
|
||||
for (auto it = object.begin(); it != object.end(); ++it)
|
||||
{
|
||||
const auto & [key, value] = *it;
|
||||
traverse(value, builder.append(key, false), paths, values);
|
||||
builder.popBack();
|
||||
}
|
||||
}
|
||||
else if (element.isArray())
|
||||
{
|
||||
auto array = element.getArray();
|
||||
|
||||
using PathPartsWithArray = std::pair<PathInData::Parts, Array>;
|
||||
using PathToArray = HashMapWithStackMemory<UInt128, PathPartsWithArray, UInt128TrivialHash, 5>;
|
||||
|
||||
/// Traverse elements of array and collect an array
|
||||
/// of fields by each path.
|
||||
|
||||
PathToArray arrays_by_path;
|
||||
Arena strings_pool;
|
||||
|
||||
size_t current_size = 0;
|
||||
for (auto it = array.begin(); it != array.end(); ++it)
|
||||
{
|
||||
std::vector<PathInData::Parts> element_paths;
|
||||
std::vector<Field> element_values;
|
||||
PathInDataBuilder element_builder;
|
||||
|
||||
traverse(*it, element_builder, element_paths, element_values);
|
||||
size_t size = element_paths.size();
|
||||
size_t keys_to_update = arrays_by_path.size();
|
||||
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
{
|
||||
UInt128 hash = PathInData::getPartsHash(element_paths[i]);
|
||||
if (auto * found = arrays_by_path.find(hash))
|
||||
{
|
||||
auto & path_array = found->getMapped().second;
|
||||
|
||||
assert(path_array.size() == current_size);
|
||||
path_array.push_back(std::move(element_values[i]));
|
||||
--keys_to_update;
|
||||
}
|
||||
else
|
||||
{
|
||||
/// We found a new key. Add and empty array with current size.
|
||||
Array path_array;
|
||||
path_array.reserve(array.size());
|
||||
path_array.resize(current_size);
|
||||
path_array.push_back(std::move(element_values[i]));
|
||||
|
||||
auto & elem = arrays_by_path[hash];
|
||||
elem.first = std::move(element_paths[i]);
|
||||
elem.second = std::move(path_array);
|
||||
}
|
||||
}
|
||||
|
||||
/// If some of the keys are missed in current element,
|
||||
/// add default values for them.
|
||||
if (keys_to_update)
|
||||
{
|
||||
for (auto & [_, value] : arrays_by_path)
|
||||
{
|
||||
auto & path_array = value.second;
|
||||
assert(path_array.size() == current_size || path_array.size() == current_size + 1);
|
||||
if (path_array.size() == current_size)
|
||||
path_array.push_back(Field());
|
||||
}
|
||||
}
|
||||
|
||||
++current_size;
|
||||
}
|
||||
|
||||
if (arrays_by_path.empty())
|
||||
{
|
||||
paths.push_back(builder.getParts());
|
||||
values.push_back(Array());
|
||||
}
|
||||
else
|
||||
{
|
||||
paths.reserve(paths.size() + arrays_by_path.size());
|
||||
values.reserve(values.size() + arrays_by_path.size());
|
||||
|
||||
for (auto && [_, value] : arrays_by_path)
|
||||
{
|
||||
auto && [path, path_array] = value;
|
||||
|
||||
/// Merge prefix path and path of array element.
|
||||
paths.push_back(builder.append(path, true).getParts());
|
||||
values.push_back(std::move(path_array));
|
||||
|
||||
builder.popBack(path.size());
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
paths.push_back(builder.getParts());
|
||||
values.push_back(getValueAsField(element));
|
||||
}
|
||||
}
|
||||
|
||||
ParserImpl parser;
|
||||
};
|
||||
|
||||
}
|
198
src/DataTypes/Serializations/PathInData.cpp
Normal file
198
src/DataTypes/Serializations/PathInData.cpp
Normal file
@ -0,0 +1,198 @@
|
||||
#include <DataTypes/Serializations/PathInData.h>
|
||||
#include <DataTypes/NestedUtils.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Common/SipHash.h>
|
||||
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
|
||||
#include <boost/algorithm/string/split.hpp>
|
||||
#include <boost/algorithm/string.hpp>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
PathInData::PathInData(std::string_view path_)
|
||||
: path(path_)
|
||||
{
|
||||
const char * begin = path.data();
|
||||
const char * end = path.data() + path.size();
|
||||
|
||||
for (const char * it = path.data(); it != end; ++it)
|
||||
{
|
||||
if (*it == '.')
|
||||
{
|
||||
size_t size = static_cast<size_t>(it - begin);
|
||||
parts.emplace_back(std::string_view{begin, size}, false, 0);
|
||||
begin = it + 1;
|
||||
}
|
||||
}
|
||||
|
||||
size_t size = static_cast<size_t>(end - begin);
|
||||
parts.emplace_back(std::string_view{begin, size}, false, 0.);
|
||||
}
|
||||
|
||||
PathInData::PathInData(const Parts & parts_)
|
||||
{
|
||||
buildPath(parts_);
|
||||
buildParts(parts_);
|
||||
}
|
||||
|
||||
PathInData::PathInData(const PathInData & other)
|
||||
: path(other.path)
|
||||
{
|
||||
buildParts(other.getParts());
|
||||
}
|
||||
|
||||
PathInData & PathInData::operator=(const PathInData & other)
|
||||
{
|
||||
if (this != &other)
|
||||
{
|
||||
path = other.path;
|
||||
buildParts(other.parts);
|
||||
}
|
||||
return *this;
|
||||
}
|
||||
|
||||
UInt128 PathInData::getPartsHash(const Parts & parts_)
|
||||
{
|
||||
SipHash hash;
|
||||
hash.update(parts_.size());
|
||||
for (const auto & part : parts_)
|
||||
{
|
||||
hash.update(part.key.data(), part.key.length());
|
||||
hash.update(part.is_nested);
|
||||
hash.update(part.anonymous_array_level);
|
||||
}
|
||||
|
||||
UInt128 res;
|
||||
hash.get128(res);
|
||||
return res;
|
||||
}
|
||||
|
||||
void PathInData::writeBinary(WriteBuffer & out) const
|
||||
{
|
||||
writeVarUInt(parts.size(), out);
|
||||
for (const auto & part : parts)
|
||||
{
|
||||
writeStringBinary(part.key, out);
|
||||
writeIntBinary(part.is_nested, out);
|
||||
writeIntBinary(part.anonymous_array_level, out);
|
||||
}
|
||||
}
|
||||
|
||||
void PathInData::readBinary(ReadBuffer & in)
|
||||
{
|
||||
size_t num_parts;
|
||||
readVarUInt(num_parts, in);
|
||||
|
||||
Arena arena;
|
||||
Parts temp_parts;
|
||||
temp_parts.reserve(num_parts);
|
||||
|
||||
for (size_t i = 0; i < num_parts; ++i)
|
||||
{
|
||||
bool is_nested;
|
||||
UInt8 anonymous_array_level;
|
||||
|
||||
auto ref = readStringBinaryInto(arena, in);
|
||||
readIntBinary(is_nested, in);
|
||||
readIntBinary(anonymous_array_level, in);
|
||||
|
||||
temp_parts.emplace_back(static_cast<std::string_view>(ref), is_nested, anonymous_array_level);
|
||||
}
|
||||
|
||||
/// Recreate path and parts.
|
||||
buildPath(temp_parts);
|
||||
buildParts(temp_parts);
|
||||
}
|
||||
|
||||
void PathInData::buildPath(const Parts & other_parts)
|
||||
{
|
||||
if (other_parts.empty())
|
||||
return;
|
||||
|
||||
path.clear();
|
||||
auto it = other_parts.begin();
|
||||
path += it->key;
|
||||
++it;
|
||||
for (; it != other_parts.end(); ++it)
|
||||
{
|
||||
path += ".";
|
||||
path += it->key;
|
||||
}
|
||||
}
|
||||
|
||||
void PathInData::buildParts(const Parts & other_parts)
|
||||
{
|
||||
if (other_parts.empty())
|
||||
return;
|
||||
|
||||
parts.clear();
|
||||
parts.reserve(other_parts.size());
|
||||
const char * begin = path.data();
|
||||
for (const auto & part : other_parts)
|
||||
{
|
||||
has_nested |= part.is_nested;
|
||||
parts.emplace_back(std::string_view{begin, part.key.length()}, part.is_nested, part.anonymous_array_level);
|
||||
begin += part.key.length() + 1;
|
||||
}
|
||||
}
|
||||
|
||||
size_t PathInData::Hash::operator()(const PathInData & value) const
|
||||
{
|
||||
auto hash = getPartsHash(value.parts);
|
||||
return hash.items[0] ^ hash.items[1];
|
||||
}
|
||||
|
||||
PathInDataBuilder & PathInDataBuilder::append(std::string_view key, bool is_array)
|
||||
{
|
||||
if (parts.empty())
|
||||
current_anonymous_array_level += is_array;
|
||||
|
||||
if (!key.empty())
|
||||
{
|
||||
if (!parts.empty())
|
||||
parts.back().is_nested = is_array;
|
||||
|
||||
parts.emplace_back(key, false, current_anonymous_array_level);
|
||||
current_anonymous_array_level = 0;
|
||||
}
|
||||
|
||||
return *this;
|
||||
}
|
||||
|
||||
PathInDataBuilder & PathInDataBuilder::append(const PathInData::Parts & path, bool is_array)
|
||||
{
|
||||
if (parts.empty())
|
||||
current_anonymous_array_level += is_array;
|
||||
|
||||
if (!path.empty())
|
||||
{
|
||||
if (!parts.empty())
|
||||
parts.back().is_nested = is_array;
|
||||
|
||||
auto it = parts.insert(parts.end(), path.begin(), path.end());
|
||||
for (; it != parts.end(); ++it)
|
||||
it->anonymous_array_level += current_anonymous_array_level;
|
||||
current_anonymous_array_level = 0;
|
||||
}
|
||||
|
||||
return *this;
|
||||
}
|
||||
|
||||
void PathInDataBuilder::popBack()
|
||||
{
|
||||
parts.pop_back();
|
||||
}
|
||||
|
||||
void PathInDataBuilder::popBack(size_t n)
|
||||
{
|
||||
assert(n <= parts.size());
|
||||
parts.resize(parts.size() - n);
|
||||
}
|
||||
|
||||
}
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user