Merge branch 'master' into parallel-downloading-url-engine

This commit is contained in:
Antonio Andelic 2022-03-21 09:11:01 +00:00
commit b8c43ff2f2
208 changed files with 5212 additions and 2094 deletions

View File

@ -32,7 +32,7 @@ jobs:
uses: svenstaro/upload-release-action@v2
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: ${{runner.temp}}/release_packages/*
file: ${{runner.temp}}/push_to_artifactory/*
overwrite: true
tag: ${{ github.ref }}
file_glob: true

View File

@ -1,4 +1,138 @@
### ClickHouse release v22.2, 2022-02-17
### Table of Contents
**[ClickHouse release v22.3-lts, 2022-03-17](#223)**<br>
**[ClickHouse release v22.2, 2022-02-17](#222)**<br>
**[ClickHouse release v22.1, 2022-01-18](#221)**<br>
**[Changelog for 2021](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/whats-new/changelog/2021.md)**<br>
## <a id="223"></a> ClickHouse release v22.3-lts, 2022-03-17
#### Backward Incompatible Change
* Make `arrayCompact` function behave as other higher-order functions: perform compaction not of lambda function results but on the original array. If you're using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrapping `arrayCompact` arguments into `arrayMap`. Closes [#34010](https://github.com/ClickHouse/ClickHouse/issues/34010) [#18535](https://github.com/ClickHouse/ClickHouse/issues/18535) [#14778](https://github.com/ClickHouse/ClickHouse/issues/14778). [#34795](https://github.com/ClickHouse/ClickHouse/pull/34795) ([Alexandre Snarskii](https://github.com/snar)).
* Change implementation specific behavior on overflow of function `toDatetime`. It will be saturated to the nearest min/max supported instant of datetime instead of wraparound. This change is highlighted as "backward incompatible" because someone may unintentionally rely on the old behavior. [#32898](https://github.com/ClickHouse/ClickHouse/pull/32898) ([HaiBo Li](https://github.com/marising)).
#### New Feature
* Support for caching data locally for remote filesystems. It can be enabled for `s3` disks. Closes [#28961](https://github.com/ClickHouse/ClickHouse/issues/28961). [#33717](https://github.com/ClickHouse/ClickHouse/pull/33717) ([Kseniia Sumarokova](https://github.com/kssenii)). In the meantime, we enabled the test suite on s3 filesystem and no more known issues exist, so it is started to be production ready.
* Add new table function `hive`. It can be used as follows `hive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>')` for example `SELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', 'id Nullable(String), score Nullable(Int32), day Nullable(String)', 'day')`. [#34946](https://github.com/ClickHouse/ClickHouse/pull/34946) ([lgbo](https://github.com/lgbo-ustc)).
* Support authentication of users connected via SSL by their X.509 certificate. [#31484](https://github.com/ClickHouse/ClickHouse/pull/31484) ([eungenue](https://github.com/eungenue)).
* Support schema inference for inserting into table functions `file`/`hdfs`/`s3`/`url`. [#34732](https://github.com/ClickHouse/ClickHouse/pull/34732) ([Kruglov Pavel](https://github.com/Avogar)).
* Now you can read `system.zookeeper` table without restrictions on path or using `like` expression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable setting `allow_unrestricted_reads_from_keeper`. [#34609](https://github.com/ClickHouse/ClickHouse/pull/34609) ([Sergei Trifonov](https://github.com/serxa)).
* Display CPU and memory metrics in clickhouse-local. Close [#34545](https://github.com/ClickHouse/ClickHouse/issues/34545). [#34605](https://github.com/ClickHouse/ClickHouse/pull/34605) ([李扬](https://github.com/taiyang-li)).
* Implement `startsWith` and `endsWith` function for arrays, closes [#33982](https://github.com/ClickHouse/ClickHouse/issues/33982). [#34368](https://github.com/ClickHouse/ClickHouse/pull/34368) ([usurai](https://github.com/usurai)).
* Add three functions for Map data type: 1. `mapReplace(map1, map2)` - replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2. `mapFilter` 3. `mapMap`. mapFilter and mapMap are higher order functions, accepting two arguments, the first argument is a lambda function with k, v pair as arguments, the second argument is a column of type Map. [#33698](https://github.com/ClickHouse/ClickHouse/pull/33698) ([hexiaoting](https://github.com/hexiaoting)).
* Allow getting default user and password for clickhouse-client from the `CLICKHOUSE_USER` and `CLICKHOUSE_PASSWORD` environment variables. Close [#34538](https://github.com/ClickHouse/ClickHouse/issues/34538). [#34947](https://github.com/ClickHouse/ClickHouse/pull/34947) ([DR](https://github.com/freedomDR)).
#### Experimental Feature
* New data type `Object(<schema_format>)`, which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.g `data.key1.key2` or with cast operator `data.key1.key2::Int64`.
* Add `database_replicated_allow_only_replicated_engine` setting. When enabled, it only allowed to only create `Replicated` tables or tables with stateless engines in `Replicated` databases. [#35214](https://github.com/ClickHouse/ClickHouse/pull/35214) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). Note that `Replicated` database is still an experimental feature.
#### Performance Improvement
* Improve performance of insertion into `MergeTree` tables by optimizing sorting. Up to 2x improvement is observed on realistic benchmarks. [#34750](https://github.com/ClickHouse/ClickHouse/pull/34750) ([Maksim Kita](https://github.com/kitaisreal)).
* Columns pruning when reading Parquet, ORC and Arrow files from URL and S3. Closes [#34163](https://github.com/ClickHouse/ClickHouse/issues/34163). [#34849](https://github.com/ClickHouse/ClickHouse/pull/34849) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Columns pruning when reading Parquet, ORC and Arrow files from Hive. [#34954](https://github.com/ClickHouse/ClickHouse/pull/34954) ([lgbo](https://github.com/lgbo-ustc)).
* A bunch of performance optimizations from a performance superhero. Improve performance of processing queries with large `IN` section. Improve performance of `direct` dictionary if its source is `ClickHouse`. Improve performance of `detectCharset `, `detectLanguageUnknown ` functions. [#34888](https://github.com/ClickHouse/ClickHouse/pull/34888) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve performance of `any` aggregate function by using more batching. [#34760](https://github.com/ClickHouse/ClickHouse/pull/34760) ([Raúl Marín](https://github.com/Algunenano)).
* Multiple improvements for performance of `clickhouse-keeper`: less locking [#35010](https://github.com/ClickHouse/ClickHouse/pull/35010) ([zhanglistar](https://github.com/zhanglistar)), lower memory usage by streaming reading and writing of snapshot instead of full copy. [#34584](https://github.com/ClickHouse/ClickHouse/pull/34584) ([zhanglistar](https://github.com/zhanglistar)), optimizing compaction of log store in the RAFT implementation. [#34534](https://github.com/ClickHouse/ClickHouse/pull/34534) ([zhanglistar](https://github.com/zhanglistar)), versioning of the internal data structure [#34486](https://github.com/ClickHouse/ClickHouse/pull/34486) ([zhanglistar](https://github.com/zhanglistar)).
#### Improvement
* Allow asynchronous inserts to table functions. Fixes [#34864](https://github.com/ClickHouse/ClickHouse/issues/34864). [#34866](https://github.com/ClickHouse/ClickHouse/pull/34866) ([Anton Popov](https://github.com/CurtizJ)).
* Implicit type casting of the key argument for functions `dictGetHierarchy`, `dictIsIn`, `dictGetChildren`, `dictGetDescendants`. Closes [#34970](https://github.com/ClickHouse/ClickHouse/issues/34970). [#35027](https://github.com/ClickHouse/ClickHouse/pull/35027) ([Maksim Kita](https://github.com/kitaisreal)).
* `EXPLAIN AST` query can output AST in form of a graph in Graphviz format: `EXPLAIN AST graph = 1 SELECT * FROM system.parts`. [#35173](https://github.com/ClickHouse/ClickHouse/pull/35173) ([李扬](https://github.com/taiyang-li)).
* When large files were written with `s3` table function or table engine, the content type on the files was mistakenly set to `application/xml` due to a bug in the AWS SDK. This closes [#33964](https://github.com/ClickHouse/ClickHouse/issues/33964). [#34433](https://github.com/ClickHouse/ClickHouse/pull/34433) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Change restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also `SHOW CREATE ROW POLICY` will always show `AS permissive` or `AS restrictive` in row policy's definition. [#34596](https://github.com/ClickHouse/ClickHouse/pull/34596) ([Vitaly Baranov](https://github.com/vitlibar)).
* Improve schema inference with globs in File/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. [#34465](https://github.com/ClickHouse/ClickHouse/pull/34465) ([Kruglov Pavel](https://github.com/Avogar)).
* Play UI now correctly detects the preferred light/dark theme from the OS. [#35068](https://github.com/ClickHouse/ClickHouse/pull/35068) ([peledni](https://github.com/peledni)).
* Added `date_time_input_format = 'best_effort_us'`. Closes [#34799](https://github.com/ClickHouse/ClickHouse/issues/34799). [#34982](https://github.com/ClickHouse/ClickHouse/pull/34982) ([WenYao](https://github.com/Cai-Yao)).
* A new settings called `allow_plaintext_password` and `allow_no_password` are added in server configuration which turn on/off authentication types that can be potentially insecure in some environments. They are allowed by default. [#34738](https://github.com/ClickHouse/ClickHouse/pull/34738) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Support for `DateTime64` data type in `Arrow` format, closes [#8280](https://github.com/ClickHouse/ClickHouse/issues/8280) and closes [#28574](https://github.com/ClickHouse/ClickHouse/issues/28574). [#34561](https://github.com/ClickHouse/ClickHouse/pull/34561) ([李扬](https://github.com/taiyang-li)).
* Reload `remote_url_allow_hosts` (filtering of outgoing connections) on config update. [#35294](https://github.com/ClickHouse/ClickHouse/pull/35294) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support `--testmode` parameter for `clickhouse-local`. This parameter enables interpretation of test hints that we use in functional tests. [#35264](https://github.com/ClickHouse/ClickHouse/pull/35264) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add `distributed_depth` to query log. It is like a more detailed variant of `is_initial_query` [#35207](https://github.com/ClickHouse/ClickHouse/pull/35207) ([李扬](https://github.com/taiyang-li)).
* Respect `remote_url_allow_hosts` for `MySQL` and `PostgreSQL` table functions. [#35191](https://github.com/ClickHouse/ClickHouse/pull/35191) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Added `disk_name` field to `system.part_log`. [#35178](https://github.com/ClickHouse/ClickHouse/pull/35178) ([Artyom Yurkov](https://github.com/Varinara)).
* Do not retry non-rertiable errors when querying remote URLs. Closes [#35161](https://github.com/ClickHouse/ClickHouse/issues/35161). [#35172](https://github.com/ClickHouse/ClickHouse/pull/35172) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support distributed INSERT SELECT queries (the setting `parallel_distributed_insert_select`) table function `view()`. [#35132](https://github.com/ClickHouse/ClickHouse/pull/35132) ([Azat Khuzhin](https://github.com/azat)).
* More precise memory tracking during `INSERT` into `Buffer` with `AggregateFunction`. [#35072](https://github.com/ClickHouse/ClickHouse/pull/35072) ([Azat Khuzhin](https://github.com/azat)).
* Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes [#34787](https://github.com/ClickHouse/ClickHouse/issues/34787). [#35032](https://github.com/ClickHouse/ClickHouse/pull/35032) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. [#35004](https://github.com/ClickHouse/ClickHouse/pull/35004) ([alesapin](https://github.com/alesapin)).
* Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes [#34929](https://github.com/ClickHouse/ClickHouse/issues/34929). [#34949](https://github.com/ClickHouse/ClickHouse/pull/34949) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Use connection pool for Hive metastore client. [#34940](https://github.com/ClickHouse/ClickHouse/pull/34940) ([lgbo](https://github.com/lgbo-ustc)).
* Ignore per-column `TTL` in `CREATE TABLE AS` if new table engine does not support it (i.e. if the engine is not of `MergeTree` family). [#34938](https://github.com/ClickHouse/ClickHouse/pull/34938) ([Azat Khuzhin](https://github.com/azat)).
* Allow `LowCardinality` strings for `ngrambf_v1`/`tokenbf_v1` indexes. Closes [#21865](https://github.com/ClickHouse/ClickHouse/issues/21865). [#34911](https://github.com/ClickHouse/ClickHouse/pull/34911) ([Lars Hiller Eidnes](https://github.com/larspars)).
* Allow opening empty sqlite db if the file doesn't exist. Closes [#33367](https://github.com/ClickHouse/ClickHouse/issues/33367). [#34907](https://github.com/ClickHouse/ClickHouse/pull/34907) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Implement memory statistics for FreeBSD - this is required for `max_server_memory_usage` to work correctly. [#34902](https://github.com/ClickHouse/ClickHouse/pull/34902) ([Alexandre Snarskii](https://github.com/snar)).
* In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes [#34324](https://github.com/ClickHouse/ClickHouse/issues/34324). [#34801](https://github.com/ClickHouse/ClickHouse/pull/34801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Now `ALTER TABLE DROP COLUMN columnX` queries for `MergeTree` table engines will work instantly when `columnX` is an `ALIAS` column. Fixes [#34660](https://github.com/ClickHouse/ClickHouse/issues/34660). [#34786](https://github.com/ClickHouse/ClickHouse/pull/34786) ([alesapin](https://github.com/alesapin)).
* Show hints when user mistyped the name of a data skipping index. Closes [#29698](https://github.com/ClickHouse/ClickHouse/issues/29698). [#34764](https://github.com/ClickHouse/ClickHouse/pull/34764) ([flynn](https://github.com/ucasfl)).
* Support `remote()`/`cluster()` table functions for `parallel_distributed_insert_select`. [#34728](https://github.com/ClickHouse/ClickHouse/pull/34728) ([Azat Khuzhin](https://github.com/azat)).
* Do not reset logging that configured via `--log-file`/`--errorlog-file` command line options in case of empty configuration in the config file. [#34718](https://github.com/ClickHouse/ClickHouse/pull/34718) ([Amos Bird](https://github.com/amosbird)).
* Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. [#34684](https://github.com/ClickHouse/ClickHouse/pull/34684) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow specifying argument names for executable UDFs. This is necessary for formats where argument name is part of serialization, like `Native`, `JSONEachRow`. Closes [#34604](https://github.com/ClickHouse/ClickHouse/issues/34604). [#34653](https://github.com/ClickHouse/ClickHouse/pull/34653) ([Maksim Kita](https://github.com/kitaisreal)).
* `MaterializedMySQL` (experimental feature) now supports `materialized_mysql_tables_list` (a comma-separated list of MySQL database tables, which will be replicated by the MaterializedMySQL database engine. Default value: empty list — means all the tables will be replicated), mentioned at [#32977](https://github.com/ClickHouse/ClickHouse/issues/32977). [#34487](https://github.com/ClickHouse/ClickHouse/pull/34487) ([zzsmdfj](https://github.com/zzsmdfj)).
* Improve OpenTelemetry span logs for INSERT operation on distributed table. [#34480](https://github.com/ClickHouse/ClickHouse/pull/34480) ([Frank Chen](https://github.com/FrankChen021)).
* Make the znode `ctime` and `mtime` consistent between servers in ClickHouse Keeper. [#33441](https://github.com/ClickHouse/ClickHouse/pull/33441) ([小路](https://github.com/nicelulu)).
#### Build/Testing/Packaging Improvement
* Package repository is migrated to JFrog Artifactory (**Mikhail f. Shiryaev**).
* Randomize some settings in functional tests, so more possible combinations of settings will be tested. This is yet another fuzzing method to ensure better test coverage. This closes [#32268](https://github.com/ClickHouse/ClickHouse/issues/32268). [#34092](https://github.com/ClickHouse/ClickHouse/pull/34092) ([Kruglov Pavel](https://github.com/Avogar)).
* Drop PVS-Studio from our CI. [#34680](https://github.com/ClickHouse/ClickHouse/pull/34680) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add an ability to build stripped binaries with CMake. In previous versions it was performed by dh-tools. [#35196](https://github.com/ClickHouse/ClickHouse/pull/35196) ([alesapin](https://github.com/alesapin)).
* Smaller "fat-free" `clickhouse-keeper` build. [#35031](https://github.com/ClickHouse/ClickHouse/pull/35031) ([alesapin](https://github.com/alesapin)).
* Use @robot-clickhouse as an author and committer for PRs like https://github.com/ClickHouse/ClickHouse/pull/34685. [#34793](https://github.com/ClickHouse/ClickHouse/pull/34793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Limit DWARF version for debug info by 4 max, because our internal stack symbolizer cannot parse DWARF version 5. This makes sense if you compile ClickHouse with clang-15. [#34777](https://github.com/ClickHouse/ClickHouse/pull/34777) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove `clickhouse-test` debian package as unneeded complication. CI use tests from repository and standalone testing via deb package is no longer supported. [#34606](https://github.com/ClickHouse/ClickHouse/pull/34606) ([Ilya Yatsishin](https://github.com/qoega)).
#### Bug Fix (user-visible misbehaviour in official stable or prestable release)
* A fix for HDFS integration: When the inner buffer size is too small, NEED_MORE_INPUT in `HadoopSnappyDecoder` will run multi times (>=3) for one compressed block. This makes the input data be copied into the wrong place in `HadoopSnappyDecoder::buffer`. [#35116](https://github.com/ClickHouse/ClickHouse/pull/35116) ([lgbo](https://github.com/lgbo-ustc)).
* Ignore obsolete grants in ATTACH GRANT statements. This PR fixes [#34815](https://github.com/ClickHouse/ClickHouse/issues/34815). [#34855](https://github.com/ClickHouse/ClickHouse/pull/34855) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes [#35312](https://github.com/ClickHouse/ClickHouse/issues/35312). [#35313](https://github.com/ClickHouse/ClickHouse/pull/35313) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partial merge join duplicate rows bug, close [#31009](https://github.com/ClickHouse/ClickHouse/issues/31009). [#35311](https://github.com/ClickHouse/ClickHouse/pull/35311) ([Vladimir C](https://github.com/vdimir)).
* Fix possible `Assertion 'position() != working_buffer.end()' failed` while using bzip2 compression with small `max_read_buffer_size` setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. [#35300](https://github.com/ClickHouse/ClickHouse/pull/35300) ([Kruglov Pavel](https://github.com/Avogar)). While using lz4 compression with a small max_read_buffer_size setting value. [#35296](https://github.com/ClickHouse/ClickHouse/pull/35296) ([Kruglov Pavel](https://github.com/Avogar)). While using lzma compression with small `max_read_buffer_size` setting value. [#35295](https://github.com/ClickHouse/ClickHouse/pull/35295) ([Kruglov Pavel](https://github.com/Avogar)). While using `brotli` compression with a small `max_read_buffer_size` setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. [#35281](https://github.com/ClickHouse/ClickHouse/pull/35281) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix possible segfault in `JSONEachRow` schema inference. [#35291](https://github.com/ClickHouse/ClickHouse/pull/35291) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix `CHECK TABLE` query in case when sparse columns are enabled in table. [#35274](https://github.com/ClickHouse/ClickHouse/pull/35274) ([Anton Popov](https://github.com/CurtizJ)).
* Avoid std::terminate in case of exception in reading from remote VFS. [#35257](https://github.com/ClickHouse/ClickHouse/pull/35257) ([Azat Khuzhin](https://github.com/azat)).
* Fix reading port from config, close [#34776](https://github.com/ClickHouse/ClickHouse/issues/34776). [#35193](https://github.com/ClickHouse/ClickHouse/pull/35193) ([Vladimir C](https://github.com/vdimir)).
* Fix error in query with `WITH TOTALS` in case if `HAVING` returned empty result. This fixes [#33711](https://github.com/ClickHouse/ClickHouse/issues/33711). [#35186](https://github.com/ClickHouse/ClickHouse/pull/35186) ([Amos Bird](https://github.com/amosbird)).
* Fix a corner case of `replaceRegexpAll`, close [#35117](https://github.com/ClickHouse/ClickHouse/issues/35117). [#35182](https://github.com/ClickHouse/ClickHouse/pull/35182) ([Vladimir C](https://github.com/vdimir)).
* Schema inference didn't work properly on case of `INSERT INTO FUNCTION s3(...) FROM ...`, it tried to read schema from s3 file instead of from select query. [#35176](https://github.com/ClickHouse/ClickHouse/pull/35176) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix MaterializedPostgreSQL (experimental feature) `table overrides` for partition by, etc. Closes [#35048](https://github.com/ClickHouse/ClickHouse/issues/35048). [#35162](https://github.com/ClickHouse/ClickHouse/pull/35162) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix MaterializedPostgreSQL (experimental feature) adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes [#33800](https://github.com/ClickHouse/ClickHouse/issues/33800). Closes [#34922](https://github.com/ClickHouse/ClickHouse/issues/34922). Closes [#34315](https://github.com/ClickHouse/ClickHouse/issues/34315). [#35158](https://github.com/ClickHouse/ClickHouse/pull/35158) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning error when non-monotonic function is used with IN operator. This fixes [#35136](https://github.com/ClickHouse/ClickHouse/issues/35136). [#35146](https://github.com/ClickHouse/ClickHouse/pull/35146) ([Amos Bird](https://github.com/amosbird)).
* Fixed slightly incorrect translation of YAML configs to XML. [#35135](https://github.com/ClickHouse/ClickHouse/pull/35135) ([Miel Donkers](https://github.com/mdonkers)).
* Fix `optimize_skip_unused_shards_rewrite_in` for signed columns and negative values. [#35134](https://github.com/ClickHouse/ClickHouse/pull/35134) ([Azat Khuzhin](https://github.com/azat)).
* The `update_lag` external dictionary configuration option was unusable showing the error message ``Unexpected key `update_lag` in dictionary source configuration``. [#35089](https://github.com/ClickHouse/ClickHouse/pull/35089) ([Jason Chu](https://github.com/1lann)).
* Avoid possible deadlock on server shutdown. [#35081](https://github.com/ClickHouse/ClickHouse/pull/35081) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing alias after function is optimized to a subcolumn when setting `optimize_functions_to_subcolumns` is enabled. Closes [#33798](https://github.com/ClickHouse/ClickHouse/issues/33798). [#35079](https://github.com/ClickHouse/ClickHouse/pull/35079) ([qieqieplus](https://github.com/qieqieplus)).
* Fix reading from `system.asynchronous_inserts` table if there exists asynchronous insert into table function. [#35050](https://github.com/ClickHouse/ClickHouse/pull/35050) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible exception `Reading for MergeTree family tables must be done with last position boundary` (relevant to operation on remote VFS). Closes [#34979](https://github.com/ClickHouse/ClickHouse/issues/34979). [#35001](https://github.com/ClickHouse/ClickHouse/pull/35001) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix unexpected result when use -State type aggregate function in window frame. [#34999](https://github.com/ClickHouse/ClickHouse/pull/34999) ([metahys](https://github.com/metahys)).
* Fix possible segfault in FileLog (experimental feature). Closes [#30749](https://github.com/ClickHouse/ClickHouse/issues/30749). [#34996](https://github.com/ClickHouse/ClickHouse/pull/34996) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible rare error `Cannot push block to port which already has data`. [#34993](https://github.com/ClickHouse/ClickHouse/pull/34993) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix wrong schema inference for unquoted dates in CSV. Closes [#34768](https://github.com/ClickHouse/ClickHouse/issues/34768). [#34961](https://github.com/ClickHouse/ClickHouse/pull/34961) ([Kruglov Pavel](https://github.com/Avogar)).
* Integration with Hive: Fix unexpected result when use `in` in `where` in hive query. [#34945](https://github.com/ClickHouse/ClickHouse/pull/34945) ([lgbo](https://github.com/lgbo-ustc)).
* Avoid busy polling in ClickHouse Keeper while searching for changelog files to delete. [#34931](https://github.com/ClickHouse/ClickHouse/pull/34931) ([Azat Khuzhin](https://github.com/azat)).
* Fix DateTime64 conversion from PostgreSQL. Closes [#33364](https://github.com/ClickHouse/ClickHouse/issues/33364). [#34910](https://github.com/ClickHouse/ClickHouse/pull/34910) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible "Part directory doesn't exist" during `INSERT` into MergeTree table backed by VFS over s3. [#34876](https://github.com/ClickHouse/ClickHouse/pull/34876) ([Azat Khuzhin](https://github.com/azat)).
* Support DDLs like CREATE USER to be executed on cross replicated cluster. [#34860](https://github.com/ClickHouse/ClickHouse/pull/34860) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Fix bugs for multiple columns group by in `WindowView` (experimental feature). [#34859](https://github.com/ClickHouse/ClickHouse/pull/34859) ([vxider](https://github.com/Vxider)).
* Fix possible failures in S2 functions when queries contain const columns. [#34745](https://github.com/ClickHouse/ClickHouse/pull/34745) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix bug for H3 funcs containing const columns which cause queries to fail. [#34743](https://github.com/ClickHouse/ClickHouse/pull/34743) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix `No such file or directory` with enabled `fsync_part_directory` and vertical merge. [#34739](https://github.com/ClickHouse/ClickHouse/pull/34739) ([Azat Khuzhin](https://github.com/azat)).
* Fix serialization/printing for system queries `RELOAD MODEL`, `RELOAD FUNCTION`, `RESTART DISK` when used `ON CLUSTER`. Closes [#34514](https://github.com/ClickHouse/ClickHouse/issues/34514). [#34696](https://github.com/ClickHouse/ClickHouse/pull/34696) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix `allow_experimental_projection_optimization` with `enable_global_with_statement` (before it may lead to `Stack size too large` error in case of multiple expressions in `WITH` clause, and also it executes scalar subqueries again and again, so not it will be more optimal). [#34650](https://github.com/ClickHouse/ClickHouse/pull/34650) ([Azat Khuzhin](https://github.com/azat)).
* Stop to select part for mutate when the other replica has already updated the transaction log for `ReplatedMergeTree` engine. [#34633](https://github.com/ClickHouse/ClickHouse/pull/34633) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Fix incorrect result of trivial count query when part movement feature is used [#34089](https://github.com/ClickHouse/ClickHouse/issues/34089). [#34385](https://github.com/ClickHouse/ClickHouse/pull/34385) ([nvartolomei](https://github.com/nvartolomei)).
* Fix inconsistency of `max_query_size` limitation in distributed subqueries. [#34078](https://github.com/ClickHouse/ClickHouse/pull/34078) ([Chao Ma](https://github.com/godliness)).
### <a id="222"></a> ClickHouse release v22.2, 2022-02-17
#### Upgrade Notes
@ -174,7 +308,7 @@
* This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). [#33574](https://github.com/ClickHouse/ClickHouse/pull/33574) ([Vitaly Baranov](https://github.com/vitlibar)).
### ClickHouse release v22.1, 2022-01-18
### <a id="221"></a> ClickHouse release v22.1, 2022-01-18
#### Upgrade Notes

View File

@ -15,7 +15,7 @@ The following versions of ClickHouse server are currently being supported with s
| 20.x | :x: |
| 21.1 | :x: |
| 21.2 | :x: |
| 21.3 | |
| 21.3 | :x: |
| 21.4 | :x: |
| 21.5 | :x: |
| 21.6 | :x: |
@ -23,9 +23,11 @@ The following versions of ClickHouse server are currently being supported with s
| 21.8 | ✅ |
| 21.9 | :x: |
| 21.10 | :x: |
| 21.11 | |
| 21.12 | |
| 21.11 | :x: |
| 21.12 | :x: |
| 22.1 | ✅ |
| 22.2 | ✅ |
| 22.3 | ✅ |
## Reporting a Vulnerability

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54460)
SET(VERSION_REVISION 54461)
SET(VERSION_MAJOR 22)
SET(VERSION_MINOR 3)
SET(VERSION_MINOR 4)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 75366fc95e510b7ac76759ef670702ae5f488a51)
SET(VERSION_DESCRIBE v22.3.1.1-testing)
SET(VERSION_STRING 22.3.1.1)
SET(VERSION_GITHASH 92ab33f560e638d1989c5ca543021ab53d110f5c)
SET(VERSION_DESCRIBE v22.4.1.1-testing)
SET(VERSION_STRING 22.4.1.1)
# end of autochange

View File

@ -229,6 +229,25 @@ As simple code editors, you can use Sublime Text or Visual Studio Code, or Kate
Just in case, it is worth mentioning that CLion creates `build` path on its own, it also on its own selects `debug` for build type, for configuration it uses a version of CMake that is defined in CLion and not the one installed by you, and finally, CLion will use `make` to run build tasks instead of `ninja`. This is normal behaviour, just keep that in mind to avoid confusion.
## Debugging
Many graphical IDEs offer with an integrated debugger but you can also use a standalone debugger.
### GDB
### LLDB
# tell LLDB where to find the source code
settings set target.source-map /path/to/build/dir /path/to/source/dir
# configure LLDB to display code before/after currently executing line
settings set stop-line-count-before 10
settings set stop-line-count-after 10
target create ./clickhouse-client
# <set breakpoints here>
process launch -- --query="SELECT * FROM TAB"
## Writing Code {#writing-code}
The description of ClickHouse architecture can be found here: https://clickhouse.com/docs/en/development/architecture/

View File

@ -5,30 +5,19 @@ toc_title: Playground
# ClickHouse Playground {#clickhouse-playground}
!!! warning "Warning"
This service is deprecated and will be replaced in foreseeable future.
[ClickHouse Playground](https://play.clickhouse.com) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
Several example datasets are available in Playground as well as sample queries that show ClickHouse features. Theres also a selection of ClickHouse LTS releases to experiment with.
[ClickHouse Playground](https://play.clickhouse.com/play?user=play) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
Several example datasets are available in Playground.
You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
## Credentials {#credentials}
| Parameter | Value |
|:--------------------|:----------------------------------------|
| HTTPS endpoint | `https://play-api.clickhouse.com:8443` |
| Native TCP endpoint | `play-api.clickhouse.com:9440` |
| User | `playground` |
| Password | `clickhouse` |
There are additional endpoints with specific ClickHouse releases to experiment with their differences (ports and user/password are the same as above):
- 20.3 LTS: `play-api-v20-3.clickhouse.com`
- 19.14 LTS: `play-api-v19-14.clickhouse.com`
!!! note "Note"
All these endpoints require a secure TLS connection.
| Parameter | Value |
|:--------------------|:-----------------------------------|
| HTTPS endpoint | `https://play.clickhouse.com:443/` |
| Native TCP endpoint | `play.clickhouse.com:9440` |
| User | `explorer` or `play` |
| Password | (empty) |
## Limitations {#limitations}
@ -37,23 +26,18 @@ The queries are executed as a read-only user. It implies some limitations:
- DDL queries are not allowed
- INSERT queries are not allowed
The following settings are also enforced:
- [max_result_bytes=10485760](../operations/settings/query-complexity/#max-result-bytes)
- [max_result_rows=2000](../operations/settings/query-complexity/#setting-max_result_rows)
- [result_overflow_mode=break](../operations/settings/query-complexity/#result-overflow-mode)
- [max_execution_time=60000](../operations/settings/query-complexity/#max-execution-time)
The service also have quotas on its usage.
## Examples {#examples}
HTTPS endpoint example with `curl`:
``` bash
curl "https://play-api.clickhouse.com:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
curl "https://play.clickhouse.com/?user=explorer" --data-binary "SELECT 'Play ClickHouse'"
```
TCP endpoint example with [CLI](../interfaces/cli.md):
``` bash
clickhouse client --secure -h play-api.clickhouse.com --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
clickhouse client --secure --host play.clickhouse.com --user explorer
```

View File

@ -51,6 +51,7 @@ The supported formats are:
| [PrettySpace](#prettyspace) | ✗ | ✔ |
| [Protobuf](#protobuf) | ✔ | ✔ |
| [ProtobufSingle](#protobufsingle) | ✔ | ✔ |
| [ProtobufList](#protobuflist) | ✔ | ✔ |
| [Avro](#data-format-avro) | ✔ | ✔ |
| [AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ |
| [Parquet](#data-format-parquet) | ✔ | ✔ |
@ -1230,7 +1231,38 @@ See also [how to read/write length-delimited protobuf messages in popular langua
## ProtobufSingle {#protobufsingle}
Same as [Protobuf](#protobuf) but for storing/parsing single Protobuf message without length delimiters.
Same as [Protobuf](#protobuf) but for storing/parsing a single Protobuf message without length delimiter.
As a result, only a single table row can be written/read.
## ProtobufList {#protobuflist}
Similar to Protobuf but rows are represented as a sequence of sub-messages contained in a message with fixed name "Envelope".
Usage example:
``` sql
SELECT * FROM test.table FORMAT ProtobufList SETTINGS format_schema = 'schemafile:MessageType'
```
``` bash
cat protobuflist_messages.bin | clickhouse-client --query "INSERT INTO test.table FORMAT ProtobufList SETTINGS format_schema='schemafile:MessageType'"
```
where the file `schemafile.proto` looks like this:
``` capnp
syntax = "proto3";
message Envelope {
message MessageType {
string name = 1;
string surname = 2;
uint32 birthDate = 3;
repeated string phoneNumbers = 4;
};
MessageType row = 1;
};
```
## Avro {#data-format-avro}
@ -1364,7 +1396,8 @@ The table below shows supported data types and how they match ClickHouse [data t
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` |
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
@ -1421,7 +1454,8 @@ The table below shows supported data types and how they match ClickHouse [data t
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT32` |
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `FLOAT64` |
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` |
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
@ -1483,7 +1517,8 @@ The table below shows supported data types and how they match ClickHouse [data t
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `DATE32` |
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `TIMESTAMP` |
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
| `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` |

View File

@ -55,7 +55,7 @@ Internal coordination settings are located in `<keeper_server>.<coordination_set
- `auto_forwarding` — Allow to forward write requests from followers to the leader (default: true).
- `shutdown_timeout` — Wait to finish internal connections and shutdown (ms) (default: 5000).
- `startup_timeout` — If the server doesn't connect to other quorum participants in the specified timeout it will terminate (ms) (default: 30000).
- `four_letter_word_white_list` — White list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
- `four_letter_word_allow_list` — Allow list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro").
Quorum configuration is located in `<keeper_server>.<raft_configuration>` section and contain servers description.
@ -121,7 +121,7 @@ clickhouse keeper --config /etc/your_path_to_config/config.xml
ClickHouse Keeper also provides 4lw commands which are almost the same with Zookeeper. Each command is composed of four letters such as `mntr`, `stat` etc. There are some more interesting commands: `stat` gives some general information about the server and connected clients, while `srvr` and `cons` give extended details on server and connections respectively.
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro".
The 4lw commands has a allow list configuration `four_letter_word_allow_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro".
You can issue the commands to ClickHouse Keeper via telnet or nc, at the client port.
@ -201,7 +201,7 @@ Server stats reset.
```
server_id=1
tcp_port=2181
four_letter_word_white_list=*
four_letter_word_allow_list=*
log_storage_path=./coordination/logs
snapshot_storage_path=./coordination/snapshots
max_requests_batch_size=100

View File

@ -3290,6 +3290,19 @@ Possible values:
Default value: `16`.
## max_insert_delayed_streams_for_parallel_write {#max-insert-delayed-streams-for-parallel-write}
The maximum number of streams (columns) to delay final part flush.
It makes difference only if underlying storage supports parallel write (i.e. S3), otherwise it will not give any benefit.
Possible values:
- Positive integer.
- 0 or 1 — Disabled.
Default value: `1000` for S3 and `0` otherwise.
## opentelemetry_start_trace_probability {#opentelemetry-start-trace-probability}
Sets the probability that the ClickHouse can start a trace for executed queries (if no parent [trace context](https://www.w3.org/TR/trace-context/) is supplied).

View File

@ -225,15 +225,15 @@ This storage method works the same way as hashed and allows using date/time (arb
Example: The table contains discounts for each advertiser in the format:
``` text
+---------|-------------|-------------|------+
+---------------|---------------------|-------------------|--------+
| advertiser id | discount start date | discount end date | amount |
+===============+=====================+===================+========+
| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
+---------|-------------|-------------|------+
+---------------|---------------------|-------------------|--------+
| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
+---------|-------------|-------------|------+
+---------------|---------------------|-------------------|--------+
| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
+---------|-------------|-------------|------+
+---------------|---------------------|-------------------|--------+
```
To use a sample for date ranges, define the `range_min` and `range_max` elements in the [structure](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-structure.md). These elements must contain elements `name` and `type` (if `type` is not specified, the default type will be used - Date). `type` can be any numeric type (Date / DateTime / UInt64 / Int32 / others).
@ -272,10 +272,10 @@ LAYOUT(RANGE_HASHED())
RANGE(MIN first MAX last)
```
To work with these dictionaries, you need to pass an additional argument to the `dictGetT` function, for which a range is selected:
To work with these dictionaries, you need to pass an additional argument to the `dictGet*` function, for which a range is selected:
``` sql
dictGetT('dict_name', 'attr_name', id, date)
dictGet*('dict_name', 'attr_name', id, date)
```
This function returns the value for the specified `id`s and the date range that includes the passed date.
@ -479,17 +479,17 @@ This type of storage is for mapping network prefixes (IP addresses) to metadata
Example: The table contains network prefixes and their corresponding AS number and country code:
``` text
+-----------|-----|------+
+-----------------|-------|--------+
| prefix | asn | cca2 |
+=================+=======+========+
| 202.79.32.0/20 | 17501 | NP |
+-----------|-----|------+
+-----------------|-------|--------+
| 2620:0:870::/48 | 3856 | US |
+-----------|-----|------+
+-----------------|-------|--------+
| 2a02:6b8:1::/48 | 13238 | RU |
+-----------|-----|------+
+-----------------|-------|--------+
| 2001:db8::/32 | 65536 | ZZ |
+-----------|-----|------+
+-----------------|-------|--------+
```
When using this type of layout, the structure must have a composite key.
@ -538,10 +538,10 @@ PRIMARY KEY prefix
The key must have only one String type attribute that contains an allowed IP prefix. Other types are not supported yet.
For queries, you must use the same functions (`dictGetT` with a tuple) as for dictionaries with composite keys:
For queries, you must use the same functions (`dictGet*` with a tuple) as for dictionaries with composite keys:
``` sql
dictGetT('dict_name', 'attr_name', tuple(ip))
dictGet*('dict_name', 'attr_name', tuple(ip))
```
The function takes either `UInt32` for IPv4, or `FixedString(16)` for IPv6:

View File

@ -1392,12 +1392,24 @@ Returns the first element in the `arr1` array for which `func` returns something
Note that the `arrayFirst` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayFirstOrNull(func, arr1, …) {#array-first-or-null}
Returns the first element in the `arr1` array for which `func` returns something other than 0. If there are no such element, returns null.
Note that the `arrayFirstOrNull` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayLast(func, arr1, …) {#array-last}
Returns the last element in the `arr1` array for which `func` returns something other than 0.
Note that the `arrayLast` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayLastOrNull(func, arr1, …) {#array-last-or-null}
Returns the last element in the `arr1` array for which `func` returns something other than 0. If there are no such element, returns null.
Note that the `arrayLast` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayFirstIndex(func, arr1, …) {#array-first-index}
Returns the index of the first element in the `arr1` array for which `func` returns something other than 0.

View File

@ -1026,4 +1026,185 @@ Result:
│ 41162 │
└─────────────┘
```
## h3PointDistM {#h3pointdistm}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in meters.
**Syntax**
``` sql
h3PointDistM(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in meters.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistM(-10.0 ,0.0, 10.0, 0.0) as h3PointDistM;
```
Result:
``` text
┌──────h3PointDistM─┐
│ 2223901.039504589 │
└───────────────────┘
```
## h3PointDistKm {#h3pointdistkm}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in kilometers.
**Syntax**
``` sql
h3PointDistKm(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in kilometers.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistKm(-10.0 ,0.0, 10.0, 0.0) as h3PointDistKm;
```
Result:
``` text
┌─────h3PointDistKm─┐
│ 2223.901039504589 │
└───────────────────┘
```
## h3PointDistRads {#h3pointdistrads}
Returns the "great circle" or "haversine" distance between pairs of GeoCoord points (latitude/longitude) pairs in radians.
**Syntax**
``` sql
h3PointDistRads(lat1, lon1, lat2, lon2)
```
**Arguments**
- `lat1`, `lon1` — Latitude and Longitude of point1 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
- `lat2`, `lon2` — Latitude and Longitude of point2 in degrees. Type: [Float64](../../../sql-reference/data-types/float.md).
**Returned values**
- Haversine or great circle distance in radians.
Type: [Float64](../../../sql-reference/data-types/float.md).
**Example**
Query:
``` sql
select h3PointDistRads(-10.0 ,0.0, 10.0, 0.0) as h3PointDistRads;
```
Result:
``` text
┌────h3PointDistRads─┐
│ 0.3490658503988659 │
└────────────────────┘
```
## h3GetRes0Indexes {#h3getres0indexes}
Returns an array of all the resolution 0 H3 indexes.
**Syntax**
``` sql
h3GetRes0Indexes()
```
**Returned values**
- Array of all the resolution 0 H3 indexes.
Type: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)).
**Example**
Query:
``` sql
select h3GetRes0Indexes as indexes ;
```
Result:
``` text
┌─indexes─────────────────────────────────────┐
│ [576495936675512319,576531121047601151,....]│
└─────────────────────────────────────────────┘
```
## h3GetPentagonIndexes {#h3getpentagonindexes}
Returns all the pentagon H3 indexes at the specified resolution.
**Syntax**
``` sql
h3GetPentagonIndexes(resolution)
```
**Parameter**
- `resolution` — Index resolution. Range: `[0, 15]`. Type: [UInt8](../../../sql-reference/data-types/int-uint.md).
**Returned value**
- Array of all pentagon H3 indexes.
Type: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)).
**Example**
Query:
``` sql
SELECT h3GetPentagonIndexes(3) AS indexes;
```
Result:
``` text
┌─indexes────────────────────────────────────────────────────────┐
│ [590112357393367039,590464201114255359,590816044835143679,...] │
└────────────────────────────────────────────────────────────────┘
```
[Original article](https://clickhouse.com/docs/en/sql-reference/functions/geo/h3) <!--hide-->

View File

@ -2,6 +2,49 @@
toc_priority: 76
toc_title: Security Changelog
---
## Fixed in ClickHouse 21.10.2.15, 2021-10-18 {#fixed-in-clickhouse-release-21-10-2-215-2021-10-18}
### CVE-2021-43304 {#cve-2021-43304}
Heap buffer overflow in Clickhouse's LZ4 compression codec when parsing a malicious query. There is no verification that the copy operations in the LZ4::decompressImpl loop and especially the arbitrary copy operation wildCopy<copy_amount>(op, ip, copy_end), dont exceed the destination buffers limits.
Credits: JFrog Security Research Team
### CVE-2021-43305 {#cve-2021-43305}
Heap buffer overflow in Clickhouse's LZ4 compression codec when parsing a malicious query. There is no verification that the copy operations in the LZ4::decompressImpl loop and especially the arbitrary copy operation wildCopy<copy_amount>(op, ip, copy_end), dont exceed the destination buffers limits. This issue is very similar to CVE-2021-43304, but the vulnerable copy operation is in a different wildCopy call.
Credits: JFrog Security Research Team
### CVE-2021-42387 {#cve-2021-42387}
Heap out-of-bounds read in Clickhouse's LZ4 compression codec when parsing a malicious query. As part of the LZ4::decompressImpl() loop, a 16-bit unsigned user-supplied value ('offset') is read from the compressed data. The offset is later used in the length of a copy operation, without checking the upper bounds of the source of the copy operation.
Credits: JFrog Security Research Team
### CVE-2021-42388 {#cve-2021-42388}
Heap out-of-bounds read in Clickhouse's LZ4 compression codec when parsing a malicious query. As part of the LZ4::decompressImpl() loop, a 16-bit unsigned user-supplied value ('offset') is read from the compressed data. The offset is later used in the length of a copy operation, without checking the lower bounds of the source of the copy operation.
Credits: JFrog Security Research Team
### CVE-2021-42389 {#cve-2021-42389}
Divide-by-zero in Clickhouse's Delta compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
Credits: JFrog Security Research Team
### CVE-2021-42390 {#cve-2021-42390}
Divide-by-zero in Clickhouse's DeltaDouble compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
Credits: JFrog Security Research Team
### CVE-2021-42391 {#cve-2021-42391}
Divide-by-zero in Clickhouse's Gorilla compression codec when parsing a malicious query. The first byte of the compressed buffer is used in a modulo operation without being checked for 0.
Credits: JFrog Security Research Team
## Fixed in ClickHouse 21.4.3.21, 2021-04-12 {#fixed-in-clickhouse-release-21-4-3-21-2021-04-12}

View File

@ -5,58 +5,39 @@ toc_title: Playground
# ClickHouse Playground {#clickhouse-playground}
!!! warning "Warning"
This service is deprecated and will be replaced in foreseeable future.
[ClickHouse Playground](https://play.clickhouse.com/play?user=play) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
Several example datasets are available in Playground.
[ClickHouse Playground](https://play.clickhouse.com) では、サーバーやクラスタを設定することなく、即座にクエリを実行して ClickHouse を試すことができます。
いくつかの例のデータセットは、Playground だけでなく、ClickHouse の機能を示すサンプルクエリとして利用可能です. また、 ClickHouse の LTS リリースで試すこともできます。
You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
任意の HTTP クライアントを使用してプレイグラウンドへのクエリを作成することができます。例えば[curl](https://curl.haxx.se)、[wget](https://www.gnu.org/software/wget/)、[JDBC](../interfaces/jdbc.md)または[ODBC](../interfaces/odbc.md)ドライバを使用して接続を設定します。
ClickHouse をサポートするソフトウェア製品の詳細情報は[こちら](../interfaces/index.md)をご覧ください。
## Credentials {#credentials}
## 資格情報 {#credentials}
| Parameter | Value |
|:--------------------|:-----------------------------------|
| HTTPS endpoint | `https://play.clickhouse.com:443/` |
| Native TCP endpoint | `play.clickhouse.com:9440` |
| User | `explorer` or `play` |
| Password | (empty) |
| パラメータ | 値 |
| :---------------------------- | :-------------------------------------- |
| HTTPS エンドポイント | `https://play-api.clickhouse.com:8443` |
| ネイティブ TCP エンドポイント | `play-api.clickhouse.com:9440` |
| ユーザ名 | `playgrounnd` |
| パスワード | `clickhouse` |
## Limitations {#limitations}
The queries are executed as a read-only user. It implies some limitations:
特定のClickHouseのリリースで試すために、追加のエンドポイントがあります。ポートとユーザー/パスワードは上記と同じです)。
- DDL queries are not allowed
- INSERT queries are not allowed
- 20.3 LTS: `play-api-v20-3.clickhouse.com`
- 19.14 LTS: `play-api-v19-14.clickhouse.com`
The service also have quotas on its usage.
!!! note "備考"
これらのエンドポイントはすべて、安全なTLS接続が必要です。
## Examples {#examples}
## 制限事項 {#limitations}
クエリは読み取り専用のユーザとして実行されます。これにはいくつかの制限があります。
- DDL クエリは許可されていません。
- INSERT クエリは許可されていません。
また、以下の設定がなされています。
- [max_result_bytes=10485760](../operations/settings/query_complexity/#max-result-bytes)
- [max_result_rows=2000](../operations/settings/query_complexity/#setting-max_result_rows)
- [result_overflow_mode=break](../operations/settings/query_complexity/#result-overflow-mode)
- [max_execution_time=60000](../operations/settings/query_complexity/#max-execution-time)
## 例 {#examples}
`curl` を用いて HTTPSエンドポイントへ接続する例:
HTTPS endpoint example with `curl`:
``` bash
curl "https://play-api.clickhouse.com:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
curl "https://play.clickhouse.com/?user=explorer" --data-binary "SELECT 'Play ClickHouse'"
```
[CLI](../interfaces/cli.md) で TCP エンドポイントへ接続する例:
TCP endpoint example with [CLI](../interfaces/cli.md):
``` bash
clickhouse client --secure -h play-api.clickhouse.com --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
clickhouse client --secure --host play.clickhouse.com --user explorer
```

View File

@ -5,53 +5,39 @@ toc_title: Playground
# ClickHouse Playground {#clickhouse-playground}
!!! warning "Warning"
This service is deprecated and will be replaced in foreseeable future.
[ClickHouse Playground](https://play.clickhouse.com/play?user=play) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
Several example datasets are available in Playground.
[ClickHouse Playground](https://play.clickhouse.com) позволяет пользователям экспериментировать с ClickHouse, мгновенно выполняя запросы без настройки своего сервера или кластера.
В Playground доступны несколько тестовых массивов данных, а также примеры запросов, которые показывают возможности ClickHouse. Кроме того, вы можете выбрать LTS релиз ClickHouse, который хотите протестировать.
You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
Вы можете отправлять запросы к Playground с помощью любого HTTP-клиента, например [curl](https://curl.haxx.se) или [wget](https://www.gnu.org/software/wget/), также можно установить соединение с помощью драйверов [JDBC](../interfaces/jdbc.md) или [ODBC](../interfaces/odbc.md). Более подробная информация о программных продуктах, поддерживающих ClickHouse, доступна [здесь](../interfaces/index.md).
## Credentials {#credentials}
## Параметры доступа {#credentials}
| Parameter | Value |
|:--------------------|:-----------------------------------|
| HTTPS endpoint | `https://play.clickhouse.com:443/` |
| Native TCP endpoint | `play.clickhouse.com:9440` |
| User | `explorer` or `play` |
| Password | (empty) |
| Параметр | Значение |
|:--------------------|:----------------------------------------|
| Конечная точка HTTPS| `https://play-api.clickhouse.com:8443` |
| Конечная точка TCP | `play-api.clickhouse.com:9440` |
| Пользователь | `playground` |
| Пароль | `clickhouse` |
## Limitations {#limitations}
Также можно подключаться к ClickHouse определённых релизов, чтобы протестировать их различия (порты и пользователь / пароль остаются неизменными):
The queries are executed as a read-only user. It implies some limitations:
- 20.3 LTS: `play-api-v20-3.clickhouse.com`
- 19.14 LTS: `play-api-v19-14.clickhouse.com`
- DDL queries are not allowed
- INSERT queries are not allowed
!!! note "Примечание"
Для всех этих конечных точек требуется безопасное соединение TLS.
The service also have quotas on its usage.
## Ограничения {#limitations}
## Examples {#examples}
Запросы выполняются под пользователем с правами `readonly`, для которого есть следующие ограничения:
- запрещены DDL запросы
- запрещены INSERT запросы
Также установлены следующие опции:
- [max_result_bytes=10485760](../operations/settings/query-complexity.md#max-result-bytes)
- [max_result_rows=2000](../operations/settings/query-complexity.md#setting-max_result_rows)
- [result_overflow_mode=break](../operations/settings/query-complexity.md#result-overflow-mode)
- [max_execution_time=60000](../operations/settings/query-complexity.md#max-execution-time)
## Примеры {#examples}
Пример конечной точки HTTPS с `curl`:
HTTPS endpoint example with `curl`:
``` bash
curl "https://play-api.clickhouse.com:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
curl "https://play.clickhouse.com/?user=explorer" --data-binary "SELECT 'Play ClickHouse'"
```
Пример конечной точки TCP с [CLI](../interfaces/cli.md):
TCP endpoint example with [CLI](../interfaces/cli.md):
``` bash
clickhouse client --secure -h play-api.clickhouse.com --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
clickhouse client --secure --host play.clickhouse.com --user explorer
```

View File

@ -54,7 +54,7 @@ ClickHouse Keeper может использоваться как равноце
- `auto_forwarding` — разрешить пересылку запросов на запись от последователей лидеру (по умолчанию: true).
- `shutdown_timeout` — время ожидания завершения внутренних подключений и выключения, в миллисекундах (по умолчанию: 5000).
- `startup_timeout` — время отключения сервера, если он не подключается к другим участникам кворума, в миллисекундах (по умолчанию: 30000).
- `four_letter_word_white_list` — список разрешенных 4-х буквенных команд (по умолчанию: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
- `four_letter_word_allow_list` — список разрешенных 4-х буквенных команд (по умолчанию: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro").
Конфигурация кворума находится в `<keeper_server>.<raft_configuration>` и содержит описание серверов.
@ -114,7 +114,7 @@ clickhouse-keeper --config /etc/your_path_to_config/config.xml --daemon
ClickHouse Keeper также поддерживает 4-х буквенные команды, почти такие же, как у Zookeeper. Каждая команда состоит из 4-х символов, например, `mntr`, `stat` и т. д. Несколько интересных команд: `stat` предоставляет общую информацию о сервере и подключенных клиентах, а `srvr` и `cons` предоставляют расширенные сведения о сервере и подключениях соответственно.
У 4-х буквенных команд есть параметр для настройки разрешенного списка `four_letter_word_white_list`, который имеет значение по умолчанию "conf,cons,crst,envi,ruok,srst,srvr,stat, wchc,wchs,dirs,mntr,isro".
У 4-х буквенных команд есть параметр для настройки разрешенного списка `four_letter_word_allow_list`, который имеет значение по умолчанию "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro".
Вы можете отправлять команды в ClickHouse Keeper через telnet или nc на порт для клиента.
@ -194,7 +194,7 @@ Server stats reset.
```
server_id=1
tcp_port=2181
four_letter_word_white_list=*
four_letter_word_allow_list=*
log_storage_path=./coordination/logs
snapshot_storage_path=./coordination/snapshots
max_requests_batch_size=100

View File

@ -3,62 +3,41 @@ toc_priority: 14
toc_title: 体验平台
---
# ClickHouse体验平台 {#clickhouse-playground}
# ClickHouse Playground {#clickhouse-playground}
!!! warning "Warning"
This service is deprecated and will be replaced in foreseeable future.
[ClickHouse Playground](https://play.clickhouse.com/play?user=play) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster.
Several example datasets are available in Playground.
[ClickHouse体验平台](https://play.clickhouse.com?file=welcome) 允许人们通过即时运行查询来尝试ClickHouse而无需设置他们的服务器或集群。
体验平台中提供几个示例数据集以及显示ClickHouse特性的示例查询。还有一些ClickHouse LTS版本可供尝试。
您可以使用任何HTTP客户端对ClickHouse体验平台进行查询例如[curl](https://curl.haxx.se)或者[wget](https://www.gnu.org/software/wget/),或使用[JDBC](../interfaces/jdbc.md)或者[ODBC](../interfaces/odbc.md)驱动连接。关于支持ClickHouse的软件产品的更多信息详见[here](../interfaces/index.md).
You can make queries to Playground using any HTTP client, for example [curl](https://curl.haxx.se) or [wget](https://www.gnu.org/software/wget/), or set up a connection using [JDBC](../interfaces/jdbc.md) or [ODBC](../interfaces/odbc.md) drivers. More information about software products that support ClickHouse is available [here](../interfaces/index.md).
## Credentials {#credentials}
| 参数 | 值 |
|:--------------------|:----------------------------------------|
| HTTPS端点 | `https://play-api.clickhouse.com:8443` |
| TCP端点 | `play-api.clickhouse.com:9440` |
| 用户 | `playground` |
| 密码 | `clickhouse` |
| Parameter | Value |
|:--------------------|:-----------------------------------|
| HTTPS endpoint | `https://play.clickhouse.com:443/` |
| Native TCP endpoint | `play.clickhouse.com:9440` |
| User | `explorer` or `play` |
| Password | (empty) |
还有一些带有特定ClickHouse版本的附加信息来试验它们之间的差异(端口和用户/密码与上面相同):
## Limitations {#limitations}
- 20.3 LTS: `play-api-v20-3.clickhouse.com`
- 19.14 LTS: `play-api-v19-14.clickhouse.com`
The queries are executed as a read-only user. It implies some limitations:
!!! note "注意"
所有这些端点都需要安全的TLS连接。
- DDL queries are not allowed
- INSERT queries are not allowed
## 查询限制 {#limitations}
The service also have quotas on its usage.
查询以只读用户身份执行。 这意味着一些局限性:
## Examples {#examples}
- 不允许DDL查询
- 不允许插入查询
还强制执行以下设置:
- [max_result_bytes=10485760](../operations/settings/query-complexity/#max-result-bytes)
- [max_result_rows=2000](../operations/settings/query-complexity/#setting-max_result_rows)
- [result_overflow_mode=break](../operations/settings/query-complexity/#result-overflow-mode)
- [max_execution_time=60000](../operations/settings/query-complexity/#max-execution-time)
ClickHouse体验还有如下
[ClickHouse管理服务](https://cloud.yandex.com/services/managed-clickhouse)
实例托管 [Yandex云](https://cloud.yandex.com/)。
更多信息 [云提供商](../commercial/cloud.md)。
## 示例 {#examples}
使用`curl`连接Https服务
HTTPS endpoint example with `curl`:
``` bash
curl "https://play-api.clickhouse.com:8443/?query=SELECT+'Play+ClickHouse\!';&user=playground&password=clickhouse&database=datasets"
curl "https://play.clickhouse.com/?user=explorer" --data-binary "SELECT 'Play ClickHouse'"
```
TCP连接示例[CLI](../interfaces/cli.md):
TCP endpoint example with [CLI](../interfaces/cli.md):
``` bash
clickhouse client --secure -h play-api.clickhouse.com --port 9440 -u playground --password clickhouse -q "SELECT 'Play ClickHouse\!'"
clickhouse client --secure --host play.clickhouse.com --user explorer
```

View File

@ -1240,7 +1240,8 @@ SELECT * FROM topic1_stream;
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` |
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `STRING` |
| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `STRING` |
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
@ -1295,7 +1296,8 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_
| `FLOAT`, `HALF_FLOAT` | [Float32](../sql-reference/data-types/float.md) | `FLOAT` |
| `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` |
| `DATE32` | [Date](../sql-reference/data-types/date.md) | `DATE32` |
| `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `TIMESTAMP` |
| `DATE64` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` |
| `TIMESTAMP` | [DateTime64](../sql-reference/data-types/datetime64.md) | `TIMESTAMP` |
| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` |
| `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` |
| `-` | [Array](../sql-reference/data-types/array.md) | `LIST` |

View File

@ -1,10 +1,5 @@
---
machine_translated: true
machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3
---
# system.numbers_mt {#system-numbers-mt}
# 系统。numbers_mt {#system-numbers-mt}
一样的 [系统。数字](../../operations/system-tables/numbers.md) 但读取是并行的。 这些数字可以以任何顺序返回。
与[system.numbers](../../operations/system-tables/numbers.md)相似,但读取是并行的。 这些数字可以以任何顺序返回。
用于测试。

View File

@ -31,7 +31,7 @@
- 对于dict_name分层字典查找child_id键是否位于ancestor_id或匹配ancestor_id。返回UInt8。
## 独裁主义 {#dictgethierarchy}
## dictGetHierarchy {#dictgethierarchy}
`dictGetHierarchy('dict_name', id)`

View File

@ -2,6 +2,7 @@
#include <Columns/ColumnObject.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnSparse.h>
#include <DataTypes/ObjectUtils.h>
#include <DataTypes/getLeastSupertype.h>
#include <DataTypes/DataTypeNothing.h>
@ -252,9 +253,32 @@ void ColumnObject::Subcolumn::insert(Field field)
insert(std::move(field), std::move(info));
}
void ColumnObject::Subcolumn::addNewColumnPart(DataTypePtr type)
{
auto serialization = type->getSerialization(ISerialization::Kind::SPARSE);
data.push_back(type->createColumn(*serialization));
least_common_type = LeastCommonType{std::move(type)};
}
static bool isConversionRequiredBetweenIntegers(const IDataType & lhs, const IDataType & rhs)
{
/// If both of types are signed/unsigned integers and size of left field type
/// is less than right type, we don't need to convert field,
/// because all integer fields are stored in Int64/UInt64.
WhichDataType which_lhs(lhs);
WhichDataType which_rhs(rhs);
bool is_native_int = which_lhs.isNativeInt() && which_rhs.isNativeInt();
bool is_native_uint = which_lhs.isNativeUInt() && which_rhs.isNativeUInt();
return (is_native_int || is_native_uint)
&& lhs.getSizeOfValueInMemory() <= rhs.getSizeOfValueInMemory();
}
void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
{
auto base_type = info.scalar_type;
auto base_type = std::move(info.scalar_type);
if (isNothing(base_type) && info.num_dimensions == 0)
{
@ -262,10 +286,10 @@ void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
return;
}
auto column_dim = getNumberOfDimensions(*least_common_type);
auto column_dim = least_common_type.getNumberOfDimensions();
auto value_dim = info.num_dimensions;
if (isNothing(least_common_type))
if (isNothing(least_common_type.get()))
column_dim = value_dim;
if (field.isNull())
@ -283,27 +307,26 @@ void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
if (!is_nullable && info.have_nulls)
field = applyVisitor(FieldVisitorReplaceNull(base_type->getDefault(), value_dim), std::move(field));
auto value_type = createArrayOfType(base_type, value_dim);
bool type_changed = false;
const auto & least_common_base_type = least_common_type.getBase();
if (data.empty())
{
data.push_back(value_type->createColumn());
least_common_type = value_type;
addNewColumnPart(createArrayOfType(std::move(base_type), value_dim));
}
else if (!least_common_type->equals(*value_type))
else if (!least_common_base_type->equals(*base_type) && !isNothing(base_type))
{
value_type = getLeastSupertype(DataTypes{value_type, least_common_type}, true);
type_changed = true;
if (!least_common_type->equals(*value_type))
if (!isConversionRequiredBetweenIntegers(*base_type, *least_common_base_type))
{
data.push_back(value_type->createColumn());
least_common_type = value_type;
base_type = getLeastSupertype(DataTypes{std::move(base_type), least_common_base_type}, true);
type_changed = true;
if (!least_common_base_type->equals(*base_type))
addNewColumnPart(createArrayOfType(std::move(base_type), value_dim));
}
}
if (type_changed || info.need_convert)
field = convertFieldToTypeOrThrow(field, *value_type);
field = convertFieldToTypeOrThrow(field, *least_common_type.get());
data.back()->insert(field);
}
@ -313,39 +336,47 @@ void ColumnObject::Subcolumn::insertRangeFrom(const Subcolumn & src, size_t star
assert(src.isFinalized());
const auto & src_column = src.data.back();
const auto & src_type = src.least_common_type;
const auto & src_type = src.least_common_type.get();
if (data.empty())
{
least_common_type = src_type;
data.push_back(src_type->createColumn());
addNewColumnPart(src.least_common_type.get());
data.back()->insertRangeFrom(*src_column, start, length);
}
else if (least_common_type->equals(*src_type))
else if (least_common_type.get()->equals(*src_type))
{
data.back()->insertRangeFrom(*src_column, start, length);
}
else
{
auto new_least_common_type = getLeastSupertype(DataTypes{least_common_type, src_type}, true);
auto new_least_common_type = getLeastSupertype(DataTypes{least_common_type.get(), src_type}, true);
auto casted_column = castColumn({src_column, src_type, ""}, new_least_common_type);
if (!least_common_type->equals(*new_least_common_type))
{
least_common_type = new_least_common_type;
data.push_back(least_common_type->createColumn());
}
if (!least_common_type.get()->equals(*new_least_common_type))
addNewColumnPart(std::move(new_least_common_type));
data.back()->insertRangeFrom(*casted_column, start, length);
}
}
bool ColumnObject::Subcolumn::isFinalized() const
{
return data.empty() ||
(data.size() == 1 && !data[0]->isSparse() && num_of_defaults_in_prefix == 0);
}
void ColumnObject::Subcolumn::finalize()
{
if (isFinalized() || data.empty())
if (isFinalized())
return;
const auto & to_type = least_common_type;
if (data.size() == 1 && num_of_defaults_in_prefix == 0)
{
data[0] = data[0]->convertToFullColumnIfSparse();
return;
}
const auto & to_type = least_common_type.get();
auto result_column = to_type->createColumn();
if (num_of_defaults_in_prefix)
@ -353,6 +384,7 @@ void ColumnObject::Subcolumn::finalize()
for (auto & part : data)
{
part = part->convertToFullColumnIfSparse();
auto from_type = getDataTypeByColumn(*part);
size_t part_size = part->size();
@ -446,7 +478,7 @@ ColumnObject::Subcolumn ColumnObject::Subcolumn::recreateWithDefaultValues(const
scalar_type = makeNullable(scalar_type);
Subcolumn new_subcolumn;
new_subcolumn.least_common_type = createArrayOfType(scalar_type, field_info.num_dimensions);
new_subcolumn.least_common_type = LeastCommonType{createArrayOfType(scalar_type, field_info.num_dimensions)};
new_subcolumn.is_nullable = is_nullable;
new_subcolumn.num_of_defaults_in_prefix = num_of_defaults_in_prefix;
new_subcolumn.data.reserve(data.size());
@ -476,6 +508,13 @@ const ColumnPtr & ColumnObject::Subcolumn::getFinalizedColumnPtr() const
return data[0];
}
ColumnObject::Subcolumn::LeastCommonType::LeastCommonType(DataTypePtr type_)
: type(std::move(type_))
, base_type(getBaseTypeOfArray(type))
, num_dimensions(DB::getNumberOfDimensions(*type))
{
}
ColumnObject::ColumnObject(bool is_nullable_)
: is_nullable(is_nullable_)
, num_rows(0)

View File

@ -66,8 +66,8 @@ public:
size_t byteSize() const;
size_t allocatedBytes() const;
bool isFinalized() const { return data.size() == 1 && num_of_defaults_in_prefix == 0; }
const DataTypePtr & getLeastCommonType() const { return least_common_type; }
bool isFinalized() const;
const DataTypePtr & getLeastCommonType() const { return least_common_type.get(); }
/// Checks the consistency of column's parts stored in @data.
void checkTypes() const;
@ -102,8 +102,26 @@ public:
friend class ColumnObject;
private:
class LeastCommonType
{
public:
LeastCommonType() = default;
explicit LeastCommonType(DataTypePtr type_);
const DataTypePtr & get() const { return type; }
const DataTypePtr & getBase() const { return base_type; }
size_t getNumberOfDimensions() const { return num_dimensions; }
private:
DataTypePtr type;
DataTypePtr base_type;
size_t num_dimensions = 0;
};
void addNewColumnPart(DataTypePtr type);
/// Current least common type of all values inserted to this subcolumn.
DataTypePtr least_common_type;
LeastCommonType least_common_type;
/// If true then common type type of subcolumn is Nullable
/// and default values are NULLs.

View File

@ -194,7 +194,7 @@ void FileSegment::write(const char * from, size_t size)
{
std::lock_guard segment_lock(mutex);
LOG_ERROR(log, "Failed to write to cache. File segment info: {}", getInfoForLog());
LOG_ERROR(log, "Failed to write to cache. File segment info: {}", getInfoForLogImpl(segment_lock));
download_state = State::PARTIALLY_DOWNLOADED_NO_CONTINUATION;
@ -405,7 +405,11 @@ void FileSegment::completeImpl(bool allow_non_strict_checking)
String FileSegment::getInfoForLog() const
{
std::lock_guard segment_lock(mutex);
return getInfoForLogImpl(segment_lock);
}
String FileSegment::getInfoForLogImpl(std::lock_guard<std::mutex> & segment_lock) const
{
WriteBufferFromOwnString info;
info << "File segment: " << range().toString() << ", ";
info << "state: " << download_state << ", ";

View File

@ -130,6 +130,7 @@ private:
static String getCallerIdImpl(bool allow_non_strict_checking = false);
void resetDownloaderImpl(std::lock_guard<std::mutex> & segment_lock);
size_t getDownloadedSize(std::lock_guard<std::mutex> & segment_lock) const;
String getInfoForLogImpl(std::lock_guard<std::mutex> & segment_lock) const;
const Range segment_range;

View File

@ -127,6 +127,7 @@ PoolWithFailover::Entry PoolWithFailover::get()
/// If we cannot connect to some replica due to pool overflow, than we will wait and connect.
PoolPtr * full_pool = nullptr;
std::map<std::string, std::tuple<std::string, int>> error_detail;
for (size_t try_no = 0; try_no < max_tries; ++try_no)
{
@ -160,6 +161,15 @@ PoolWithFailover::Entry PoolWithFailover::get()
}
app.logger().warning("Connection to " + pool->getDescription() + " failed: " + e.displayText());
//save all errors to error_detail
if (error_detail.contains(pool->getDescription()))
{
error_detail[pool->getDescription()] = {e.displayText(), e.code()};
}
else
{
error_detail.insert({pool->getDescription(), {e.displayText(), e.code()}});
}
continue;
}
@ -180,7 +190,14 @@ PoolWithFailover::Entry PoolWithFailover::get()
message << "Connections to all replicas failed: ";
for (auto it = replicas_by_priority.begin(); it != replicas_by_priority.end(); ++it)
for (auto jt = it->second.begin(); jt != it->second.end(); ++jt)
{
message << (it == replicas_by_priority.begin() && jt == it->second.begin() ? "" : ", ") << (*jt)->getDescription();
if (error_detail.contains((*jt)->getDescription()))
{
std::tuple<std::string, int> error_and_code = error_detail[(*jt)->getDescription()];
message << ", ERROR " << std::get<1>(error_and_code) << " : " << std::get<0>(error_and_code);
}
}
throw Poco::Exception(message.str());
}

View File

@ -37,7 +37,7 @@ void CoordinationSettings::loadFromConfig(const String & config_elem, const Poco
}
const String KeeperConfigurationAndSettings::DEFAULT_FOUR_LETTER_WORD_CMD = "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro";
const String KeeperConfigurationAndSettings::DEFAULT_FOUR_LETTER_WORD_CMD = "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro";
KeeperConfigurationAndSettings::KeeperConfigurationAndSettings()
: server_id(NOT_EXIST)
@ -82,8 +82,8 @@ void KeeperConfigurationAndSettings::dump(WriteBufferFromOwnString & buf) const
write_int(tcp_port_secure);
}
writeText("four_letter_word_white_list=", buf);
writeText(four_letter_word_white_list, buf);
writeText("four_letter_word_allow_list=", buf);
writeText(four_letter_word_allow_list, buf);
buf.write('\n');
writeText("log_storage_path=", buf);
@ -177,7 +177,11 @@ KeeperConfigurationAndSettings::loadFromConfig(const Poco::Util::AbstractConfigu
ret->super_digest = config.getString("keeper_server.superdigest");
}
ret->four_letter_word_white_list = config.getString("keeper_server.four_letter_word_white_list", DEFAULT_FOUR_LETTER_WORD_CMD);
ret->four_letter_word_allow_list = config.getString(
"keeper_server.four_letter_word_allow_list",
config.getString("keeper_server.four_letter_word_white_list",
DEFAULT_FOUR_LETTER_WORD_CMD));
ret->log_storage_path = getLogsPathFromConfig(config, standalone_keeper_);
ret->snapshot_storage_path = getSnapshotsPathFromConfig(config, standalone_keeper_);

View File

@ -68,7 +68,7 @@ struct KeeperConfigurationAndSettings
int tcp_port;
int tcp_port_secure;
String four_letter_word_white_list;
String four_letter_word_allow_list;
String super_digest;

View File

@ -129,7 +129,7 @@ void FourLetterCommandFactory::registerCommands(KeeperDispatcher & keeper_dispat
FourLetterCommandPtr watch_command = std::make_shared<WatchCommand>(keeper_dispatcher);
factory.registerCommand(watch_command);
factory.initializeWhiteList(keeper_dispatcher);
factory.initializeAllowList(keeper_dispatcher);
factory.setInitialize(true);
}
}
@ -137,17 +137,17 @@ void FourLetterCommandFactory::registerCommands(KeeperDispatcher & keeper_dispat
bool FourLetterCommandFactory::isEnabled(int32_t code)
{
checkInitialization();
if (!white_list.empty() && *white_list.cbegin() == WHITE_LIST_ALL)
if (!allow_list.empty() && *allow_list.cbegin() == ALLOW_LIST_ALL)
return true;
return std::find(white_list.begin(), white_list.end(), code) != white_list.end();
return std::find(allow_list.begin(), allow_list.end(), code) != allow_list.end();
}
void FourLetterCommandFactory::initializeWhiteList(KeeperDispatcher & keeper_dispatcher)
void FourLetterCommandFactory::initializeAllowList(KeeperDispatcher & keeper_dispatcher)
{
const auto & keeper_settings = keeper_dispatcher.getKeeperConfigurationAndSettings();
String list_str = keeper_settings->four_letter_word_white_list;
String list_str = keeper_settings->four_letter_word_allow_list;
Strings tokens;
splitInto<','>(tokens, list_str);
@ -157,15 +157,15 @@ void FourLetterCommandFactory::initializeWhiteList(KeeperDispatcher & keeper_dis
if (token == "*")
{
white_list.clear();
white_list.push_back(WHITE_LIST_ALL);
allow_list.clear();
allow_list.push_back(ALLOW_LIST_ALL);
return;
}
else
{
if (commands.contains(IFourLetterCommand::toCode(token)))
{
white_list.push_back(IFourLetterCommand::toCode(token));
allow_list.push_back(IFourLetterCommand::toCode(token));
}
else
{

View File

@ -40,10 +40,10 @@ struct FourLetterCommandFactory : private boost::noncopyable
{
public:
using Commands = std::unordered_map<int32_t, FourLetterCommandPtr>;
using WhiteList = std::vector<int32_t>;
using AllowList = std::vector<int32_t>;
///represent '*' which is used in white list
static constexpr int32_t WHITE_LIST_ALL = 0;
///represent '*' which is used in allow list
static constexpr int32_t ALLOW_LIST_ALL = 0;
bool isKnown(int32_t code);
bool isEnabled(int32_t code);
@ -52,7 +52,7 @@ public:
/// There is no need to make it thread safe, because registration is no initialization and get is after startup.
void registerCommand(FourLetterCommandPtr & command);
void initializeWhiteList(KeeperDispatcher & keeper_dispatcher);
void initializeAllowList(KeeperDispatcher & keeper_dispatcher);
void checkInitialization() const;
bool isInitialized() const { return initialized; }
@ -64,7 +64,7 @@ public:
private:
std::atomic<bool> initialized = false;
Commands commands;
WhiteList white_list;
AllowList allow_list;
};
/**Tests if server is running in a non-error state. The server will respond with imok if it is running.
@ -130,7 +130,7 @@ struct StatResetCommand : public IFourLetterCommand
};
/// A command that does not do anything except reply to client with predefined message.
///It is used to inform clients who execute none white listed four letter word commands.
///It is used to inform clients who execute none allow listed four letter word commands.
struct NopCommand : public IFourLetterCommand
{
explicit NopCommand(KeeperDispatcher & keeper_dispatcher_)

View File

@ -44,6 +44,7 @@ class IColumn;
M(UInt64, min_insert_block_size_bytes_for_materialized_views, 0, "Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)", 0) \
M(UInt64, max_joined_block_size_rows, DEFAULT_BLOCK_SIZE, "Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.", 0) \
M(UInt64, max_insert_threads, 0, "The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see 'max_threads' setting.", 0) \
M(UInt64, max_insert_delayed_streams_for_parallel_write, 0, "The maximum number of streams (columns) to delay final part flush. Default - auto (1000 in case of underlying storage supports parallel write, for example S3 and disabled otherwise)", 0) \
M(UInt64, max_final_threads, 16, "The maximum number of threads to read from table with FINAL.", 0) \
M(MaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
M(MaxThreads, max_download_threads, 4, "The maximum number of threads to download data (e.g. for URL engine).", 0) \
@ -138,7 +139,7 @@ class IColumn;
\
M(Bool, skip_unavailable_shards, false, "If true, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.", 0) \
\
M(UInt64, parallel_distributed_insert_select, 0, "Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard", 0) \
M(UInt64, parallel_distributed_insert_select, 0, "Process distributed INSERT SELECT query in the same cluster on local tables on every shard; if set to 1 - SELECT is executed on each shard; if set to 2 - SELECT and INSERT are executed on each shard", 0) \
M(UInt64, distributed_group_by_no_merge, 0, "If 1, Do not merge aggregation states from different servers for distributed queries (shards will process query up to the Complete stage, initiator just proxies the data from the shards). If 2 the initiator will apply ORDER BY and LIMIT stages (it is not in case when shard process query up to the Complete stage)", 0) \
M(UInt64, distributed_push_down_limit, 1, "If 1, LIMIT will be applied on each shard separatelly. Usually you don't need to use it, since this will be done automatically if it is possible, i.e. for simple query SELECT FROM LIMIT.", 0) \
M(Bool, optimize_distributed_group_by_sharding_key, true, "Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server).", 0) \
@ -555,7 +556,7 @@ class IColumn;
M(UInt64, remote_fs_read_max_backoff_ms, 10000, "Max wait time when trying to read data for remote disk", 0) \
M(UInt64, remote_fs_read_backoff_max_tries, 5, "Max attempts to read with backoff", 0) \
M(Bool, remote_fs_enable_cache, true, "Use cache for remote filesystem. This setting does not turn on/off cache for disks (must me done via disk config), but allows to bypass cache for some queries if intended", 0) \
M(UInt64, remote_fs_cache_max_wait_sec, 5, "Allow to wait a most this number of seconds for download of current remote_fs_buffer_size bytes, and skip cache if exceeded", 0) \
M(UInt64, remote_fs_cache_max_wait_sec, 5, "Allow to wait at most this number of seconds for download of current remote_fs_buffer_size bytes, and skip cache if exceeded", 0) \
\
M(UInt64, http_max_tries, 10, "Max attempts to read via http.", 0) \
M(UInt64, http_retry_initial_backoff_ms, 100, "Min milliseconds for backoff, when retrying read via http", 0) \

View File

@ -19,7 +19,6 @@ namespace ErrorCodes
DataTypeObject::DataTypeObject(const String & schema_format_, bool is_nullable_)
: schema_format(Poco::toLower(schema_format_))
, is_nullable(is_nullable_)
, default_serialization(getObjectSerialization(schema_format))
{
}
@ -32,7 +31,7 @@ bool DataTypeObject::equals(const IDataType & rhs) const
SerializationPtr DataTypeObject::doGetDefaultSerialization() const
{
return default_serialization;
return getObjectSerialization(schema_format);
}
String DataTypeObject::doGetName() const

View File

@ -18,7 +18,6 @@ class DataTypeObject : public IDataType
private:
String schema_format;
bool is_nullable;
SerializationPtr default_serialization;
public:
DataTypeObject(const String & schema_format_, bool is_nullable_);

View File

@ -35,7 +35,7 @@ class JSONDataParser
public:
using Element = typename ParserImpl::Element;
void readJSON(String & s, ReadBuffer & buf)
static void readJSON(String & s, ReadBuffer & buf)
{
readJSONObjectPossiblyInvalid(s, buf);
}

View File

@ -36,15 +36,15 @@ PathInData::PathInData(std::string_view path_)
}
PathInData::PathInData(const Parts & parts_)
: path(buildPath(parts_))
, parts(buildParts(path, parts_))
{
buildPath(parts_);
buildParts(parts_);
}
PathInData::PathInData(const PathInData & other)
: path(other.path)
, parts(buildParts(path, other.getParts()))
{
buildParts(other.getParts());
}
PathInData & PathInData::operator=(const PathInData & other)
@ -52,7 +52,7 @@ PathInData & PathInData::operator=(const PathInData & other)
if (this != &other)
{
path = other.path;
parts = buildParts(path, other.parts);
buildParts(other.parts);
}
return *this;
}
@ -79,8 +79,8 @@ void PathInData::writeBinary(WriteBuffer & out) const
for (const auto & part : parts)
{
writeStringBinary(part.key, out);
writeVarUInt(part.is_nested, out);
writeVarUInt(part.anonymous_array_level, out);
writeIntBinary(part.is_nested, out);
writeIntBinary(part.anonymous_array_level, out);
}
}
@ -99,48 +99,47 @@ void PathInData::readBinary(ReadBuffer & in)
UInt8 anonymous_array_level;
auto ref = readStringBinaryInto(arena, in);
readVarUInt(is_nested, in);
readVarUInt(anonymous_array_level, in);
readIntBinary(is_nested, in);
readIntBinary(anonymous_array_level, in);
temp_parts.emplace_back(static_cast<std::string_view>(ref), is_nested, anonymous_array_level);
}
/// Recreate path and parts.
path = buildPath(temp_parts);
parts = buildParts(path, temp_parts);
buildPath(temp_parts);
buildParts(temp_parts);
}
String PathInData::buildPath(const Parts & other_parts)
void PathInData::buildPath(const Parts & other_parts)
{
if (other_parts.empty())
return "";
return;
String res;
path.clear();
auto it = other_parts.begin();
res += it->key;
path += it->key;
++it;
for (; it != other_parts.end(); ++it)
{
res += ".";
res += it->key;
path += ".";
path += it->key;
}
return res;
}
PathInData::Parts PathInData::buildParts(const String & other_path, const Parts & other_parts)
void PathInData::buildParts(const Parts & other_parts)
{
if (other_parts.empty())
return {};
return;
Parts res;
const char * begin = other_path.data();
parts.clear();
parts.reserve(other_parts.size());
const char * begin = path.data();
for (const auto & part : other_parts)
{
res.emplace_back(std::string_view{begin, part.key.length()}, part.is_nested, part.anonymous_array_level);
has_nested |= part.is_nested;
parts.emplace_back(std::string_view{begin, part.key.length()}, part.is_nested, part.anonymous_array_level);
begin += part.key.length() + 1;
}
return res;
}
size_t PathInData::Hash::operator()(const PathInData & value) const

View File

@ -55,7 +55,7 @@ public:
const Parts & getParts() const { return parts; }
bool isNested(size_t i) const { return parts[i].is_nested; }
bool hasNested() const { return std::any_of(parts.begin(), parts.end(), [](const auto & part) { return part.is_nested; }); }
bool hasNested() const { return has_nested; }
void writeBinary(WriteBuffer & out) const;
void readBinary(ReadBuffer & in);
@ -65,16 +65,20 @@ public:
private:
/// Creates full path from parts.
static String buildPath(const Parts & other_parts);
void buildPath(const Parts & other_parts);
/// Creates new parts full from full path with correct string pointers.
static Parts buildParts(const String & other_path, const Parts & other_parts);
void buildParts(const Parts & other_parts);
/// The full path. Parts are separated by dots.
String path;
/// Parts of the path. All string_view-s in parts must point to the @path.
Parts parts;
/// True if at least one part is nested.
/// Cached to avoid linear complexity at 'hasNested'.
bool has_nested = false;
};
class PathInDataBuilder

View File

@ -68,7 +68,7 @@ using Node = typename ColumnObject::SubcolumnsTree::Node;
/// Finds a subcolumn from the same Nested type as @entry and inserts
/// an array with default values with consistent sizes as in Nested type.
bool tryInsertDefaultFromNested(
std::shared_ptr<Node> entry, const ColumnObject::SubcolumnsTree & subcolumns)
const std::shared_ptr<Node> & entry, const ColumnObject::SubcolumnsTree & subcolumns)
{
if (!entry->path.hasNested())
return false;
@ -134,8 +134,13 @@ void SerializationObject<Parser>::deserializeTextImpl(IColumn & column, Reader &
String buf;
reader(buf);
std::optional<ParseResult> result;
{
auto parser = parsers_pool.get([] { return new Parser; });
result = parser->parse(buf.data(), buf.size());
}
auto result = parser.parse(buf.data(), buf.size());
if (!result)
throw Exception(ErrorCodes::INCORRECT_DATA, "Cannot parse object");
@ -205,7 +210,7 @@ void SerializationObject<Parser>::deserializeTextQuoted(IColumn & column, ReadBu
template <typename Parser>
void SerializationObject<Parser>::deserializeTextJSON(IColumn & column, ReadBuffer & istr, const FormatSettings &) const
{
deserializeTextImpl(column, [&](String & s) { parser.readJSON(s, istr); });
deserializeTextImpl(column, [&](String & s) { Parser::readJSON(s, istr); });
}
template <typename Parser>

View File

@ -1,6 +1,7 @@
#pragma once
#include <DataTypes/Serializations/SimpleTextSerialization.h>
#include <Common/ObjectPool.h>
namespace DB
{
@ -65,7 +66,8 @@ private:
void serializeTextImpl(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const;
mutable Parser parser;
/// Pool of parser objects to make SerializationObject thread safe.
mutable SimpleObjectPool<Parser> parsers_pool;
};
SerializationPtr getObjectSerialization(const String & schema_format);

View File

@ -3,7 +3,7 @@
#include <DataTypes/Serializations/PathInData.h>
#include <DataTypes/IDataType.h>
#include <Columns/IColumn.h>
#include <unordered_map>
#include <Common/HashTable/HashMap.h>
namespace DB
{
@ -31,7 +31,8 @@ public:
Kind kind = TUPLE;
const Node * parent = nullptr;
std::map<String, std::shared_ptr<Node>, std::less<>> children;
Arena strings_pool;
HashMapWithStackMemory<StringRef, std::shared_ptr<Node>, StringRefHash, 4> children;
NodeData data;
PathInData path;
@ -39,10 +40,11 @@ public:
bool isNested() const { return kind == NESTED; }
bool isScalar() const { return kind == SCALAR; }
void addChild(const String & key, std::shared_ptr<Node> next_node)
void addChild(std::string_view key, std::shared_ptr<Node> next_node)
{
next_node->parent = this;
children[key] = std::move(next_node);
StringRef key_ref{strings_pool.insert(key.data(), key.length()), key.length()};
children[key_ref] = std::move(next_node);
}
};
@ -83,10 +85,10 @@ public:
{
assert(current_node->kind != Node::SCALAR);
auto it = current_node->children.find(parts[i].key);
auto it = current_node->children.find(StringRef{parts[i].key});
if (it != current_node->children.end())
{
current_node = it->second.get();
current_node = it->getMapped().get();
node_creator(current_node->kind, true);
if (current_node->isNested() != parts[i].is_nested)
@ -101,7 +103,7 @@ public:
}
}
auto it = current_node->children.find(parts.back().key);
auto it = current_node->children.find(StringRef{parts.back().key});
if (it != current_node->children.end())
return false;
@ -192,11 +194,11 @@ private:
for (const auto & part : parts)
{
auto it = current_node->children.find(part.key);
auto it = current_node->children.find(StringRef{part.key});
if (it == current_node->children.end())
return find_exact ? nullptr : current_node;
current_node = it->second.get();
current_node = it->getMapped().get();
}
return current_node;

View File

@ -8,6 +8,7 @@
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypesDecimal.h>
#include <DataTypes/DataTypeUUID.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime64.h>
#include <boost/algorithm/string/split.hpp>
@ -97,6 +98,8 @@ static DataTypePtr convertPostgreSQLDataType(String & type, Fn<void()> auto && r
res = std::make_shared<DataTypeDateTime64>(6);
else if (type == "date")
res = std::make_shared<DataTypeDate>();
else if (type == "uuid")
res = std::make_shared<DataTypeUUID>();
else if (type.starts_with("numeric"))
{
/// Numeric and decimal will both end up here as numeric. If it has type and precision,

View File

@ -1,6 +1,7 @@
#include "CassandraDictionarySource.h"
#include "DictionarySourceFactory.h"
#include "DictionaryStructure.h"
#include <Interpreters/Context.h>
namespace DB
{
@ -17,13 +18,17 @@ void registerDictionarySourceCassandra(DictionarySourceFactory & factory)
[[maybe_unused]] const Poco::Util::AbstractConfiguration & config,
[[maybe_unused]] const std::string & config_prefix,
[[maybe_unused]] Block & sample_block,
ContextPtr /* global_context */,
[[maybe_unused]] ContextPtr global_context,
const std::string & /* default_database */,
bool /*created_from_ddl*/) -> DictionarySourcePtr
{
#if USE_CASSANDRA
setupCassandraDriverLibraryLogging(CASS_LOG_INFO);
return std::make_unique<CassandraDictionarySource>(dict_struct, config, config_prefix + ".cassandra", sample_block);
auto source_config_prefix = config_prefix + ".cassandra";
global_context->getRemoteHostFilter().checkHostAndPort(config.getString(source_config_prefix + ".host"), toString(config.getUInt(source_config_prefix + ".port", 0)));
return std::make_unique<CassandraDictionarySource>(dict_struct, config, source_config_prefix, sample_block);
#else
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED,
"Dictionary source of type `cassandra` is disabled because ClickHouse was built without cassandra support.");

View File

@ -8,6 +8,7 @@
#include <Poco/Redis/Command.h>
#include <Poco/Redis/Type.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Interpreters/Context.h>
#include <IO/WriteHelpers.h>
@ -40,15 +41,20 @@ namespace DB
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
Block & sample_block,
ContextPtr /* global_context */,
ContextPtr global_context,
const std::string & /* default_database */,
bool /* created_from_ddl */) -> DictionarySourcePtr {
auto redis_config_prefix = config_prefix + ".redis";
auto host = config.getString(redis_config_prefix + ".host");
auto port = config.getUInt(redis_config_prefix + ".port");
global_context->getRemoteHostFilter().checkHostAndPort(host, toString(port));
RedisDictionarySource::Configuration configuration =
{
.host = config.getString(redis_config_prefix + ".host"),
.port = static_cast<UInt16>(config.getUInt(redis_config_prefix + ".port")),
.host = host,
.port = static_cast<UInt16>(port),
.db_index = config.getUInt(redis_config_prefix + ".db_index", 0),
.password = config.getString(redis_config_prefix + ".password", ""),
.storage_type = parseStorageType(config.getString(redis_config_prefix + ".storage_type", "")),

View File

@ -248,6 +248,10 @@ public:
/// Overrode in remote fs disks.
virtual bool supportZeroCopyReplication() const = 0;
/// Whether this disk support parallel write
/// Overrode in remote fs disks.
virtual bool supportParallelWrite() const { return false; }
virtual bool isReadOnly() const { return false; }
/// Check if disk is broken. Broken disks will have 0 space and not be used.

View File

@ -4,7 +4,6 @@
#include <IO/ReadBufferFromFile.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromFile.h>
#include <IO/WriteBufferFromS3.h>
#include <IO/WriteHelpers.h>
#include <Common/createHardLink.h>
#include <Common/quoteString.h>

View File

@ -105,6 +105,8 @@ public:
bool supportZeroCopyReplication() const override { return true; }
bool supportParallelWrite() const override { return true; }
void shutdown() override;
void startup() override;

View File

@ -24,7 +24,9 @@ ProtobufSchemas & ProtobufSchemas::instance()
class ProtobufSchemas::ImporterWithSourceTree : public google::protobuf::compiler::MultiFileErrorCollector
{
public:
explicit ImporterWithSourceTree(const String & schema_directory) : importer(&disk_source_tree, this)
explicit ImporterWithSourceTree(const String & schema_directory, WithEnvelope with_envelope_)
: importer(&disk_source_tree, this)
, with_envelope(with_envelope_)
{
disk_source_tree.MapPath("", schema_directory);
}
@ -39,16 +41,33 @@ public:
return descriptor;
const auto * file_descriptor = importer.Import(schema_path);
// If there are parsing errors AddError() throws an exception and in this case the following line
// If there are parsing errors, AddError() throws an exception and in this case the following line
// isn't executed.
assert(file_descriptor);
descriptor = file_descriptor->FindMessageTypeByName(message_name);
if (!descriptor)
throw Exception(
"Not found a message named '" + message_name + "' in the schema file '" + schema_path + "'", ErrorCodes::BAD_ARGUMENTS);
if (with_envelope == WithEnvelope::No)
{
const auto * message_descriptor = file_descriptor->FindMessageTypeByName(message_name);
if (!message_descriptor)
throw Exception(
"Could not find a message named '" + message_name + "' in the schema file '" + schema_path + "'", ErrorCodes::BAD_ARGUMENTS);
return descriptor;
return message_descriptor;
}
else
{
const auto * envelope_descriptor = file_descriptor->FindMessageTypeByName("Envelope");
if (!envelope_descriptor)
throw Exception(
"Could not find a message named 'Envelope' in the schema file '" + schema_path + "'", ErrorCodes::BAD_ARGUMENTS);
const auto * message_descriptor = envelope_descriptor->FindNestedTypeByName(message_name); // silly protobuf API disallows a restricting the field type to messages
if (!message_descriptor)
throw Exception(
"Could not find a message named '" + message_name + "' in the schema file '" + schema_path + "'", ErrorCodes::BAD_ARGUMENTS);
return message_descriptor;
}
}
private:
@ -63,18 +82,16 @@ private:
google::protobuf::compiler::DiskSourceTree disk_source_tree;
google::protobuf::compiler::Importer importer;
const WithEnvelope with_envelope;
};
ProtobufSchemas::ProtobufSchemas() = default;
ProtobufSchemas::~ProtobufSchemas() = default;
const google::protobuf::Descriptor * ProtobufSchemas::getMessageTypeForFormatSchema(const FormatSchemaInfo & info)
const google::protobuf::Descriptor * ProtobufSchemas::getMessageTypeForFormatSchema(const FormatSchemaInfo & info, WithEnvelope with_envelope)
{
std::lock_guard lock(mutex);
auto it = importers.find(info.schemaDirectory());
if (it == importers.end())
it = importers.emplace(info.schemaDirectory(), std::make_unique<ImporterWithSourceTree>(info.schemaDirectory())).first;
it = importers.emplace(info.schemaDirectory(), std::make_unique<ImporterWithSourceTree>(info.schemaDirectory(), with_envelope)).first;
auto * importer = it->second.get();
return importer->import(info.schemaPath(), info.messageName());
}

View File

@ -28,14 +28,36 @@ class FormatSchemaInfo;
class ProtobufSchemas : private boost::noncopyable
{
public:
static ProtobufSchemas & instance();
enum class WithEnvelope
{
// Return descriptor for a top-level message with a user-provided name.
// Example: In protobuf schema
// message MessageType {
// string colA = 1;
// int32 colB = 2;
// }
// message_name = "MessageType" returns a descriptor. Used by IO
// formats Protobuf and ProtobufSingle.
No,
// Return descriptor for a message with a user-provided name one level
// below a top-level message with the hardcoded name "Envelope".
// Example: In protobuf schema
// message Envelope {
// message MessageType {
// string colA = 1;
// int32 colB = 2;
// }
// }
// message_name = "MessageType" returns a descriptor. Used by IO format
// ProtobufList.
Yes
};
ProtobufSchemas();
~ProtobufSchemas();
static ProtobufSchemas & instance();
/// Parses the format schema, then parses the corresponding proto file, and returns the descriptor of the message type.
/// The function never returns nullptr, it throws an exception if it cannot load or parse the file.
const google::protobuf::Descriptor * getMessageTypeForFormatSchema(const FormatSchemaInfo & info);
const google::protobuf::Descriptor * getMessageTypeForFormatSchema(const FormatSchemaInfo & info, WithEnvelope with_envelope);
private:
class ImporterWithSourceTree;

View File

@ -2171,6 +2171,11 @@ namespace
field_index_by_field_tag.emplace(field_infos[i].field_tag, i);
}
void setHasEnvelopeAsParent()
{
has_envelope_as_parent = true;
}
void setColumns(const ColumnPtr * columns_, size_t num_columns_) override
{
if (!num_columns_)
@ -2217,7 +2222,7 @@ namespace
void writeRow(size_t row_num) override
{
if (parent_field_descriptor)
if (parent_field_descriptor || has_envelope_as_parent)
writer->startNestedMessage();
else
writer->startMessage();
@ -2236,13 +2241,17 @@ namespace
bool is_group = (parent_field_descriptor->type() == FieldTypeId::TYPE_GROUP);
writer->endNestedMessage(parent_field_descriptor->number(), is_group, should_skip_if_empty);
}
else if (has_envelope_as_parent)
{
writer->endNestedMessage(1, false, should_skip_if_empty);
}
else
writer->endMessage(with_length_delimiter);
}
void readRow(size_t row_num) override
{
if (parent_field_descriptor)
if (parent_field_descriptor || has_envelope_as_parent)
reader->startNestedMessage();
else
reader->startMessage(with_length_delimiter);
@ -2285,7 +2294,7 @@ namespace
}
}
if (parent_field_descriptor)
if (parent_field_descriptor || has_envelope_as_parent)
reader->endNestedMessage();
else
reader->endMessage(false);
@ -2375,6 +2384,7 @@ namespace
};
const FieldDescriptor * const parent_field_descriptor;
bool has_envelope_as_parent = false;
const bool with_length_delimiter;
const std::unique_ptr<RowInputMissingColumnsFiller> missing_columns_filler;
const bool should_skip_if_empty;
@ -2388,6 +2398,86 @@ namespace
size_t last_field_index = static_cast<size_t>(-1);
};
/// Serializes a top-level envelope message in the protobuf schema.
/// "Envelope" means that the contained subtree of serializers is enclosed in a message just once,
/// i.e. only when the first and the last row read/write trigger a read/write of the msg header.
class ProtobufSerializerEnvelope : public ProtobufSerializer
{
public:
ProtobufSerializerEnvelope(
std::unique_ptr<ProtobufSerializerMessage>&& serializer_,
const ProtobufReaderOrWriter & reader_or_writer_)
: serializer(std::move(serializer_))
, reader(reader_or_writer_.reader)
, writer(reader_or_writer_.writer)
{
// The inner serializer has a backreference of type protobuf::FieldDescriptor * to it's parent
// serializer. If it is unset, it considers itself the top-level message, otherwise a nested
// message and accordingly it makes start/endMessage() vs. startEndNestedMessage() calls into
// Protobuf(Writer|Reader). There is no field descriptor because Envelopes merely forward calls
// but don't contain data to be serialized. We must still force the inner serializer to act
// as nested message.
serializer->setHasEnvelopeAsParent();
}
void setColumns(const ColumnPtr * columns_, size_t num_columns_) override
{
serializer->setColumns(columns_, num_columns_);
}
void setColumns(const MutableColumnPtr * columns_, size_t num_columns_) override
{
serializer->setColumns(columns_, num_columns_);
}
void writeRow(size_t row_num) override
{
if (first_call_of_write_row)
{
writer->startMessage();
first_call_of_write_row = false;
}
serializer->writeRow(row_num);
}
void finalizeWrite() override
{
writer->endMessage(/*with_length_delimiter = */ true);
}
void readRow(size_t row_num) override
{
if (first_call_of_read_row)
{
reader->startMessage(/*with_length_delimiter = */ true);
first_call_of_read_row = false;
}
int field_tag;
[[maybe_unused]] bool ret = reader->readFieldNumber(field_tag);
assert(ret);
serializer->readRow(row_num);
}
void insertDefaults(size_t row_num) override
{
serializer->insertDefaults(row_num);
}
void describeTree(WriteBuffer & out, size_t indent) const override
{
writeIndent(out, indent) << "ProtobufSerializerEnvelope ->\n";
serializer->describeTree(out, indent + 1);
}
std::unique_ptr<ProtobufSerializerMessage> serializer;
ProtobufReader * const reader;
ProtobufWriter * const writer;
bool first_call_of_write_row = true;
bool first_call_of_read_row = true;
};
/// Serializes a tuple with explicit names as a nested message.
class ProtobufSerializerTupleAsNestedMessage : public ProtobufSerializer
@ -2610,7 +2700,8 @@ namespace
const DataTypes & data_types,
std::vector<size_t> & missing_column_indices,
const MessageDescriptor & message_descriptor,
bool with_length_delimiter)
bool with_length_delimiter,
bool with_envelope)
{
root_serializer_ptr = std::make_shared<ProtobufSerializer *>();
get_root_desc_function = [root_serializer_ptr = root_serializer_ptr](size_t indent) -> String
@ -2648,13 +2739,23 @@ namespace
boost::range::set_difference(collections::range(column_names.size()), used_column_indices_sorted,
std::back_inserter(missing_column_indices));
*root_serializer_ptr = message_serializer.get();
if (!with_envelope)
{
*root_serializer_ptr = message_serializer.get();
#if 0
LOG_INFO(&Poco::Logger::get("ProtobufSerializer"), "Serialization tree:\n{}", get_root_desc_function(0));
LOG_INFO(&Poco::Logger::get("ProtobufSerializer"), "Serialization tree:\n{}", get_root_desc_function(0));
#endif
return message_serializer;
return message_serializer;
}
else
{
auto envelope_serializer = std::make_unique<ProtobufSerializerEnvelope>(std::move(message_serializer), reader_or_writer);
*root_serializer_ptr = envelope_serializer.get();
#if 0
LOG_INFO(&Poco::Logger::get("ProtobufSerializer"), "Serialization tree:\n{}", get_root_desc_function(0));
#endif
return envelope_serializer;
}
}
private:
@ -3337,9 +3438,10 @@ std::unique_ptr<ProtobufSerializer> ProtobufSerializer::create(
std::vector<size_t> & missing_column_indices,
const google::protobuf::Descriptor & message_descriptor,
bool with_length_delimiter,
bool with_envelope,
ProtobufReader & reader)
{
return ProtobufSerializerBuilder(reader).buildMessageSerializer(column_names, data_types, missing_column_indices, message_descriptor, with_length_delimiter);
return ProtobufSerializerBuilder(reader).buildMessageSerializer(column_names, data_types, missing_column_indices, message_descriptor, with_length_delimiter, with_envelope);
}
std::unique_ptr<ProtobufSerializer> ProtobufSerializer::create(
@ -3347,10 +3449,11 @@ std::unique_ptr<ProtobufSerializer> ProtobufSerializer::create(
const DataTypes & data_types,
const google::protobuf::Descriptor & message_descriptor,
bool with_length_delimiter,
bool with_envelope,
ProtobufWriter & writer)
{
std::vector<size_t> missing_column_indices;
return ProtobufSerializerBuilder(writer).buildMessageSerializer(column_names, data_types, missing_column_indices, message_descriptor, with_length_delimiter);
return ProtobufSerializerBuilder(writer).buildMessageSerializer(column_names, data_types, missing_column_indices, message_descriptor, with_length_delimiter, with_envelope);
}
NamesAndTypesList protobufSchemaToCHSchema(const google::protobuf::Descriptor * message_descriptor)

View File

@ -26,6 +26,7 @@ public:
virtual void setColumns(const ColumnPtr * columns, size_t num_columns) = 0;
virtual void writeRow(size_t row_num) = 0;
virtual void finalizeWrite() {}
virtual void setColumns(const MutableColumnPtr * columns, size_t num_columns) = 0;
virtual void readRow(size_t row_num) = 0;
@ -39,6 +40,7 @@ public:
std::vector<size_t> & missing_column_indices,
const google::protobuf::Descriptor & message_descriptor,
bool with_length_delimiter,
bool with_envelope,
ProtobufReader & reader);
static std::unique_ptr<ProtobufSerializer> create(
@ -46,6 +48,7 @@ public:
const DataTypes & data_types,
const google::protobuf::Descriptor & message_descriptor,
bool with_length_delimiter,
bool with_envelope,
ProtobufWriter & writer);
};

View File

@ -36,6 +36,8 @@ void registerInputFormatJSONCompactEachRow(FormatFactory & factory);
void registerOutputFormatJSONCompactEachRow(FormatFactory & factory);
void registerInputFormatProtobuf(FormatFactory & factory);
void registerOutputFormatProtobuf(FormatFactory & factory);
void registerInputFormatProtobufList(FormatFactory & factory);
void registerOutputFormatProtobufList(FormatFactory & factory);
void registerInputFormatTemplate(FormatFactory & factory);
void registerOutputFormatTemplate(FormatFactory & factory);
void registerInputFormatMsgPack(FormatFactory & factory);
@ -98,6 +100,7 @@ void registerNativeSchemaReader(FormatFactory & factory);
void registerRowBinaryWithNamesAndTypesSchemaReader(FormatFactory & factory);
void registerAvroSchemaReader(FormatFactory & factory);
void registerProtobufSchemaReader(FormatFactory & factory);
void registerProtobufListSchemaReader(FormatFactory & factory);
void registerLineAsStringSchemaReader(FormatFactory & factory);
void registerJSONAsStringSchemaReader(FormatFactory & factory);
void registerRawBLOBSchemaReader(FormatFactory & factory);
@ -140,6 +143,8 @@ void registerFormats()
registerInputFormatJSONCompactEachRow(factory);
registerOutputFormatJSONCompactEachRow(factory);
registerInputFormatProtobuf(factory);
registerOutputFormatProtobufList(factory);
registerInputFormatProtobufList(factory);
registerOutputFormatProtobuf(factory);
registerInputFormatTemplate(factory);
registerOutputFormatTemplate(factory);
@ -199,6 +204,7 @@ void registerFormats()
registerRowBinaryWithNamesAndTypesSchemaReader(factory);
registerAvroSchemaReader(factory);
registerProtobufSchemaReader(factory);
registerProtobufListSchemaReader(factory);
registerLineAsStringSchemaReader(factory);
registerJSONAsStringSchemaReader(factory);
registerRawBLOBSchemaReader(factory);

View File

@ -1,4 +1,5 @@
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnNullable.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
@ -12,13 +13,19 @@ namespace ErrorCodes
extern const int ILLEGAL_COLUMN;
}
enum class ArrayFirstLastStrategy
enum class ArrayFirstLastStrategy : uint8_t
{
First,
Last
};
template <ArrayFirstLastStrategy strategy>
enum class ArrayFirstLastElementNotExistsStrategy : uint8_t
{
Default,
Null
};
template <ArrayFirstLastStrategy strategy, ArrayFirstLastElementNotExistsStrategy element_not_exists_strategy>
struct ArrayFirstLastImpl
{
using column_type = ColumnArray;
@ -30,6 +37,9 @@ struct ArrayFirstLastImpl
static DataTypePtr getReturnType(const DataTypePtr & /*expression_return*/, const DataTypePtr & array_element)
{
if constexpr (element_not_exists_strategy == ArrayFirstLastElementNotExistsStrategy::Null)
return makeNullable(array_element);
return array_element;
}
@ -52,6 +62,16 @@ struct ArrayFirstLastImpl
out->reserve(data.size());
size_t offsets_size = offsets.size();
ColumnUInt8::MutablePtr col_null_map_to;
ColumnUInt8::Container * vec_null_map_to = nullptr;
if constexpr (element_not_exists_strategy == ArrayFirstLastElementNotExistsStrategy::Null)
{
col_null_map_to = ColumnUInt8::create(offsets_size, false);
vec_null_map_to = &col_null_map_to->getData();
}
for (size_t offset_index = 0; offset_index < offsets_size; ++offset_index)
{
size_t start_offset = offsets[offset_index - 1];
@ -67,16 +87,29 @@ struct ArrayFirstLastImpl
else
{
out->insertDefault();
if constexpr (element_not_exists_strategy == ArrayFirstLastElementNotExistsStrategy::Null)
(*vec_null_map_to)[offset_index] = true;
}
}
if constexpr (element_not_exists_strategy == ArrayFirstLastElementNotExistsStrategy::Null)
return ColumnNullable::create(std::move(out), std::move(col_null_map_to));
return out;
}
else
{
auto out = array.getData().cloneEmpty();
out->insertDefault();
return out->replicate(IColumn::Offsets(1, array.size()));
out->insertManyDefaults(array.size());
if constexpr (element_not_exists_strategy == ArrayFirstLastElementNotExistsStrategy::Null)
{
auto col_null_map_to = ColumnUInt8::create(out->size(), true);
return ColumnNullable::create(std::move(out), std::move(col_null_map_to));
}
return out;
}
}
@ -87,6 +120,16 @@ struct ArrayFirstLastImpl
out->reserve(data.size());
size_t offsets_size = offsets.size();
ColumnUInt8::MutablePtr col_null_map_to;
ColumnUInt8::Container * vec_null_map_to = nullptr;
if constexpr (element_not_exists_strategy == ArrayFirstLastElementNotExistsStrategy::Null)
{
col_null_map_to = ColumnUInt8::create(offsets_size, false);
vec_null_map_to = &col_null_map_to->getData();
}
for (size_t offset_index = 0; offset_index < offsets_size; ++offset_index)
{
size_t start_offset = offsets[offset_index - 1];
@ -120,25 +163,43 @@ struct ArrayFirstLastImpl
}
if (!exists)
{
out->insertDefault();
if constexpr (element_not_exists_strategy == ArrayFirstLastElementNotExistsStrategy::Null)
(*vec_null_map_to)[offset_index] = true;
}
}
if constexpr (element_not_exists_strategy == ArrayFirstLastElementNotExistsStrategy::Null)
return ColumnNullable::create(std::move(out), std::move(col_null_map_to));
return out;
}
};
struct NameArrayFirst { static constexpr auto name = "arrayFirst"; };
using ArrayFirstImpl = ArrayFirstLastImpl<ArrayFirstLastStrategy::First>;
using ArrayFirstImpl = ArrayFirstLastImpl<ArrayFirstLastStrategy::First, ArrayFirstLastElementNotExistsStrategy::Default>;
using FunctionArrayFirst = FunctionArrayMapped<ArrayFirstImpl, NameArrayFirst>;
struct NameArrayFirstOrNull { static constexpr auto name = "arrayFirstOrNull"; };
using ArrayFirstOrNullImpl = ArrayFirstLastImpl<ArrayFirstLastStrategy::First, ArrayFirstLastElementNotExistsStrategy::Null>;
using FunctionArrayFirstOrNull = FunctionArrayMapped<ArrayFirstOrNullImpl, NameArrayFirstOrNull>;
struct NameArrayLast { static constexpr auto name = "arrayLast"; };
using ArrayLastImpl = ArrayFirstLastImpl<ArrayFirstLastStrategy::Last>;
using ArrayLastImpl = ArrayFirstLastImpl<ArrayFirstLastStrategy::Last, ArrayFirstLastElementNotExistsStrategy::Default>;
using FunctionArrayLast = FunctionArrayMapped<ArrayLastImpl, NameArrayLast>;
struct NameArrayLastOrNull { static constexpr auto name = "arrayLastOrNull"; };
using ArrayLastOrNullImpl = ArrayFirstLastImpl<ArrayFirstLastStrategy::Last, ArrayFirstLastElementNotExistsStrategy::Null>;
using FunctionArrayLastOrNull = FunctionArrayMapped<ArrayLastOrNullImpl, NameArrayLastOrNull>;
void registerFunctionArrayFirst(FunctionFactory & factory)
{
factory.registerFunction<FunctionArrayFirst>();
factory.registerFunction<FunctionArrayFirstOrNull>();
factory.registerFunction<FunctionArrayLast>();
factory.registerFunction<FunctionArrayLastOrNull>();
}
}

View File

@ -0,0 +1,118 @@
#include "config_functions.h"
#if USE_H3
#include <Columns/ColumnArray.h>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <Common/typeid_cast.h>
#include <constants.h>
#include <h3api.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
extern const int ARGUMENT_OUT_OF_BOUND;
}
namespace
{
class FunctionH3GetPentagonIndexes : public IFunction
{
public:
static constexpr auto name = "h3GetPentagonIndexes";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionH3GetPentagonIndexes>(); }
std::string getName() const override { return name; }
size_t getNumberOfArguments() const override { return 1; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
const auto * arg = arguments[0].get();
if (!WhichDataType(arg).isUInt8())
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument {} of function {}. Must be UInt8",
arg->getName(), 1, getName());
return std::make_shared<DataTypeArray>(std::make_shared<DataTypeUInt64>());
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
auto non_const_arguments = arguments;
for (auto & argument : non_const_arguments)
argument.column = argument.column->convertToFullColumnIfConst();
const auto * column = checkAndGetColumn<ColumnUInt8>(non_const_arguments[0].column.get());
if (!column)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be UInt8.",
arguments[0].type->getName(),
1,
getName());
const auto & data = column->getData();
auto result_column_data = ColumnUInt64::create();
auto & result_data = result_column_data->getData();
auto result_column_offsets = ColumnArray::ColumnOffsets::create();
auto & result_offsets = result_column_offsets->getData();
result_offsets.resize(input_rows_count);
auto current_offset = 0;
std::vector<H3Index> hindex_vec;
result_data.reserve(input_rows_count);
for (size_t row = 0; row < input_rows_count; ++row)
{
if (data[row] > MAX_H3_RES)
throw Exception(
ErrorCodes::ARGUMENT_OUT_OF_BOUND,
"The argument 'resolution' ({}) of function {} is out of bounds because the maximum resolution in H3 library is ",
toString(data[row]),
getName(),
MAX_H3_RES);
const auto vec_size = pentagonCount();
hindex_vec.resize(vec_size);
getPentagons(data[row], hindex_vec.data());
for (auto & i : hindex_vec)
{
++current_offset;
result_data.emplace_back(i);
}
result_offsets[row] = current_offset;
hindex_vec.clear();
}
return ColumnArray::create(std::move(result_column_data), std::move(result_column_offsets));
}
};
}
void registerFunctionH3GetPentagonIndexes(FunctionFactory & factory)
{
factory.registerFunction<FunctionH3GetPentagonIndexes>();
}
}
#endif

View File

@ -0,0 +1,72 @@
#include "config_functions.h"
#if USE_H3
#include <Columns/ColumnArray.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnConst.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <IO/WriteHelpers.h>
#include <Common/typeid_cast.h>
#include <base/range.h>
#include <h3api.h>
namespace DB
{
namespace
{
class FunctionH3GetRes0Indexes final : public IFunction
{
public:
static constexpr auto name = "h3GetRes0Indexes";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionH3GetRes0Indexes>(); }
std::string getName() const override { return name; }
size_t getNumberOfArguments() const override { return 0; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override
{
return std::make_shared<DataTypeArray>(std::make_shared<DataTypeUInt64>());
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr & result_type, size_t input_rows_count) const override
{
if (input_rows_count == 0)
return result_type->createColumn();
std::vector<H3Index> res0_indexes;
const auto cell_count = res0CellCount();
res0_indexes.resize(cell_count);
getRes0Cells(res0_indexes.data());
auto res = ColumnArray::create(ColumnUInt64::create());
Array res_indexes;
res_indexes.insert(res_indexes.end(), res0_indexes.begin(), res0_indexes.end());
res->insert(res_indexes);
return result_type->createColumnConst(input_rows_count, res_indexes);
}
};
}
void registerFunctionH3GetRes0Indexes(FunctionFactory & factory)
{
factory.registerFunction<FunctionH3GetRes0Indexes>();
}
}
#endif

View File

@ -0,0 +1,153 @@
#include "config_functions.h"
#if USE_H3
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeTuple.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <IO/WriteHelpers.h>
#include <Common/typeid_cast.h>
#include <base/range.h>
#include <constants.h>
#include <h3api.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
}
namespace
{
template <class Impl>
class FunctionH3PointDist final : public IFunction
{
public:
static constexpr auto name = Impl::name;
static constexpr auto function = Impl::function;
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionH3PointDist>(); }
std::string getName() const override { return name; }
size_t getNumberOfArguments() const override { return 4; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
for (size_t i = 0; i < getNumberOfArguments(); ++i)
{
const auto * arg = arguments[i].get();
if (!WhichDataType(arg).isFloat64())
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of argument {} of function {}. Must be Float64",
arg->getName(), i, getName());
}
return std::make_shared<DataTypeFloat64>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
auto non_const_arguments = arguments;
for (auto & argument : non_const_arguments)
argument.column = argument.column->convertToFullColumnIfConst();
const auto * col_lat1 = checkAndGetColumn<ColumnFloat64>(non_const_arguments[0].column.get());
if (!col_lat1)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float64",
arguments[0].type->getName(),
1,
getName());
const auto & data_lat1 = col_lat1->getData();
const auto * col_lon1 = checkAndGetColumn<ColumnFloat64>(non_const_arguments[1].column.get());
if (!col_lon1)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float64",
arguments[1].type->getName(),
2,
getName());
const auto & data_lon1 = col_lon1->getData();
const auto * col_lat2 = checkAndGetColumn<ColumnFloat64>(non_const_arguments[2].column.get());
if (!col_lat2)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float64",
arguments[2].type->getName(),
3,
getName());
const auto & data_lat2 = col_lat2->getData();
const auto * col_lon2 = checkAndGetColumn<ColumnFloat64>(non_const_arguments[3].column.get());
if (!col_lon2)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float64",
arguments[3].type->getName(),
4,
getName());
const auto & data_lon2 = col_lon2->getData();
auto dst = ColumnVector<Float64>::create();
auto & dst_data = dst->getData();
dst_data.resize(input_rows_count);
for (size_t row = 0; row < input_rows_count; ++row)
{
const double lat1 = data_lat1[row];
const double lon1 = data_lon1[row];
const auto lat2 = data_lat2[row];
const auto lon2 = data_lon2[row];
LatLng point1 = {degsToRads(lat1), degsToRads(lon1)};
LatLng point2 = {degsToRads(lat2), degsToRads(lon2)};
// function will be equivalent to distanceM or distanceKm or distanceRads
Float64 res = function(&point1, &point2);
dst_data[row] = res;
}
return dst;
}
};
}
struct H3PointDistM
{
static constexpr auto name = "h3PointDistM";
static constexpr auto function = distanceM;
};
struct H3PointDistKm
{
static constexpr auto name = "h3PointDistKm";
static constexpr auto function = distanceKm;
};
struct H3PointDistRads
{
static constexpr auto name = "h3PointDistRads";
static constexpr auto function = distanceRads;
};
void registerFunctionH3PointDistM(FunctionFactory & factory) { factory.registerFunction<FunctionH3PointDist<H3PointDistM>>(); }
void registerFunctionH3PointDistKm(FunctionFactory & factory) { factory.registerFunction<FunctionH3PointDist<H3PointDistKm>>(); }
void registerFunctionH3PointDistRads(FunctionFactory & factory) { factory.registerFunction<FunctionH3PointDist<H3PointDistRads>>(); }
}
#endif

View File

@ -0,0 +1,291 @@
#include <cfloat>
#include <cmath>
#include <boost/math/distributions/normal.hpp>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypesDecimal.h>
#include <DataTypes/DataTypesNumber.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/IFunction.h>
#include <Functions/castTypeToEither.h>
#include <Interpreters/castColumn.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
}
template <typename Impl>
class FunctionMinSampleSize : public IFunction
{
public:
static constexpr auto name = Impl::name;
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionMinSampleSize<Impl>>(); }
String getName() const override { return name; }
size_t getNumberOfArguments() const override { return Impl::num_args; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override
{
return ColumnNumbers(std::begin(Impl::const_args), std::end(Impl::const_args));
}
bool useDefaultImplementationForNulls() const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
static DataTypePtr getReturnType()
{
auto float_64_type = std::make_shared<DataTypeNumber<Float64>>();
DataTypes types{
float_64_type,
float_64_type,
float_64_type,
};
Strings names{
"minimum_sample_size",
"detect_range_lower",
"detect_range_upper",
};
return std::make_shared<DataTypeTuple>(std::move(types), std::move(names));
}
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
Impl::validateArguments(arguments);
return getReturnType();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
return Impl::execute(arguments, input_rows_count);
}
};
static bool isBetweenZeroAndOne(Float64 v)
{
return v >= 0.0 && v <= 1.0 && fabs(v - 0.0) >= DBL_EPSILON && fabs(v - 1.0) >= DBL_EPSILON;
}
struct ContinousImpl
{
static constexpr auto name = "minSampleSizeContinous";
static constexpr size_t num_args = 5;
static constexpr size_t const_args[] = {2, 3, 4};
static void validateArguments(const DataTypes & arguments)
{
for (size_t i = 0; i < arguments.size(); ++i)
{
if (!isNativeNumber(arguments[i]))
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "The {}th Argument of function {} must be a number.", i + 1, name);
}
}
}
static ColumnPtr execute(const ColumnsWithTypeAndName & arguments, size_t input_rows_count)
{
auto float_64_type = std::make_shared<DataTypeFloat64>();
auto baseline_argument = arguments[0];
baseline_argument.column = baseline_argument.column->convertToFullColumnIfConst();
auto baseline_column_untyped = castColumnAccurate(baseline_argument, float_64_type);
const auto * baseline_column = checkAndGetColumn<ColumnVector<Float64>>(*baseline_column_untyped);
const auto & baseline_column_data = baseline_column->getData();
auto sigma_argument = arguments[1];
sigma_argument.column = sigma_argument.column->convertToFullColumnIfConst();
auto sigma_column_untyped = castColumnAccurate(sigma_argument, float_64_type);
const auto * sigma_column = checkAndGetColumn<ColumnVector<Float64>>(*sigma_column_untyped);
const auto & sigma_column_data = sigma_column->getData();
const IColumn & col_mde = *arguments[2].column;
const IColumn & col_power = *arguments[3].column;
const IColumn & col_alpha = *arguments[4].column;
auto res_min_sample_size = ColumnFloat64::create();
auto & data_min_sample_size = res_min_sample_size->getData();
data_min_sample_size.reserve(input_rows_count);
auto res_detect_lower = ColumnFloat64::create();
auto & data_detect_lower = res_detect_lower->getData();
data_detect_lower.reserve(input_rows_count);
auto res_detect_upper = ColumnFloat64::create();
auto & data_detect_upper = res_detect_upper->getData();
data_detect_upper.reserve(input_rows_count);
/// Minimal Detectable Effect
const Float64 mde = col_mde.getFloat64(0);
/// Sufficient statistical power to detect a treatment effect
const Float64 power = col_power.getFloat64(0);
/// Significance level
const Float64 alpha = col_alpha.getFloat64(0);
boost::math::normal_distribution<> nd(0.0, 1.0);
for (size_t row_num = 0; row_num < input_rows_count; ++row_num)
{
/// Mean of control-metric
Float64 baseline = baseline_column_data[row_num];
/// Standard deviation of conrol-metric
Float64 sigma = sigma_column_data[row_num];
if (!std::isfinite(baseline) || !std::isfinite(sigma) || !isBetweenZeroAndOne(mde) || !isBetweenZeroAndOne(power)
|| !isBetweenZeroAndOne(alpha))
{
data_min_sample_size.emplace_back(std::numeric_limits<Float64>::quiet_NaN());
data_detect_lower.emplace_back(std::numeric_limits<Float64>::quiet_NaN());
data_detect_upper.emplace_back(std::numeric_limits<Float64>::quiet_NaN());
continue;
}
Float64 delta = baseline * mde;
using namespace boost::math;
/// https://towardsdatascience.com/required-sample-size-for-a-b-testing-6f6608dd330a
/// \frac{2\sigma^{2} * (Z_{1 - alpha /2} + Z_{power})^{2}}{\Delta^{2}}
Float64 min_sample_size
= 2 * std::pow(sigma, 2) * std::pow(quantile(nd, 1.0 - alpha / 2) + quantile(nd, power), 2) / std::pow(delta, 2);
data_min_sample_size.emplace_back(min_sample_size);
data_detect_lower.emplace_back(baseline - delta);
data_detect_upper.emplace_back(baseline + delta);
}
return ColumnTuple::create(Columns{std::move(res_min_sample_size), std::move(res_detect_lower), std::move(res_detect_upper)});
}
};
struct ConversionImpl
{
static constexpr auto name = "minSampleSizeConversion";
static constexpr size_t num_args = 4;
static constexpr size_t const_args[] = {1, 2, 3};
static void validateArguments(const DataTypes & arguments)
{
size_t arguments_size = arguments.size();
for (size_t i = 0; i < arguments_size; ++i)
{
if (!isFloat(arguments[i]))
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "The {}th argument of function {} must be a float.", i + 1, name);
}
}
}
static ColumnPtr execute(const ColumnsWithTypeAndName & arguments, size_t input_rows_count)
{
auto first_argument_column = castColumnAccurate(arguments[0], std::make_shared<DataTypeFloat64>());
if (const ColumnConst * const col_p1_const = checkAndGetColumnConst<ColumnVector<Float64>>(first_argument_column.get()))
{
const Float64 left_value = col_p1_const->template getValue<Float64>();
return process<true>(arguments, &left_value, input_rows_count);
}
else if (const ColumnVector<Float64> * const col_p1 = checkAndGetColumn<ColumnVector<Float64>>(first_argument_column.get()))
{
return process<false>(arguments, col_p1->getData().data(), input_rows_count);
}
else
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "The first argument of function {} must be a float.", name);
}
}
template <bool const_p1>
static ColumnPtr process(const ColumnsWithTypeAndName & arguments, const Float64 * col_p1, const size_t input_rows_count)
{
const IColumn & col_mde = *arguments[1].column;
const IColumn & col_power = *arguments[2].column;
const IColumn & col_alpha = *arguments[3].column;
auto res_min_sample_size = ColumnFloat64::create();
auto & data_min_sample_size = res_min_sample_size->getData();
data_min_sample_size.reserve(input_rows_count);
auto res_detect_lower = ColumnFloat64::create();
auto & data_detect_lower = res_detect_lower->getData();
data_detect_lower.reserve(input_rows_count);
auto res_detect_upper = ColumnFloat64::create();
auto & data_detect_upper = res_detect_upper->getData();
data_detect_upper.reserve(input_rows_count);
/// Minimal Detectable Effect
const Float64 mde = col_mde.getFloat64(0);
/// Sufficient statistical power to detect a treatment effect
const Float64 power = col_power.getFloat64(0);
/// Significance level
const Float64 alpha = col_alpha.getFloat64(0);
boost::math::normal_distribution<> nd(0.0, 1.0);
for (size_t row_num = 0; row_num < input_rows_count; ++row_num)
{
/// Proportion of control-metric
Float64 p1;
if constexpr (const_p1)
{
p1 = col_p1[0];
}
else if constexpr (!const_p1)
{
p1 = col_p1[row_num];
}
if (!std::isfinite(p1) || !isBetweenZeroAndOne(mde) || !isBetweenZeroAndOne(power) || !isBetweenZeroAndOne(alpha))
{
data_min_sample_size.emplace_back(std::numeric_limits<Float64>::quiet_NaN());
data_detect_lower.emplace_back(std::numeric_limits<Float64>::quiet_NaN());
data_detect_upper.emplace_back(std::numeric_limits<Float64>::quiet_NaN());
continue;
}
Float64 q1 = 1.0 - p1;
Float64 p2 = p1 + mde;
Float64 q2 = 1.0 - p2;
Float64 p_bar = (p1 + p2) / 2.0;
Float64 q_bar = 1.0 - p_bar;
using namespace boost::math;
/// https://towardsdatascience.com/required-sample-size-for-a-b-testing-6f6608dd330a
/// \frac{(Z_{1-alpha/2} * \sqrt{2*\bar{p}*\bar{q}} + Z_{power} * \sqrt{p1*q1+p2*q2})^{2}}{\Delta^{2}}
Float64 min_sample_size
= std::pow(
quantile(nd, 1.0 - alpha / 2.0) * std::sqrt(2.0 * p_bar * q_bar) + quantile(nd, power) * std::sqrt(p1 * q1 + p2 * q2),
2)
/ std::pow(mde, 2);
data_min_sample_size.emplace_back(min_sample_size);
data_detect_lower.emplace_back(p1 - mde);
data_detect_upper.emplace_back(p1 + mde);
}
return ColumnTuple::create(Columns{std::move(res_min_sample_size), std::move(res_detect_lower), std::move(res_detect_upper)});
}
};
void registerFunctionMinSampleSize(FunctionFactory & factory)
{
factory.registerFunction<FunctionMinSampleSize<ContinousImpl>>();
factory.registerFunction<FunctionMinSampleSize<ConversionImpl>>();
}
}

View File

@ -56,6 +56,7 @@ void registerFunctionTid(FunctionFactory & factory);
void registerFunctionLogTrace(FunctionFactory & factory);
void registerFunctionsTimeWindow(FunctionFactory &);
void registerFunctionToBool(FunctionFactory &);
void registerFunctionMinSampleSize(FunctionFactory &);
#if USE_SSL
void registerFunctionEncrypt(FunctionFactory & factory);
@ -118,6 +119,7 @@ void registerFunctions()
registerFunctionsSnowflake(factory);
registerFunctionsTimeWindow(factory);
registerFunctionToBool(factory);
registerFunctionMinSampleSize(factory);
#if USE_SSL
registerFunctionEncrypt(factory);

View File

@ -52,6 +52,11 @@ void registerFunctionH3HexAreaKm2(FunctionFactory &);
void registerFunctionH3CellAreaM2(FunctionFactory &);
void registerFunctionH3CellAreaRads2(FunctionFactory &);
void registerFunctionH3NumHexagons(FunctionFactory &);
void registerFunctionH3PointDistM(FunctionFactory &);
void registerFunctionH3PointDistKm(FunctionFactory &);
void registerFunctionH3PointDistRads(FunctionFactory &);
void registerFunctionH3GetRes0Indexes(FunctionFactory &);
void registerFunctionH3GetPentagonIndexes(FunctionFactory &);
#endif
@ -118,6 +123,11 @@ void registerFunctionsGeo(FunctionFactory & factory)
registerFunctionH3CellAreaM2(factory);
registerFunctionH3CellAreaRads2(factory);
registerFunctionH3NumHexagons(factory);
registerFunctionH3PointDistM(factory);
registerFunctionH3PointDistKm(factory);
registerFunctionH3PointDistRads(factory);
registerFunctionH3GetRes0Indexes(factory);
registerFunctionH3GetPentagonIndexes(factory);
#endif
#if USE_S2_GEOMETRY

View File

@ -1,231 +1,225 @@
#include <Common/typeid_cast.h>
#include <Columns/IColumn.h>
#include <Columns/ColumnNullable.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/IColumn.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/castTypeToEither.h>
#include <Functions/IFunction.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/IFunction.h>
#include <Functions/castTypeToEither.h>
#include <Interpreters/castColumn.h>
#include <boost/math/distributions/normal.hpp>
#include <Common/typeid_cast.h>
namespace DB
{
namespace ErrorCodes
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int BAD_ARGUMENTS;
}
class FunctionTwoSampleProportionsZTest : public IFunction
{
public:
static constexpr auto POOLED = "pooled";
static constexpr auto UNPOOLED = "unpooled";
static constexpr auto name = "proportionsZTest";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionTwoSampleProportionsZTest>(); }
String getName() const override { return name; }
size_t getNumberOfArguments() const override { return 6; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {5}; }
bool useDefaultImplementationForNulls() const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
static DataTypePtr getReturnType()
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int BAD_ARGUMENTS;
auto float_data_type = std::make_shared<DataTypeNumber<Float64>>();
DataTypes types(4, float_data_type);
Strings names{"z_statistic", "p_value", "confidence_interval_low", "confidence_interval_high"};
return std::make_shared<DataTypeTuple>(std::move(types), std::move(names));
}
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
for (size_t i = 0; i < 4; ++i)
{
if (!isUnsignedInteger(arguments[i].type))
{
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"The {}th Argument of function {} must be an unsigned integer.",
i + 1,
getName());
}
}
if (!isFloat(arguments[4].type))
{
throw Exception{ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"The fifth argument {} of function {} should be a float,",
arguments[4].type->getName(),
getName()};
}
/// There is an additional check for constancy in ExecuteImpl
if (!isString(arguments[5].type) || !arguments[5].column)
{
throw Exception{ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"The sixth argument {} of function {} should be a constant string",
arguments[5].type->getName(),
getName()};
}
return getReturnType();
}
class FunctionTwoSampleProportionsZTest : public IFunction
ColumnPtr executeImpl(const ColumnsWithTypeAndName & const_arguments, const DataTypePtr &, size_t input_rows_count) const override
{
public:
static constexpr auto POOLED = "pooled";
static constexpr auto UNPOOLED = "unpooled";
auto arguments = const_arguments;
/// Only last argument have to be constant
for (size_t i = 0; i < 5; ++i)
arguments[i].column = arguments[i].column->convertToFullColumnIfConst();
static constexpr auto name = "proportionsZTest";
static const auto uint64_data_type = std::make_shared<DataTypeNumber<UInt64>>();
static FunctionPtr create(ContextPtr)
auto column_successes_x = castColumnAccurate(arguments[0], uint64_data_type);
const auto & data_successes_x = checkAndGetColumn<ColumnVector<UInt64>>(column_successes_x.get())->getData();
auto column_successes_y = castColumnAccurate(arguments[1], uint64_data_type);
const auto & data_successes_y = checkAndGetColumn<ColumnVector<UInt64>>(column_successes_y.get())->getData();
auto column_trials_x = castColumnAccurate(arguments[2], uint64_data_type);
const auto & data_trials_x = checkAndGetColumn<ColumnVector<UInt64>>(column_trials_x.get())->getData();
auto column_trials_y = castColumnAccurate(arguments[3], uint64_data_type);
const auto & data_trials_y = checkAndGetColumn<ColumnVector<UInt64>>(column_trials_y.get())->getData();
static const auto float64_data_type = std::make_shared<DataTypeNumber<Float64>>();
auto column_confidence_level = castColumnAccurate(arguments[4], float64_data_type);
const auto & data_confidence_level = checkAndGetColumn<ColumnVector<Float64>>(column_confidence_level.get())->getData();
String usevar = checkAndGetColumnConst<ColumnString>(arguments[5].column.get())->getValue<String>();
if (usevar != UNPOOLED && usevar != POOLED)
throw Exception{ErrorCodes::BAD_ARGUMENTS,
"The sixth argument {} of function {} must be equal to `pooled` or `unpooled`",
arguments[5].type->getName(),
getName()};
const bool is_unpooled = (usevar == UNPOOLED);
auto res_z_statistic = ColumnFloat64::create();
auto & data_z_statistic = res_z_statistic->getData();
data_z_statistic.reserve(input_rows_count);
auto res_p_value = ColumnFloat64::create();
auto & data_p_value = res_p_value->getData();
data_p_value.reserve(input_rows_count);
auto res_ci_lower = ColumnFloat64::create();
auto & data_ci_lower = res_ci_lower->getData();
data_ci_lower.reserve(input_rows_count);
auto res_ci_upper = ColumnFloat64::create();
auto & data_ci_upper = res_ci_upper->getData();
data_ci_upper.reserve(input_rows_count);
auto insert_values_into_result = [&data_z_statistic, &data_p_value, &data_ci_lower, &data_ci_upper](
Float64 z_stat, Float64 p_value, Float64 lower, Float64 upper)
{
return std::make_shared<FunctionTwoSampleProportionsZTest>();
}
data_z_statistic.emplace_back(z_stat);
data_p_value.emplace_back(p_value);
data_ci_lower.emplace_back(lower);
data_ci_upper.emplace_back(upper);
};
String getName() const override
static constexpr Float64 nan = std::numeric_limits<Float64>::quiet_NaN();
boost::math::normal_distribution<> nd(0.0, 1.0);
for (size_t row_num = 0; row_num < input_rows_count; ++row_num)
{
return name;
}
const UInt64 successes_x = data_successes_x[row_num];
const UInt64 successes_y = data_successes_y[row_num];
const UInt64 trials_x = data_trials_x[row_num];
const UInt64 trials_y = data_trials_y[row_num];
const Float64 confidence_level = data_confidence_level[row_num];
size_t getNumberOfArguments() const override { return 6; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {5}; }
const Float64 props_x = static_cast<Float64>(successes_x) / trials_x;
const Float64 props_y = static_cast<Float64>(successes_y) / trials_y;
const Float64 diff = props_x - props_y;
const UInt64 trials_total = trials_x + trials_y;
bool useDefaultImplementationForNulls() const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
static DataTypePtr getReturnType()
{
auto float_data_type = std::make_shared<DataTypeNumber<Float64>>();
DataTypes types(4, float_data_type);
Strings names
if (successes_x == 0 || successes_y == 0 || successes_x > trials_x || successes_y > trials_y || trials_total == 0
|| !std::isfinite(confidence_level) || confidence_level < 0.0 || confidence_level > 1.0)
{
"z_statistic",
"p_value",
"confidence_interval_low",
"confidence_interval_high"
};
return std::make_shared<DataTypeTuple>(
std::move(types),
std::move(names)
);
}
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
for (size_t i = 0; i < 4; ++i)
{
if (!isUnsignedInteger(arguments[i].type))
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"The {}th Argument of function {} must be an unsigned integer.", i + 1, getName());
}
insert_values_into_result(nan, nan, nan, nan);
continue;
}
if (!isFloat(arguments[4].type))
Float64 se = std::sqrt(props_x * (1.0 - props_x) / trials_x + props_y * (1.0 - props_y) / trials_y);
/// z-statistics
/// z = \frac{ \bar{p_{1}} - \bar{p_{2}} }{ \sqrt{ \frac{ \bar{p_{1}} \left ( 1 - \bar{p_{1}} \right ) }{ n_{1} } \frac{ \bar{p_{2}} \left ( 1 - \bar{p_{2}} \right ) }{ n_{2} } } }
Float64 zstat;
if (is_unpooled)
{
throw Exception{ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"The fifth argument {} of function {} should be a float,", arguments[4].type->getName(), getName()};
zstat = (props_x - props_y) / se;
}
else
{
UInt64 successes_total = successes_x + successes_y;
Float64 p_pooled = static_cast<Float64>(successes_total) / trials_total;
Float64 trials_fact = 1.0 / trials_x + 1.0 / trials_y;
zstat = diff / std::sqrt(p_pooled * (1.0 - p_pooled) * trials_fact);
}
/// There is an additional check for constancy in ExecuteImpl
if (!isString(arguments[5].type) || !arguments[5].column)
if (!std::isfinite(zstat))
{
throw Exception{ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"The sixth argument {} of function {} should be a constant string", arguments[5].type->getName(), getName()};
insert_values_into_result(nan, nan, nan, nan);
continue;
}
return getReturnType();
// pvalue
Float64 pvalue = 0;
Float64 one_side = 1 - boost::math::cdf(nd, std::abs(zstat));
pvalue = one_side * 2;
// Confidence intervals
Float64 d = props_x - props_y;
Float64 z = -boost::math::quantile(nd, (1.0 - confidence_level) / 2.0);
Float64 dist = z * se;
Float64 ci_low = d - dist;
Float64 ci_high = d + dist;
insert_values_into_result(zstat, pvalue, ci_low, ci_high);
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & const_arguments, const DataTypePtr &, size_t input_rows_count) const override
{
auto arguments = const_arguments;
/// Only last argument have to be constant
for (size_t i = 0; i < 5; ++i)
arguments[i].column = arguments[i].column->convertToFullColumnIfConst();
static const auto uint64_data_type = std::make_shared<DataTypeNumber<UInt64>>();
auto column_successes_x = castColumnAccurate(arguments[0], uint64_data_type);
const auto & data_successes_x = checkAndGetColumn<ColumnVector<UInt64>>(column_successes_x.get())->getData();
auto column_successes_y = castColumnAccurate(arguments[1], uint64_data_type);
const auto & data_successes_y = checkAndGetColumn<ColumnVector<UInt64>>(column_successes_y.get())->getData();
auto column_trials_x = castColumnAccurate(arguments[2], uint64_data_type);
const auto & data_trials_x = checkAndGetColumn<ColumnVector<UInt64>>(column_trials_x.get())->getData();
auto column_trials_y = castColumnAccurate(arguments[3], uint64_data_type);
const auto & data_trials_y = checkAndGetColumn<ColumnVector<UInt64>>(column_trials_y.get())->getData();
static const auto float64_data_type = std::make_shared<DataTypeNumber<Float64>>();
auto column_confidence_level = castColumnAccurate(arguments[4], float64_data_type);
const auto & data_confidence_level = checkAndGetColumn<ColumnVector<Float64>>(column_confidence_level.get())->getData();
String usevar = checkAndGetColumnConst<ColumnString>(arguments[5].column.get())->getValue<String>();
if (usevar != UNPOOLED && usevar != POOLED)
throw Exception{ErrorCodes::BAD_ARGUMENTS,
"The sixth argument {} of function {} must be equal to `pooled` or `unpooled`", arguments[5].type->getName(), getName()};
const bool is_unpooled = (usevar == UNPOOLED);
auto res_z_statistic = ColumnFloat64::create();
auto & data_z_statistic = res_z_statistic->getData();
data_z_statistic.reserve(input_rows_count);
auto res_p_value = ColumnFloat64::create();
auto & data_p_value = res_p_value->getData();
data_p_value.reserve(input_rows_count);
auto res_ci_lower = ColumnFloat64::create();
auto & data_ci_lower = res_ci_lower->getData();
data_ci_lower.reserve(input_rows_count);
auto res_ci_upper = ColumnFloat64::create();
auto & data_ci_upper = res_ci_upper->getData();
data_ci_upper.reserve(input_rows_count);
auto insert_values_into_result = [&data_z_statistic, &data_p_value, &data_ci_lower, &data_ci_upper](Float64 z_stat, Float64 p_value, Float64 lower, Float64 upper)
{
data_z_statistic.emplace_back(z_stat);
data_p_value.emplace_back(p_value);
data_ci_lower.emplace_back(lower);
data_ci_upper.emplace_back(upper);
};
static constexpr Float64 nan = std::numeric_limits<Float64>::quiet_NaN();
boost::math::normal_distribution<> nd(0.0, 1.0);
for (size_t row_num = 0; row_num < input_rows_count; ++row_num)
{
const UInt64 successes_x = data_successes_x[row_num];
const UInt64 successes_y = data_successes_y[row_num];
const UInt64 trials_x = data_trials_x[row_num];
const UInt64 trials_y = data_trials_y[row_num];
const Float64 confidence_level = data_confidence_level[row_num];
const Float64 props_x = static_cast<Float64>(successes_x) / trials_x;
const Float64 props_y = static_cast<Float64>(successes_y) / trials_y;
const Float64 diff = props_x - props_y;
const UInt64 trials_total = trials_x + trials_y;
if (successes_x == 0 || successes_y == 0
|| successes_x > trials_x || successes_y > trials_y
|| trials_total == 0
|| !std::isfinite(confidence_level) || confidence_level < 0.0 || confidence_level > 1.0)
{
insert_values_into_result(nan, nan, nan, nan);
continue;
}
Float64 se = std::sqrt(props_x * (1.0 - props_x) / trials_x + props_y * (1.0 - props_y) / trials_y);
/// z-statistics
/// z = \frac{ \bar{p_{1}} - \bar{p_{2}} }{ \sqrt{ \frac{ \bar{p_{1}} \left ( 1 - \bar{p_{1}} \right ) }{ n_{1} } \frac{ \bar{p_{2}} \left ( 1 - \bar{p_{2}} \right ) }{ n_{2} } } }
Float64 zstat;
if (is_unpooled)
{
zstat = (props_x - props_y) / se;
}
else
{
UInt64 successes_total = successes_x + successes_y;
Float64 p_pooled = static_cast<Float64>(successes_total) / trials_total;
Float64 trials_fact = 1.0 / trials_x + 1.0 / trials_y;
zstat = diff / std::sqrt(p_pooled * (1.0 - p_pooled) * trials_fact);
}
if (!std::isfinite(zstat))
{
insert_values_into_result(nan, nan, nan, nan);
continue;
}
// pvalue
Float64 pvalue = 0;
Float64 one_side = 1 - boost::math::cdf(nd, std::abs(zstat));
pvalue = one_side * 2;
// Confidence intervals
Float64 d = props_x - props_y;
Float64 z = -boost::math::quantile(nd, (1.0 - confidence_level) / 2.0);
Float64 dist = z * se;
Float64 ci_low = d - dist;
Float64 ci_high = d + dist;
insert_values_into_result(zstat, pvalue, ci_low, ci_high);
}
return ColumnTuple::create(Columns{std::move(res_z_statistic), std::move(res_p_value), std::move(res_ci_lower), std::move(res_ci_upper)});
}
};
void registerFunctionZTest(FunctionFactory & factory)
{
factory.registerFunction<FunctionTwoSampleProportionsZTest>();
return ColumnTuple::create(
Columns{std::move(res_z_statistic), std::move(res_p_value), std::move(res_ci_lower), std::move(res_ci_upper)});
}
};
void registerFunctionZTest(FunctionFactory & factory)
{
factory.registerFunction<FunctionTwoSampleProportionsZTest>();
}
}

View File

@ -517,7 +517,7 @@ private:
class S3CredentialsProviderChain : public Aws::Auth::AWSCredentialsProviderChain
{
public:
explicit S3CredentialsProviderChain(const DB::S3::PocoHTTPClientConfiguration & configuration, const Aws::Auth::AWSCredentials & credentials, bool use_environment_credentials, bool use_insecure_imds_request)
S3CredentialsProviderChain(const DB::S3::PocoHTTPClientConfiguration & configuration, const Aws::Auth::AWSCredentials & credentials, bool use_environment_credentials, bool use_insecure_imds_request)
{
auto * logger = &Poco::Logger::get("S3CredentialsProviderChain");
@ -529,17 +529,18 @@ public:
static const char AWS_EC2_METADATA_DISABLED[] = "AWS_EC2_METADATA_DISABLED";
/// The only difference from DefaultAWSCredentialsProviderChain::DefaultAWSCredentialsProviderChain()
/// is that this chain uses custom ClientConfiguration.
AddProvider(std::make_shared<Aws::Auth::EnvironmentAWSCredentialsProvider>());
AddProvider(std::make_shared<Aws::Auth::ProfileConfigFileAWSCredentialsProvider>());
AddProvider(std::make_shared<Aws::Auth::ProcessCredentialsProvider>());
/// is that this chain uses custom ClientConfiguration. Also we removed process provider because it's useless in our case.
///
/// AWS API tries credentials providers one by one. Some of providers (like ProfileConfigFileAWSCredentialsProvider) can be
/// quite verbose even if nobody configured them. So we use our provider first and only after it use default providers.
{
DB::S3::PocoHTTPClientConfiguration aws_client_configuration = DB::S3::ClientFactory::instance().createClientConfiguration(configuration.region, configuration.remote_host_filter, configuration.s3_max_redirects);
AddProvider(std::make_shared<AwsAuthSTSAssumeRoleWebIdentityCredentialsProvider>(aws_client_configuration));
}
AddProvider(std::make_shared<Aws::Auth::EnvironmentAWSCredentialsProvider>());
/// ECS TaskRole Credentials only available when ENVIRONMENT VARIABLE is set.
const auto relative_uri = Aws::Environment::GetEnv(AWS_ECS_CONTAINER_CREDENTIALS_RELATIVE_URI);
LOG_DEBUG(logger, "The environment variable value {} is {}", AWS_ECS_CONTAINER_CREDENTIALS_RELATIVE_URI,
@ -601,6 +602,9 @@ public:
}
AddProvider(std::make_shared<Aws::Auth::SimpleAWSCredentialsProvider>(credentials));
/// Quite verbose provider (argues if file with credentials doesn't exist) so iut's the last one
/// in chain.
AddProvider(std::make_shared<Aws::Auth::ProfileConfigFileAWSCredentialsProvider>());
}
};

View File

@ -113,6 +113,23 @@ namespace JoinStuff
}
}
template <bool use_flags, bool multiple_disjuncts>
void JoinUsedFlags::setUsed(const Block * block, size_t row_num, size_t offset)
{
if constexpr (!use_flags)
return;
/// Could be set simultaneously from different threads.
if constexpr (multiple_disjuncts)
{
flags[block][row_num].store(true, std::memory_order_relaxed);
}
else
{
flags[nullptr][offset].store(true, std::memory_order_relaxed);
}
}
template <bool use_flags, bool multiple_disjuncts, typename FindResult>
bool JoinUsedFlags::getUsed(const FindResult & f)
{
@ -302,7 +319,7 @@ HashJoin::HashJoin(std::shared_ptr<TableJoin> table_join_, const Block & right_s
throw Exception("ASOF join needs at least one equi-join column", ErrorCodes::SYNTAX_ERROR);
size_t asof_size;
asof_type = AsofRowRefs::getTypeSize(*key_columns.back(), asof_size);
asof_type = SortedLookupVectorBase::getTypeSize(*key_columns.back(), asof_size);
key_columns.pop_back();
/// this is going to set up the appropriate hash table for the direct lookup part of the join
@ -611,8 +628,8 @@ namespace
TypeIndex asof_type = *join.getAsofType();
if (emplace_result.isInserted())
time_series_map = new (time_series_map) typename Map::mapped_type(asof_type);
time_series_map->insert(asof_type, asof_column, stored_block, i);
time_series_map = new (time_series_map) typename Map::mapped_type(createAsofRowRef(asof_type, join.getAsofInequality()));
(*time_series_map)->insert(asof_column, stored_block, i);
}
};
@ -895,8 +912,6 @@ public:
bool is_join_get_)
: join_on_keys(join_on_keys_)
, rows_to_add(block.rows())
, asof_type(join.getAsofType())
, asof_inequality(join.getAsofInequality())
, is_join_get(is_join_get_)
{
size_t num_columns_to_add = block_with_columns_to_add.columns();
@ -978,8 +993,6 @@ public:
}
}
TypeIndex asofType() const { return *asof_type; }
ASOF::Inequality asofInequality() const { return asof_inequality; }
const IColumn & leftAsofKey() const { return *left_asof_key; }
std::vector<JoinOnKeyColumns> join_on_keys;
@ -994,8 +1007,6 @@ private:
std::vector<size_t> right_indexes;
size_t lazy_defaults_count = 0;
/// for ASOF
std::optional<TypeIndex> asof_type;
ASOF::Inequality asof_inequality;
const IColumn * left_asof_key = nullptr;
bool is_join_get;
@ -1224,19 +1235,18 @@ NO_INLINE IColumn::Filter joinRightColumns(
auto & mapped = find_result.getMapped();
if constexpr (jf.is_asof_join)
{
TypeIndex asof_type = added_columns.asofType();
ASOF::Inequality asof_inequality = added_columns.asofInequality();
const IColumn & left_asof_key = added_columns.leftAsofKey();
if (const RowRef * found = mapped.findAsof(asof_type, asof_inequality, left_asof_key, i))
auto [block, row_num] = mapped->findAsof(left_asof_key, i);
if (block)
{
setUsed<need_filter>(filter, i);
if constexpr (multiple_disjuncts)
used_flags.template setUsed<jf.need_flags, multiple_disjuncts>(FindResultImpl<const RowRef, false>(found, true, 0));
used_flags.template setUsed<jf.need_flags, multiple_disjuncts>(block, row_num, 0);
else
used_flags.template setUsed<jf.need_flags, multiple_disjuncts>(find_result);
added_columns.appendFromBlock<jf.add_missing>(*found->block, found->row_num);
added_columns.appendFromBlock<jf.add_missing>(*block, row_num);
}
else
addNotFoundRow<jf.add_missing, jf.need_replication>(added_columns, current_offset);

View File

@ -62,6 +62,9 @@ public:
template <bool use_flags, bool multiple_disjuncts, typename T>
void setUsed(const T & f);
template <bool use_flags, bool multiple_disjunct>
void setUsed(const Block * block, size_t row_num, size_t offset);
template <bool use_flags, bool multiple_disjuncts, typename T>
bool getUsed(const T & f);

View File

@ -1,12 +1,9 @@
#include <Interpreters/RowRefs.h>
#include <Core/Block.h>
#include <base/types.h>
#include <Common/typeid_cast.h>
#include <Common/ColumnsHashing.h>
#include <AggregateFunctions/Helpers.h>
#include <Columns/IColumn.h>
#include <Columns/ColumnVector.h>
#include <Columns/ColumnDecimal.h>
#include <DataTypes/IDataType.h>
#include <base/types.h>
namespace DB
@ -15,6 +12,7 @@ namespace DB
namespace ErrorCodes
{
extern const int BAD_TYPE_OF_FIELD;
extern const int LOGICAL_ERROR;
}
namespace
@ -22,145 +20,207 @@ namespace
/// maps enum values to types
template <typename F>
void callWithType(TypeIndex which, F && f)
void callWithType(TypeIndex type, F && f)
{
switch (which)
{
case TypeIndex::UInt8: return f(UInt8());
case TypeIndex::UInt16: return f(UInt16());
case TypeIndex::UInt32: return f(UInt32());
case TypeIndex::UInt64: return f(UInt64());
case TypeIndex::Int8: return f(Int8());
case TypeIndex::Int16: return f(Int16());
case TypeIndex::Int32: return f(Int32());
case TypeIndex::Int64: return f(Int64());
case TypeIndex::Float32: return f(Float32());
case TypeIndex::Float64: return f(Float64());
case TypeIndex::Decimal32: return f(Decimal32());
case TypeIndex::Decimal64: return f(Decimal64());
case TypeIndex::Decimal128: return f(Decimal128());
case TypeIndex::DateTime64: return f(DateTime64());
default:
break;
}
WhichDataType which(type);
#define DISPATCH(TYPE) \
if (which.idx == TypeIndex::TYPE) \
return f(TYPE());
FOR_NUMERIC_TYPES(DISPATCH)
DISPATCH(Decimal32)
DISPATCH(Decimal64)
DISPATCH(Decimal128)
DISPATCH(Decimal256)
DISPATCH(DateTime64)
#undef DISPATCH
__builtin_unreachable();
}
}
AsofRowRefs::AsofRowRefs(TypeIndex type)
template <typename TKey, ASOF::Inequality inequality>
class SortedLookupVector : public SortedLookupVectorBase
{
auto call = [&](const auto & t)
struct Entry
{
using T = std::decay_t<decltype(t)>;
using LookupType = typename Entry<T>::LookupType;
lookups = std::make_unique<LookupType>();
/// We don't store a RowRef and instead keep it's members separately (and return a tuple) to reduce the memory usage.
/// For example, for sizeof(T) == 4 => sizeof(Entry) == 16 (while before it would be 20). Then when you put it into a vector, the effect is even greater
decltype(RowRef::block) block;
decltype(RowRef::row_num) row_num;
TKey asof_value;
Entry() = delete;
Entry(TKey v, const Block * b, size_t r) : block(b), row_num(r), asof_value(v) { }
bool operator<(const Entry & other) const { return asof_value < other.asof_value; }
};
callWithType(type, call);
}
void AsofRowRefs::insert(TypeIndex type, const IColumn & asof_column, const Block * block, size_t row_num)
{
auto call = [&](const auto & t)
struct GreaterEntryOperator
{
using T = std::decay_t<decltype(t)>;
using LookupPtr = typename Entry<T>::LookupPtr;
using ColumnType = ColumnVectorOrDecimal<T>;
const auto & column = typeid_cast<const ColumnType &>(asof_column);
T key = column.getElement(row_num);
auto entry = Entry<T>(key, RowRef(block, row_num));
std::get<LookupPtr>(lookups)->insert(entry);
bool operator()(Entry const & a, Entry const & b) const { return a.asof_value > b.asof_value; }
};
callWithType(type, call);
}
const RowRef * AsofRowRefs::findAsof(TypeIndex type, ASOF::Inequality inequality, const IColumn & asof_column, size_t row_num) const
{
const RowRef * out = nullptr;
public:
using Base = std::vector<Entry>;
using Keys = std::vector<TKey>;
static constexpr bool isDescending = (inequality == ASOF::Inequality::Greater || inequality == ASOF::Inequality::GreaterOrEquals);
static constexpr bool isStrict = (inequality == ASOF::Inequality::Less) || (inequality == ASOF::Inequality::Greater);
bool ascending = (inequality == ASOF::Inequality::Less) || (inequality == ASOF::Inequality::LessOrEquals);
bool is_strict = (inequality == ASOF::Inequality::Less) || (inequality == ASOF::Inequality::Greater);
auto call = [&](const auto & t)
void insert(const IColumn & asof_column, const Block * block, size_t row_num) override
{
using T = std::decay_t<decltype(t)>;
using EntryType = Entry<T>;
using LookupPtr = typename EntryType::LookupPtr;
using ColumnType = ColumnVectorOrDecimal<TKey>;
const auto & column = assert_cast<const ColumnType &>(asof_column);
TKey k = column.getElement(row_num);
using ColumnType = ColumnVectorOrDecimal<T>;
const auto & column = typeid_cast<const ColumnType &>(asof_column);
T key = column.getElement(row_num);
auto & typed_lookup = std::get<LookupPtr>(lookups);
if (is_strict)
out = typed_lookup->upperBound(EntryType(key), ascending);
else
out = typed_lookup->lowerBound(EntryType(key), ascending);
};
callWithType(type, call);
return out;
}
std::optional<TypeIndex> AsofRowRefs::getTypeSize(const IColumn & asof_column, size_t & size)
{
TypeIndex idx = asof_column.getDataType();
switch (idx)
{
case TypeIndex::UInt8:
size = sizeof(UInt8);
return idx;
case TypeIndex::UInt16:
size = sizeof(UInt16);
return idx;
case TypeIndex::UInt32:
size = sizeof(UInt32);
return idx;
case TypeIndex::UInt64:
size = sizeof(UInt64);
return idx;
case TypeIndex::Int8:
size = sizeof(Int8);
return idx;
case TypeIndex::Int16:
size = sizeof(Int16);
return idx;
case TypeIndex::Int32:
size = sizeof(Int32);
return idx;
case TypeIndex::Int64:
size = sizeof(Int64);
return idx;
//case TypeIndex::Int128:
case TypeIndex::Float32:
size = sizeof(Float32);
return idx;
case TypeIndex::Float64:
size = sizeof(Float64);
return idx;
case TypeIndex::Decimal32:
size = sizeof(Decimal32);
return idx;
case TypeIndex::Decimal64:
size = sizeof(Decimal64);
return idx;
case TypeIndex::Decimal128:
size = sizeof(Decimal128);
return idx;
case TypeIndex::DateTime64:
size = sizeof(DateTime64);
return idx;
default:
break;
assert(!sorted.load(std::memory_order_acquire));
array.emplace_back(k, block, row_num);
}
/// Unrolled version of upper_bound and lower_bound
/// Loosely based on https://academy.realm.io/posts/how-we-beat-cpp-stl-binary-search/
/// In the future it'd interesting to replace it with a B+Tree Layout as described
/// at https://en.algorithmica.org/hpc/data-structures/s-tree/
size_t boundSearch(TKey value)
{
size_t size = array.size();
size_t low = 0;
/// This is a single binary search iteration as a macro to unroll. Takes into account the inequality:
/// isStrict -> Equal values are not requested
/// isDescending -> The vector is sorted in reverse (for greater or greaterOrEquals)
#define BOUND_ITERATION \
{ \
size_t half = size / 2; \
size_t other_half = size - half; \
size_t probe = low + half; \
size_t other_low = low + other_half; \
TKey v = array[probe].asof_value; \
size = half; \
if constexpr (isDescending) \
{ \
if constexpr (isStrict) \
low = value <= v ? other_low : low; \
else \
low = value < v ? other_low : low; \
} \
else \
{ \
if constexpr (isStrict) \
low = value >= v ? other_low : low; \
else \
low = value > v ? other_low : low; \
} \
}
while (size >= 8)
{
BOUND_ITERATION
BOUND_ITERATION
BOUND_ITERATION
}
while (size > 0)
{
BOUND_ITERATION
}
#undef BOUND_ITERATION
return low;
}
std::tuple<decltype(RowRef::block), decltype(RowRef::row_num)> findAsof(const IColumn & asof_column, size_t row_num) override
{
sort();
using ColumnType = ColumnVectorOrDecimal<TKey>;
const auto & column = assert_cast<const ColumnType &>(asof_column);
TKey k = column.getElement(row_num);
size_t pos = boundSearch(k);
if (pos != array.size())
return std::make_tuple(array[pos].block, array[pos].row_num);
return {nullptr, 0};
}
private:
std::atomic<bool> sorted = false;
mutable std::mutex lock;
Base array;
// Double checked locking with SC atomics works in C++
// https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/
// The first thread that calls one of the lookup methods sorts the data
// After calling the first lookup method it is no longer allowed to insert any data
// the array becomes immutable
void sort()
{
if (!sorted.load(std::memory_order_acquire))
{
std::lock_guard<std::mutex> l(lock);
if (!sorted.load(std::memory_order_relaxed))
{
if constexpr (isDescending)
::sort(array.begin(), array.end(), GreaterEntryOperator());
else
::sort(array.begin(), array.end());
sorted.store(true, std::memory_order_release);
}
}
}
};
}
AsofRowRefs createAsofRowRef(TypeIndex type, ASOF::Inequality inequality)
{
AsofRowRefs result;
auto call = [&](const auto & t)
{
using T = std::decay_t<decltype(t)>;
switch (inequality)
{
case ASOF::Inequality::LessOrEquals:
result = std::make_unique<SortedLookupVector<T, ASOF::Inequality::LessOrEquals>>();
break;
case ASOF::Inequality::Less:
result = std::make_unique<SortedLookupVector<T, ASOF::Inequality::Less>>();
break;
case ASOF::Inequality::GreaterOrEquals:
result = std::make_unique<SortedLookupVector<T, ASOF::Inequality::GreaterOrEquals>>();
break;
case ASOF::Inequality::Greater:
result = std::make_unique<SortedLookupVector<T, ASOF::Inequality::Greater>>();
break;
default:
throw Exception("Invalid ASOF Join order", ErrorCodes::LOGICAL_ERROR);
}
};
callWithType(type, call);
return result;
}
std::optional<TypeIndex> SortedLookupVectorBase::getTypeSize(const IColumn & asof_column, size_t & size)
{
WhichDataType which(asof_column.getDataType());
#define DISPATCH(TYPE) \
if (which.idx == TypeIndex::TYPE) \
{ \
size = sizeof(TYPE); \
return asof_column.getDataType(); \
}
FOR_NUMERIC_TYPES(DISPATCH)
DISPATCH(Decimal32)
DISPATCH(Decimal64)
DISPATCH(Decimal128)
DISPATCH(Decimal256)
DISPATCH(DateTime64)
#undef DISPATCH
throw Exception("ASOF join not supported for type: " + std::string(asof_column.getFamilyName()), ErrorCodes::BAD_TYPE_OF_FIELD);
}

View File

@ -1,16 +1,18 @@
#pragma once
#include <optional>
#include <variant>
#include <algorithm>
#include <cassert>
#include <list>
#include <mutex>
#include <algorithm>
#include <optional>
#include <variant>
#include <base/sort.h>
#include <Common/Arena.h>
#include <Columns/ColumnDecimal.h>
#include <Columns/ColumnVector.h>
#include <Columns/IColumn.h>
#include <Interpreters/asof.h>
#include <base/sort.h>
#include <Common/Arena.h>
namespace DB
@ -26,7 +28,7 @@ struct RowRef
const Block * block = nullptr;
SizeT row_num = 0;
RowRef() {} /// NOLINT
RowRef() = default;
RowRef(const Block * block_, size_t row_num_) : block(block_), row_num(row_num_) {}
};
@ -141,123 +143,23 @@ private:
* After calling any of the lookup methods, it is no longer allowed to insert more data as this would invalidate the
* references that can be returned by the lookup methods
*/
template <typename TEntry, typename TKey>
class SortedLookupVector
struct SortedLookupVectorBase
{
public:
using Base = std::vector<TEntry>;
// First stage, insertions into the vector
template <typename U, typename ... TAllocatorParams>
void insert(U && x, TAllocatorParams &&... allocator_params)
{
assert(!sorted.load(std::memory_order_acquire));
array.push_back(std::forward<U>(x), std::forward<TAllocatorParams>(allocator_params)...);
}
const RowRef * upperBound(const TEntry & k, bool ascending)
{
sort(ascending);
auto it = std::upper_bound(array.cbegin(), array.cend(), k, (ascending ? less : greater));
if (it != array.cend())
return &(it->row_ref);
return nullptr;
}
const RowRef * lowerBound(const TEntry & k, bool ascending)
{
sort(ascending);
auto it = std::lower_bound(array.cbegin(), array.cend(), k, (ascending ? less : greater));
if (it != array.cend())
return &(it->row_ref);
return nullptr;
}
private:
std::atomic<bool> sorted = false;
Base array;
mutable std::mutex lock;
static bool less(const TEntry & a, const TEntry & b)
{
return a.asof_value < b.asof_value;
}
static bool greater(const TEntry & a, const TEntry & b)
{
return a.asof_value > b.asof_value;
}
// Double checked locking with SC atomics works in C++
// https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/
// The first thread that calls one of the lookup methods sorts the data
// After calling the first lookup method it is no longer allowed to insert any data
// the array becomes immutable
void sort(bool ascending)
{
if (!sorted.load(std::memory_order_acquire))
{
std::lock_guard<std::mutex> l(lock);
if (!sorted.load(std::memory_order_relaxed))
{
if (!array.empty())
::sort(array.begin(), array.end(), (ascending ? less : greater));
sorted.store(true, std::memory_order_release);
}
}
}
};
class AsofRowRefs
{
public:
template <typename T>
struct Entry
{
using LookupType = SortedLookupVector<Entry<T>, T>;
using LookupPtr = std::unique_ptr<LookupType>;
T asof_value;
RowRef row_ref;
explicit Entry(T v) : asof_value(v) {}
Entry(T v, RowRef rr) : asof_value(v), row_ref(rr) {}
};
using Lookups = std::variant<
Entry<UInt8>::LookupPtr,
Entry<UInt16>::LookupPtr,
Entry<UInt32>::LookupPtr,
Entry<UInt64>::LookupPtr,
Entry<Int8>::LookupPtr,
Entry<Int16>::LookupPtr,
Entry<Int32>::LookupPtr,
Entry<Int64>::LookupPtr,
Entry<Float32>::LookupPtr,
Entry<Float64>::LookupPtr,
Entry<Decimal32>::LookupPtr,
Entry<Decimal64>::LookupPtr,
Entry<Decimal128>::LookupPtr,
Entry<DateTime64>::LookupPtr>;
AsofRowRefs() = default;
explicit AsofRowRefs(TypeIndex t);
SortedLookupVectorBase() = default;
virtual ~SortedLookupVectorBase() { }
static std::optional<TypeIndex> getTypeSize(const IColumn & asof_column, size_t & type_size);
// This will be synchronized by the rwlock mutex in Join.h
void insert(TypeIndex type, const IColumn & asof_column, const Block * block, size_t row_num);
virtual void insert(const IColumn &, const Block *, size_t) = 0;
// This will internally synchronize
const RowRef * findAsof(TypeIndex type, ASOF::Inequality inequality, const IColumn & asof_column, size_t row_num) const;
private:
// Lookups can be stored in a HashTable because it is memmovable
// A std::variant contains a currently active type id (memmovable), together with a union of the types
// The types are all std::unique_ptr, which contains a single pointer, which is memmovable.
// Source: https://github.com/ClickHouse/ClickHouse/issues/4906
Lookups lookups;
// This needs to be synchronized internally
virtual std::tuple<decltype(RowRef::block), decltype(RowRef::row_num)> findAsof(const IColumn &, size_t) = 0;
};
// It only contains a std::unique_ptr which is memmovable.
// Source: https://github.com/ClickHouse/ClickHouse/issues/4906
using AsofRowRefs = std::unique_ptr<SortedLookupVectorBase>;
AsofRowRefs createAsofRowRef(TypeIndex type, ASOF::Inequality inequality);
}

View File

@ -751,7 +751,7 @@ void TreeOptimizer::apply(ASTPtr & query, TreeRewriterResult & result,
if (settings.convert_query_to_cnf)
converted_to_cnf = convertQueryToCNF(select_query);
if (converted_to_cnf && settings.optimize_using_constraints)
if (converted_to_cnf && settings.optimize_using_constraints && result.storage_snapshot)
{
optimizeWithConstraints(select_query, result.aliases, result.source_columns_set,
tables_with_columns, result.storage_snapshot->metadata, settings.optimize_append_index);

View File

@ -0,0 +1,97 @@
#include "ProtobufListInputFormat.h"
#if USE_PROTOBUF
# include <Core/Block.h>
# include <Formats/FormatFactory.h>
# include <Formats/ProtobufReader.h>
# include <Formats/ProtobufSchemas.h>
# include <Formats/ProtobufSerializer.h>
namespace DB
{
ProtobufListInputFormat::ProtobufListInputFormat(ReadBuffer & in_, const Block & header_, const Params & params_, const FormatSchemaInfo & schema_info_)
: IRowInputFormat(header_, in_, params_)
, reader(std::make_unique<ProtobufReader>(in_))
, serializer(ProtobufSerializer::create(
header_.getNames(),
header_.getDataTypes(),
missing_column_indices,
*ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info_, ProtobufSchemas::WithEnvelope::Yes),
/* with_length_delimiter = */ true,
/* with_envelope = */ true,
*reader))
{
}
bool ProtobufListInputFormat::readRow(MutableColumns & columns, RowReadExtension & row_read_extension)
{
if (reader->eof())
{
reader->endMessage(/*ignore_errors =*/ false);
return false;
}
size_t row_num = columns.empty() ? 0 : columns[0]->size();
if (!row_num)
serializer->setColumns(columns.data(), columns.size());
serializer->readRow(row_num);
row_read_extension.read_columns.clear();
row_read_extension.read_columns.resize(columns.size(), true);
for (size_t column_idx : missing_column_indices)
row_read_extension.read_columns[column_idx] = false;
return true;
}
ProtobufListSchemaReader::ProtobufListSchemaReader(const FormatSettings & format_settings)
: schema_info(
format_settings.schema.format_schema,
"Protobuf",
true,
format_settings.schema.is_server,
format_settings.schema.format_schema_path)
{
}
NamesAndTypesList ProtobufListSchemaReader::readSchema()
{
const auto * message_descriptor = ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info, ProtobufSchemas::WithEnvelope::Yes);
return protobufSchemaToCHSchema(message_descriptor);
}
void registerInputFormatProtobufList(FormatFactory & factory)
{
factory.registerInputFormat(
"ProtobufList",
[](ReadBuffer &buf,
const Block & sample,
RowInputFormatParams params,
const FormatSettings & settings)
{
return std::make_shared<ProtobufListInputFormat>(buf, sample, std::move(params), FormatSchemaInfo(settings, "Protobuf", true));
});
factory.markFormatAsColumnOriented("ProtobufList");
}
void registerProtobufListSchemaReader(FormatFactory & factory)
{
factory.registerExternalSchemaReader("ProtobufList", [](const FormatSettings & settings)
{
return std::make_shared<ProtobufListSchemaReader>(settings);
});
}
}
#else
namespace DB
{
class FormatFactory;
void registerInputFormatProtobufList(FormatFactory &) {}
void registerProtobufListSchemaReader(FormatFactory &) {}
}
#endif

View File

@ -0,0 +1,52 @@
#pragma once
#include "config_formats.h"
#if USE_PROTOBUF
# include <Formats/FormatSchemaInfo.h>
# include <Processors/Formats/IRowInputFormat.h>
# include <Processors/Formats/ISchemaReader.h>
namespace DB
{
class Block;
class ProtobufReader;
class ProtobufSerializer;
class ReadBuffer;
/** Stream designed to deserialize data from the google protobuf format.
* One nested Protobuf message is parsed as one row of data.
*
* Parsing of the protobuf format requires the 'format_schema' setting to be set, e.g.
* INSERT INTO table FORMAT Protobuf SETTINGS format_schema = 'schema:Message'
* where schema is the name of "schema.proto" file specifying protobuf schema.
*/
class ProtobufListInputFormat final : public IRowInputFormat
{
public:
ProtobufListInputFormat(ReadBuffer & in_, const Block & header_, const Params & params_, const FormatSchemaInfo & schema_info_);
String getName() const override { return "ProtobufListInputFormat"; }
private:
bool readRow(MutableColumns & columns, RowReadExtension & row_read_extension) override;
std::unique_ptr<ProtobufReader> reader;
std::vector<size_t> missing_column_indices;
std::unique_ptr<ProtobufSerializer> serializer;
};
class ProtobufListSchemaReader : public IExternalSchemaReader
{
public:
explicit ProtobufListSchemaReader(const FormatSettings & format_settings);
NamesAndTypesList readSchema() override;
private:
const FormatSchemaInfo schema_info;
};
}
#endif

View File

@ -0,0 +1,68 @@
#include "ProtobufListOutputFormat.h"
#if USE_PROTOBUF
# include <Formats/FormatFactory.h>
# include <Formats/FormatSchemaInfo.h>
# include <Formats/ProtobufWriter.h>
# include <Formats/ProtobufSerializer.h>
# include <Formats/ProtobufSchemas.h>
namespace DB
{
ProtobufListOutputFormat::ProtobufListOutputFormat(
WriteBuffer & out_,
const Block & header_,
const RowOutputFormatParams & params_,
const FormatSchemaInfo & schema_info_)
: IRowOutputFormat(header_, out_, params_)
, writer(std::make_unique<ProtobufWriter>(out))
, serializer(ProtobufSerializer::create(
header_.getNames(),
header_.getDataTypes(),
*ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info_, ProtobufSchemas::WithEnvelope::Yes),
/* with_length_delimiter = */ true,
/* with_envelope = */ true,
*writer))
{
}
void ProtobufListOutputFormat::write(const Columns & columns, size_t row_num)
{
if (row_num == 0)
serializer->setColumns(columns.data(), columns.size());
serializer->writeRow(row_num);
}
void ProtobufListOutputFormat::finalizeImpl()
{
serializer->finalizeWrite();
}
void registerOutputFormatProtobufList(FormatFactory & factory)
{
factory.registerOutputFormat(
"ProtobufList",
[](WriteBuffer & buf,
const Block & header,
const RowOutputFormatParams & params,
const FormatSettings & settings)
{
return std::make_shared<ProtobufListOutputFormat>(
buf, header, params,
FormatSchemaInfo(settings, "Protobuf", true));
});
}
}
#else
namespace DB
{
class FormatFactory;
void registerOutputFormatProtobufList(FormatFactory &) {}
}
#endif

View File

@ -0,0 +1,48 @@
#pragma once
#include "config_formats.h"
#if USE_PROTOBUF
# include <Processors/Formats/IRowOutputFormat.h>
namespace DB
{
class FormatSchemaInfo;
class ProtobufWriter;
class ProtobufSerializer;
/** Stream designed to serialize data in the google protobuf format.
* Each row is written as a separated nested message, and all rows are enclosed by a single
* top-level, envelope message
*
* Serializing in the protobuf format requires the 'format_schema' setting to be set, e.g.
* SELECT * from table FORMAT Protobuf SETTINGS format_schema = 'schema:Message'
* where schema is the name of "schema.proto" file specifying protobuf schema.
*/
// class ProtobufListOutputFormat final : public IOutputFormat
class ProtobufListOutputFormat final : public IRowOutputFormat
{
public:
ProtobufListOutputFormat(
WriteBuffer & out_,
const Block & header_,
const RowOutputFormatParams & params_,
const FormatSchemaInfo & schema_info_);
String getName() const override { return "ProtobufListOutputFormat"; }
String getContentType() const override { return "application/octet-stream"; }
private:
void write(const Columns & columns, size_t row_num) override;
void writeField(const IColumn &, const ISerialization &, size_t) override {}
void finalizeImpl() override;
std::unique_ptr<ProtobufWriter> writer;
std::unique_ptr<ProtobufSerializer> serializer;
};
}
#endif

View File

@ -3,16 +3,13 @@
#if USE_PROTOBUF
# include <Core/Block.h>
# include <Formats/FormatFactory.h>
# include <Formats/FormatSchemaInfo.h>
# include <Formats/ProtobufReader.h>
# include <Formats/ProtobufSchemas.h>
# include <Formats/ProtobufSerializer.h>
# include <Interpreters/Context.h>
# include <base/range.h>
namespace DB
{
ProtobufRowInputFormat::ProtobufRowInputFormat(ReadBuffer & in_, const Block & header_, const Params & params_, const FormatSchemaInfo & schema_info_, bool with_length_delimiter_)
: IRowInputFormat(header_, in_, params_)
, reader(std::make_unique<ProtobufReader>(in_))
@ -20,14 +17,13 @@ ProtobufRowInputFormat::ProtobufRowInputFormat(ReadBuffer & in_, const Block & h
header_.getNames(),
header_.getDataTypes(),
missing_column_indices,
*ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info_),
*ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info_, ProtobufSchemas::WithEnvelope::No),
with_length_delimiter_,
/* with_envelope = */ false,
*reader))
{
}
ProtobufRowInputFormat::~ProtobufRowInputFormat() = default;
bool ProtobufRowInputFormat::readRow(MutableColumns & columns, RowReadExtension & row_read_extension)
{
if (reader->eof())
@ -85,7 +81,7 @@ ProtobufSchemaReader::ProtobufSchemaReader(const FormatSettings & format_setting
NamesAndTypesList ProtobufSchemaReader::readSchema()
{
const auto * message_descriptor = ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info);
const auto * message_descriptor = ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info, ProtobufSchemas::WithEnvelope::No);
return protobufSchemaToCHSchema(message_descriptor);
}
@ -111,7 +107,6 @@ namespace DB
{
class FormatFactory;
void registerInputFormatProtobuf(FormatFactory &) {}
void registerProtobufSchemaReader(FormatFactory &) {}
}

View File

@ -3,17 +3,16 @@
#include "config_formats.h"
#if USE_PROTOBUF
# include <Formats/FormatSchemaInfo.h>
# include <Processors/Formats/IRowInputFormat.h>
# include <Processors/Formats/ISchemaReader.h>
# include <Processors/Formats/IRowInputFormat.h>
# include <Processors/Formats/ISchemaReader.h>
# include <Formats/FormatSchemaInfo.h>
namespace DB
{
class Block;
class FormatSchemaInfo;
class ProtobufReader;
class ProtobufSerializer;
class ReadBuffer;
/** Stream designed to deserialize data from the google protobuf format.
* One Protobuf message is parsed as one row of data.
@ -30,12 +29,11 @@ class ProtobufRowInputFormat final : public IRowInputFormat
{
public:
ProtobufRowInputFormat(ReadBuffer & in_, const Block & header_, const Params & params_, const FormatSchemaInfo & schema_info_, bool with_length_delimiter_);
~ProtobufRowInputFormat() override;
String getName() const override { return "ProtobufRowInputFormat"; }
private:
bool readRow(MutableColumns & columns, RowReadExtension &) override;
bool readRow(MutableColumns & columns, RowReadExtension & row_read_extension) override;
bool allowSyncAfterError() const override;
void syncAfterError() override;
@ -52,7 +50,7 @@ public:
NamesAndTypesList readSchema() override;
private:
FormatSchemaInfo schema_info;
const FormatSchemaInfo schema_info;
};
}

View File

@ -4,12 +4,12 @@
# include <Formats/FormatFactory.h>
# include <Core/Block.h>
# include <Formats/FormatSchemaInfo.h>
# include <Formats/FormatSettings.h>
# include <Formats/ProtobufSchemas.h>
# include <Formats/ProtobufSerializer.h>
# include <Formats/ProtobufWriter.h>
# include <google/protobuf/descriptor.h>
namespace DB
{
namespace ErrorCodes
@ -17,7 +17,6 @@ namespace ErrorCodes
extern const int NO_ROW_DELIMITER;
}
ProtobufRowOutputFormat::ProtobufRowOutputFormat(
WriteBuffer & out_,
const Block & header_,
@ -30,8 +29,9 @@ ProtobufRowOutputFormat::ProtobufRowOutputFormat(
, serializer(ProtobufSerializer::create(
header_.getNames(),
header_.getDataTypes(),
*ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info_),
*ProtobufSchemas::instance().getMessageTypeForFormatSchema(schema_info_, ProtobufSchemas::WithEnvelope::No),
with_length_delimiter_,
/* with_envelope = */ false,
*writer))
, allow_multiple_rows(with_length_delimiter_ || settings_.protobuf.allow_multiple_rows_without_delimiter)
{
@ -44,13 +44,12 @@ void ProtobufRowOutputFormat::write(const Columns & columns, size_t row_num)
"The ProtobufSingle format can't be used to write multiple rows because this format doesn't have any row delimiter.",
ErrorCodes::NO_ROW_DELIMITER);
if (!row_num)
if (row_num == 0)
serializer->setColumns(columns.data(), columns.size());
serializer->writeRow(row_num);
}
void registerOutputFormatProtobuf(FormatFactory & factory)
{
for (bool with_length_delimiter : {false, true})

View File

@ -3,17 +3,15 @@
#include "config_formats.h"
#if USE_PROTOBUF
# include <Core/Block.h>
# include <Formats/FormatSchemaInfo.h>
# include <Formats/FormatSettings.h>
# include <Processors/Formats/IRowOutputFormat.h>
namespace DB
{
class ProtobufWriter;
class ProtobufSerializer;
class DB;
class FormatSchemaInfo;
class ProtobufSerializer;
class ProtobufWriter;
class WriteBuffer;
struct FormatSettings;
/** Stream designed to serialize data in the google protobuf format.

View File

@ -246,6 +246,9 @@ IProcessor::Status GroupingAggregatedTransform::prepare()
void GroupingAggregatedTransform::addChunk(Chunk chunk, size_t input)
{
if (!chunk.hasRows())
return;
const auto & info = chunk.getChunkInfo();
if (!info)
throw Exception("Chunk info was not set for chunk in GroupingAggregatedTransform.", ErrorCodes::LOGICAL_ERROR);

View File

@ -20,7 +20,7 @@ namespace ErrorCodes
}
std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryTask::prepare()
ReplicatedMergeMutateTaskBase::PrepareResult MergeFromLogEntryTask::prepare()
{
LOG_TRACE(log, "Executing log entry to merge parts {} to {}",
fmt::join(entry.source_parts, ", "), entry.new_part_name);
@ -30,7 +30,11 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
if (storage_settings_ptr->always_fetch_merged_part)
{
LOG_INFO(log, "Will fetch part {} because setting 'always_fetch_merged_part' is true", entry.new_part_name);
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
if (entry.merge_type == MergeType::TTL_RECOMPRESS &&
@ -40,7 +44,12 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
LOG_INFO(log, "Will try to fetch part {} until '{}' because this part assigned to recompression merge. "
"Source replica {} will try to merge this part first", entry.new_part_name,
DateLUT::instance().timeToString(entry.create_time + storage_settings_ptr->try_fetch_recompressed_part_timeout.totalSeconds()), entry.source_replica);
return {false, {}};
/// Waiting other replica to recompress part. No need to check it.
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = false,
.part_log_writer = {}
};
}
/// In some use cases merging can be more expensive than fetching
@ -56,7 +65,11 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
"Prefer fetching part {} from replica {} due to execute_merges_on_single_replica_time_threshold",
entry.new_part_name, replica_to_execute_merge.value());
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
}
@ -69,7 +82,11 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
{
/// We do not have one of source parts locally, try to take some already merged part from someone.
LOG_DEBUG(log, "Don't have all parts for merge {}; will try to fetch it instead", entry.new_part_name);
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
if (source_part_or_covering->name != source_part_name)
@ -83,7 +100,12 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
LOG_WARNING(log, fmt::runtime(message), source_part_name, source_part_or_covering->name, entry.new_part_name);
if (!source_part_or_covering->info.contains(MergeTreePartInfo::fromPartName(entry.new_part_name, storage.format_version)))
throw Exception(ErrorCodes::LOGICAL_ERROR, message, source_part_name, source_part_or_covering->name, entry.new_part_name);
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
parts.push_back(source_part_or_covering);
@ -106,7 +128,12 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
if (!replica.empty())
{
LOG_DEBUG(log, "Prefer to fetch {} from replica {}", entry.new_part_name, replica);
return {false, {}};
/// We found covering part, no checks for missing part.
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = false,
.part_log_writer = {}
};
}
}
}
@ -163,7 +190,12 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
if (!storage.findReplicaHavingCoveringPart(entry.new_part_name, true, dummy).empty())
{
LOG_DEBUG(log, "Merge of part {} finished by some other replica, will fetch merged part", entry.new_part_name);
return {false, {}};
/// We found covering part, no checks for missing part.
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = false,
.part_log_writer = {}
};
}
zero_copy_lock = storage.tryCreateZeroCopyExclusiveLock(entry.new_part_name, disk);
@ -171,7 +203,13 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
if (!zero_copy_lock)
{
LOG_DEBUG(log, "Merge of part {} started by some other replica, will wait it and fetch merged part", entry.new_part_name);
return {false, {}};
/// Don't check for missing part -- it's missing because other replica still not
/// finished merge.
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = false,
.part_log_writer = {}
};
}
}
}
@ -210,7 +248,7 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MergeFromLogEntryT
for (auto & item : future_merged_part->parts)
priority += item->getBytesOnDisk();
return {true, [this, stopwatch = *stopwatch_ptr] (const ExecutionStatus & execution_status)
return {true, true, [this, stopwatch = *stopwatch_ptr] (const ExecutionStatus & execution_status)
{
storage.writePartLog(
PartLogElement::MERGE_PARTS, execution_status, stopwatch.elapsed(),

View File

@ -25,7 +25,7 @@ public:
protected:
/// Both return false if we can't execute merge.
std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> prepare() override;
ReplicatedMergeMutateTaskBase::PrepareResult prepare() override;
bool finalize(ReplicatedMergeMutateTaskBase::PartLogWriter write_part_log) override;
bool executeInnerTask() override

View File

@ -286,10 +286,7 @@ MergeTreeData::MergeTreeData(
if (disk->exists(current_version_file_path))
{
if (!version_file.first.empty())
{
LOG_ERROR(log, "Duplication of version file {} and {}", fullPath(version_file.second, version_file.first), current_version_file_path);
throw Exception("Multiple format_version.txt file", ErrorCodes::CORRUPTED_DATA);
}
throw Exception(ErrorCodes::CORRUPTED_DATA, "Duplication of version file {} and {}", fullPath(version_file.second, version_file.first), current_version_file_path);
version_file = {current_version_file_path, disk};
}
}

View File

@ -39,6 +39,9 @@
namespace DB
{
/// Number of streams is not number parts, but number or parts*files, hence 1000.
const size_t DEFAULT_DELAYED_STREAMS_FOR_PARALLEL_WRITE = 1000;
class AlterCommands;
class MergeTreePartsMover;
class MergeTreeDataMergerMutator;

View File

@ -693,11 +693,10 @@ MergeTreeDataMergerMutator::getColumnsForNewDataPart(
}
}
bool is_wide_part = isWidePart(source_part);
SerializationInfoByName new_serialization_infos;
for (const auto & [name, info] : serialization_infos)
{
if (is_wide_part && removed_columns.count(name))
if (removed_columns.count(name))
continue;
auto it = renamed_columns_from_to.find(name);
@ -709,7 +708,7 @@ MergeTreeDataMergerMutator::getColumnsForNewDataPart(
/// In compact parts we read all columns, because they all stored in a
/// single file
if (!is_wide_part)
if (!isWidePart(source_part))
return {updated_header.getNamesAndTypesList(), new_serialization_infos};
Names source_column_names = source_part->getColumns().getNames();

View File

@ -52,7 +52,14 @@ void MergeTreeSink::consume(Chunk chunk)
auto block = getHeader().cloneWithColumns(chunk.detachColumns());
auto part_blocks = storage.writer.splitBlockIntoParts(block, max_parts_per_block, metadata_snapshot, context);
std::vector<MergeTreeSink::DelayedChunk::Partition> partitions;
using DelayedPartitions = std::vector<MergeTreeSink::DelayedChunk::Partition>;
DelayedPartitions partitions;
const Settings & settings = context->getSettingsRef();
size_t streams = 0;
bool support_parallel_write = false;
for (auto & current_block : part_blocks)
{
Stopwatch watch;
@ -67,9 +74,12 @@ void MergeTreeSink::consume(Chunk chunk)
if (!temp_part.part)
continue;
if (!support_parallel_write && temp_part.part->volume->getDisk()->supportParallelWrite())
support_parallel_write = true;
if (storage.getDeduplicationLog())
{
const String & dedup_token = context->getSettingsRef().insert_deduplication_token;
const String & dedup_token = settings.insert_deduplication_token;
if (!dedup_token.empty())
{
/// multiple blocks can be inserted within the same insert query
@ -79,6 +89,24 @@ void MergeTreeSink::consume(Chunk chunk)
}
}
size_t max_insert_delayed_streams_for_parallel_write = DEFAULT_DELAYED_STREAMS_FOR_PARALLEL_WRITE;
if (!support_parallel_write || settings.max_insert_delayed_streams_for_parallel_write.changed)
max_insert_delayed_streams_for_parallel_write = settings.max_insert_delayed_streams_for_parallel_write;
/// In case of too much columns/parts in block, flush explicitly.
streams += temp_part.streams.size();
if (streams > max_insert_delayed_streams_for_parallel_write)
{
finishDelayedChunk();
delayed_chunk = std::make_unique<MergeTreeSink::DelayedChunk>();
delayed_chunk->partitions = std::move(partitions);
finishDelayedChunk();
streams = 0;
support_parallel_write = false;
partitions = DelayedPartitions{};
}
partitions.emplace_back(MergeTreeSink::DelayedChunk::Partition
{
.temp_part = std::move(temp_part),

View File

@ -13,7 +13,7 @@ namespace ProfileEvents
namespace DB
{
std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MutateFromLogEntryTask::prepare()
ReplicatedMergeMutateTaskBase::PrepareResult MutateFromLogEntryTask::prepare()
{
const String & source_part_name = entry.source_parts.at(0);
const auto storage_settings_ptr = storage.getSettings();
@ -23,7 +23,11 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MutateFromLogEntry
if (!source_part)
{
LOG_DEBUG(log, "Source part {} for {} is not ready; will try to fetch it instead", source_part_name, entry.new_part_name);
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
if (source_part->name != source_part_name)
@ -33,7 +37,12 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MutateFromLogEntry
"Possibly the mutation of this part is not needed and will be skipped. "
"This shouldn't happen often.",
source_part_name, source_part->name, entry.new_part_name);
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
/// TODO - some better heuristic?
@ -48,7 +57,11 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MutateFromLogEntry
if (!replica.empty())
{
LOG_DEBUG(log, "Prefer to fetch {} from replica {}", entry.new_part_name, replica);
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
}
@ -65,7 +78,12 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MutateFromLogEntry
"Prefer fetching part {} from replica {} due to execute_merges_on_single_replica_time_threshold",
entry.new_part_name, replica_to_execute_merge.value());
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
}
@ -98,7 +116,11 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MutateFromLogEntry
if (!storage.findReplicaHavingCoveringPart(entry.new_part_name, true, dummy).empty())
{
LOG_DEBUG(log, "Mutation of part {} finished by some other replica, will download merged part", entry.new_part_name);
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = true,
.part_log_writer = {}
};
}
zero_copy_lock = storage.tryCreateZeroCopyExclusiveLock(entry.new_part_name, disk);
@ -106,7 +128,11 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MutateFromLogEntry
if (!zero_copy_lock)
{
LOG_DEBUG(log, "Mutation of part {} started by some other replica, will wait it and fetch merged part", entry.new_part_name);
return {false, {}};
return PrepareResult{
.prepared_successfully = false,
.need_to_check_missing_part_in_fetch = false,
.part_log_writer = {}
};
}
}
}
@ -132,7 +158,7 @@ std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> MutateFromLogEntry
for (auto & item : future_mutated_part->parts)
priority += item->getBytesOnDisk();
return {true, [this] (const ExecutionStatus & execution_status)
return {true, true, [this] (const ExecutionStatus & execution_status)
{
storage.writePartLog(
PartLogElement::MUTATE_PART, execution_status, stopwatch_ptr->elapsed(),

View File

@ -25,7 +25,7 @@ public:
UInt64 getPriority() override { return priority; }
private:
std::pair<bool, ReplicatedMergeMutateTaskBase::PartLogWriter> prepare() override;
ReplicatedMergeMutateTaskBase::PrepareResult prepare() override;
bool finalize(ReplicatedMergeMutateTaskBase::PartLogWriter write_part_log) override;
bool executeInnerTask() override

View File

@ -422,10 +422,11 @@ void finalizeMutatedPart(
const CompressionCodecPtr & codec)
{
auto disk = new_data_part->volume->getDisk();
auto part_path = fs::path(new_data_part->getFullRelativePath());
if (new_data_part->uuid != UUIDHelpers::Nil)
{
auto out = disk->writeFile(new_data_part->getFullRelativePath() + IMergeTreeDataPart::UUID_FILE_NAME, 4096);
auto out = disk->writeFile(part_path / IMergeTreeDataPart::UUID_FILE_NAME, 4096);
HashingWriteBuffer out_hashing(*out);
writeUUIDText(new_data_part->uuid, out_hashing);
new_data_part->checksums.files[IMergeTreeDataPart::UUID_FILE_NAME].file_size = out_hashing.count();
@ -435,27 +436,36 @@ void finalizeMutatedPart(
if (execute_ttl_type != ExecuteTTLType::NONE)
{
/// Write a file with ttl infos in json format.
auto out_ttl = disk->writeFile(fs::path(new_data_part->getFullRelativePath()) / "ttl.txt", 4096);
auto out_ttl = disk->writeFile(part_path / "ttl.txt", 4096);
HashingWriteBuffer out_hashing(*out_ttl);
new_data_part->ttl_infos.write(out_hashing);
new_data_part->checksums.files["ttl.txt"].file_size = out_hashing.count();
new_data_part->checksums.files["ttl.txt"].file_hash = out_hashing.getHash();
}
if (!new_data_part->getSerializationInfos().empty())
{
auto out = disk->writeFile(part_path / IMergeTreeDataPart::SERIALIZATION_FILE_NAME, 4096);
HashingWriteBuffer out_hashing(*out);
new_data_part->getSerializationInfos().writeJSON(out_hashing);
new_data_part->checksums.files[IMergeTreeDataPart::SERIALIZATION_FILE_NAME].file_size = out_hashing.count();
new_data_part->checksums.files[IMergeTreeDataPart::SERIALIZATION_FILE_NAME].file_hash = out_hashing.getHash();
}
{
/// Write file with checksums.
auto out_checksums = disk->writeFile(fs::path(new_data_part->getFullRelativePath()) / "checksums.txt", 4096);
auto out_checksums = disk->writeFile(part_path / "checksums.txt", 4096);
new_data_part->checksums.write(*out_checksums);
} /// close fd
{
auto out = disk->writeFile(new_data_part->getFullRelativePath() + IMergeTreeDataPart::DEFAULT_COMPRESSION_CODEC_FILE_NAME, 4096);
auto out = disk->writeFile(part_path / IMergeTreeDataPart::DEFAULT_COMPRESSION_CODEC_FILE_NAME, 4096);
DB::writeText(queryToString(codec->getFullCodecDesc()), *out);
}
{
/// Write a file with a description of columns.
auto out_columns = disk->writeFile(fs::path(new_data_part->getFullRelativePath()) / "columns.txt", 4096);
auto out_columns = disk->writeFile(part_path / "columns.txt", 4096);
new_data_part->getColumns().writeText(*out_columns);
} /// close fd
@ -466,7 +476,7 @@ void finalizeMutatedPart(
new_data_part->modification_time = time(nullptr);
new_data_part->loadProjections(false, false);
new_data_part->setBytesOnDisk(
MergeTreeData::DataPart::calculateTotalSizeOnDisk(new_data_part->volume->getDisk(), new_data_part->getFullRelativePath()));
MergeTreeData::DataPart::calculateTotalSizeOnDisk(new_data_part->volume->getDisk(), part_path));
new_data_part->default_codec = codec;
new_data_part->calculateColumnsAndSecondaryIndicesSizesOnDisk();
new_data_part->storage.lockSharedData(*new_data_part);
@ -1308,14 +1318,13 @@ bool MutateTask::prepare()
ctx->mrk_extension = ctx->source_part->index_granularity_info.is_adaptive ? getAdaptiveMrkExtension(ctx->new_data_part->getType())
: getNonAdaptiveMrkExtension();
const auto data_settings = ctx->data-> getSettings();
const auto data_settings = ctx->data->getSettings();
ctx->need_sync = needSyncPart(ctx->source_part->rows_count, ctx->source_part->getBytesOnDisk(), *data_settings);
ctx->execute_ttl_type = ExecuteTTLType::NONE;
if (ctx->mutating_pipeline.initialized())
ctx->execute_ttl_type = MergeTreeDataMergerMutator::shouldExecuteTTL(ctx->metadata_snapshot, ctx->interpreter->getColumnDependencies());
/// All columns from part are changed and may be some more that were missing before in part
/// TODO We can materialize compact part without copying data
if (!isWidePart(ctx->source_part)
@ -1325,7 +1334,6 @@ bool MutateTask::prepare()
}
else /// TODO: check that we modify only non-key columns in this case.
{
/// We will modify only some of the columns. Other columns and key values can be copied as-is.
for (const auto & name_type : ctx->updated_header.getNamesAndTypesList())
ctx->updated_columns.emplace(name_type.name);

View File

@ -144,9 +144,9 @@ bool ReplicatedMergeMutateTaskBase::executeImpl()
};
auto execute_fetch = [&] () -> bool
auto execute_fetch = [&] (bool need_to_check_missing_part) -> bool
{
if (storage.executeFetch(entry))
if (storage.executeFetch(entry, need_to_check_missing_part))
return remove_processed_entry();
return false;
@ -164,12 +164,13 @@ bool ReplicatedMergeMutateTaskBase::executeImpl()
return remove_processed_entry();
}
bool res = false;
std::tie(res, part_log_writer) = prepare();
auto prepare_result = prepare();
part_log_writer = prepare_result.part_log_writer;
/// Avoid resheduling, execute fetch here, in the same thread.
if (!res)
return execute_fetch();
if (!prepare_result.prepared_successfully)
return execute_fetch(prepare_result.need_to_check_missing_part_in_fetch);
state = State::NEED_EXECUTE_INNER_MERGE;
return true;
@ -198,7 +199,7 @@ bool ReplicatedMergeMutateTaskBase::executeImpl()
try
{
if (!finalize(part_log_writer))
return execute_fetch();
return execute_fetch(/* need_to_check_missing = */true);
}
catch (...)
{

View File

@ -38,7 +38,14 @@ public:
protected:
using PartLogWriter = std::function<void(const ExecutionStatus &)>;
virtual std::pair<bool, PartLogWriter> prepare() = 0;
struct PrepareResult
{
bool prepared_successfully;
bool need_to_check_missing_part_in_fetch;
PartLogWriter part_log_writer;
};
virtual PrepareResult prepare() = 0;
virtual bool finalize(ReplicatedMergeMutateTaskBase::PartLogWriter write_part_log) = 0;
/// Will execute a part of inner MergeTask or MutateTask

View File

@ -150,9 +150,14 @@ void ReplicatedMergeTreeSink::consume(Chunk chunk)
if (quorum)
checkQuorumPrecondition(zookeeper);
const Settings & settings = context->getSettingsRef();
auto part_blocks = storage.writer.splitBlockIntoParts(block, max_parts_per_block, metadata_snapshot, context);
std::vector<ReplicatedMergeTreeSink::DelayedChunk::Partition> partitions;
String block_dedup_token;
using DelayedPartitions = std::vector<ReplicatedMergeTreeSink::DelayedChunk::Partition>;
DelayedPartitions partitions;
size_t streams = 0;
bool support_parallel_write = false;
for (auto & current_block : part_blocks)
{
@ -171,10 +176,12 @@ void ReplicatedMergeTreeSink::consume(Chunk chunk)
if (deduplicate)
{
String block_dedup_token;
/// We add the hash from the data and partition identifier to deduplication ID.
/// That is, do not insert the same data to the same partition twice.
const String & dedup_token = context->getSettingsRef().insert_deduplication_token;
const String & dedup_token = settings.insert_deduplication_token;
if (!dedup_token.empty())
{
/// multiple blocks can be inserted within the same insert query
@ -182,6 +189,7 @@ void ReplicatedMergeTreeSink::consume(Chunk chunk)
block_dedup_token = fmt::format("{}_{}", dedup_token, chunk_dedup_seqnum);
++chunk_dedup_seqnum;
}
block_id = temp_part.part->getZeroLevelPartBlockID(block_dedup_token);
LOG_DEBUG(log, "Wrote block with ID '{}', {} rows", block_id, current_block.block.rows());
}
@ -192,6 +200,24 @@ void ReplicatedMergeTreeSink::consume(Chunk chunk)
UInt64 elapsed_ns = watch.elapsed();
size_t max_insert_delayed_streams_for_parallel_write = DEFAULT_DELAYED_STREAMS_FOR_PARALLEL_WRITE;
if (!support_parallel_write || settings.max_insert_delayed_streams_for_parallel_write.changed)
max_insert_delayed_streams_for_parallel_write = settings.max_insert_delayed_streams_for_parallel_write;
/// In case of too much columns/parts in block, flush explicitly.
streams += temp_part.streams.size();
if (streams > max_insert_delayed_streams_for_parallel_write)
{
finishDelayedChunk(zookeeper);
delayed_chunk = std::make_unique<ReplicatedMergeTreeSink::DelayedChunk>();
delayed_chunk->partitions = std::move(partitions);
finishDelayedChunk(zookeeper);
streams = 0;
support_parallel_write = false;
partitions = DelayedPartitions{};
}
partitions.emplace_back(ReplicatedMergeTreeSink::DelayedChunk::Partition{
.temp_part = std::move(temp_part),
.elapsed_ns = elapsed_ns,
@ -207,7 +233,7 @@ void ReplicatedMergeTreeSink::consume(Chunk chunk)
/// value for `last_block_is_duplicate`, which is possible only after the part is committed.
/// Othervide we can delay commit.
/// TODO: we can also delay commit if there is no MVs.
if (!context->getSettingsRef().deduplicate_blocks_in_dependent_materialized_views)
if (!settings.deduplicate_blocks_in_dependent_materialized_views)
finishDelayedChunk(zookeeper);
}

View File

@ -90,6 +90,8 @@ StorageRabbitMQ::StorageRabbitMQ(
, is_attach(is_attach_)
{
auto parsed_address = parseAddress(getContext()->getMacros()->expand(rabbitmq_settings->rabbitmq_host_port), 5672);
context_->getRemoteHostFilter().checkHostAndPort(parsed_address.first, toString(parsed_address.second));
auto rabbitmq_username = rabbitmq_settings->rabbitmq_username.value;
auto rabbitmq_password = rabbitmq_settings->rabbitmq_password.value;
configuration =

View File

@ -156,6 +156,8 @@ StorageMongoDBConfiguration StorageMongoDB::getConfiguration(ASTs engine_args, C
}
context->getRemoteHostFilter().checkHostAndPort(configuration.host, toString(configuration.port));
return configuration;
}

View File

@ -1560,7 +1560,7 @@ bool StorageReplicatedMergeTree::executeLogEntry(LogEntry & entry)
}
bool StorageReplicatedMergeTree::executeFetch(LogEntry & entry)
bool StorageReplicatedMergeTree::executeFetch(LogEntry & entry, bool need_to_check_missing_part)
{
/// Looking for covering part. After that entry.actual_new_part_name may be filled.
String replica = findReplicaHavingCoveringPart(entry, true);
@ -1684,6 +1684,10 @@ bool StorageReplicatedMergeTree::executeFetch(LogEntry & entry)
if (replica.empty())
{
ProfileEvents::increment(ProfileEvents::ReplicatedPartFailedFetches);
if (!need_to_check_missing_part)
return false;
throw Exception("No active replica has part " + entry.new_part_name + " or covering part", ErrorCodes::NO_REPLICA_HAS_PART);
}
}
@ -7715,7 +7719,12 @@ void StorageReplicatedMergeTree::createZeroCopyLockNode(const zkutil::ZooKeeperP
{
try
{
zookeeper->createAncestors(zookeeper_node);
/// Ephemeral locks can be created only when we fetch shared data.
/// So it never require to create ancestors. If we create them
/// race condition with source replica drop is possible.
if (mode == zkutil::CreateMode::Persistent)
zookeeper->createAncestors(zookeeper_node);
if (replace_existing_lock && zookeeper->exists(zookeeper_node))
{
Coordination::Requests ops;

View File

@ -519,7 +519,7 @@ private:
/// NOTE: Attention! First of all tries to find covering part on other replica
/// and set it into entry.actual_new_part_name. After that tries to fetch this new covering part.
/// If fetch was not successful, clears entry.actual_new_part_name.
bool executeFetch(LogEntry & entry);
bool executeFetch(LogEntry & entry, bool need_to_check_missing_part=true);
bool executeReplaceRange(const LogEntry & entry);
void executeClonePartFromShard(const LogEntry & entry);

View File

@ -4,7 +4,7 @@ import argparse
import logging
import os
import re
from typing import Tuple
from typing import List, Tuple
from artifactory import ArtifactorySaaSPath # type: ignore
from build_download_helper import dowload_build_with_progress
@ -283,27 +283,28 @@ def parse_args() -> argparse.Namespace:
return args
def process_deb(s3: S3, art_client: Artifactory):
def process_deb(s3: S3, art_clients: List[Artifactory]):
s3.download_deb()
if art_client is not None:
for art_client in art_clients:
art_client.deploy_deb(s3.packages)
def process_rpm(s3: S3, art_client: Artifactory):
def process_rpm(s3: S3, art_clients: List[Artifactory]):
s3.download_rpm()
if art_client is not None:
for art_client in art_clients:
art_client.deploy_rpm(s3.packages)
def process_tgz(s3: S3, art_client: Artifactory):
def process_tgz(s3: S3, art_clients: List[Artifactory]):
s3.download_tgz()
if art_client is not None:
for art_client in art_clients:
art_client.deploy_tgz(s3.packages)
def main():
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
args = parse_args()
os.makedirs(TEMP_PATH, exist_ok=True)
s3 = S3(
args.bucket_name,
args.pull_request,
@ -312,16 +313,18 @@ def main():
args.release.version,
args.force_download,
)
art_client = None
art_clients = []
if args.artifactory:
art_client = Artifactory(args.artifactory_url, args.release.type)
art_clients.append(Artifactory(args.artifactory_url, args.release.type))
if args.release.type == "lts":
art_clients.append(Artifactory(args.artifactory_url, "stable"))
if args.deb:
process_deb(s3, art_client)
process_deb(s3, art_clients)
if args.rpm:
process_rpm(s3, art_client)
process_rpm(s3, art_clients)
if args.tgz:
process_tgz(s3, art_client)
process_tgz(s3, art_clients)
if __name__ == "__main__":

View File

@ -100,10 +100,10 @@ class Release:
if self.release_type in self.BIG:
# Checkout to the commit, it will provide the correct current version
if with_prestable:
logging.info("Skipping prestable stage")
else:
with self.prestable():
logging.info("Prestable part of the releasing is done")
else:
logging.info("Skipping prestable stage")
with self.testing():
logging.info("Testing part of the releasing is done")

View File

@ -16,51 +16,77 @@ from build_download_helper import download_all_deb_packages
from upload_result_helper import upload_results
from docker_pull_helper import get_image_with_version
from commit_status_helper import post_commit_status
from clickhouse_helper import ClickHouseHelper, mark_flaky_tests, prepare_tests_results_for_clickhouse
from clickhouse_helper import (
ClickHouseHelper,
mark_flaky_tests,
prepare_tests_results_for_clickhouse,
)
from stopwatch import Stopwatch
from rerun_helper import RerunHelper
from tee_popen import TeePopen
def get_run_command(build_path, result_folder, repo_tests_path, server_log_folder, image):
cmd = "docker run --cap-add=SYS_PTRACE -e S3_URL='https://clickhouse-datasets.s3.amazonaws.com' " + \
f"--volume={build_path}:/package_folder " \
f"--volume={result_folder}:/test_output " \
f"--volume={repo_tests_path}:/usr/share/clickhouse-test " \
f"--volume={server_log_folder}:/var/log/clickhouse-server {image}"
def get_run_command(
build_path, result_folder, repo_tests_path, server_log_folder, image
):
cmd = (
"docker run --cap-add=SYS_PTRACE "
"-e S3_URL='https://clickhouse-datasets.s3.amazonaws.com' "
f"--volume={build_path}:/package_folder "
f"--volume={result_folder}:/test_output "
f"--volume={repo_tests_path}:/usr/share/clickhouse-test "
f"--volume={server_log_folder}:/var/log/clickhouse-server {image}"
)
return cmd
def process_results(result_folder, server_log_path, run_log_path):
test_results = []
additional_files = []
# Just upload all files from result_folder.
# If task provides processed results, then it's responsible for content of result_folder.
# If task provides processed results, then it's responsible for content
# of result_folder.
if os.path.exists(result_folder):
test_files = [f for f in os.listdir(result_folder) if os.path.isfile(os.path.join(result_folder, f))]
test_files = [
f
for f in os.listdir(result_folder)
if os.path.isfile(os.path.join(result_folder, f))
]
additional_files = [os.path.join(result_folder, f) for f in test_files]
if os.path.exists(server_log_path):
server_log_files = [f for f in os.listdir(server_log_path) if os.path.isfile(os.path.join(server_log_path, f))]
additional_files = additional_files + [os.path.join(server_log_path, f) for f in server_log_files]
server_log_files = [
f
for f in os.listdir(server_log_path)
if os.path.isfile(os.path.join(server_log_path, f))
]
additional_files = additional_files + [
os.path.join(server_log_path, f) for f in server_log_files
]
additional_files.append(run_log_path)
status_path = os.path.join(result_folder, "check_status.tsv")
if not os.path.exists(status_path):
return "failure", "check_status.tsv doesn't exists", test_results, additional_files
return (
"failure",
"check_status.tsv doesn't exists",
test_results,
additional_files,
)
logging.info("Found check_status.tsv")
with open(status_path, 'r', encoding='utf-8') as status_file:
status = list(csv.reader(status_file, delimiter='\t'))
with open(status_path, "r", encoding="utf-8") as status_file:
status = list(csv.reader(status_file, delimiter="\t"))
if len(status) != 1 or len(status[0]) != 2:
return "error", "Invalid check_status.tsv", test_results, additional_files
state, description = status[0][0], status[0][1]
results_path = os.path.join(result_folder, "test_results.tsv")
with open(results_path, 'r', encoding='utf-8') as results_file:
test_results = list(csv.reader(results_file, delimiter='\t'))
with open(results_path, "r", encoding="utf-8") as results_file:
test_results = list(csv.reader(results_file, delimiter="\t"))
if len(test_results) == 0:
raise Exception("Empty results")
@ -90,7 +116,7 @@ if __name__ == "__main__":
logging.info("Check is already finished according to github status, exiting")
sys.exit(0)
docker_image = get_image_with_version(reports_path, 'clickhouse/stress-test')
docker_image = get_image_with_version(reports_path, "clickhouse/stress-test")
packages_path = os.path.join(temp_path, "packages")
if not os.path.exists(packages_path):
@ -108,7 +134,9 @@ if __name__ == "__main__":
run_log_path = os.path.join(temp_path, "runlog.log")
run_command = get_run_command(packages_path, result_path, repo_tests_path, server_log_path, docker_image)
run_command = get_run_command(
packages_path, result_path, repo_tests_path, server_log_path, docker_image
)
logging.info("Going to run func tests: %s", run_command)
with TeePopen(run_command, run_log_path) as process:
@ -120,16 +148,32 @@ if __name__ == "__main__":
subprocess.check_call(f"sudo chown -R ubuntu:ubuntu {temp_path}", shell=True)
s3_helper = S3Helper('https://s3.amazonaws.com')
state, description, test_results, additional_logs = process_results(result_path, server_log_path, run_log_path)
s3_helper = S3Helper("https://s3.amazonaws.com")
state, description, test_results, additional_logs = process_results(
result_path, server_log_path, run_log_path
)
ch_helper = ClickHouseHelper()
mark_flaky_tests(ch_helper, check_name, test_results)
report_url = upload_results(s3_helper, pr_info.number, pr_info.sha, test_results, [run_log_path] + additional_logs, check_name)
report_url = upload_results(
s3_helper,
pr_info.number,
pr_info.sha,
test_results,
additional_logs,
check_name,
)
print(f"::notice ::Report url: {report_url}")
post_commit_status(gh, pr_info.sha, check_name, description, state, report_url)
prepared_events = prepare_tests_results_for_clickhouse(pr_info, test_results, state, stopwatch.duration_seconds, stopwatch.start_time_str, report_url, check_name)
prepared_events = prepare_tests_results_for_clickhouse(
pr_info,
test_results,
state,
stopwatch.duration_seconds,
stopwatch.start_time_str,
report_url,
check_name,
)
ch_helper.insert_events_into(db="gh-data", table="checks", events=prepared_events)

View File

@ -202,6 +202,29 @@ def get_processlist(args):
return clickhouse_execute_json(args, 'SHOW PROCESSLIST')
def get_processlist_after_test(args):
log_comment = args.testcase_basename
database = args.testcase_database
if args.replicated_database:
return clickhouse_execute_json(args, f"""
SELECT materialize((hostName(), tcpPort())) as host, *
FROM clusterAllReplicas('test_cluster_database_replicated', system.processes)
WHERE
query NOT LIKE '%system.processes%' AND
Settings['log_comment'] = '{log_comment}' AND
current_database = '{database}'
""")
else:
return clickhouse_execute_json(args, f"""
SELECT *
FROM system.processes
WHERE
query NOT LIKE '%system.processes%' AND
Settings['log_comment'] = '{log_comment}' AND
current_database = '{database}'
""")
# collect server stacktraces using gdb
def get_stacktraces_from_gdb(server_pid):
try:
@ -404,7 +427,7 @@ class TestCase:
testcase_args.testcase_start_time = datetime.now()
testcase_basename = os.path.basename(case_file)
testcase_args.testcase_client = f"{testcase_args.client} --log_comment='{testcase_basename}'"
testcase_args.testcase_client = f"{testcase_args.client} --log_comment '{testcase_basename}'"
testcase_args.testcase_basename = testcase_basename
if testcase_args.database:
@ -672,6 +695,16 @@ class TestCase:
proc.stdout is None or 'Exception' not in proc.stdout)
need_drop_database = not maybe_passed
left_queries_check = args.no_left_queries_check is False
if self.tags and 'no-left-queries-check' in self.tags:
left_queries_check = False
if left_queries_check:
processlist = get_processlist_after_test(args)
if processlist:
print(colored(f"\nFound queries left in processlist after running {args.testcase_basename} (database={database}):", args, "red", attrs=["bold"]))
print(json.dumps(processlist, indent=4))
exit_code.value = 1
if need_drop_database:
seconds_left = max(args.timeout - (datetime.now() - start_time).total_seconds(), 20)
try:
@ -1411,6 +1444,7 @@ if __name__ == '__main__':
parser.add_argument('--order', default='desc', choices=['asc', 'desc', 'random'], help='Run order')
parser.add_argument('--testname', action='store_true', default=None, dest='testname', help='Make query with test name before test run')
parser.add_argument('--hung-check', action='store_true', default=False)
parser.add_argument('--no-left-queries-check', action='store_true', default=False)
parser.add_argument('--force-color', action='store_true', default=False)
parser.add_argument('--database', help='Database for tests (random name test_XXXXXX by default)')
parser.add_argument('--no-drop-if-fail', action='store_true', help='Do not drop database for test if test has failed')

View File

@ -139,7 +139,7 @@ def assert_eq_stats(stat1, stat2):
assert stat1.version == stat2.version
assert stat1.cversion == stat2.cversion
assert stat1.aversion == stat2.aversion
assert stat1.aversion == stat2.aversion
assert stat1.ephemeralOwner == stat2.ephemeralOwner
assert stat1.dataLength == stat2.dataLength
assert stat1.numChildren == stat2.numChildren

View File

@ -4,7 +4,7 @@
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<four_letter_word_white_list>*</four_letter_word_white_list>
<four_letter_word_allow_list>*</four_letter_word_allow_list>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>

View File

@ -4,7 +4,7 @@
<server_id>2</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<four_letter_word_white_list>*</four_letter_word_white_list>
<four_letter_word_allow_list>*</four_letter_word_allow_list>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>

View File

@ -4,7 +4,7 @@
<server_id>3</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<four_letter_word_white_list>*</four_letter_word_white_list>
<four_letter_word_allow_list>*</four_letter_word_allow_list>
<coordination_settings>
<operation_timeout_ms>5000</operation_timeout_ms>

View File

@ -2,7 +2,7 @@
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<four_letter_word_white_list>ruok, conf</four_letter_word_white_list>
<four_letter_word_allow_list>ruok, conf</four_letter_word_allow_list>
<raft_configuration>
<server>
<id>1</id>

View File

@ -2,7 +2,7 @@
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>3</server_id>
<four_letter_word_white_list>*</four_letter_word_white_list>
<four_letter_word_allow_list>*</four_letter_word_allow_list>
<raft_configuration>
<server>
<id>1</id>

View File

@ -281,7 +281,7 @@ def test_cmd_conf(started_cluster):
assert "tcp_port_secure" not in result
assert "superdigest" not in result
assert result["four_letter_word_white_list"] == "*"
assert result["four_letter_word_allow_list"] == "*"
assert result["log_storage_path"] == "/var/lib/clickhouse/coordination/log"
assert result["snapshot_storage_path"] == "/var/lib/clickhouse/coordination/snapshots"

Some files were not shown because too many files have changed in this diff Show More