Merge branch 'master' into dev_intel_iaa_deflate

This commit is contained in:
jasperzhu 2022-07-22 11:05:44 +08:00 committed by GitHub
commit 614f3b14a2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
197 changed files with 3520 additions and 673 deletions

3
.gitmodules vendored
View File

@ -280,3 +280,6 @@
[submodule "contrib/base-x"]
path = contrib/base-x
url = https://github.com/ClickHouse/base-x.git
[submodule "contrib/c-ares"]
path = contrib/c-ares
url = https://github.com/ClickHouse/c-ares

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v22.7, 2022-07-21](#226)**<br>
**[ClickHouse release v22.6, 2022-06-16](#226)**<br>
**[ClickHouse release v22.5, 2022-05-19](#225)**<br>
**[ClickHouse release v22.4, 2022-04-20](#224)**<br>
@ -7,6 +8,172 @@
**[ClickHouse release v22.1, 2022-01-18](#221)**<br>
**[Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021/)**<br>
### <a id="227"></a> ClickHouse release 22.7, 2022-07-21
#### Upgrade Notes
* Enable setting `enable_positional_arguments` by default. It allows queries like `SELECT ... ORDER BY 1, 2` where 1, 2 are the references to the select clause. If you need to return the old behavior, disable this setting. [#38204](https://github.com/ClickHouse/ClickHouse/pull/38204) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* `Ordinary` database engine and old storage definition syntax for `*MergeTree` tables are deprecated. By default it's not possible to create new databases with `Ordinary` engine. If `system` database has `Ordinary` engine it will be automatically converted to `Atomic` on server startup. There are settings to keep old behavior (`allow_deprecated_database_ordinary` and `allow_deprecated_syntax_for_merge_tree`), but these settings may be removed in future releases. [#38335](https://github.com/ClickHouse/ClickHouse/pull/38335) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Force rewriting comma join to inner by default (set default value `cross_to_inner_join_rewrite = 2`). To have old behavior set `cross_to_inner_join_rewrite = 1`. [#39326](https://github.com/ClickHouse/ClickHouse/pull/39326) ([Vladimir C](https://github.com/vdimir)). If you will face any incompatibilities, you can turn this setting back.
#### New Feature
* Support expressions with window functions. Closes [#19857](https://github.com/ClickHouse/ClickHouse/issues/19857). [#37848](https://github.com/ClickHouse/ClickHouse/pull/37848) ([Dmitry Novik](https://github.com/novikd)).
* Add new `direct` join algorithm for `EmbeddedRocksDB` tables, see [#33582](https://github.com/ClickHouse/ClickHouse/issues/33582). [#35363](https://github.com/ClickHouse/ClickHouse/pull/35363) ([Vladimir C](https://github.com/vdimir)).
* Added full sorting merge join algorithm. [#35796](https://github.com/ClickHouse/ClickHouse/pull/35796) ([Vladimir C](https://github.com/vdimir)).
* Implement NATS table engine, which allows to pub/sub to NATS. Closes [#32388](https://github.com/ClickHouse/ClickHouse/issues/32388). [#37171](https://github.com/ClickHouse/ClickHouse/pull/37171) ([tchepavel](https://github.com/tchepavel)). ([Kseniia Sumarokova](https://github.com/kssenii))
* Implement table function `mongodb`. Allow writes into `MongoDB` storage / table function. [#37213](https://github.com/ClickHouse/ClickHouse/pull/37213) ([aaapetrenko](https://github.com/aaapetrenko)). ([Kseniia Sumarokova](https://github.com/kssenii))
* Add `SQLInsert` output format. Closes [#38441](https://github.com/ClickHouse/ClickHouse/issues/38441). [#38477](https://github.com/ClickHouse/ClickHouse/pull/38477) ([Kruglov Pavel](https://github.com/Avogar)).
* Introduced settings `additional_table_filters`. Using this setting, you can specify additional filtering condition for a table which will be applied directly after reading. Example: `select number, x, y from (select number from system.numbers limit 5) f any left join (select x, y from table_1) s on f.number = s.x settings additional_table_filters={'system.numbers : 'number != 3', 'table_1' : 'x != 2'}`. Introduced setting `additional_result_filter` which specifies additional filtering condition for query result. Closes [#37918](https://github.com/ClickHouse/ClickHouse/issues/37918). [#38475](https://github.com/ClickHouse/ClickHouse/pull/38475) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add `compatibility` setting and `system.settings_changes` system table that contains information about changes in settings through ClickHouse versions. Closes [#35972](https://github.com/ClickHouse/ClickHouse/issues/35972). [#38957](https://github.com/ClickHouse/ClickHouse/pull/38957) ([Kruglov Pavel](https://github.com/Avogar)).
* Add functions `translate(string, from_string, to_string)` and `translateUTF8(string, from_string, to_string)`. It translates some characters to another. [#38935](https://github.com/ClickHouse/ClickHouse/pull/38935) ([Nikolay Degterinsky](https://github.com/evillique)).
* Support `parseTimeDelta` function. It can be used like ` ;-+,:` can be used as separators, eg. `1yr-2mo`, `2m:6s`: `SELECT parseTimeDelta('1yr-2mo-4w + 12 days, 3 hours : 1 minute ; 33 seconds')`. [#39071](https://github.com/ClickHouse/ClickHouse/pull/39071) ([jiahui-97](https://github.com/jiahui-97)).
* Added `CREATE TABLE ... EMPTY AS SELECT` query. It automatically deduces table structure from the SELECT query, but does not fill the table after creation. Resolves [#38049](https://github.com/ClickHouse/ClickHouse/issues/38049). [#38272](https://github.com/ClickHouse/ClickHouse/pull/38272) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Added options to limit IO operations with remote storage: `max_remote_read_network_bandwidth_for_server` and `max_remote_write_network_bandwidth_for_server`. [#39095](https://github.com/ClickHouse/ClickHouse/pull/39095) ([Sergei Trifonov](https://github.com/serxa)).
* Add `group_by_use_nulls` setting to make aggregation key columns nullable in the case of ROLLUP, CUBE and GROUPING SETS. Closes [#37359](https://github.com/ClickHouse/ClickHouse/issues/37359). [#38642](https://github.com/ClickHouse/ClickHouse/pull/38642) ([Dmitry Novik](https://github.com/novikd)).
* Add the ability to specify compression level during data export. [#38907](https://github.com/ClickHouse/ClickHouse/pull/38907) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add an option to require explicit grants to SELECT from the `system` database. Details: [#38970](https://github.com/ClickHouse/ClickHouse/pull/38970) ([Vitaly Baranov](https://github.com/vitlibar)).
* Functions `multiMatchAny`, `multiMatchAnyIndex`, `multiMatchAllIndices` and their fuzzy variants now accept non-const pattern array argument. [#38485](https://github.com/ClickHouse/ClickHouse/pull/38485) ([Robert Schulze](https://github.com/rschu1ze)). SQL function `multiSearchAllPositions` now accepts non-const needle arguments. [#39167](https://github.com/ClickHouse/ClickHouse/pull/39167) ([Robert Schulze](https://github.com/rschu1ze)).
* Add a setting `zstd_window_log_max` to configure max memory usage on zstd decoding when importing external files. Closes [#35693](https://github.com/ClickHouse/ClickHouse/issues/35693). [#37015](https://github.com/ClickHouse/ClickHouse/pull/37015) ([wuxiaobai24](https://github.com/wuxiaobai24)).
* Add `send_logs_source_regexp` setting. Send server text logs with specified regexp to match log source name. Empty means all sources. [#39161](https://github.com/ClickHouse/ClickHouse/pull/39161) ([Amos Bird](https://github.com/amosbird)).
* Support `ALTER` for `Hive` tables. [#38214](https://github.com/ClickHouse/ClickHouse/pull/38214) ([lgbo](https://github.com/lgbo-ustc)).
* Support `isNullable` function. This function checks whether it's argument is nullable and return 1 or 0. Closes [#38611](https://github.com/ClickHouse/ClickHouse/issues/38611). [#38841](https://github.com/ClickHouse/ClickHouse/pull/38841) ([lokax](https://github.com/lokax)).
* Added functions for base58 encoding/decoding. [#38159](https://github.com/ClickHouse/ClickHouse/pull/38159) ([Andrey Zvonov](https://github.com/zvonand)).
* Add chart visualization to Play UI. [#38197](https://github.com/ClickHouse/ClickHouse/pull/38197) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added L2 Squared distance and norm functions for both arrays and tuples. [#38545](https://github.com/ClickHouse/ClickHouse/pull/38545) ([Julian Gilyadov](https://github.com/israelg99)).
* Add ability to pass HTTP headers to the `url` table function / storage via SQL. Closes [#37897](https://github.com/ClickHouse/ClickHouse/issues/37897). [#38176](https://github.com/ClickHouse/ClickHouse/pull/38176) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add `clickhouse-diagnostics` binary to the packages. [#38647](https://github.com/ClickHouse/ClickHouse/pull/38647) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Experimental Feature
* Adds new setting `implicit_transaction` to run standalone queries inside a transaction. It handles both creation and closing (via COMMIT if the query succeeded or ROLLBACK if it didn't) of the transaction automatically. [#38344](https://github.com/ClickHouse/ClickHouse/pull/38344) ([Raúl Marín](https://github.com/Algunenano)).
#### Performance Improvement
* Distinct optimization for sorted columns. Use specialized distinct transformation in case input stream is sorted by column(s) in distinct. Optimization can be applied to pre-distinct, final distinct, or both. Initial implementation by @dimarub2000. [#37803](https://github.com/ClickHouse/ClickHouse/pull/37803) ([Igor Nikonov](https://github.com/devcrafter)).
* Improve performance of `ORDER BY`, `MergeTree` merges, window functions using batch version of `BinaryHeap`. [#38022](https://github.com/ClickHouse/ClickHouse/pull/38022) ([Maksim Kita](https://github.com/kitaisreal)).
* More parallel execution for queries with `FINAL` [#36396](https://github.com/ClickHouse/ClickHouse/pull/36396) ([Nikita Taranov](https://github.com/nickitat)).
* Fix significant join performance regression which was introduced in [#35616](https://github.com/ClickHouse/ClickHouse/pull/35616). It's interesting that common join queries such as ssb queries have been 10 times slower for almost 3 months while no one complains. [#38052](https://github.com/ClickHouse/ClickHouse/pull/38052) ([Amos Bird](https://github.com/amosbird)).
* Migrate from the Intel hyperscan library to vectorscan, this speeds up many string matching on non-x86 platforms. [#38171](https://github.com/ClickHouse/ClickHouse/pull/38171) ([Robert Schulze](https://github.com/rschu1ze)).
* Increased parallelism of query plan steps executed after aggregation. [#38295](https://github.com/ClickHouse/ClickHouse/pull/38295) ([Nikita Taranov](https://github.com/nickitat)).
* Improve performance of insertion to columns of type `JSON`. [#38320](https://github.com/ClickHouse/ClickHouse/pull/38320) ([Anton Popov](https://github.com/CurtizJ)).
* Optimized insertion and lookups in the HashTable. [#38413](https://github.com/ClickHouse/ClickHouse/pull/38413) ([Nikita Taranov](https://github.com/nickitat)).
* Fix performance degradation from [#32493](https://github.com/ClickHouse/ClickHouse/issues/32493). [#38417](https://github.com/ClickHouse/ClickHouse/pull/38417) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve performance of joining with numeric columns using SIMD instructions. [#37235](https://github.com/ClickHouse/ClickHouse/pull/37235) ([zzachimed](https://github.com/zzachimed)). [#38565](https://github.com/ClickHouse/ClickHouse/pull/38565) ([Maksim Kita](https://github.com/kitaisreal)).
* Norm and Distance functions for arrays speed up 1.2-2 times. [#38740](https://github.com/ClickHouse/ClickHouse/pull/38740) ([Alexander Gololobov](https://github.com/davenger)).
* Add AVX-512 VBMI optimized `copyOverlap32Shuffle` for LZ4 decompression. In other words, LZ4 decompression performance is improved. [#37891](https://github.com/ClickHouse/ClickHouse/pull/37891) ([Guo Wangyang](https://github.com/guowangy)).
* `ORDER BY (a, b)` will use all the same benefits as `ORDER BY a, b`. [#38873](https://github.com/ClickHouse/ClickHouse/pull/38873) ([Igor Nikonov](https://github.com/devcrafter)).
* Align branches within a 32B boundary to make benchmark more stable. [#38988](https://github.com/ClickHouse/ClickHouse/pull/38988) ([Guo Wangyang](https://github.com/guowangy)). It improves performance 1..2% on average for Intel.
* Executable UDF, executable dictionaries, and Executable tables will avoid wasting one second during wait for subprocess termination. [#38929](https://github.com/ClickHouse/ClickHouse/pull/38929) ([Constantine Peresypkin](https://github.com/pkit)).
* Optimize accesses to `system.stack_trace` table if not all columns are selected. [#39177](https://github.com/ClickHouse/ClickHouse/pull/39177) ([Azat Khuzhin](https://github.com/azat)).
* Improve isNullable/isConstant/isNull/isNotNull performance for LowCardinality argument. [#39192](https://github.com/ClickHouse/ClickHouse/pull/39192) ([Kruglov Pavel](https://github.com/Avogar)).
* Optimized processing of ORDER BY in window functions. [#34632](https://github.com/ClickHouse/ClickHouse/pull/34632) ([Vladimir Chebotarev](https://github.com/excitoon)).
* The table `system.asynchronous_metric_log` is further optimized for storage space. This closes [#38134](https://github.com/ClickHouse/ClickHouse/issues/38134). See the [YouTube video](https://www.youtube.com/watch?v=0fSp9SF8N8A). [#38428](https://github.com/ClickHouse/ClickHouse/pull/38428) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Improvement
* Support SQL standard CREATE INDEX and DROP INDEX syntax. [#35166](https://github.com/ClickHouse/ClickHouse/pull/35166) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Send profile events for INSERT queries (previously only SELECT was supported). [#37391](https://github.com/ClickHouse/ClickHouse/pull/37391) ([Azat Khuzhin](https://github.com/azat)).
* Implement in order aggregation (`optimize_aggregation_in_order`) for fully materialized projections. [#37469](https://github.com/ClickHouse/ClickHouse/pull/37469) ([Azat Khuzhin](https://github.com/azat)).
* Remove subprocess run for kerberos initialization. Added new integration test. Closes [#27651](https://github.com/ClickHouse/ClickHouse/issues/27651). [#38105](https://github.com/ClickHouse/ClickHouse/pull/38105) ([Roman Vasin](https://github.com/rvasin)).
* * Add setting `multiple_joins_try_to_keep_original_names` to not rewrite identifier name on multiple JOINs rewrite, close [#34697](https://github.com/ClickHouse/ClickHouse/issues/34697). [#38149](https://github.com/ClickHouse/ClickHouse/pull/38149) ([Vladimir C](https://github.com/vdimir)).
* Improved trace-visualizer UX. [#38169](https://github.com/ClickHouse/ClickHouse/pull/38169) ([Sergei Trifonov](https://github.com/serxa)).
* Enable stack trace collection and query profiler for AArch64. [#38181](https://github.com/ClickHouse/ClickHouse/pull/38181) ([Maksim Kita](https://github.com/kitaisreal)).
* Do not skip symlinks in `user_defined` directory during SQL user defined functions loading. Closes [#38042](https://github.com/ClickHouse/ClickHouse/issues/38042). [#38184](https://github.com/ClickHouse/ClickHouse/pull/38184) ([Maksim Kita](https://github.com/kitaisreal)).
* Added background cleanup of subdirectories in `store/`. In some cases clickhouse-server might left garbage subdirectories in `store/` (for example, on unsuccessful table creation) and those dirs were never been removed. Fixes [#33710](https://github.com/ClickHouse/ClickHouse/issues/33710). [#38265](https://github.com/ClickHouse/ClickHouse/pull/38265) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add `DESCRIBE CACHE` query to show cache settings from config. Add `SHOW CACHES` query to show available filesystem caches list. [#38279](https://github.com/ClickHouse/ClickHouse/pull/38279) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add access check for `system drop filesystem cache`. Support ON CLUSTER. [#38319](https://github.com/ClickHouse/ClickHouse/pull/38319) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix PostgreSQL database engine incompatibility on upgrade from 21.3 to 22.3. Closes [#36659](https://github.com/ClickHouse/ClickHouse/issues/36659). [#38369](https://github.com/ClickHouse/ClickHouse/pull/38369) ([Kseniia Sumarokova](https://github.com/kssenii)).
* `filesystemAvailable` and similar functions now work in `clickhouse-local`. This closes [#38423](https://github.com/ClickHouse/ClickHouse/issues/38423). [#38424](https://github.com/ClickHouse/ClickHouse/pull/38424) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add `revision` function. [#38555](https://github.com/ClickHouse/ClickHouse/pull/38555) ([Azat Khuzhin](https://github.com/azat)).
* Fix GCS via proxy tunnel usage. [#38726](https://github.com/ClickHouse/ClickHouse/pull/38726) ([Azat Khuzhin](https://github.com/azat)).
* Support `\i file` in clickhouse client / local (similar to psql \i). [#38813](https://github.com/ClickHouse/ClickHouse/pull/38813) ([Kseniia Sumarokova](https://github.com/kssenii)).
* New option `optimize = 1` in `EXPLAIN AST`. If enabled, it shows AST after it's rewritten, otherwise AST of original query. Disabled by default. [#38910](https://github.com/ClickHouse/ClickHouse/pull/38910) ([Igor Nikonov](https://github.com/devcrafter)).
* Allow trailing comma in columns list. closes [#38425](https://github.com/ClickHouse/ClickHouse/issues/38425). [#38440](https://github.com/ClickHouse/ClickHouse/pull/38440) ([chen](https://github.com/xiedeyantu)).
* Bugfixes and performance improvements for `parallel_hash` JOIN method. [#37648](https://github.com/ClickHouse/ClickHouse/pull/37648) ([Vladimir C](https://github.com/vdimir)).
* Support hadoop secure RPC transfer (hadoop.rpc.protection=privacy and hadoop.rpc.protection=integrity). [#37852](https://github.com/ClickHouse/ClickHouse/pull/37852) ([Peng Liu](https://github.com/michael1589)).
* Add struct type support in `StorageHive`. [#38118](https://github.com/ClickHouse/ClickHouse/pull/38118) ([lgbo](https://github.com/lgbo-ustc)).
* S3 single objects are now removed with `RemoveObjectRequest`. Implement compatibility with GCP which did not allow to use `removeFileIfExists` effectively breaking approximately half of `remove` functionality. Automatic detection for `DeleteObjects` S3 API, that is not supported by GCS. This will allow to use GCS without explicit `support_batch_delete=0` in configuration. [#37882](https://github.com/ClickHouse/ClickHouse/pull/37882) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Expose basic ClickHouse Keeper related monitoring data (via ProfileEvents and CurrentMetrics). [#38072](https://github.com/ClickHouse/ClickHouse/pull/38072) ([lingpeng0314](https://github.com/lingpeng0314)).
* Support `auto_close` option for PostgreSQL engine connection. Closes [#31486](https://github.com/ClickHouse/ClickHouse/issues/31486). [#38363](https://github.com/ClickHouse/ClickHouse/pull/38363) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow `NULL` modifier in columns declaration for table functions. [#38816](https://github.com/ClickHouse/ClickHouse/pull/38816) ([Kruglov Pavel](https://github.com/Avogar)).
* Deactivate `mutations_finalizing_task` before shutdown to avoid benign `TABLE_IS_READ_ONLY` errors during shutdown. [#38851](https://github.com/ClickHouse/ClickHouse/pull/38851) ([Raúl Marín](https://github.com/Algunenano)).
* Eliminate unnecessary waiting of SELECT queries after ALTER queries in presence of INSERT queries if you use deprecated Ordinary databases. [#38864](https://github.com/ClickHouse/ClickHouse/pull/38864) ([Azat Khuzhin](https://github.com/azat)).
* New option `rewrite` in `EXPLAIN AST`. If enabled, it shows AST after it's rewritten, otherwise AST of original query. Disabled by default. [#38910](https://github.com/ClickHouse/ClickHouse/pull/38910) ([Igor Nikonov](https://github.com/devcrafter)).
* Stop reporting Zookeeper "Node exists" exceptions in system.errors when they are expected. [#38961](https://github.com/ClickHouse/ClickHouse/pull/38961) ([Raúl Marín](https://github.com/Algunenano)).
* `clickhouse-keeper`: add support for real-time digest calculation and verification. It is disabled by default. [#37555](https://github.com/ClickHouse/ClickHouse/pull/37555) ([Antonio Andelic](https://github.com/antonio2368)).
* Allow to specify globs `* or {expr1, expr2, expr3}` inside a key for `clickhouse-extract-from-config` tool. [#38966](https://github.com/ClickHouse/ClickHouse/pull/38966) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* clearOldLogs: Don't report KEEPER_EXCEPTION on concurrent deletes. [#39016](https://github.com/ClickHouse/ClickHouse/pull/39016) ([Raúl Marín](https://github.com/Algunenano)).
* clickhouse-keeper improvement: persist meta-information about keeper servers to disk. [#39069](https://github.com/ClickHouse/ClickHouse/pull/39069) ([Antonio Andelic](https://github.com/antonio2368)). This will make it easier to operate if you shutdown or restart all keeper nodes at the same time.
* Continue without exception when running out of disk space when using filesystem cache. [#39106](https://github.com/ClickHouse/ClickHouse/pull/39106) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Handling SIGTERM signals from k8s. [#39130](https://github.com/ClickHouse/ClickHouse/pull/39130) ([Timur Solodovnikov](https://github.com/tsolodov)).
* Add `merge_algorithm` column (Undecided, Horizontal, Vertical) to system.part_log. [#39181](https://github.com/ClickHouse/ClickHouse/pull/39181) ([Azat Khuzhin](https://github.com/azat)).
* Don't increment a counter in `system.errors` when the disk is not rotational. [#39216](https://github.com/ClickHouse/ClickHouse/pull/39216) ([Raúl Marín](https://github.com/Algunenano)).
* The metric `result_bytes` for `INSERT` queries in `system.query_log` shows number of bytes inserted. Previously value was incorrect and stored the same value as `result_rows`. [#39225](https://github.com/ClickHouse/ClickHouse/pull/39225) ([Ilya Yatsishin](https://github.com/qoega)).
* The CPU usage metric in clickhouse-client will be displayed in a better way. Fixes [#38756](https://github.com/ClickHouse/ClickHouse/issues/38756). [#39280](https://github.com/ClickHouse/ClickHouse/pull/39280) ([Sergei Trifonov](https://github.com/serxa)).
* Rethrow exception on filesystem cache initialization on server startup, better error message. [#39386](https://github.com/ClickHouse/ClickHouse/pull/39386) ([Kseniia Sumarokova](https://github.com/kssenii)).
* OpenTelemetry now collects traces without Processors spans by default (there are too many). To enable Processors spans collection `opentelemetry_trace_processors` setting. [#39170](https://github.com/ClickHouse/ClickHouse/pull/39170) ([Ilya Yatsishin](https://github.com/qoega)).
* Functions `multiMatch[Fuzzy](AllIndices/Any/AnyIndex)` - don't throw a logical error if the needle argument is empty. [#39012](https://github.com/ClickHouse/ClickHouse/pull/39012) ([Robert Schulze](https://github.com/rschu1ze)).
* Allow to declare `RabbitMQ` queue without default arguments `x-max-length` and `x-overflow`. [#39259](https://github.com/ClickHouse/ClickHouse/pull/39259) ([rnbondarenko](https://github.com/rnbondarenko)).
#### Build/Testing/Packaging Improvement
* Apply Clang Thread Safety Analysis (TSA) annotations to ClickHouse. [#38068](https://github.com/ClickHouse/ClickHouse/pull/38068) ([Robert Schulze](https://github.com/rschu1ze)).
* Adapt universal installation script for FreeBSD. [#39302](https://github.com/ClickHouse/ClickHouse/pull/39302) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Preparation for building on `s390x` platform. [#39193](https://github.com/ClickHouse/ClickHouse/pull/39193) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fix a bug in `jemalloc` library [#38757](https://github.com/ClickHouse/ClickHouse/pull/38757) ([Azat Khuzhin](https://github.com/azat)).
* Hardware benchmark now has support for automatic results uploading. [#38427](https://github.com/ClickHouse/ClickHouse/pull/38427) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* System table "system.licenses" is now correctly populated on Mac (Darwin). [#38294](https://github.com/ClickHouse/ClickHouse/pull/38294) ([Robert Schulze](https://github.com/rschu1ze)).
* Change `all|noarch` packages to architecture-dependent - Fix some documentation for it - Push aarch64|arm64 packages to artifactory and release assets - Fixes [#36443](https://github.com/ClickHouse/ClickHouse/issues/36443). [#38580](https://github.com/ClickHouse/ClickHouse/pull/38580) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
* Fix rounding for `Decimal128/Decimal256` with more than 19-digits long scale. [#38027](https://github.com/ClickHouse/ClickHouse/pull/38027) ([Igor Nikonov](https://github.com/devcrafter)).
* Fixed crash caused by data race in storage `Hive` (integration table engine). [#38887](https://github.com/ClickHouse/ClickHouse/pull/38887) ([lgbo](https://github.com/lgbo-ustc)).
* Fix crash when executing GRANT ALL ON *.* with ON CLUSTER. It was broken in https://github.com/ClickHouse/ClickHouse/pull/35767. This closes [#38618](https://github.com/ClickHouse/ClickHouse/issues/38618). [#38674](https://github.com/ClickHouse/ClickHouse/pull/38674) ([Vitaly Baranov](https://github.com/vitlibar)).
* Correct glob expansion in case of `{0..10}` forms. Fixes [#38498](https://github.com/ClickHouse/ClickHouse/issues/38498) Current Implementation is similar to what shell does mentiond by @rschu1ze [here](https://github.com/ClickHouse/ClickHouse/pull/38502#issuecomment-1169057723). [#38502](https://github.com/ClickHouse/ClickHouse/pull/38502) ([Heena Bansal](https://github.com/HeenaBansal2009)).
* Fix crash for `mapUpdate`, `mapFilter` functions when using with constant map argument. Closes [#38547](https://github.com/ClickHouse/ClickHouse/issues/38547). [#38553](https://github.com/ClickHouse/ClickHouse/pull/38553) ([hexiaoting](https://github.com/hexiaoting)).
* Fix `toHour` monotonicity information for query optimization which can lead to incorrect query result (incorrect index analysis). This fixes [#38333](https://github.com/ClickHouse/ClickHouse/issues/38333). [#38675](https://github.com/ClickHouse/ClickHouse/pull/38675) ([Amos Bird](https://github.com/amosbird)).
* Fix checking whether s3 storage support parallel writes. It resulted in s3 parallel writes not working. [#38792](https://github.com/ClickHouse/ClickHouse/pull/38792) ([chen](https://github.com/xiedeyantu)).
* Fix s3 seekable reads with parallel read buffer. (Affected memory usage during query). Closes [#38258](https://github.com/ClickHouse/ClickHouse/issues/38258). [#38802](https://github.com/ClickHouse/ClickHouse/pull/38802) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update `simdjson`. This fixes [#38621](https://github.com/ClickHouse/ClickHouse/issues/38621) - a buffer overflow on machines with the latest Intel CPUs with AVX-512 VBMI. [#38838](https://github.com/ClickHouse/ClickHouse/pull/38838) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix possible logical error for Vertical merges. [#38859](https://github.com/ClickHouse/ClickHouse/pull/38859) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix settings profile with seconds unit. [#38896](https://github.com/ClickHouse/ClickHouse/pull/38896) ([Raúl Marín](https://github.com/Algunenano)).
* Fix incorrect partition pruning when there is a nullable partition key. Note: most likely you don't use nullable partition keys - this is an obscure feature you should not use. Nullable keys are a nonsense and this feature is only needed for some crazy use-cases. This fixes [#38941](https://github.com/ClickHouse/ClickHouse/issues/38941). [#38946](https://github.com/ClickHouse/ClickHouse/pull/38946) ([Amos Bird](https://github.com/amosbird)).
* Improve `fsync_part_directory` for fetches. [#38993](https://github.com/ClickHouse/ClickHouse/pull/38993) ([Azat Khuzhin](https://github.com/azat)).
* Fix possible dealock inside `OvercommitTracker`. Fixes [#37794](https://github.com/ClickHouse/ClickHouse/issues/37794). [#39030](https://github.com/ClickHouse/ClickHouse/pull/39030) ([Dmitry Novik](https://github.com/novikd)).
* Fix bug in filesystem cache that could happen in some corner case which coincided with cache capacity hitting the limit. Closes [#39066](https://github.com/ClickHouse/ClickHouse/issues/39066). [#39070](https://github.com/ClickHouse/ClickHouse/pull/39070) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix some corner cases of interpretation of the arguments of window expressions. Fixes [#38538](https://github.com/ClickHouse/ClickHouse/issues/38538) Allow using of higher-order functions in window expressions. [#39112](https://github.com/ClickHouse/ClickHouse/pull/39112) ([Dmitry Novik](https://github.com/novikd)).
* Keep `LowCardinality` type in `tuple` function. Previously `LowCardinality` type was dropped and elements of created tuple had underlying type of `LowCardinality`. [#39113](https://github.com/ClickHouse/ClickHouse/pull/39113) ([Anton Popov](https://github.com/CurtizJ)).
* Fix error `Block structure mismatch` which could happen for INSERT into table with attached MATERIALIZED VIEW and enabled setting `extremes = 1`. Closes [#29759](https://github.com/ClickHouse/ClickHouse/issues/29759) and [#38729](https://github.com/ClickHouse/ClickHouse/issues/38729). [#39125](https://github.com/ClickHouse/ClickHouse/pull/39125) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix unexpected query result when both `optimize_trivial_count_query` and `empty_result_for_aggregation_by_empty_set` are set to true. This fixes [#39140](https://github.com/ClickHouse/ClickHouse/issues/39140). [#39155](https://github.com/ClickHouse/ClickHouse/pull/39155) ([Amos Bird](https://github.com/amosbird)).
* Fixed error `Not found column Type in block` in selects with `PREWHERE` and read-in-order optimizations. [#39157](https://github.com/ClickHouse/ClickHouse/pull/39157) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix extremely rare race condition in during hardlinks for remote filesystem. The only way to reproduce it is concurrent run of backups. [#39190](https://github.com/ClickHouse/ClickHouse/pull/39190) ([alesapin](https://github.com/alesapin)).
* (zero-copy replication is an experimental feature that should not be used in production) Fix fetch of in-memory part with `allow_remote_fs_zero_copy_replication`. [#39214](https://github.com/ClickHouse/ClickHouse/pull/39214) ([Azat Khuzhin](https://github.com/azat)).
* (MaterializedPostgreSQL - experimental feature). Fix segmentation fault in MaterializedPostgreSQL database engine, which could happen if some exception occurred at replication initialisation. Closes [#36939](https://github.com/ClickHouse/ClickHouse/issues/36939). [#39272](https://github.com/ClickHouse/ClickHouse/pull/39272) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix incorrect fetch of table metadata from PostgreSQL database engine. Closes [#33502](https://github.com/ClickHouse/ClickHouse/issues/33502). [#39283](https://github.com/ClickHouse/ClickHouse/pull/39283) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix projection exception when aggregation keys are wrapped inside other functions. This fixes [#37151](https://github.com/ClickHouse/ClickHouse/issues/37151). [#37155](https://github.com/ClickHouse/ClickHouse/pull/37155) ([Amos Bird](https://github.com/amosbird)).
* Fix possible logical error `... with argument with type Nothing and default implementation for Nothing is expected to return result with type Nothing, got ...` in some functions. Closes: [#37610](https://github.com/ClickHouse/ClickHouse/issues/37610) Closes: [#37741](https://github.com/ClickHouse/ClickHouse/issues/37741). [#37759](https://github.com/ClickHouse/ClickHouse/pull/37759) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix incorrect columns order in subqueries of UNION (in case of duplicated columns in subselects may produce incorrect result). [#37887](https://github.com/ClickHouse/ClickHouse/pull/37887) ([Azat Khuzhin](https://github.com/azat)).
* Fix incorrect work of MODIFY ALTER Column with column names that contain dots. Closes [#37907](https://github.com/ClickHouse/ClickHouse/issues/37907). [#37971](https://github.com/ClickHouse/ClickHouse/pull/37971) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix reading of sparse columns from `MergeTree` tables that store their data in S3. [#37978](https://github.com/ClickHouse/ClickHouse/pull/37978) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible crash in `Distributed` async insert in case of removing a replica from config. [#38029](https://github.com/ClickHouse/ClickHouse/pull/38029) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix "Missing columns" for GLOBAL JOIN with CTE without alias. [#38056](https://github.com/ClickHouse/ClickHouse/pull/38056) ([Azat Khuzhin](https://github.com/azat)).
* Rewrite tuple functions as literals in backwards-compatibility mode. [#38096](https://github.com/ClickHouse/ClickHouse/pull/38096) ([Anton Kozlov](https://github.com/tonickkozlov)).
* Fix redundant memory reservation for output block during `ORDER BY`. [#38127](https://github.com/ClickHouse/ClickHouse/pull/38127) ([iyupeng](https://github.com/iyupeng)).
* Fix possible logical error `Bad cast from type DB::IColumn* to DB::ColumnNullable*` in array mapped functions. Closes [#38006](https://github.com/ClickHouse/ClickHouse/issues/38006). [#38132](https://github.com/ClickHouse/ClickHouse/pull/38132) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix temporary name clash in partial merge join, close [#37928](https://github.com/ClickHouse/ClickHouse/issues/37928). [#38135](https://github.com/ClickHouse/ClickHouse/pull/38135) ([Vladimir C](https://github.com/vdimir)).
* Some minr issue with queries like `CREATE TABLE nested_name_tuples (`a` Tuple(x String, y Tuple(i Int32, j String))) ENGINE = Memory;` [#38136](https://github.com/ClickHouse/ClickHouse/pull/38136) ([lgbo](https://github.com/lgbo-ustc)).
* Fix bug with nested short-circuit functions that led to execution of arguments even if condition is false. Closes [#38040](https://github.com/ClickHouse/ClickHouse/issues/38040). [#38173](https://github.com/ClickHouse/ClickHouse/pull/38173) ([Kruglov Pavel](https://github.com/Avogar)).
* (Window View is a experimental feature) Fix LOGICAL_ERROR for WINDOW VIEW with incorrect structure. [#38205](https://github.com/ClickHouse/ClickHouse/pull/38205) ([Azat Khuzhin](https://github.com/azat)).
* Update librdkafka submodule to fix crash when an OAUTHBEARER refresh callback is set. [#38225](https://github.com/ClickHouse/ClickHouse/pull/38225) ([Rafael Acevedo](https://github.com/racevedoo)).
* Fix INSERT into Distributed hung due to ProfileEvents. [#38307](https://github.com/ClickHouse/ClickHouse/pull/38307) ([Azat Khuzhin](https://github.com/azat)).
* Fix retries in PostgreSQL engine. [#38310](https://github.com/ClickHouse/ClickHouse/pull/38310) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix optimization in PartialSortingTransform (SIGSEGV and possible incorrect result). [#38324](https://github.com/ClickHouse/ClickHouse/pull/38324) ([Azat Khuzhin](https://github.com/azat)).
* Fix RabbitMQ with formats based on PeekableReadBuffer. Closes [#38061](https://github.com/ClickHouse/ClickHouse/issues/38061). [#38356](https://github.com/ClickHouse/ClickHouse/pull/38356) ([Kseniia Sumarokova](https://github.com/kssenii)).
* MaterializedPostgreSQL - experimentail feature. Fix possible `Invalid number of rows in Chunk` in MaterializedPostgreSQL. Closes [#37323](https://github.com/ClickHouse/ClickHouse/issues/37323). [#38360](https://github.com/ClickHouse/ClickHouse/pull/38360) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix RabbitMQ configuration with connection string setting. Closes [#36531](https://github.com/ClickHouse/ClickHouse/issues/36531). [#38365](https://github.com/ClickHouse/ClickHouse/pull/38365) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix PostgreSQL engine not using PostgreSQL schema when retrieving array dimension size. Closes [#36755](https://github.com/ClickHouse/ClickHouse/issues/36755). Closes [#36772](https://github.com/ClickHouse/ClickHouse/issues/36772). [#38366](https://github.com/ClickHouse/ClickHouse/pull/38366) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possibly incorrect result of distributed queries with `DISTINCT` and `LIMIT`. Fixes [#38282](https://github.com/ClickHouse/ClickHouse/issues/38282). [#38371](https://github.com/ClickHouse/ClickHouse/pull/38371) ([Anton Popov](https://github.com/CurtizJ)).
* Fix wrong results of countSubstrings() & position() on patterns with 0-bytes. [#38589](https://github.com/ClickHouse/ClickHouse/pull/38589) ([Robert Schulze](https://github.com/rschu1ze)).
* Now it's possible to start a clickhouse-server and attach/detach tables even for tables with the incorrect values of IPv4/IPv6 representation. Proper fix for issue [#35156](https://github.com/ClickHouse/ClickHouse/issues/35156). [#38590](https://github.com/ClickHouse/ClickHouse/pull/38590) ([alesapin](https://github.com/alesapin)).
* `rankCorr` function will work correctly if some arguments are NaNs. This closes [#38396](https://github.com/ClickHouse/ClickHouse/issues/38396). [#38722](https://github.com/ClickHouse/ClickHouse/pull/38722) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix `parallel_view_processing=1` with `optimize_trivial_insert_select=1`. Fix `max_insert_threads` while pushing to views. [#38731](https://github.com/ClickHouse/ClickHouse/pull/38731) ([Azat Khuzhin](https://github.com/azat)).
* Fix use-after-free for aggregate functions with `Map` combinator that leads to incorrect result. [#38748](https://github.com/ClickHouse/ClickHouse/pull/38748) ([Azat Khuzhin](https://github.com/azat)).
### <a id="226"></a> ClickHouse release 22.6, 2022-06-16
#### Backward Incompatible Change

View File

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.14)
cmake_minimum_required(VERSION 3.15)
project(ClickHouse LANGUAGES C CXX ASM)

View File

@ -1,176 +1,68 @@
#include "atomic.h"
#include <sys/auxv.h>
#include <fcntl.h> // open
#include <sys/stat.h> // O_RDONLY
#include <unistd.h> // read, close
#include <stdlib.h> // ssize_t
#include <stdio.h> // perror, fprintf
#include <link.h> // ElfW
#include "atomic.h"
#include <unistd.h> // __environ
#include <errno.h>
#define ARRAY_SIZE(a) sizeof((a))/sizeof((a[0]))
// We don't have libc struct available here.
// Compute aux vector manually (from /proc/self/auxv).
//
// Right now there is only 51 AT_* constants,
// so 64 should be enough until this implementation will be replaced with musl.
static unsigned long __auxv_procfs[64];
// We don't have libc struct available here. Compute aux vector manually.
static unsigned long * __auxv = NULL;
static unsigned long __auxv_secure = 0;
// Common
static unsigned long * __auxv_environ = NULL;
static void * volatile getauxval_func;
static unsigned long __auxv_init_environ(unsigned long type);
//
// auxv from procfs interface
//
ssize_t __retry_read(int fd, void * buf, size_t count)
{
for (;;)
{
ssize_t ret = read(fd, buf, count);
if (ret == -1)
{
if (errno == EINTR)
{
continue;
}
perror("Cannot read /proc/self/auxv");
abort();
}
return ret;
}
}
unsigned long __getauxval_procfs(unsigned long type)
{
if (type == AT_SECURE)
{
return __auxv_secure;
}
if (type >= ARRAY_SIZE(__auxv_procfs))
{
errno = ENOENT;
return 0;
}
return __auxv_procfs[type];
}
static unsigned long __auxv_init_procfs(unsigned long type)
{
// For debugging:
// - od -t dL /proc/self/auxv
// - LD_SHOW_AUX= ls
int fd = open("/proc/self/auxv", O_RDONLY);
// It is possible in case of:
// - no procfs mounted
// - on android you are not able to read it unless running from shell or debugging
// - some other issues
if (fd == -1)
{
// Fallback to environ.
a_cas_p(&getauxval_func, (void *)__auxv_init_procfs, (void *)__auxv_init_environ);
return __auxv_init_environ(type);
}
ElfW(auxv_t) aux;
/// NOTE: sizeof(aux) is very small (less then PAGE_SIZE), so partial read should not be possible.
_Static_assert(sizeof(aux) < 4096, "Unexpected sizeof(aux)");
while (__retry_read(fd, &aux, sizeof(aux)) == sizeof(aux))
{
if (aux.a_type >= ARRAY_SIZE(__auxv_procfs))
{
fprintf(stderr, "AT_* is out of range: %li (maximum allowed is %zu)\n", aux.a_type, ARRAY_SIZE(__auxv_procfs));
abort();
}
if (__auxv_procfs[aux.a_type])
{
fprintf(stderr, "AUXV already has value (%zu)\n", __auxv_procfs[aux.a_type]);
abort();
}
__auxv_procfs[aux.a_type] = aux.a_un.a_val;
}
close(fd);
__auxv_secure = __getauxval_procfs(AT_SECURE);
// Now we've initialized __auxv_procfs, next time getauxval() will only call __get_auxval().
a_cas_p(&getauxval_func, (void *)__auxv_init_procfs, (void *)__getauxval_procfs);
return __getauxval_procfs(type);
}
//
// auxv from environ interface
//
// NOTE: environ available only after static initializers,
// so you cannot rely on this if you need getauxval() before.
//
// Good example of such user is sanitizers, for example
// LSan will not work with __auxv_init_environ(),
// since it needs getauxval() before.
//
static size_t __find_auxv(unsigned long type)
{
size_t i;
for (i = 0; __auxv_environ[i]; i += 2)
for (i = 0; __auxv[i]; i += 2)
{
if (__auxv_environ[i] == type)
{
if (__auxv[i] == type)
return i + 1;
}
}
return (size_t) -1;
}
unsigned long __getauxval_environ(unsigned long type)
unsigned long __getauxval(unsigned long type)
{
if (type == AT_SECURE)
return __auxv_secure;
if (__auxv_environ)
if (__auxv)
{
size_t index = __find_auxv(type);
if (index != ((size_t) -1))
return __auxv_environ[index];
return __auxv[index];
}
errno = ENOENT;
return 0;
}
static unsigned long __auxv_init_environ(unsigned long type)
static void * volatile getauxval_func;
static unsigned long __auxv_init(unsigned long type)
{
if (!__environ)
{
// __environ is not initialized yet so we can't initialize __auxv_environ right now.
// __environ is not initialized yet so we can't initialize __auxv right now.
// That's normally occurred only when getauxval() is called from some sanitizer's internal code.
errno = ENOENT;
return 0;
}
// Initialize __auxv_environ and __auxv_secure.
// Initialize __auxv and __auxv_secure.
size_t i;
for (i = 0; __environ[i]; i++);
__auxv_environ = (unsigned long *) (__environ + i + 1);
__auxv = (unsigned long *) (__environ + i + 1);
size_t secure_idx = __find_auxv(AT_SECURE);
if (secure_idx != ((size_t) -1))
__auxv_secure = __auxv_environ[secure_idx];
__auxv_secure = __auxv[secure_idx];
// Now we need to switch to __getauxval_environ for all later calls, since
// everything is initialized.
a_cas_p(&getauxval_func, (void *)__auxv_init_environ, (void *)__getauxval_environ);
// Now we've initialized __auxv, next time getauxval() will only call __get_auxval().
a_cas_p(&getauxval_func, (void *)__auxv_init, (void *)__getauxval);
return __getauxval_environ(type);
return __getauxval(type);
}
// Callchain:
// - __auxv_init_procfs -> __getauxval_environ
// - __auxv_init_procfs -> __auxv_init_environ -> __getauxval_environ
static void * volatile getauxval_func = (void *)__auxv_init_procfs;
// First time getauxval() will call __auxv_init().
static void * volatile getauxval_func = (void *)__auxv_init;
unsigned long getauxval(unsigned long type)
{

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54464)
SET(VERSION_REVISION 54465)
SET(VERSION_MAJOR 22)
SET(VERSION_MINOR 7)
SET(VERSION_MINOR 8)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 7000c4e0033bb9e69050ab8ef73e8e7465f78059)
SET(VERSION_DESCRIBE v22.7.1.1-testing)
SET(VERSION_STRING 22.7.1.1)
SET(VERSION_GITHASH f4f05ec786a8b8966dd0ea2a2d7e39a8c7db24f4)
SET(VERSION_DESCRIBE v22.8.1.1-testing)
SET(VERSION_STRING 22.8.1.1)
# end of autochange

View File

@ -1,3 +1,19 @@
if (_CLICKHOUSE_TOOLCHAIN_FILE_LOADED)
# During first run of cmake the toolchain file will be loaded twice,
# - /usr/share/cmake-3.23/Modules/CMakeDetermineSystem.cmake
# - /bld/CMakeFiles/3.23.2/CMakeSystem.cmake
#
# But once you already have non-empty cmake cache it will be loaded only
# once:
# - /bld/CMakeFiles/3.23.2/CMakeSystem.cmake
#
# This has no harm except for double load of toolchain will add
# --gcc-toolchain multiple times that will not allow ccache to reuse the
# cache.
return()
endif()
set (_CLICKHOUSE_TOOLCHAIN_FILE_LOADED ON)
set (CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY)
set (CMAKE_SYSTEM_NAME "Linux")

View File

@ -157,6 +157,7 @@ endif()
add_contrib (sqlite-cmake sqlite-amalgamation)
add_contrib (s2geometry-cmake s2geometry)
add_contrib (base-x-cmake base-x)
add_contrib(c-ares-cmake c-ares)
add_contrib (qpl-cmake qpl)
# Put all targets defined here and in subdirectories under "contrib/<immediate-subdir>" folders in GUI-based IDEs.

1
contrib/c-ares vendored Submodule

@ -0,0 +1 @@
Subproject commit afee6748b0b99acf4509d42fa37ac8422262f91b

View File

@ -0,0 +1,35 @@
# Choose to build static or shared library for c-ares.
if (USE_STATIC_LIBRARIES)
set(CARES_STATIC ON CACHE BOOL "" FORCE)
set(CARES_SHARED OFF CACHE BOOL "" FORCE)
else ()
set(CARES_STATIC OFF CACHE BOOL "" FORCE)
set(CARES_SHARED ON CACHE BOOL "" FORCE)
endif ()
# Disable looking for libnsl on a platforms that has gethostbyname in glibc
#
# c-ares searching for gethostbyname in the libnsl library, however in the
# version that shipped with gRPC it doing it wrong [1], since it uses
# CHECK_LIBRARY_EXISTS(), which will return TRUE even if the function exists in
# another dependent library. The upstream already contains correct macro [2],
# but it is not included in gRPC (even upstream gRPC, not the one that is
# shipped with clickhousee).
#
# [1]: https://github.com/c-ares/c-ares/blob/e982924acee7f7313b4baa4ee5ec000c5e373c30/CMakeLists.txt#L125
# [2]: https://github.com/c-ares/c-ares/blob/44fbc813685a1fa8aa3f27fcd7544faf612d376a/CMakeLists.txt#L146
#
# And because if you by some reason have libnsl [3] installed, clickhouse will
# reject to start w/o it. While this is completelly different library.
#
# [3]: https://packages.debian.org/bullseye/libnsl2
if (NOT CMAKE_SYSTEM_NAME STREQUAL "SunOS")
set(HAVE_LIBNSL OFF CACHE BOOL "" FORCE)
endif()
# Force use of c-ares inet_net_pton instead of libresolv one
set(HAVE_INET_NET_PTON OFF CACHE BOOL "" FORCE)
add_subdirectory("../c-ares/" "../c-ares/")
add_library(ch_contrib::c-ares ALIAS c-ares)

View File

@ -45,38 +45,11 @@ set(_gRPC_SSL_LIBRARIES OpenSSL::Crypto OpenSSL::SSL)
# Use abseil-cpp from ClickHouse contrib, not from gRPC third_party.
set(gRPC_ABSL_PROVIDER "clickhouse" CACHE STRING "" FORCE)
# Choose to build static or shared library for c-ares.
if (USE_STATIC_LIBRARIES)
set(CARES_STATIC ON CACHE BOOL "" FORCE)
set(CARES_SHARED OFF CACHE BOOL "" FORCE)
else ()
set(CARES_STATIC OFF CACHE BOOL "" FORCE)
set(CARES_SHARED ON CACHE BOOL "" FORCE)
endif ()
# Disable looking for libnsl on a platforms that has gethostbyname in glibc
#
# c-ares searching for gethostbyname in the libnsl library, however in the
# version that shipped with gRPC it doing it wrong [1], since it uses
# CHECK_LIBRARY_EXISTS(), which will return TRUE even if the function exists in
# another dependent library. The upstream already contains correct macro [2],
# but it is not included in gRPC (even upstream gRPC, not the one that is
# shipped with clickhousee).
#
# [1]: https://github.com/c-ares/c-ares/blob/e982924acee7f7313b4baa4ee5ec000c5e373c30/CMakeLists.txt#L125
# [2]: https://github.com/c-ares/c-ares/blob/44fbc813685a1fa8aa3f27fcd7544faf612d376a/CMakeLists.txt#L146
#
# And because if you by some reason have libnsl [3] installed, clickhouse will
# reject to start w/o it. While this is completelly different library.
#
# [3]: https://packages.debian.org/bullseye/libnsl2
if (NOT CMAKE_SYSTEM_NAME STREQUAL "SunOS")
set(HAVE_LIBNSL OFF CACHE BOOL "" FORCE)
endif()
# We don't want to build C# extensions.
set(gRPC_BUILD_CSHARP_EXT OFF)
set(_gRPC_CARES_LIBRARIES ch_contrib::c-ares)
set(gRPC_CARES_PROVIDER "clickhouse" CACHE STRING "" FORCE)
add_subdirectory("${_gRPC_SOURCE_DIR}" "${_gRPC_BINARY_DIR}")
# The contrib/grpc/CMakeLists.txt redefined the PROTOBUF_GENERATE_GRPC_CPP() function for its own purposes,

View File

@ -93,6 +93,18 @@ set (CMAKE_CXX_STANDARD 17)
set (LLVM_SOURCE_DIR "${ClickHouse_SOURCE_DIR}/contrib/llvm/llvm")
set (LLVM_BINARY_DIR "${ClickHouse_BINARY_DIR}/contrib/llvm/llvm")
add_subdirectory ("${LLVM_SOURCE_DIR}" "${LLVM_BINARY_DIR}")
set_directory_properties (PROPERTIES
# due to llvm crosscompile cmake does not know how to clean it, and on clean
# will lead to the following error:
#
# ninja: error: remove(contrib/llvm/llvm/NATIVE): Directory not empty
#
ADDITIONAL_CLEAN_FILES "${LLVM_BINARY_DIR}"
# llvm's cmake configuring this file only when cmake runs,
# and after clean cmake will not know that it should re-run,
# add explicitly depends from llvm-config.h
CMAKE_CONFIGURE_DEPENDS "${LLVM_BINARY_DIR}/include/llvm/Config/llvm-config.h"
)
add_library (_llvm INTERFACE)
target_link_libraries (_llvm INTERFACE ${REQUIRED_LLVM_LIBRARIES})

View File

@ -135,6 +135,7 @@ function clone_submodules
contrib/replxx
contrib/wyhash
contrib/hashidsxx
contrib/c-ares
)
git submodule sync

View File

@ -0,0 +1,9 @@
version: "2.3"
services:
coredns:
image: coredns/coredns:latest
restart: always
volumes:
- ${COREDNS_CONFIG_DIR}/example.com:/example.com
- ${COREDNS_CONFIG_DIR}/Corefile:/Corefile

View File

@ -362,6 +362,8 @@ else
# FIXME Not sure if it's expected, but some tests from BC check may not be finished yet when we restarting server.
# Let's just ignore all errors from queries ("} <Error> TCPHandler: Code:", "} <Error> executeQuery: Code:")
# FIXME https://github.com/ClickHouse/ClickHouse/issues/39197 ("Missing columns: 'v3' while processing query: 'v3, k, v1, v2, p'")
# NOTE Incompatibility was introduced in https://github.com/ClickHouse/ClickHouse/pull/39263, it's expected
# ("This engine is deprecated and is not supported in transactions", "[Queue = DB::MergeMutateRuntimeQueue]: Code: 235. DB::Exception: Part")
echo "Check for Error messages in server log:"
zgrep -Fav -e "Code: 236. DB::Exception: Cancelled merging parts" \
-e "Code: 236. DB::Exception: Cancelled mutating parts" \
@ -389,6 +391,8 @@ else
-e "} <Error> TCPHandler: Code:" \
-e "} <Error> executeQuery: Code:" \
-e "Missing columns: 'v3' while processing query: 'v3, k, v1, v2, p'" \
-e "This engine is deprecated and is not supported in transactions" \
-e "[Queue = DB::MergeMutateRuntimeQueue]: Code: 235. DB::Exception: Part" \
/var/log/clickhouse-server/clickhouse-server.backward.clean.log | zgrep -Fa "<Error>" > /test_output/bc_check_error_messages.txt \
&& echo -e 'Backward compatibility check: Error message in clickhouse-server.log (see bc_check_error_messages.txt)\tFAIL' >> /test_output/test_results.tsv \
|| echo -e 'Backward compatibility check: No Error messages in clickhouse-server.log\tOK' >> /test_output/test_results.tsv

View File

@ -46,6 +46,9 @@ def get_options(i, backward_compatibility_check):
if i == 13:
client_options.append("memory_tracker_fault_probability=0.001")
if i % 2 == 1 and not backward_compatibility_check:
client_options.append("group_by_use_nulls=1")
if client_options:
options.append(" --client-option " + " ".join(client_options))

View File

@ -38,9 +38,9 @@ Writing the docs is extremely useful for project's users and developers, and gro
The documentation contains information about all the aspects of the ClickHouse lifecycle: developing, testing, installing, operating, and using. The base language of the documentation is English. The English version is the most actual. All other languages are supported as much as they can by contributors from different countries.
At the moment, [documentation](https://clickhouse.com/docs) exists in English, Russian, Chinese, Japanese. We store the documentation besides the ClickHouse source code in the [GitHub repository](https://github.com/ClickHouse/ClickHouse/tree/master/docs).
At the moment, [documentation](https://clickhouse.com/docs) exists in English, Russian, and Chinese. We store the reference documentation besides the ClickHouse source code in the [GitHub repository](https://github.com/ClickHouse/ClickHouse/tree/master/docs), and user guides in a separate repo [Clickhouse/clickhouse-docs](https://github.com/ClickHouse/clickhouse-docs).
Each language lays in the corresponding folder. Files that are not translated from English are the symbolic links to the English ones.
Each language lies in the corresponding folder. Files that are not translated from English are symbolic links to the English ones.
<a name="how-to-contribute"/>
@ -48,9 +48,9 @@ Each language lays in the corresponding folder. Files that are not translated fr
You can contribute to the documentation in many ways, for example:
- Fork the ClickHouse repository, edit, commit, push, and open a pull request.
- Fork the ClickHouse and ClickHouse-docs repositories, edit, commit, push, and open a pull request.
Add the `documentation` label to this pull request for proper automatic checks applying. If you have no permissions for adding labels, the reviewer of your PR adds it.
Add the `pr-documentation` label to this pull request for proper automatic checks applying. If you do not have permission to add labels, then the reviewer of your PR will add it.
- Open a required file in the ClickHouse repository and edit it from the GitHub web interface.
@ -158,15 +158,15 @@ When everything is ready, we will add the new language to the website.
<a name="target-audience"/>
### Documentation for Different Audience
### Documentation for Different Audiences
When writing documentation, think about people who read it. Each audience has specific requirements for terms they use in communications.
When writing documentation, think about the people who read it. Each audience has specific requirements for terms they use in communications.
ClickHouse documentation can be divided by the audience for the following parts:
ClickHouse documentation can be divided up by the audience for the following parts:
- Conceptual topics in [Introduction](https://clickhouse.com/docs/en/), tutorials and overviews, changelog.
- Conceptual topics like tutorials and overviews.
These topics are for the most common auditory. When editing text in them, use the most common terms that are comfortable for the audience with basic technical skills.
These topics are for the most common audience. When editing text in them, use the most common terms that are comfortable for the audience with basic technical skills.
- Query language reference and related topics.

View File

@ -75,7 +75,7 @@ This will create the `programs/clickhouse` executable, which can be used with `c
The build requires the following components:
- Git (is used only to checkout the sources, its not needed for the build)
- CMake 3.14 or newer
- CMake 3.15 or newer
- Ninja
- C++ compiler: clang-14 or newer
- Linker: lld

View File

@ -1632,6 +1632,8 @@ kafka_topic_list = 'topic1',
kafka_group_name = 'group1',
kafka_format = 'AvroConfluent';
-- for debug purposes you can set format_avro_schema_registry_url in a session.
-- this way cannot be used in production
SET format_avro_schema_registry_url = 'http://schema-registry';
SELECT * FROM topic1_stream;

View File

@ -2626,7 +2626,7 @@ Possible values:
- Any positive integer.
- 0 - Disabled (infinite timeout).
Default value: 1800.
Default value: 180.
## http_receive_timeout {#http_receive_timeout}
@ -2637,7 +2637,7 @@ Possible values:
- Any positive integer.
- 0 - Disabled (infinite timeout).
Default value: 1800.
Default value: 180.
## check_query_single_value_result {#check_query_single_value_result}
@ -3329,6 +3329,15 @@ Read more about [memory overcommit](memory-overcommit.md).
Default value: `1GiB`.
## compatibility {#compatibility}
This setting changes other settings according to provided ClickHouse version.
If a behaviour in ClickHouse was changed by using a different default value for some setting, this compatibility setting allows you to use default values from previous versions for all the settings that were not set by the user.
This setting takes ClickHouse version number as a string, like `21.3`, `21.8`. Empty value means that this setting is disabled.
Disabled by default.
# Format settings {#format-settings}
## input_format_skip_unknown_fields {#input_format_skip_unknown_fields}

View File

@ -1,50 +1,94 @@
## How ClickHouse documentation is generated? {#how-clickhouse-documentation-is-generated}
## Generating ClickHouse documentation {#how-clickhouse-documentation-is-generated}
ClickHouse documentation is built using [build.py](build.py) script that uses [mkdocs](https://www.mkdocs.org) library and its dependencies to separately build all version of documentations (all languages in either single and multi page mode) as static HTMLs for each single page version. The results are then put in the correct directory structure. It is recommended to use Python 3.7 to run this script.
ClickHouse documentation is built using [Docusaurus](https://docusaurus.io).
[release.sh](release.sh) also pulls static files needed for [official ClickHouse website](https://clickhouse.com) from [../../website](../../website) folder then pushes to specified GitHub repo to be served via [GitHub Pages](https://pages.github.com).
## Check the look of your documentation changes {#how-to-check-if-the-documentation-will-look-fine}
## How to check if the documentation will look fine? {#how-to-check-if-the-documentation-will-look-fine}
There are a few options that are all useful depending on how large or complex your edits are.
There are few options that are all useful depending on how large or complex your edits are.
### Use the GitHub web interface to edit
### Use GitHub web interface to edit
Every page in the docs has an **Edit this page** link that opens the page in the GitHub editor. GitHub has Markdown support with a preview feature. The details of GitHub Markdown and the documentation Markdown are a bit different but generally this is close enough, and the person merging your PR will build the docs and check them.
GitHub has Markdown support with preview feature, but the details of GitHub Markdown dialect are a bit different in ClickHouse documentation.
### Install a Markdown editor or plugin for your IDE {#install-markdown-editor-or-plugin-for-your-ide}
### Install Markdown editor or plugin for your IDE {#install-markdown-editor-or-plugin-for-your-ide}
Usually, these plugins provide a preview of how the markdown will render, and they catch basic errors like unclosed tags very early.
Usually those also have some way to preview how Markdown will look like, which allows to catch basic errors like unclosed tags very early.
### Use build.py {#use-build-py}
## Build the docs locally {#use-build-py}
Itll take some effort to go through, but the result will be very close to production documentation.
You can build the docs locally. It takes a few minutes to set up, but once you have done it the first time, the process is very simple.
For the first time youll need to:
### Clone the repos
#### 1. Set up virtualenv
The documentation is in two repos, clone both of them:
- [ClickHouse/ClickHouse](https://github.com/ClickHouse/ClickHouse)
- [ClickHouse/ClickHouse-docs](https://github.com/ClickHouse/clickhouse-docs)
``` bash
$ cd ClickHouse/docs/tools
$ mkdir venv
$ virtualenv -p $(which python3) venv
$ source venv/bin/activate
$ pip3 install -r requirements.txt
### Install Node.js
The documentation is built with Docusaurus, which requires Node.js. We recommend version 16. Install [Node.js](https://nodejs.org/en/download/).
### Copy files into place
Docusaurus expects all of the markdown files to be located in the directory tree `clickhouse-docs/docs/`. This is not the way our repos are set up, so some copying of files is needed to build the docs:
```bash
# from the parent directory of both the ClickHouse/ClickHouse and ClickHouse-clickhouse-docs repos:
cp -r ClickHouse/docs/en/development clickhouse-docs/docs/en/
cp -r ClickHouse/docs/en/engines clickhouse-docs/docs/en/
cp -r ClickHouse/docs/en/getting-started clickhouse-docs/docs/en/
cp -r ClickHouse/docs/en/interfaces clickhouse-docs/docs/en/
cp -r ClickHouse/docs/en/operations clickhouse-docs/docs/en/
cp -r ClickHouse/docs/en/sql-reference clickhouse-docs/docs/en/
cp -r ClickHouse/docs/ru/* clickhouse-docs/docs/ru/
cp -r ClickHouse/docs/zh clickhouse-docs/docs/
```
#### 2. Run build.py
#### Note: Symlinks will not work.
### Setup Docusaurus
When all prerequisites are installed, running `build.py` without args (there are some, check `build.py --help`) will generate `ClickHouse/docs/build` folder with complete static html website.
There are two commands that you may need to use with Docusaurus:
- `yarn install`
- `yarn start`
The easiest way to see the result is to use `--livereload=8888` argument of build.py. Alternatively, you can manually launch a HTTP server to serve the docs, for example by running `cd ClickHouse/docs/build && python3 -m http.server 8888`. Then go to http://localhost:8888 in browser. Feel free to use any other port instead of 8888.
#### Install Docusaurus and its dependencies:
```bash
cd clickhouse-docs
yarn install
```
#### Start a development Docusaurus environment
This command will start Docusaurus in development mode, which means that as you edit source (for example, `.md` files) files the changes will be rendered into HTML files and served by the Docusaurus development server.
```bash
yarn start
```
### Make your changes to the markdown files
Edit your files. Remember that if you are editing files in the `ClickHouse/ClickHouse` repo then you should edit them
in that repo and then copy the edited file into the `ClickHouse/clickhouse-docs/` directory structure so that they are updated in your develoment environment.
`yarn start` probably opened a browser for you when you ran it; if not, open a browser to `http://localhost:3000/docs/en/intro` and navigate to the documentation that you are changing. If you have already made the changes, you can verify them here; if not, make them, and you will see the page update as you save the changes.
## How to change code highlighting? {#how-to-change-code-hl}
ClickHouse does not use mkdocs `highlightjs` feature. It uses modified pygments styles instead.
If you want to change code highlighting, edit the `website/css/highlight.css` file.
Currently, an [eighties](https://github.com/idleberg/base16-pygments/blob/master/css/base16-eighties.dark.css) theme
is used.
Code highlighting is based on the language chosen for your code blocks. Specify the language when you start the code block:
<pre lang="no-highlight"><code>```sql
SELECT firstname from imdb.actors;
```
</code></pre>
```sql
SELECT firstname from imdb.actors;
```
If you need a language supported then open an issue in [ClickHouse-docs](https://github.com/ClickHouse/clickhouse-docs/issues).
## How to subscribe on documentation changes? {#how-to-subscribe-on-documentation-changes}
At the moment theres no easy way to do just that, but you can consider:

View File

@ -102,9 +102,34 @@ void Client::processError(const String & query) const
}
void Client::showWarnings()
{
try
{
std::vector<String> messages = loadWarningMessages();
if (!messages.empty())
{
std::cout << "Warnings:" << std::endl;
for (const auto & message : messages)
std::cout << " * " << message << std::endl;
std::cout << std::endl;
}
}
catch (...)
{
/// Ignore exception
}
}
/// Make query to get all server warnings
std::vector<String> Client::loadWarningMessages()
{
/// Older server versions cannot execute the query loading warnings.
constexpr UInt64 min_server_revision_to_load_warnings = DBMS_MIN_PROTOCOL_VERSION_WITH_VIEW_IF_PERMITTED;
if (server_revision < min_server_revision_to_load_warnings)
return {};
std::vector<String> messages;
connection->sendQuery(connection_parameters.timeouts,
"SELECT * FROM viewIfPermitted(SELECT message FROM system.warnings ELSE null('message String'))",
@ -226,25 +251,9 @@ try
connect();
/// Load Warnings at the beginning of connection
/// Show warnings at the beginning of connection.
if (is_interactive && !config().has("no-warnings"))
{
try
{
std::vector<String> messages = loadWarningMessages();
if (!messages.empty())
{
std::cout << "Warnings:" << std::endl;
for (const auto & message : messages)
std::cout << " * " << message << std::endl;
std::cout << std::endl;
}
}
catch (...)
{
/// Ignore exception
}
}
showWarnings();
if (is_interactive && !delayed_interactive)
{
@ -370,7 +379,7 @@ void Client::connect()
}
server_version = toString(server_version_major) + "." + toString(server_version_minor) + "." + toString(server_version_patch);
load_suggestions = is_interactive && (server_revision >= Suggest::MIN_SERVER_REVISION && !config().getBool("disable_suggestion", false));
load_suggestions = is_interactive && (server_revision >= Suggest::MIN_SERVER_REVISION) && !config().getBool("disable_suggestion", false);
if (server_display_name = connection->getServerDisplayName(connection_parameters.timeouts); server_display_name.empty())
server_display_name = config().getString("host", "localhost");

View File

@ -45,6 +45,7 @@ protected:
private:
void printChangedSettings() const;
void showWarnings();
std::vector<String> loadWarningMessages();
};
}

View File

@ -1,6 +1,7 @@
#pragma once
#include "ICommand.h"
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -1,6 +1,7 @@
#pragma once
#include "ICommand.h"
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -1,6 +1,7 @@
#pragma once
#include "ICommand.h"
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -1,6 +1,7 @@
#pragma once
#include "ICommand.h"
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -1,6 +1,7 @@
#pragma once
#include "ICommand.h"
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -1,6 +1,7 @@
#pragma once
#include "ICommand.h"
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -1,6 +1,7 @@
#pragma once
#include "ICommand.h"
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -1,6 +1,7 @@
#pragma once
#include "ICommand.h"
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -110,18 +110,24 @@ namespace
}
/// Returns the host name by its address.
String getHostByAddress(const IPAddress & address)
Strings getHostsByAddress(const IPAddress & address)
{
String host = DNSResolver::instance().reverseResolve(address);
auto hosts = DNSResolver::instance().reverseResolve(address);
/// Check that PTR record is resolved back to client address
if (!isAddressOfHost(address, host))
throw Exception("Host " + String(host) + " isn't resolved back to " + address.toString(), ErrorCodes::DNS_ERROR);
if (hosts.empty())
throw Exception(ErrorCodes::DNS_ERROR, "{} could not be resolved", address.toString());
return host;
for (const auto & host : hosts)
{
/// Check that PTR record is resolved back to client address
if (!isAddressOfHost(address, host))
throw Exception(ErrorCodes::DNS_ERROR, "Host {} isn't resolved back to {}", host, address.toString());
}
return hosts;
}
void parseLikePatternIfIPSubnet(const String & pattern, IPSubnet & subnet, IPAddress::Family address_family)
{
size_t slash = pattern.find('/');
@ -520,20 +526,29 @@ bool AllowedClientHosts::contains(const IPAddress & client_address) const
return true;
/// Check `name_regexps`.
std::optional<String> resolved_host;
std::optional<Strings> resolved_hosts;
auto check_name_regexp = [&](const String & name_regexp_)
{
try
{
if (boost::iequals(name_regexp_, "localhost"))
return is_client_local();
if (!resolved_host)
resolved_host = getHostByAddress(client_v6);
if (resolved_host->empty())
return false;
Poco::RegularExpression re(name_regexp_);
Poco::RegularExpression::Match match;
return re.match(*resolved_host, match) != 0;
if (!resolved_hosts)
{
resolved_hosts = getHostsByAddress(client_address);
}
for (const auto & host : resolved_hosts.value())
{
Poco::RegularExpression re(name_regexp_);
Poco::RegularExpression::Match match;
if (re.match(host, match) != 0)
{
return true;
}
}
return false;
}
catch (const Exception & e)
{

View File

@ -15,6 +15,7 @@
#include <Poco/JSON/Stringifier.h>
#include <boost/algorithm/string/case_conv.hpp>
#include <boost/range/adaptor/map.hpp>
#include <base/range.h>
#include <filesystem>
#include <fstream>

View File

@ -8,6 +8,8 @@
namespace DB
{
struct Array;
Array getAggregateFunctionParametersArray(
const ASTPtr & expression_list,
const std::string & error_context,

View File

@ -7,6 +7,7 @@
#include <IO/Archives/hasRegisteredArchiveFileExtension.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <filesystem>
#include <Interpreters/Context.h>
namespace DB

View File

@ -382,7 +382,7 @@ if (TARGET ch_contrib::rdkafka)
endif()
if (TARGET ch_contrib::nats_io)
dbms_target_link_libraries(PRIVATE ch_contrib::nats_io)
dbms_target_link_libraries(PRIVATE ch_contrib::nats_io ch_contrib::uv)
endif()
if (TARGET ch_contrib::sasl2)
@ -453,6 +453,9 @@ if (TARGET ch_contrib::avrocpp)
dbms_target_link_libraries(PRIVATE ch_contrib::avrocpp)
endif ()
set_source_files_properties(Common/CaresPTRResolver.cpp PROPERTIES COMPILE_FLAGS -Wno-reserved-identifier)
target_link_libraries (clickhouse_common_io PRIVATE ch_contrib::c-ares)
if (TARGET OpenSSL::Crypto)
dbms_target_link_libraries (PRIVATE OpenSSL::Crypto)
target_link_libraries (clickhouse_common_io PRIVATE OpenSSL::Crypto)

View File

@ -28,8 +28,8 @@ public:
template <typename ConnectionType>
void load(ContextPtr context, const ConnectionParameters & connection_parameters, Int32 suggestion_limit);
/// Older server versions cannot execute the query above.
static constexpr int MIN_SERVER_REVISION = 54406;
/// Older server versions cannot execute the query loading suggestions.
static constexpr int MIN_SERVER_REVISION = DBMS_MIN_PROTOCOL_VERSION_WITH_VIEW_IF_PERMITTED;
private:
void fetch(IServerConnection & connection, const ConnectionTimeouts & timeouts, const std::string & query);

View File

@ -793,4 +793,18 @@ ColumnPtr makeNullable(const ColumnPtr & column)
return ColumnNullable::create(column, ColumnUInt8::create(column->size(), 0));
}
ColumnPtr makeNullableSafe(const ColumnPtr & column)
{
if (isColumnNullable(*column))
return column;
if (isColumnConst(*column))
return ColumnConst::create(makeNullableSafe(assert_cast<const ColumnConst &>(*column).getDataColumnPtr()), column->size());
if (column->canBeInsideNullable())
return makeNullable(column);
return column;
}
}

View File

@ -223,5 +223,6 @@ private:
};
ColumnPtr makeNullable(const ColumnPtr & column);
ColumnPtr makeNullableSafe(const ColumnPtr & column);
}

View File

@ -0,0 +1,109 @@
#include "CaresPTRResolver.h"
#include <arpa/inet.h>
#include <sys/select.h>
#include <Common/Exception.h>
#include "ares.h"
#include "netdb.h"
namespace DB
{
namespace ErrorCodes
{
extern const int DNS_ERROR;
}
static void callback(void * arg, int status, int, struct hostent * host)
{
auto * ptr_records = reinterpret_cast<std::vector<std::string>*>(arg);
if (status == ARES_SUCCESS && host->h_aliases)
{
int i = 0;
while (auto * ptr_record = host->h_aliases[i])
{
ptr_records->emplace_back(ptr_record);
i++;
}
}
}
CaresPTRResolver::CaresPTRResolver(CaresPTRResolver::provider_token) : channel(nullptr)
{
/*
* ares_library_init is not thread safe. Currently, the only other usage of c-ares seems to be in grpc.
* In grpc, ares_library_init seems to be called only in Windows.
* See https://github.com/grpc/grpc/blob/master/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc#L1187
* That means it's safe to init it here, but we should be cautious when introducing new code that depends on c-ares and even updates
* to grpc. As discussed in https://github.com/ClickHouse/ClickHouse/pull/37827#discussion_r919189085, c-ares should be adapted to be atomic
* */
if (ares_library_init(ARES_LIB_INIT_ALL) != ARES_SUCCESS || ares_init(&channel) != ARES_SUCCESS)
{
throw DB::Exception("Failed to initialize c-ares", DB::ErrorCodes::DNS_ERROR);
}
}
CaresPTRResolver::~CaresPTRResolver()
{
ares_destroy(channel);
ares_library_cleanup();
}
std::vector<std::string> CaresPTRResolver::resolve(const std::string & ip)
{
std::vector<std::string> ptr_records;
resolve(ip, ptr_records);
wait();
return ptr_records;
}
std::vector<std::string> CaresPTRResolver::resolve_v6(const std::string & ip)
{
std::vector<std::string> ptr_records;
resolve_v6(ip, ptr_records);
wait();
return ptr_records;
}
void CaresPTRResolver::resolve(const std::string & ip, std::vector<std::string> & response)
{
in_addr addr;
inet_pton(AF_INET, ip.c_str(), &addr);
ares_gethostbyaddr(channel, reinterpret_cast<const void*>(&addr), sizeof(addr), AF_INET, callback, &response);
}
void CaresPTRResolver::resolve_v6(const std::string & ip, std::vector<std::string> & response)
{
in6_addr addr;
inet_pton(AF_INET6, ip.c_str(), &addr);
ares_gethostbyaddr(channel, reinterpret_cast<const void*>(&addr), sizeof(addr), AF_INET6, callback, &response);
}
void CaresPTRResolver::wait()
{
timeval * tvp, tv;
fd_set read_fds;
fd_set write_fds;
int nfds;
for (;;)
{
FD_ZERO(&read_fds);
FD_ZERO(&write_fds);
nfds = ares_fds(channel, &read_fds,&write_fds);
if (nfds == 0)
{
break;
}
tvp = ares_timeout(channel, nullptr, &tv);
select(nfds, &read_fds, &write_fds, nullptr, tvp);
ares_process(channel, &read_fds, &write_fds);
}
}
}

View File

@ -0,0 +1,42 @@
#pragma once
#include "DNSPTRResolver.h"
using ares_channel = struct ares_channeldata *;
namespace DB
{
/*
* Implements reverse DNS resolution using c-ares lib. System reverse DNS resolution via
* gethostbyaddr or getnameinfo does not work reliably because in some systems
* it returns all PTR records for a given IP and in others it returns only one.
* */
class CaresPTRResolver : public DNSPTRResolver
{
friend class DNSPTRResolverProvider;
/*
* Allow only DNSPTRProvider to instantiate this class
* */
struct provider_token {};
public:
explicit CaresPTRResolver(provider_token);
~CaresPTRResolver() override;
std::vector<std::string> resolve(const std::string & ip) override;
std::vector<std::string> resolve_v6(const std::string & ip) override;
private:
void wait();
void resolve(const std::string & ip, std::vector<std::string> & response);
void resolve_v6(const std::string & ip, std::vector<std::string> & response);
ares_channel channel;
};
}

View File

@ -0,0 +1,18 @@
#pragma once
#include <string>
#include <vector>
namespace DB
{
struct DNSPTRResolver
{
virtual ~DNSPTRResolver() = default;
virtual std::vector<std::string> resolve(const std::string & ip) = 0;
virtual std::vector<std::string> resolve_v6(const std::string & ip) = 0;
};
}

View File

@ -0,0 +1,13 @@
#include "DNSPTRResolverProvider.h"
#include "CaresPTRResolver.h"
namespace DB
{
std::shared_ptr<DNSPTRResolver> DNSPTRResolverProvider::get()
{
static auto cares_resolver = std::make_shared<CaresPTRResolver>(
CaresPTRResolver::provider_token {}
);
return cares_resolver;
}
}

View File

@ -0,0 +1,18 @@
#pragma once
#include <memory>
#include "DNSPTRResolver.h"
namespace DB
{
/*
* Provides a ready-to-use DNSPTRResolver instance.
* It hides 3rd party lib dependencies, handles initialization and lifetime.
* Since `get` function is static, it can be called from any context. Including cached static functions.
* */
class DNSPTRResolverProvider
{
public:
static std::shared_ptr<DNSPTRResolver> get();
};
}

View File

@ -12,6 +12,7 @@
#include <atomic>
#include <optional>
#include <string_view>
#include "DNSPTRResolverProvider.h"
namespace ProfileEvents
{
@ -138,16 +139,17 @@ static DNSResolver::IPAddresses resolveIPAddressImpl(const std::string & host)
return addresses;
}
static String reverseResolveImpl(const Poco::Net::IPAddress & address)
static Strings reverseResolveImpl(const Poco::Net::IPAddress & address)
{
Poco::Net::SocketAddress sock_addr(address, 0);
auto ptr_resolver = DB::DNSPTRResolverProvider::get();
/// Resolve by hand, because Poco::Net::DNS::hostByAddress(...) does getaddrinfo(...) after getnameinfo(...)
char host[1024];
int err = getnameinfo(sock_addr.addr(), sock_addr.length(), host, sizeof(host), nullptr, 0, NI_NAMEREQD);
if (err)
throw Exception("Cannot getnameinfo(" + address.toString() + "): " + gai_strerror(err), ErrorCodes::DNS_ERROR);
return host;
if (address.family() == Poco::Net::IPAddress::Family::IPv4)
{
return ptr_resolver->resolve(address.toString());
} else
{
return ptr_resolver->resolve_v6(address.toString());
}
}
struct DNSResolver::Impl
@ -235,7 +237,7 @@ std::vector<Poco::Net::SocketAddress> DNSResolver::resolveAddressList(const std:
return addresses;
}
String DNSResolver::reverseResolve(const Poco::Net::IPAddress & address)
Strings DNSResolver::reverseResolve(const Poco::Net::IPAddress & address)
{
if (impl->disable_cache)
return reverseResolveImpl(address);

View File

@ -36,8 +36,8 @@ public:
std::vector<Poco::Net::SocketAddress> resolveAddressList(const std::string & host, UInt16 port);
/// Accepts host IP and resolves its host name
String reverseResolve(const Poco::Net::IPAddress & address);
/// Accepts host IP and resolves its host names
Strings reverseResolve(const Poco::Net::IPAddress & address);
/// Get this server host name
String getHostName();

View File

@ -45,7 +45,7 @@ void LRUFileCache::initialize()
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
return;
throw;
}
}
else
@ -841,7 +841,11 @@ void LRUFileCache::loadCacheInfoIntoMemory(std::lock_guard<std::mutex> & cache_l
/// cache_base_path / key_prefix / key / offset
if (!files.empty())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cache already initialized");
throw Exception(
ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR,
"Cache initialization is partially made. "
"This can be a result of a failed first attempt to initialize cache. "
"Please, check log for error messages");
fs::directory_iterator key_prefix_it{cache_base_path};
for (; key_prefix_it != fs::directory_iterator(); ++key_prefix_it)

View File

@ -342,6 +342,23 @@ OptimizedRegularExpressionImpl<thread_safe>::OptimizedRegularExpressionImpl(cons
}
}
template <bool thread_safe>
OptimizedRegularExpressionImpl<thread_safe>::OptimizedRegularExpressionImpl(OptimizedRegularExpressionImpl && rhs) noexcept
: is_trivial(rhs.is_trivial)
, required_substring_is_prefix(rhs.required_substring_is_prefix)
, is_case_insensitive(rhs.is_case_insensitive)
, required_substring(std::move(rhs.required_substring))
, re2(std::move(rhs.re2))
, number_of_subpatterns(rhs.number_of_subpatterns)
{
if (!required_substring.empty())
{
if (is_case_insensitive)
case_insensitive_substring_searcher.emplace(required_substring.data(), required_substring.size());
else
case_sensitive_substring_searcher.emplace(required_substring.data(), required_substring.size());
}
}
template <bool thread_safe>
bool OptimizedRegularExpressionImpl<thread_safe>::match(const char * subject, size_t subject_size) const

View File

@ -56,6 +56,9 @@ public:
using StringPieceType = std::conditional_t<thread_safe, re2::StringPiece, re2_st::StringPiece>;
OptimizedRegularExpressionImpl(const std::string & regexp_, int options = 0); /// NOLINT
/// StringSearcher store pointers to required_substring, it must be updated on move.
OptimizedRegularExpressionImpl(OptimizedRegularExpressionImpl && rhs) noexcept;
OptimizedRegularExpressionImpl(const OptimizedRegularExpressionImpl & rhs) = delete;
bool match(const std::string & subject) const
{

View File

@ -1,9 +1,7 @@
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <dlfcn.h>
#include <unistd.h>
#include <ctime>
#include <csignal>
#include <Common/logger_useful.h>
@ -13,6 +11,7 @@
#include <Common/PipeFDs.h>
#include <IO/WriteHelpers.h>
#include <IO/Operators.h>
#include <Common/waitForPid.h>
namespace
@ -94,53 +93,15 @@ ShellCommand::~ShellCommand()
bool ShellCommand::tryWaitProcessWithTimeout(size_t timeout_in_seconds)
{
int status = 0;
LOG_TRACE(getLogger(), "Try wait for shell command pid {} with timeout {}", pid, timeout_in_seconds);
wait_called = true;
struct timespec interval {.tv_sec = 1, .tv_nsec = 0};
in.close();
out.close();
err.close();
if (timeout_in_seconds == 0)
{
/// If there is no timeout before signal try to waitpid 1 time without block so we can avoid sending
/// signal if process is already normally terminated.
int waitpid_res = waitpid(pid, &status, WNOHANG);
bool process_terminated_normally = (waitpid_res == pid);
return process_terminated_normally;
}
/// If timeout is positive try waitpid without block in loop until
/// process is normally terminated or waitpid return error
while (timeout_in_seconds != 0)
{
int waitpid_res = waitpid(pid, &status, WNOHANG);
bool process_terminated_normally = (waitpid_res == pid);
if (process_terminated_normally)
{
return true;
}
else if (waitpid_res == 0)
{
--timeout_in_seconds;
nanosleep(&interval, nullptr);
continue;
}
else if (waitpid_res == -1 && errno != EINTR)
{
return false;
}
}
return false;
return waitForPid(pid, timeout_in_seconds);
}
void ShellCommand::logCommand(const char * filename, char * const argv[])

View File

@ -3,6 +3,7 @@
#include <memory>
#include <IO/ReadBufferFromFile.h>
#include <IO/WriteBufferFromFile.h>
#include <unordered_map>
namespace DB

View File

@ -1,33 +0,0 @@
#include <base/defines.h> // ADDRESS_SANITIZER
#ifdef ADDRESS_SANITIZER
#include <cstdlib>
#include <thread>
#include <gtest/gtest.h>
#include <sanitizer/lsan_interface.h>
/// Test that ensures that LSan works.
///
/// Regression test for the case when it may not work,
/// because of broken getauxval() [1].
///
/// [1]: https://github.com/ClickHouse/ClickHouse/pull/33957
TEST(Common, LSan)
{
int sanitizers_exit_code = 1;
ASSERT_EXIT({
std::thread leak_in_thread([]()
{
void * leak = malloc(4096);
ASSERT_NE(leak, nullptr);
});
leak_in_thread.join();
__lsan_do_leak_check();
}, ::testing::ExitedWithCode(sanitizers_exit_code), ".*LeakSanitizer: detected memory leaks.*");
}
#endif

192
src/Common/waitForPid.cpp Normal file
View File

@ -0,0 +1,192 @@
#include <Common/waitForPid.h>
#include <Common/VersionNumber.h>
#include <Poco/Environment.h>
#include <Common/Stopwatch.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wgnu-statement-expression"
#define HANDLE_EINTR(x) ({ \
decltype(x) eintr_wrapper_result; \
do { \
eintr_wrapper_result = (x); \
} while (eintr_wrapper_result == -1 && errno == EINTR); \
eintr_wrapper_result; \
})
#if defined(OS_LINUX)
#include <poll.h>
#include <string>
#if !defined(__NR_pidfd_open)
#if defined(__x86_64__)
#define SYS_pidfd_open 434
#elif defined(__aarch64__)
#define SYS_pidfd_open 434
#elif defined(__ppc64__)
#define SYS_pidfd_open 434
#elif defined(__riscv)
#define SYS_pidfd_open 434
#else
#error "Unsupported architecture"
#endif
#else
#define SYS_pidfd_open __NR_pidfd_open
#endif
namespace DB
{
static int syscall_pidfd_open(pid_t pid)
{
// pidfd_open cannot be interrupted, no EINTR handling
return syscall(SYS_pidfd_open, pid, 0);
}
static int dir_pidfd_open(pid_t pid)
{
std::string path = "/proc/" + std::to_string(pid);
return HANDLE_EINTR(open(path.c_str(), O_DIRECTORY));
}
static bool supportsPidFdOpen()
{
VersionNumber pidfd_open_minimal_version(5, 3, 0);
VersionNumber linux_version(Poco::Environment::osVersion());
return linux_version >= pidfd_open_minimal_version;
}
static int pidFdOpen(pid_t pid)
{
// use pidfd_open or just plain old /proc/[pid] open for Linux
if (supportsPidFdOpen())
{
return syscall_pidfd_open(pid);
}
else
{
return dir_pidfd_open(pid);
}
}
static int pollPid(pid_t pid, int timeout_in_ms)
{
struct pollfd pollfd;
int pid_fd = pidFdOpen(pid);
if (pid_fd == -1)
{
return false;
}
pollfd.fd = pid_fd;
pollfd.events = POLLIN;
int ready = poll(&pollfd, 1, timeout_in_ms);
int save_errno = errno;
close(pid_fd);
errno = save_errno;
return ready;
}
#elif defined(OS_DARWIN) || defined(OS_FREEBSD)
#include <sys/event.h>
#include <err.h>
namespace DB
{
static int pollPid(pid_t pid, int timeout_in_ms)
{
int status = 0;
int kq = HANDLE_EINTR(kqueue());
if (kq == -1)
{
return false;
}
struct kevent change = {.ident = NULL};
EV_SET(&change, pid, EVFILT_PROC, EV_ADD, NOTE_EXIT, 0, NULL);
int result = HANDLE_EINTR(kevent(kq, &change, 1, NULL, 0, NULL));
if (result == -1)
{
if (errno != ESRCH)
{
return false;
}
// check if pid already died while we called kevent()
if (waitpid(pid, &status, WNOHANG) == pid)
{
return true;
}
return false;
}
struct kevent event = {.ident = NULL};
struct timespec remaining_timespec = {.tv_sec = timeout_in_ms / 1000, .tv_nsec = (timeout_in_ms % 1000) * 1000000};
int ready = kevent(kq, nullptr, 0, &event, 1, &remaining_timespec);
int save_errno = errno;
close(kq);
errno = save_errno;
return ready;
}
#else
#error "Unsupported OS type"
#endif
bool waitForPid(pid_t pid, size_t timeout_in_seconds)
{
int status = 0;
Stopwatch watch;
if (timeout_in_seconds == 0)
{
/// If there is no timeout before signal try to waitpid 1 time without block so we can avoid sending
/// signal if process is already normally terminated.
int waitpid_res = waitpid(pid, &status, WNOHANG);
bool process_terminated_normally = (waitpid_res == pid);
return process_terminated_normally;
}
/// If timeout is positive try waitpid without block in loop until
/// process is normally terminated or waitpid return error
int timeout_in_ms = timeout_in_seconds * 1000;
while (timeout_in_ms > 0)
{
int waitpid_res = waitpid(pid, &status, WNOHANG);
bool process_terminated_normally = (waitpid_res == pid);
if (process_terminated_normally)
{
return true;
}
else if (waitpid_res == 0)
{
watch.restart();
int ready = pollPid(pid, timeout_in_ms);
if (ready <= 0)
{
if (errno == EINTR || errno == EAGAIN)
{
timeout_in_ms -= watch.elapsedMilliseconds();
}
else
{
return false;
}
}
continue;
}
else if (waitpid_res == -1 && errno != EINTR)
{
return false;
}
}
return false;
}
}
#pragma GCC diagnostic pop

12
src/Common/waitForPid.h Normal file
View File

@ -0,0 +1,12 @@
#pragma once
#include <sys/types.h>
namespace DB
{
/*
* Waits for a specific pid with timeout, using modern Linux and OSX facilities
* Returns `true` if process terminated successfully or `false` otherwise
*/
bool waitForPid(pid_t pid, size_t timeout_in_seconds);
}

View File

@ -1,5 +1,4 @@
#include <Coordination/CoordinationSettings.h>
#include <Core/Settings.h>
#include <Common/logger_useful.h>
#include <filesystem>
#include <Coordination/Defines.h>

View File

@ -21,6 +21,7 @@
#include <Poco/Util/AbstractConfiguration.h>
#include <Poco/Util/Application.h>
#include <Common/ZooKeeper/ZooKeeperIO.h>
#include <Common/Stopwatch.h>
namespace DB
{
@ -111,7 +112,7 @@ KeeperServer::KeeperServer(
configuration_and_settings_->snapshot_storage_path,
coordination_settings,
checkAndGetSuperdigest(configuration_and_settings_->super_digest),
config.getBool("keeper_server.digest_enabled", true)))
config.getBool("keeper_server.digest_enabled", false)))
, state_manager(nuraft::cs_new<KeeperStateManager>(
server_id, "keeper_server", configuration_and_settings_->log_storage_path, configuration_and_settings_->state_file_path, config, coordination_settings))
, log(&Poco::Logger::get("KeeperServer"))

View File

@ -43,9 +43,16 @@ class BaseSettings : public TTraits::Data
{
using CustomSettingMap = std::unordered_map<std::string_view, std::pair<std::shared_ptr<const String>, SettingFieldCustom>>;
public:
BaseSettings() = default;
BaseSettings(const BaseSettings &) = default;
BaseSettings(BaseSettings &&) noexcept = default;
BaseSettings & operator=(const BaseSettings &) = default;
BaseSettings & operator=(BaseSettings &&) noexcept = default;
virtual ~BaseSettings() = default;
using Traits = TTraits;
void set(std::string_view name, const Field & value);
virtual void set(std::string_view name, const Field & value);
Field get(std::string_view name) const;
void setString(std::string_view name, const String & value);
@ -62,6 +69,8 @@ public:
/// Resets all the settings to their default values.
void resetToDefault();
/// Resets specified setting to its default value.
void resetToDefault(std::string_view name);
bool has(std::string_view name) const { return hasBuiltin(name) || hasCustom(name); }
static bool hasBuiltin(std::string_view name);
@ -315,6 +324,14 @@ void BaseSettings<TTraits>::resetToDefault()
custom_settings_map.clear();
}
template <typename TTraits>
void BaseSettings<TTraits>::resetToDefault(std::string_view name)
{
const auto & accessor = Traits::Accessor::instance();
if (size_t index = accessor.find(name); index != static_cast<size_t>(-1))
accessor.resetValueToDefault(*this, index);
}
template <typename TTraits>
bool BaseSettings<TTraits>::hasBuiltin(std::string_view name)
{

View File

@ -52,8 +52,10 @@
/// NOTE: DBMS_TCP_PROTOCOL_VERSION has nothing common with VERSION_REVISION,
/// later is just a number for server version (one number instead of commit SHA)
/// for simplicity (sometimes it may be more convenient in some use cases).
#define DBMS_TCP_PROTOCOL_VERSION 54456
#define DBMS_TCP_PROTOCOL_VERSION 54457
#define DBMS_MIN_PROTOCOL_VERSION_WITH_INITIAL_QUERY_START_TIME 54449
#define DBMS_MIN_PROTOCOL_VERSION_WITH_PROFILE_EVENTS_IN_INSERT 54456
#define DBMS_MIN_PROTOCOL_VERSION_WITH_VIEW_IF_PERMITTED 54457

View File

@ -1,5 +1,6 @@
#include "Settings.h"
#include <Core/SettingsChangesHistory.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnMap.h>
@ -145,6 +146,53 @@ std::vector<String> Settings::getAllRegisteredNames() const
return all_settings;
}
void Settings::set(std::string_view name, const Field & value)
{
BaseSettings::set(name, value);
if (name == "compatibility")
applyCompatibilitySetting();
/// If we change setting that was changed by compatibility setting before
/// we should remove it from settings_changed_by_compatibility_setting,
/// otherwise the next time we will change compatibility setting
/// this setting will be changed too (and we don't want it).
else if (settings_changed_by_compatibility_setting.contains(name))
settings_changed_by_compatibility_setting.erase(name);
}
void Settings::applyCompatibilitySetting()
{
/// First, revert all changes applied by previous compatibility setting
for (const auto & setting_name : settings_changed_by_compatibility_setting)
resetToDefault(setting_name);
settings_changed_by_compatibility_setting.clear();
String compatibility = getString("compatibility");
/// If setting value is empty, we don't need to change settings
if (compatibility.empty())
return;
ClickHouseVersion version(compatibility);
/// Iterate through ClickHouse version in descending order and apply reversed
/// changes for each version that is higher that version from compatibility setting
for (auto it = settings_changes_history.rbegin(); it != settings_changes_history.rend(); ++it)
{
if (version >= it->first)
break;
/// Apply reversed changes from this version.
for (const auto & change : it->second)
{
/// If this setting was changed manually, we don't change it
if (isChanged(change.name) && !settings_changed_by_compatibility_setting.contains(change.name))
continue;
BaseSettings::set(change.name, change.previous_value);
settings_changed_by_compatibility_setting.insert(change.name);
}
}
}
IMPLEMENT_SETTINGS_TRAITS(FormatFactorySettingsTraits, FORMAT_FACTORY_SETTINGS)
}

View File

@ -35,6 +35,10 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
*
* `flags` can be either 0 or IMPORTANT.
* A setting is "IMPORTANT" if it affects the results of queries and can't be ignored by older versions.
*
* When adding new settings that control some backward incompatible changes or when changing some settings values,
* consider adding them to settings changes history in SettingsChangesHistory.h for special `compatibility` setting
* to work correctly.
*/
#define COMMON_SETTINGS(M) \
@ -132,6 +136,8 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(UInt64, aggregation_memory_efficient_merge_threads, 0, "Number of threads to use for merge intermediate aggregation results in memory efficient mode. When bigger, then more memory is consumed. 0 means - same as 'max_threads'.", 0) \
M(Bool, enable_positional_arguments, true, "Enable positional arguments in ORDER BY, GROUP BY and LIMIT BY", 0) \
\
M(Bool, group_by_use_nulls, false, "Treat columns mentioned in ROLLUP, CUBE or GROUPING SETS as Nullable", 0) \
\
M(UInt64, max_parallel_replicas, 1, "The maximum number of replicas of each shard used when the query is executed. For consistency (to get different parts of the same partition), this option only works for the specified sampling key. The lag of the replicas is not controlled.", 0) \
M(UInt64, parallel_replicas_count, 0, "", 0) \
M(UInt64, parallel_replica_offset, 0, "", 0) \
@ -599,6 +605,11 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(Bool, allow_deprecated_database_ordinary, false, "Allow to create databases with deprecated Ordinary engine", 0) \
M(Bool, allow_deprecated_syntax_for_merge_tree, false, "Allow to create *MergeTree tables with deprecated engine definition syntax", 0) \
\
M(String, compatibility, "", "Changes other settings according to provided ClickHouse version. If we know that we changed some behaviour in ClickHouse by changing some settings in some version, this compatibility setting will control these settings", 0) \
\
M(Map, additional_table_filters, "", "Additional filter expression which would be applied after reading from specified table. Syntax: {'table1': 'expression', 'database.table2': 'expression'}", 0) \
M(String, additional_result_filter, "", "Additional filter expression which would be applied to query result", 0) \
\
/** Experimental functions */ \
M(Bool, allow_experimental_funnel_functions, false, "Enable experimental functions for funnel analysis.", 0) \
M(Bool, allow_experimental_nlp_functions, false, "Enable experimental functions for natural language processing.", 0) \
@ -652,7 +663,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
#define FORMAT_FACTORY_SETTINGS(M) \
M(Char, format_csv_delimiter, ',', "The character to be considered as a delimiter in CSV data. If setting with a string, a string has to have a length of 1.", 0) \
M(Bool, format_csv_allow_single_quotes, true, "If it is set to true, allow strings in single quotes.", 0) \
M(Bool, format_csv_allow_single_quotes, false, "If it is set to true, allow strings in single quotes.", 0) \
M(Bool, format_csv_allow_double_quotes, true, "If it is set to true, allow strings in double quotes.", 0) \
M(Bool, output_format_csv_crlf_end_of_line, false, "If it is set true, end of line in CSV format will be \\r\\n instead of \\n.", 0) \
M(Bool, input_format_csv_enum_as_number, false, "Treat inserted enum values in CSV formats as enum indices", 0) \
@ -758,7 +769,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
M(Bool, output_format_pretty_row_numbers, false, "Add row numbers before each row for pretty output format", 0) \
M(Bool, insert_distributed_one_random_shard, false, "If setting is enabled, inserting into distributed table will choose a random shard to write when there is no sharding key", 0) \
\
M(UInt64, cross_to_inner_join_rewrite, 1, "Use inner join instead of comma/cross join if possible. Possible values: 0 - no rewrite, 1 - apply if possible, 2 - force rewrite all cross joins", 0) \
M(UInt64, cross_to_inner_join_rewrite, 1, "Use inner join instead of comma/cross join if there're joining expressions in the WHERE section. Values: 0 - no rewrite, 1 - apply if possible for comma/cross, 2 - force rewrite all comma joins, cross - if possible", 0) \
\
M(Bool, output_format_arrow_low_cardinality_as_dictionary, false, "Enable output LowCardinality type as Dictionary Arrow type", 0) \
M(Bool, output_format_arrow_string_as_string, false, "Use Arrow String type instead of Binary for String columns", 0) \
@ -825,6 +836,13 @@ struct Settings : public BaseSettings<SettingsTraits>, public IHints<2, Settings
void addProgramOption(boost::program_options::options_description & options, const SettingFieldRef & field);
void addProgramOptionAsMultitoken(boost::program_options::options_description & options, const SettingFieldRef & field);
void set(std::string_view name, const Field & value) override;
private:
void applyCompatibilitySetting();
std::unordered_set<std::string_view> settings_changed_by_compatibility_setting;
};
/*

View File

@ -0,0 +1,114 @@
#pragma once
#include <Core/Field.h>
#include <Core/Settings.h>
#include <IO/ReadHelpers.h>
#include <boost/algorithm/string.hpp>
#include <map>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
class ClickHouseVersion
{
public:
ClickHouseVersion(const String & version)
{
Strings split;
boost::split(split, version, [](char c){ return c == '.'; });
components.reserve(split.size());
if (split.empty())
throw Exception{ErrorCodes::BAD_ARGUMENTS, "Cannot parse ClickHouse version here: {}", version};
for (const auto & split_element : split)
{
size_t component;
if (!tryParse(component, split_element))
throw Exception{ErrorCodes::BAD_ARGUMENTS, "Cannot parse ClickHouse version here: {}", version};
components.push_back(component);
}
}
ClickHouseVersion(const char * version) : ClickHouseVersion(String(version)) {}
String toString() const
{
String version = std::to_string(components[0]);
for (size_t i = 1; i < components.size(); ++i)
version += "." + std::to_string(components[i]);
return version;
}
bool operator<(const ClickHouseVersion & other) const
{
return components < other.components;
}
bool operator>=(const ClickHouseVersion & other) const
{
return components >= other.components;
}
private:
std::vector<size_t> components;
};
namespace SettingsChangesHistory
{
struct SettingChange
{
String name;
Field previous_value;
Field new_value;
String reason;
};
using SettingsChanges = std::vector<SettingChange>;
}
/// History of settings changes that controls some backward incompatible changes
/// across all ClickHouse versions. It maps ClickHouse version to settings changes that were done
/// in this version. Settings changes is a vector of structs {setting_name, previous_value, new_value}
/// It's used to implement `compatibility` setting (see https://github.com/ClickHouse/ClickHouse/issues/35972)
static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> settings_changes_history =
{
{"22.7", {{"cross_to_inner_join_rewrite", 1, 2, "Force rewrite comma join to inner"},
{"enable_positional_arguments", false, true, "Enable positional arguments feature by default"},
{"format_csv_allow_single_quotes", true, false, "Most tools don't treat single quote in CSV specially, don't do it by default too"}}},
{"22.6", {{"output_format_json_named_tuples_as_objects", false, true, "Allow to serialize named tuples as JSON objects in JSON formats by default"}}},
{"22.5", {{"memory_overcommit_ratio_denominator", 0, 1073741824, "Enable memory overcommit feature by default"},
{"memory_overcommit_ratio_denominator_for_user", 0, 1073741824, "Enable memory overcommit feature by default"}}},
{"22.4", {{"allow_settings_after_format_in_insert", true, false, "Do not allow SETTINGS after FORMAT for INSERT queries because ClickHouse interpret SETTINGS as some values, which is misleading"}}},
{"22.3", {{"cast_ipv4_ipv6_default_on_conversion_error", true, false, "Make functions cast(value, 'IPv4') and cast(value, 'IPv6') behave same as toIPv4 and toIPv6 functions"}}},
{"21.12", {{"stream_like_engine_allow_direct_select", true, false, "Do not allow direct select for Kafka/RabbitMQ/FileLog by default"}}},
{"21.9", {{"output_format_decimal_trailing_zeros", true, false, "Do not output trailing zeros in text representation of Decimal types by default for better looking output"},
{"use_hedged_requests", false, true, "Enable Hedged Requests feature bu default"}}},
{"21.7", {{"legacy_column_name_of_tuple_literal", true, false, "Add this setting only for compatibility reasons. It makes sense to set to 'true', while doing rolling update of cluster from version lower than 21.7 to higher"}}},
{"21.5", {{"async_socket_for_remote", false, true, "Fix all problems and turn on asynchronous reads from socket for remote queries by default again"}}},
{"21.3", {{"async_socket_for_remote", true, false, "Turn off asynchronous reads from socket for remote queries because of some problems"},
{"optimize_normalize_count_variants", false, true, "Rewrite aggregate functions that semantically equals to count() as count() by default"},
{"normalize_function_names", false, true, "Normalize function names to their canonical names, this was needed for projection query routing"}}},
{"21.2", {{"enable_global_with_statement", false, true, "Propagate WITH statements to UNION queries and all subqueries by default"}}},
{"21.1", {{"insert_quorum_parallel", false, true, "Use parallel quorum inserts by default. It is significantly more convenient to use than sequential quorum inserts"},
{"input_format_null_as_default", false, true, "Allow to insert NULL as default for input formats by default"},
{"optimize_on_insert", false, true, "Enable data optimization on INSERT by default for better user experience"},
{"use_compact_format_in_distributed_parts_names", false, true, "Use compact format for async INSERT into Distributed tables by default"}}},
{"20.10", {{"format_regexp_escaping_rule", "Escaped", "Raw", "Use Raw as default escaping rule for Regexp format to male the behaviour more like to what users expect"}}},
{"20.7", {{"show_table_uuid_in_table_create_query_if_not_nil", true, false, "Stop showing UID of the table in its CREATE query for Engine=Atomic"}}},
{"20.5", {{"input_format_with_names_use_header", false, true, "Enable using header with names for formats with WithNames/WithNamesAndTypes suffixes"},
{"allow_suspicious_codecs", true, false, "Don't allow to specify meaningless compression codecs"}}},
{"20.4", {{"validate_polygons", false, true, "Throw exception if polygon is invalid in function pointInPolygon by default instead of returning possibly wrong results"}}},
{"19.18", {{"enable_scalar_subquery_optimization", false, true, "Prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once"}}},
{"19.14", {{"any_join_distinct_right_table_keys", true, false, "Disable ANY RIGHT and ANY FULL JOINs by default to avoid inconsistency"}}},
{"19.12", {{"input_format_defaults_for_omitted_fields", false, true, "Enable calculation of complex default expressions for omitted fields for some input formats, because it should be the expected behaviour"}}},
{"19.5", {{"max_partitions_per_insert_block", 0, 100, "Add a limit for the number of partitions in one block"}}},
{"18.12.17", {{"enable_optimize_predicate_expression", 0, 1, "Optimize predicates to subqueries by default"}}},
};
}

View File

@ -4,6 +4,8 @@
#include <Common/getNumberOfPhysicalCPUCores.h>
#include <Common/FieldVisitorConvertToNumber.h>
#include <Common/logger_useful.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypeString.h>
#include <IO/ReadHelpers.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteHelpers.h>
@ -51,6 +53,37 @@ namespace
else
return applyVisitor(FieldVisitorConvertToNumber<T>(), f);
}
#ifndef KEEPER_STANDALONE_BUILD
Map stringToMap(const String & str)
{
/// Allow empty string as an empty map
if (str.empty())
return {};
auto type_string = std::make_shared<DataTypeString>();
DataTypeMap type_map(type_string, type_string);
auto serialization = type_map.getSerialization(ISerialization::Kind::DEFAULT);
auto column = type_map.createColumn();
ReadBufferFromString buf(str);
serialization->deserializeTextEscaped(*column, buf, {});
return (*column)[0].safeGet<Map>();
}
Map fieldToMap(const Field & f)
{
if (f.getType() == Field::Types::String)
{
/// Allow to parse Map from string field. For the convenience.
const auto & str = f.get<const String &>();
return stringToMap(str);
}
return f.safeGet<const Map &>();
}
#endif
}
template <typename T>
@ -291,6 +324,48 @@ void SettingFieldString::readBinary(ReadBuffer & in)
*this = std::move(str);
}
#ifndef KEEPER_STANDALONE_BUILD
SettingFieldMap::SettingFieldMap(const Field & f) : value(fieldToMap(f)) {}
String SettingFieldMap::toString() const
{
auto type_string = std::make_shared<DataTypeString>();
DataTypeMap type_map(type_string, type_string);
auto serialization = type_map.getSerialization(ISerialization::Kind::DEFAULT);
auto column = type_map.createColumn();
column->insert(value);
WriteBufferFromOwnString out;
serialization->serializeTextEscaped(*column, 0, out, {});
return out.str();
}
SettingFieldMap & SettingFieldMap::operator =(const Field & f)
{
*this = fieldToMap(f);
return *this;
}
void SettingFieldMap::parseFromString(const String & str)
{
*this = stringToMap(str);
}
void SettingFieldMap::writeBinary(WriteBuffer & out) const
{
DB::writeBinary(value, out);
}
void SettingFieldMap::readBinary(ReadBuffer & in)
{
Map map;
DB::readBinary(map, in);
*this = map;
}
#endif
namespace
{

View File

@ -168,6 +168,32 @@ struct SettingFieldString
void readBinary(ReadBuffer & in);
};
#ifndef KEEPER_STANDALONE_BUILD
struct SettingFieldMap
{
public:
Map value;
bool changed = false;
explicit SettingFieldMap(const Map & map = {}) : value(map) {}
explicit SettingFieldMap(Map && map) : value(std::move(map)) {}
explicit SettingFieldMap(const Field & f);
SettingFieldMap & operator =(const Map & map) { value = map; changed = true; return *this; }
SettingFieldMap & operator =(const Field & f);
operator const Map &() const { return value; } /// NOLINT
explicit operator Field() const { return value; }
String toString() const;
void parseFromString(const String & str);
void writeBinary(WriteBuffer & out) const;
void readBinary(ReadBuffer & in);
};
#endif
struct SettingFieldChar
{

View File

@ -85,6 +85,13 @@ DataTypePtr makeNullable(const DataTypePtr & type)
return std::make_shared<DataTypeNullable>(type);
}
DataTypePtr makeNullableSafe(const DataTypePtr & type)
{
if (type->canBeInsideNullable())
return makeNullable(type);
return type;
}
DataTypePtr removeNullable(const DataTypePtr & type)
{
if (type->isNullable())

View File

@ -51,6 +51,7 @@ private:
DataTypePtr makeNullable(const DataTypePtr & type);
DataTypePtr makeNullableSafe(const DataTypePtr & type);
DataTypePtr removeNullable(const DataTypePtr & type);
}

View File

@ -214,6 +214,19 @@ size_t DataTypeTuple::getPositionByName(const String & name) const
throw Exception("Tuple doesn't have element with name '" + name + "'", ErrorCodes::NOT_FOUND_COLUMN_IN_BLOCK);
}
std::optional<size_t> DataTypeTuple::tryGetPositionByName(const String & name) const
{
size_t size = elems.size();
for (size_t i = 0; i < size; ++i)
{
if (names[i] == name)
{
return std::optional<size_t>(i);
}
}
return std::nullopt;
}
String DataTypeTuple::getNameByPosition(size_t i) const
{
if (i == 0 || i > names.size())

View File

@ -1,6 +1,7 @@
#pragma once
#include <DataTypes/IDataType.h>
#include <optional>
namespace DB
@ -60,6 +61,7 @@ public:
const Strings & getElementNames() const { return names; }
size_t getPositionByName(const String & name) const;
std::optional<size_t> tryGetPositionByName(const String & name) const;
String getNameByPosition(size_t i) const;
bool haveExplicitNames() const { return have_explicit_names; }

View File

@ -5,6 +5,7 @@
#include <Common/typeid_cast.h>
#include <DataTypes/IDataType.h>
#include <DataTypes/DataTypeDecimalBase.h>
#include <DataTypes/DataTypeDateTime64.h>
namespace DB
@ -13,6 +14,7 @@ namespace DB
namespace ErrorCodes
{
extern const int DECIMAL_OVERFLOW;
extern const int LOGICAL_ERROR;
}
/// Implements Decimal(P, S), where P is precision, S is scale.
@ -58,7 +60,7 @@ inline const DataTypeDecimal<T> * checkDecimal(const IDataType & data_type)
return typeid_cast<const DataTypeDecimal<T> *>(&data_type);
}
inline UInt32 getDecimalScale(const IDataType & data_type, UInt32 default_value = std::numeric_limits<UInt32>::max())
inline UInt32 getDecimalScale(const IDataType & data_type)
{
if (const auto * decimal_type = checkDecimal<Decimal32>(data_type))
return decimal_type->getScale();
@ -68,7 +70,10 @@ inline UInt32 getDecimalScale(const IDataType & data_type, UInt32 default_value
return decimal_type->getScale();
if (const auto * decimal_type = checkDecimal<Decimal256>(data_type))
return decimal_type->getScale();
return default_value;
if (const auto * date_time_type = typeid_cast<const DataTypeDateTime64 *>(&data_type))
return date_time_type->getScale();
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get decimal scale from type {}", data_type.getName());
}
inline UInt32 getDecimalPrecision(const IDataType & data_type)
@ -81,7 +86,10 @@ inline UInt32 getDecimalPrecision(const IDataType & data_type)
return decimal_type->getPrecision();
if (const auto * decimal_type = checkDecimal<Decimal256>(data_type))
return decimal_type->getPrecision();
return 0;
if (const auto * date_time_type = typeid_cast<const DataTypeDateTime64 *>(&data_type))
return date_time_type->getPrecision();
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get decimal precision from type {}", data_type.getName());
}
template <typename T>

View File

@ -532,6 +532,12 @@ inline bool isBool(const DataTypePtr & data_type)
return data_type->getName() == "Bool";
}
inline bool isAggregateFunction(const DataTypePtr & data_type)
{
WhichDataType which(data_type);
return which.isAggregateFunction();
}
template <typename DataType> constexpr bool IsDataTypeDecimal = false;
template <typename DataType> constexpr bool IsDataTypeNumber = false;
template <typename DataType> constexpr bool IsDataTypeDateOrDateTime = false;

View File

@ -554,7 +554,11 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
UInt32 max_scale = 0;
for (const auto & type : types)
{
UInt32 scale = getDecimalScale(*type, 0);
auto type_id = type->getTypeId();
if (type_id != TypeIndex::Decimal32 && type_id != TypeIndex::Decimal64 && type_id != TypeIndex::Decimal128)
continue;
UInt32 scale = getDecimalScale(*type);
if (scale > max_scale)
max_scale = scale;
}

View File

@ -8,6 +8,8 @@
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromEncryptedFile.h>
#include <boost/algorithm/hex.hpp>
#include <Common/quoteString.h>
#include <Common/typeid_cast.h>
namespace DB

View File

@ -1,7 +1,6 @@
#pragma once
#include <Interpreters/Context_fwd.h>
#include <Interpreters/Context.h>
#include <Core/Defines.h>
#include <base/types.h>
#include <Common/CurrentMetrics.h>
@ -41,6 +40,10 @@ namespace ErrorCodes
extern const int NOT_IMPLEMENTED;
}
class IDisk;
using DiskPtr = std::shared_ptr<IDisk>;
using DisksMap = std::map<String, DiskPtr>;
class IReservation;
using ReservationPtr = std::unique_ptr<IReservation>;
using Reservations = std::vector<ReservationPtr>;
@ -363,7 +366,6 @@ private:
std::unique_ptr<Executor> executor;
};
using DiskPtr = std::shared_ptr<IDisk>;
using Disks = std::vector<DiskPtr>;
/**

View File

@ -6,6 +6,7 @@
#include <Common/assert_cast.h>
#include <Common/hex.h>
#include <Common/getRandomASCIIString.h>
#include <Interpreters/Context.h>
namespace ProfileEvents

View File

@ -10,6 +10,7 @@
#include <Disks/IO/WriteIndirectBufferFromRemoteFS.h>
#include <Disks/ObjectStorages/IObjectStorage.h>
#include <Common/getRandomASCIIString.h>
#include <Common/MultiVersion.h>
namespace DB

View File

@ -12,6 +12,7 @@
#include <Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageAuth.h>
#include <Disks/ObjectStorages/AzureBlobStorage/AzureObjectStorage.h>
#include <Disks/ObjectStorages/MetadataStorageFromDisk.h>
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -18,6 +18,7 @@
#include <Disks/ObjectStorages/DiskObjectStorageTransaction.h>
#include <Disks/FakeDiskTransaction.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -3,6 +3,7 @@
#include <Common/FileCacheFactory.h>
#include <Common/IFileCache.h>
#include <Common/FileCacheSettings.h>
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -1,5 +1,6 @@
#pragma once
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
#include <Disks/ObjectStorages/MetadataStorageFromDiskTransactionOperations.h>

View File

@ -2,6 +2,7 @@
#include <Disks/IO/ThreadPoolRemoteFSReader.h>
#include <IO/WriteBufferFromFileBase.h>
#include <IO/copyData.h>
#include <Interpreters/Context.h>
namespace DB
{

View File

@ -7,6 +7,7 @@
#include <optional>
#include <Poco/Timestamp.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Core/Defines.h>
#include <Common/Exception.h>
#include <IO/ReadSettings.h>

View File

@ -28,6 +28,7 @@
#include <Common/FileCacheFactory.h>
#include <Common/getRandomASCIIString.h>
#include <Common/logger_useful.h>
#include <Common/MultiVersion.h>
namespace DB
{

View File

@ -11,6 +11,7 @@
#include <aws/s3/model/HeadObjectResult.h>
#include <aws/s3/model/ListObjectsV2Result.h>
#include <Storages/StorageS3Settings.h>
#include <Common/MultiVersion.h>
namespace DB

View File

@ -66,7 +66,7 @@ ColumnsDescription readSchemaFromFormat(
}
catch (const DB::Exception & e)
{
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, "Cannot extract table structure from {} format file. Error: {}", format_name, e.message());
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, "Cannot extract table structure from {} format file. Error: {}. You can specify the structure manually", format_name, e.message());
}
}
else if (FormatFactory::instance().checkIfFormatHasSchemaReader(format_name))
@ -75,16 +75,29 @@ ColumnsDescription readSchemaFromFormat(
SchemaReaderPtr schema_reader;
size_t max_rows_to_read = format_settings ? format_settings->max_rows_to_read_for_schema_inference : context->getSettingsRef().input_format_max_rows_to_read_for_schema_inference;
size_t iterations = 0;
while ((buf = read_buffer_iterator()))
while (true)
{
bool is_eof = false;
try
{
buf = read_buffer_iterator();
if (!buf)
break;
is_eof = buf->eof();
}
catch (...)
{
auto exception_message = getCurrentExceptionMessage(false);
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, "Cannot extract table structure from {} format file: {}. You can specify the structure manually", format_name, exception_message);
}
++iterations;
if (buf->eof())
if (is_eof)
{
auto exception_message = fmt::format("Cannot extract table structure from {} format file, file is empty", format_name);
if (!retry)
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, exception_message);
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, "{}. You can specify the structure manually", exception_message);
exception_messages += "\n" + exception_message;
continue;
@ -118,14 +131,14 @@ ColumnsDescription readSchemaFromFormat(
}
if (!retry || !isRetryableSchemaInferenceError(getCurrentExceptionCode()))
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, "Cannot extract table structure from {} format file. Error: {}", format_name, exception_message);
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, "Cannot extract table structure from {} format file. Error: {}. You can specify the structure manually", format_name, exception_message);
exception_messages += "\n" + exception_message;
}
}
if (names_and_types.empty())
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, "All attempts to extract table structure from files failed. Errors:{}", exception_messages);
throw Exception(ErrorCodes::CANNOT_EXTRACT_TABLE_STRUCTURE, "All attempts to extract table structure from files failed. Errors:{}\nYou can specify the structure manually", exception_messages);
/// If we have "INSERT SELECT" query then try to order
/// columns as they are ordered in table schema for formats

View File

@ -18,7 +18,8 @@ namespace ErrorCodes
namespace
{
const std::unordered_map<std::string_view, Float64> time_unit_to_float = {
const std::unordered_map<std::string_view, Float64> time_unit_to_float =
{
{"years", 365 * 24 * 3600},
{"year", 365 * 24 * 3600},
{"yr", 365 * 24 * 3600},
@ -50,6 +51,22 @@ namespace
{"second", 1},
{"sec", 1},
{"s", 1},
{"milliseconds", 1e-3},
{"millisecond", 1e-3},
{"millisec", 1e-3},
{"ms", 1e-3},
{"microseconds", 1e-6},
{"microsecond", 1e-6},
{"microsec", 1e-6},
{"μs", 1e-6},
{"us", 1e-6},
{"nanoseconds", 1e-9},
{"nanosecond", 1e-9},
{"nanosec", 1e-9},
{"ns", 1e-9},
};
/** Prints amount of seconds in form of:
@ -248,7 +265,7 @@ namespace
static bool scanUnit(std::string_view & str, Int64 & index, Int64 last_pos)
{
int64_t begin_index = index;
while (index <= last_pos && isalpha(str[index]))
while (index <= last_pos && !isdigit(str[index]) && !isSeparator(str[index]))
{
index++;
}
@ -271,14 +288,18 @@ namespace
scanSpaces(str, index, last_pos);
/// ignore separator
if (index <= last_pos
&& (str[index] == ';' || str[index] == '-' || str[index] == '+' || str[index] == ',' || str[index] == ':'))
if (index <= last_pos && isSeparator(str[index]))
{
index++;
}
scanSpaces(str, index, last_pos);
}
static bool isSeparator(char symbol)
{
return symbol == ';' || symbol == '-' || symbol == '+' || symbol == ',' || symbol == ':' || symbol == ' ';
}
};
}

View File

@ -18,6 +18,10 @@ namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_INDEX;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int NOT_FOUND_COLUMN_IN_BLOCK;
extern const int NUMBER_OF_DIMENSIONS_MISMATCHED;
extern const int SIZES_OF_ARRAYS_DOESNT_MATCH;
}
namespace
@ -40,9 +44,11 @@ public:
return name;
}
bool isVariadic() const override { return true; }
size_t getNumberOfArguments() const override
{
return 2;
return 0;
}
bool useDefaultImplementationForConstants() const override
@ -59,8 +65,14 @@ public:
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
size_t count_arrays = 0;
const size_t number_of_arguments = arguments.size();
if (number_of_arguments < 2 || number_of_arguments > 3)
throw Exception("Number of arguments for function " + getName() + " doesn't match: passed "
+ toString(number_of_arguments) + ", should be 2 or 3",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
size_t count_arrays = 0;
const IDataType * tuple_col = arguments[0].type.get();
while (const DataTypeArray * array = checkAndGetDataType<DataTypeArray>(tuple_col))
{
@ -72,16 +84,34 @@ public:
if (!tuple)
throw Exception("First argument for function " + getName() + " must be tuple or array of tuple.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
size_t index = getElementNum(arguments[1].column, *tuple);
DataTypePtr out_return_type = tuple->getElements()[index];
auto index = getElementNum(arguments[1].column, *tuple, number_of_arguments);
if (index.has_value())
{
DataTypePtr out_return_type = tuple->getElements()[index.value()];
for (; count_arrays; --count_arrays)
out_return_type = std::make_shared<DataTypeArray>(out_return_type);
for (; count_arrays; --count_arrays)
out_return_type = std::make_shared<DataTypeArray>(out_return_type);
return out_return_type;
return out_return_type;
}
else
{
const IDataType * default_col = arguments[2].type.get();
size_t default_argument_count_arrays = 0;
if (const DataTypeArray * array = checkAndGetDataType<DataTypeArray>(default_col))
{
default_argument_count_arrays = array->getNumberOfDimensions();
}
if (count_arrays != default_argument_count_arrays)
{
throw Exception(ErrorCodes::NUMBER_OF_DIMENSIONS_MISMATCHED, "Dimension of types mismatched between first argument and third argument. Dimension of 1st argument: {}. Dimension of 3rd argument: {}.",count_arrays, default_argument_count_arrays);
}
return arguments[2].type;
}
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
Columns array_offsets;
@ -89,6 +119,12 @@ public:
const IDataType * tuple_type = first_arg.type.get();
const IColumn * tuple_col = first_arg.column.get();
bool first_arg_is_const = false;
if (typeid_cast<const ColumnConst *>(tuple_col))
{
tuple_col = assert_cast<const ColumnConst *>(tuple_col)->getDataColumnPtr().get();
first_arg_is_const = true;
}
while (const DataTypeArray * array_type = checkAndGetDataType<DataTypeArray>(tuple_type))
{
const ColumnArray * array_col = assert_cast<const ColumnArray *>(tuple_col);
@ -103,18 +139,87 @@ public:
if (!tuple_type_concrete || !tuple_col_concrete)
throw Exception("First argument for function " + getName() + " must be tuple or array of tuple.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
size_t index = getElementNum(arguments[1].column, *tuple_type_concrete);
ColumnPtr res = tuple_col_concrete->getColumns()[index];
auto index = getElementNum(arguments[1].column, *tuple_type_concrete, arguments.size());
if (!index.has_value())
{
if (!array_offsets.empty())
{
recursiveCheckArrayOffsets(arguments[0].column, arguments[2].column, array_offsets.size());
}
return arguments[2].column;
}
ColumnPtr res = tuple_col_concrete->getColumns()[index.value()];
/// Wrap into Arrays
for (auto it = array_offsets.rbegin(); it != array_offsets.rend(); ++it)
res = ColumnArray::create(res, *it);
if (first_arg_is_const)
{
res = ColumnConst::create(res, input_rows_count);
}
return res;
}
private:
size_t getElementNum(const ColumnPtr & index_column, const DataTypeTuple & tuple) const
void recursiveCheckArrayOffsets(ColumnPtr col_x, ColumnPtr col_y, size_t depth) const
{
for (size_t i = 1; i < depth; ++i)
{
checkArrayOffsets(col_x, col_y);
col_x = assert_cast<const ColumnArray *>(col_x.get())->getDataPtr();
col_y = assert_cast<const ColumnArray *>(col_y.get())->getDataPtr();
}
checkArrayOffsets(col_x, col_y);
}
void checkArrayOffsets(ColumnPtr col_x, ColumnPtr col_y) const
{
if (isColumnConst(*col_x))
{
checkArrayOffsetsWithFirstArgConst(col_x, col_y);
}
else if (isColumnConst(*col_y))
{
checkArrayOffsetsWithFirstArgConst(col_y, col_x);
}
else
{
const auto & array_x = *assert_cast<const ColumnArray *>(col_x.get());
const auto & array_y = *assert_cast<const ColumnArray *>(col_y.get());
if (!array_x.hasEqualOffsets(array_y))
{
throw Exception("The argument 1 and argument 3 of function " + getName() + " have different array sizes", ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH);
}
}
}
void checkArrayOffsetsWithFirstArgConst(ColumnPtr col_x, ColumnPtr col_y) const
{
col_x = assert_cast<const ColumnConst *>(col_x.get())->getDataColumnPtr();
col_y = col_y->convertToFullColumnIfConst();
const auto & array_x = *assert_cast<const ColumnArray *>(col_x.get());
const auto & array_y = *assert_cast<const ColumnArray *>(col_y.get());
const auto & offsets_x = array_x.getOffsets();
const auto & offsets_y = array_y.getOffsets();
ColumnArray::Offset prev_offset = 0;
size_t row_size = offsets_y.size();
for (size_t row = 0; row < row_size; ++row)
{
if (unlikely(offsets_x[0] != offsets_y[row] - prev_offset))
{
throw Exception("The argument 1 and argument 3 of function " + getName() + " have different array sizes", ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH);
}
prev_offset = offsets_y[row];
}
}
std::optional<size_t> getElementNum(const ColumnPtr & index_column, const DataTypeTuple & tuple, const size_t argument_size) const
{
if (
checkAndGetColumnConst<ColumnUInt8>(index_column.get())
@ -131,11 +236,21 @@ private:
if (index > tuple.getElements().size())
throw Exception("Index for tuple element is out of range.", ErrorCodes::ILLEGAL_INDEX);
return index - 1;
return std::optional<size_t>(index - 1);
}
else if (const auto * name_col = checkAndGetColumnConst<ColumnString>(index_column.get()))
{
return tuple.getPositionByName(name_col->getValue<String>());
auto index = tuple.tryGetPositionByName(name_col->getValue<String>());
if (index.has_value())
{
return index;
}
if (argument_size == 2)
{
throw Exception("Tuple doesn't have element with name '" + name_col->getValue<String>() + "'", ErrorCodes::NOT_FOUND_COLUMN_IN_BLOCK);
}
return std::nullopt;
}
else
throw Exception("Second argument to " + getName() + " must be a constant UInt or String", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);

View File

@ -10,6 +10,7 @@
#include <IO/Progress.h>
#include <Common/filesystemHelpers.h>
#include <sys/stat.h>
#include <Interpreters/Context.h>
#ifdef HAS_RESERVED_IDENTIFIER

View File

@ -1,7 +1,7 @@
#pragma once
#include <IO/ReadBufferFromFileBase.h>
#include <Interpreters/Context.h>
#include <Interpreters/Context_fwd.h>
#include <unistd.h>

View File

@ -23,20 +23,6 @@ ActionLocksManager::ActionLocksManager(ContextPtr context_) : WithContext(contex
{
}
template <typename F>
inline void forEachTable(F && f, ContextPtr context)
{
for (auto & elem : DatabaseCatalog::instance().getDatabases())
for (auto iterator = elem.second->getTablesIterator(context); iterator->isValid(); iterator->next())
if (auto table = iterator->table())
f(table);
}
void ActionLocksManager::add(StorageActionBlockType action_type, ContextPtr context_)
{
forEachTable([&](const StoragePtr & table) { add(table, action_type); }, context_);
}
void ActionLocksManager::add(const StorageID & table_id, StorageActionBlockType action_type)
{
if (auto table = DatabaseCatalog::instance().tryGetTable(table_id, getContext()))
@ -54,14 +40,6 @@ void ActionLocksManager::add(const StoragePtr & table, StorageActionBlockType ac
}
}
void ActionLocksManager::remove(StorageActionBlockType action_type)
{
std::lock_guard lock(mutex);
for (auto & storage_elem : storage_locks)
storage_elem.second.erase(action_type);
}
void ActionLocksManager::remove(const StorageID & table_id, StorageActionBlockType action_type)
{
if (auto table = DatabaseCatalog::instance().tryGetTable(table_id, getContext()))

View File

@ -20,14 +20,10 @@ class ActionLocksManager : WithContext
public:
explicit ActionLocksManager(ContextPtr context);
/// Adds new locks for each table
void add(StorageActionBlockType action_type, ContextPtr context);
/// Add new lock for a table if it has not been already added
void add(const StorageID & table_id, StorageActionBlockType action_type);
void add(const StoragePtr & table, StorageActionBlockType action_type);
/// Remove locks for all tables
void remove(StorageActionBlockType action_type);
/// Removes a lock for a table if it exists
void remove(const StorageID & table_id, StorageActionBlockType action_type);
void remove(const StoragePtr & table, StorageActionBlockType action_type);

View File

@ -989,9 +989,15 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti
if (s.rfind("processor", 0) == 0)
{
/// s390x example: processor 0: version = FF, identification = 039C88, machine = 3906
/// non s390x example: processor : 0
if (auto colon = s.find_first_of(':'))
{
#ifdef __s390x__
core_id = std::stoi(s.substr(10)); /// 10: length of "processor" plus 1
#else
core_id = std::stoi(s.substr(colon + 2));
#endif
}
}
else if (s.rfind("cpu MHz", 0) == 0)

View File

@ -39,7 +39,10 @@ struct JoinedElement
: element(table_element)
{
if (element.table_join)
{
join = element.table_join->as<ASTTableJoin>();
original_kind = join->kind;
}
}
void checkTableName(const DatabaseAndTableWithAlias & table, const String & current_database) const
@ -61,6 +64,8 @@ struct JoinedElement
join->kind = ASTTableJoin::Kind::Cross;
}
ASTTableJoin::Kind getOriginalKind() const { return original_kind; }
bool rewriteCrossToInner(ASTPtr on_expression)
{
if (join->kind != ASTTableJoin::Kind::Cross)
@ -83,6 +88,8 @@ struct JoinedElement
private:
const ASTTablesInSelectQueryElement & element;
ASTTableJoin * join = nullptr;
ASTTableJoin::Kind original_kind;
};
bool isAllowedToRewriteCrossJoin(const ASTPtr & node, const Aliases & aliases)
@ -251,10 +258,17 @@ void CrossToInnerJoinMatcher::visit(ASTSelectQuery & select, ASTPtr &, Data & da
}
}
if (data.cross_to_inner_join_rewrite > 1 && !rewritten)
if (joined.getOriginalKind() == ASTTableJoin::Kind::Comma &&
data.cross_to_inner_join_rewrite > 1 &&
!rewritten)
{
throw Exception(ErrorCodes::INCORRECT_QUERY, "Failed to rewrite '{} WHERE {}' to INNER JOIN",
query_before, queryToString(select.where()));
throw Exception(
ErrorCodes::INCORRECT_QUERY,
"Failed to rewrite comma join to INNER. "
"Please, try to simplify WHERE section "
"or set the setting `cross_to_inner_join_rewrite` to 1 to allow slow CROSS JOIN for this case "
"(cannot rewrite '{} WHERE {}' to INNER JOIN)",
query_before, queryToString(select.where()));
}
}
}

View File

@ -45,6 +45,9 @@
#include <Common/typeid_cast.h>
#include <Common/StringUtils/StringUtils.h>
#include <Columns/ColumnNullable.h>
#include <Core/ColumnsWithTypeAndName.h>
#include <DataTypes/IDataType.h>
#include <Core/SettingsEnums.h>
#include <Core/ColumnNumbers.h>
#include <Core/Names.h>
@ -345,6 +348,7 @@ void ExpressionAnalyzer::analyzeAggregation(ActionsDAGPtr & temp_actions)
group_by_kind = GroupByKind::GROUPING_SETS;
else
group_by_kind = GroupByKind::ORDINARY;
bool use_nulls = group_by_kind != GroupByKind::ORDINARY && getContext()->getSettingsRef().group_by_use_nulls;
/// For GROUPING SETS with multiple groups we always add virtual __grouping_set column
/// With set number, which is used as an additional key at the stage of merging aggregating data.
@ -399,7 +403,7 @@ void ExpressionAnalyzer::analyzeAggregation(ActionsDAGPtr & temp_actions)
}
}
NameAndTypePair key{column_name, node->result_type};
NameAndTypePair key{column_name, use_nulls ? makeNullableSafe(node->result_type) : node->result_type };
grouping_set_list.push_back(key);
@ -453,7 +457,7 @@ void ExpressionAnalyzer::analyzeAggregation(ActionsDAGPtr & temp_actions)
}
}
NameAndTypePair key{column_name, node->result_type};
NameAndTypePair key = NameAndTypePair{ column_name, use_nulls ? makeNullableSafe(node->result_type) : node->result_type };
/// Aggregation keys are uniqued.
if (!unique_keys.contains(key.name))
@ -1489,6 +1493,28 @@ void SelectQueryExpressionAnalyzer::appendExpressionsAfterWindowFunctions(Expres
}
}
void SelectQueryExpressionAnalyzer::appendGroupByModifiers(ActionsDAGPtr & before_aggregation, ExpressionActionsChain & chain, bool /* only_types */)
{
const auto * select_query = getAggregatingQuery();
if (!select_query->groupBy() || !(select_query->group_by_with_rollup || select_query->group_by_with_cube))
return;
auto source_columns = before_aggregation->getResultColumns();
ColumnsWithTypeAndName result_columns;
for (const auto & source_column : source_columns)
{
if (source_column.type->canBeInsideNullable())
result_columns.emplace_back(makeNullableSafe(source_column.type), source_column.name);
else
result_columns.push_back(source_column);
}
ExpressionActionsChain::Step & step = chain.lastStep(before_aggregation->getNamesAndTypesList());
step.actions() = ActionsDAG::makeConvertingActions(source_columns, result_columns, ActionsDAG::MatchColumnsMode::Position);
}
void SelectQueryExpressionAnalyzer::appendSelectSkipWindowExpressions(ExpressionActionsChain::Step & step, ASTPtr const & node)
{
if (auto * function = node->as<ASTFunction>())
@ -1816,6 +1842,7 @@ ExpressionAnalysisResult::ExpressionAnalysisResult(
bool second_stage_,
bool only_types,
const FilterDAGInfoPtr & filter_info_,
const FilterDAGInfoPtr & additional_filter,
const Block & source_header)
: first_stage(first_stage_)
, second_stage(second_stage_)
@ -1882,6 +1909,13 @@ ExpressionAnalysisResult::ExpressionAnalysisResult(
columns_for_final.begin(), columns_for_final.end());
}
if (storage && additional_filter)
{
Names columns_for_additional_filter = additional_filter->actions->getRequiredColumnsNames();
additional_required_columns_after_prewhere.insert(additional_required_columns_after_prewhere.end(),
columns_for_additional_filter.begin(), columns_for_additional_filter.end());
}
if (storage && filter_info_)
{
filter_info = filter_info_;
@ -1956,6 +1990,9 @@ ExpressionAnalysisResult::ExpressionAnalysisResult(
query_analyzer.appendAggregateFunctionsArguments(chain, only_types || !first_stage);
before_aggregation = chain.getLastActions();
if (settings.group_by_use_nulls)
query_analyzer.appendGroupByModifiers(before_aggregation, chain, only_types);
finalize_chain(chain);
if (query_analyzer.appendHaving(chain, only_types || !second_stage))

View File

@ -281,6 +281,7 @@ struct ExpressionAnalysisResult
bool second_stage,
bool only_types,
const FilterDAGInfoPtr & filter_info,
const FilterDAGInfoPtr & additional_filter, /// for setting additional_filters
const Block & source_header);
/// Filter for row-level security.
@ -412,6 +413,8 @@ private:
void appendExpressionsAfterWindowFunctions(ExpressionActionsChain & chain, bool only_types);
void appendSelectSkipWindowExpressions(ExpressionActionsChain::Step & step, ASTPtr const & node);
void appendGroupByModifiers(ActionsDAGPtr & before_aggregation, ExpressionActionsChain & chain, bool only_types);
/// After aggregation:
bool appendHaving(ExpressionActionsChain & chain, bool only_types);
/// appendSelect

View File

@ -4,6 +4,13 @@
#include <Processors/QueryPlan/BuildQueryPipelineSettings.h>
#include <Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.h>
#include <QueryPipeline/QueryPipelineBuilder.h>
#include <Parsers/ExpressionListParsers.h>
#include <Parsers/parseQuery.h>
#include <Interpreters/TreeRewriter.h>
#include <Interpreters/ActionsDAG.h>
#include <Interpreters/ExpressionAnalyzer.h>
#include <Processors/QueryPlan/IQueryPlanStep.h>
#include <Processors/QueryPlan/FilterStep.h>
namespace DB
{
@ -81,6 +88,53 @@ void IInterpreterUnionOrSelectQuery::setQuota(QueryPipeline & pipeline) const
pipeline.setQuota(quota);
}
static ASTPtr parseAdditionalPostFilter(const Context & context)
{
const auto & settings = context.getSettingsRef();
const String & filter = settings.additional_result_filter;
if (filter.empty())
return nullptr;
ParserExpression parser;
return parseQuery(
parser, filter.data(), filter.data() + filter.size(),
"additional filter", settings.max_query_size, settings.max_parser_depth);
}
static ActionsDAGPtr makeAdditionalPostFilter(ASTPtr & ast, ContextPtr context, const Block & header)
{
auto syntax_result = TreeRewriter(context).analyze(ast, header.getNamesAndTypesList());
String result_column_name = ast->getColumnName();
auto dag = ExpressionAnalyzer(ast, syntax_result, context).getActionsDAG(false, false);
const ActionsDAG::Node * result_node = &dag->findInIndex(result_column_name);
auto & index = dag->getIndex();
index.clear();
index.reserve(dag->getInputs().size() + 1);
for (const auto * node : dag->getInputs())
index.push_back(node);
index.push_back(result_node);
return dag;
}
void IInterpreterUnionOrSelectQuery::addAdditionalPostFilter(QueryPlan & plan) const
{
if (options.subquery_depth != 0)
return;
auto ast = parseAdditionalPostFilter(*context);
if (!ast)
return;
auto dag = makeAdditionalPostFilter(ast, context, plan.getCurrentDataStream().header);
std::string filter_name = dag->getIndex().back()->result_name;
auto filter_step = std::make_unique<FilterStep>(
plan.getCurrentDataStream(), std::move(dag), std::move(filter_name), true);
filter_step->setStepDescription("Additional result filter");
plan.addStep(std::move(filter_step));
}
void IInterpreterUnionOrSelectQuery::addStorageLimits(const StorageLimitsList & limits)
{
for (const auto & val : limits)

View File

@ -72,6 +72,8 @@ protected:
/// Set quotas to query pipeline.
void setQuota(QueryPipeline & pipeline) const;
/// Add filter from additional_post_filter setting.
void addAdditionalPostFilter(QueryPlan & plan) const;
static StorageLimits getStorageLimits(const Context & context, const SelectQueryOptions & options);
};

View File

@ -146,14 +146,14 @@ namespace
struct QueryASTSettings
{
bool graph = false;
bool rewrite = false;
bool optimize = false;
constexpr static char name[] = "AST";
std::unordered_map<std::string, std::reference_wrapper<bool>> boolean_settings =
{
{"graph", graph},
{"rewrite", rewrite}
{"optimize", optimize}
};
};
@ -280,7 +280,7 @@ QueryPipeline InterpreterExplainQuery::executeImpl()
case ASTExplainQuery::ParsedAST:
{
auto settings = checkAndGetSettings<QueryASTSettings>(ast.getSettings());
if (settings.rewrite)
if (settings.optimize)
{
ExplainAnalyzedSyntaxVisitor::Data data(getContext());
ExplainAnalyzedSyntaxVisitor(data).visit(query);

View File

@ -138,6 +138,7 @@ void InterpreterSelectIntersectExceptQuery::buildQueryPlan(QueryPlan & query_pla
auto step = std::make_unique<IntersectOrExceptStep>(std::move(data_streams), final_operator, max_threads);
query_plan.unitePlans(std::move(step), std::move(plans));
addAdditionalPostFilter(query_plan);
query_plan.addInterpreterContext(context);
}

View File

@ -109,8 +109,17 @@ namespace ErrorCodes
}
/// Assumes `storage` is set and the table filter (row-level security) is not empty.
String InterpreterSelectQuery::generateFilterActions(ActionsDAGPtr & actions, const Names & prerequisite_columns) const
FilterDAGInfoPtr generateFilterActions(
const StorageID & table_id,
const ASTPtr & row_policy_filter,
const ContextPtr & context,
const StoragePtr & storage,
const StorageSnapshotPtr & storage_snapshot,
const StorageMetadataPtr & metadata_snapshot,
Names & prerequisite_columns)
{
auto filter_info = std::make_shared<FilterDAGInfo>();
const auto & db_name = table_id.getDatabaseName();
const auto & table_name = table_id.getTableName();
@ -146,16 +155,24 @@ String InterpreterSelectQuery::generateFilterActions(ActionsDAGPtr & actions, co
/// Using separate expression analyzer to prevent any possible alias injection
auto syntax_result = TreeRewriter(context).analyzeSelect(query_ast, TreeRewriterResult({}, storage, storage_snapshot));
SelectQueryExpressionAnalyzer analyzer(query_ast, syntax_result, context, metadata_snapshot);
actions = analyzer.simpleSelectActions();
filter_info->actions = analyzer.simpleSelectActions();
auto column_name = expr_list->children.at(0)->getColumnName();
actions->removeUnusedActions(NameSet{column_name});
actions->projectInput(false);
filter_info->column_name = expr_list->children.at(0)->getColumnName();
filter_info->actions->removeUnusedActions(NameSet{filter_info->column_name});
filter_info->actions->projectInput(false);
for (const auto * node : actions->getInputs())
actions->getIndex().push_back(node);
for (const auto * node : filter_info->actions->getInputs())
filter_info->actions->getIndex().push_back(node);
return column_name;
auto required_columns_from_filter = filter_info->actions->getRequiredColumns();
for (const auto & column : required_columns_from_filter)
{
if (prerequisite_columns.end() == std::find(prerequisite_columns.begin(), prerequisite_columns.end(), column.name))
prerequisite_columns.push_back(column.name);
}
return filter_info;
}
InterpreterSelectQuery::InterpreterSelectQuery(
@ -269,6 +286,33 @@ static void checkAccessRightsForSelect(
context->checkAccess(AccessType::SELECT, table_id, syntax_analyzer_result.requiredSourceColumnsForAccessCheck());
}
static ASTPtr parseAdditionalFilterConditionForTable(
const Map & setting,
const DatabaseAndTableWithAlias & target,
const Context & context)
{
for (size_t i = 0; i < setting.size(); ++i)
{
const auto & tuple = setting[i].safeGet<const Tuple &>();
auto & table = tuple.at(0).safeGet<String>();
auto & filter = tuple.at(1).safeGet<String>();
if (table == target.alias ||
(table == target.table && context.getCurrentDatabase() == target.database) ||
(table == target.database + '.' + target.table))
{
/// Try to parse expression
ParserExpression parser;
const auto & settings = context.getSettingsRef();
return parseQuery(
parser, filter.data(), filter.data() + filter.size(),
"additional filter", settings.max_query_size, settings.max_parser_depth);
}
}
return nullptr;
}
/// Returns true if we should ignore quotas and limits for a specified table in the system database.
static bool shouldIgnoreQuotaAndLimits(const StorageID & table_id)
{
@ -448,6 +492,10 @@ InterpreterSelectQuery::InterpreterSelectQuery(
if (storage)
view = dynamic_cast<StorageView *>(storage.get());
if (!settings.additional_table_filters.value.empty() && storage && !joined_tables.tablesWithColumns().empty())
query_info.additional_filter_ast = parseAdditionalFilterConditionForTable(
settings.additional_table_filters, joined_tables.tablesWithColumns().front().table, *context);
auto analyze = [&] (bool try_move_to_prewhere)
{
/// Allow push down and other optimizations for VIEW: replace with subquery and rewrite it.
@ -566,16 +614,16 @@ InterpreterSelectQuery::InterpreterSelectQuery(
/// Fix source_header for filter actions.
if (row_policy_filter)
{
filter_info = std::make_shared<FilterDAGInfo>();
filter_info->column_name = generateFilterActions(filter_info->actions, required_columns);
filter_info = generateFilterActions(
table_id, row_policy_filter, context, storage, storage_snapshot, metadata_snapshot, required_columns);
}
auto required_columns_from_filter = filter_info->actions->getRequiredColumns();
if (query_info.additional_filter_ast)
{
additional_filter_info = generateFilterActions(
table_id, query_info.additional_filter_ast, context, storage, storage_snapshot, metadata_snapshot, required_columns);
for (const auto & column : required_columns_from_filter)
{
if (required_columns.end() == std::find(required_columns.begin(), required_columns.end(), column.name))
required_columns.push_back(column.name);
}
additional_filter_info->do_remove_column = true;
}
source_header = storage_snapshot->getSampleBlockForColumns(required_columns);
@ -735,7 +783,7 @@ Block InterpreterSelectQuery::getSampleBlockImpl()
&& options.to_stage > QueryProcessingStage::WithMergeableState;
analysis_result = ExpressionAnalysisResult(
*query_analyzer, metadata_snapshot, first_stage, second_stage, options.only_analyze, filter_info, source_header);
*query_analyzer, metadata_snapshot, first_stage, second_stage, options.only_analyze, filter_info, additional_filter_info, source_header);
if (options.to_stage == QueryProcessingStage::Enum::FetchColumns)
{
@ -786,8 +834,16 @@ Block InterpreterSelectQuery::getSampleBlockImpl()
if (analysis_result.use_grouping_set_key)
res.insert({ nullptr, std::make_shared<DataTypeUInt64>(), "__grouping_set" });
for (const auto & key : query_analyzer->aggregationKeys())
res.insert({nullptr, header.getByName(key.name).type, key.name});
if (context->getSettingsRef().group_by_use_nulls && analysis_result.use_grouping_set_key)
{
for (const auto & key : query_analyzer->aggregationKeys())
res.insert({nullptr, makeNullableSafe(header.getByName(key.name).type), key.name});
}
else
{
for (const auto & key : query_analyzer->aggregationKeys())
res.insert({nullptr, header.getByName(key.name).type, key.name});
}
for (const auto & aggregate : query_analyzer->aggregates())
{
@ -1295,6 +1351,18 @@ void InterpreterSelectQuery::executeImpl(QueryPlan & query_plan, std::optional<P
query_plan.addStep(std::move(row_level_security_step));
}
if (additional_filter_info)
{
auto additional_filter_step = std::make_unique<FilterStep>(
query_plan.getCurrentDataStream(),
additional_filter_info->actions,
additional_filter_info->column_name,
additional_filter_info->do_remove_column);
additional_filter_step->setStepDescription("Additional filter");
query_plan.addStep(std::move(additional_filter_step));
}
if (expressions.before_array_join)
{
QueryPlanStepPtr before_array_join_step
@ -1937,6 +2005,7 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc
&& storage
&& storage->getName() != "MaterializedMySQL"
&& !row_policy_filter
&& !query_info.additional_filter_ast
&& processing_stage == QueryProcessingStage::FetchColumns
&& query_analyzer->hasAggregation()
&& (query_analyzer->aggregates().size() == 1)
@ -2036,6 +2105,7 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc
&& !query.limit_with_ties
&& !query.prewhere()
&& !query.where()
&& !query_info.additional_filter_ast
&& !query.groupBy()
&& !query.having()
&& !query.orderBy()
@ -2326,6 +2396,7 @@ void InterpreterSelectQuery::executeAggregation(QueryPlan & query_plan, const Ac
merge_threads,
temporary_data_merge_threads,
storage_has_evenly_distributed_read,
settings.group_by_use_nulls,
std::move(group_by_info),
std::move(group_by_sort_description),
should_produce_results_in_order_of_bucket_number);
@ -2402,9 +2473,9 @@ void InterpreterSelectQuery::executeRollupOrCube(QueryPlan & query_plan, Modific
QueryPlanStepPtr step;
if (modificator == Modificator::ROLLUP)
step = std::make_unique<RollupStep>(query_plan.getCurrentDataStream(), std::move(params), final);
step = std::make_unique<RollupStep>(query_plan.getCurrentDataStream(), std::move(params), final, settings.group_by_use_nulls);
else if (modificator == Modificator::CUBE)
step = std::make_unique<CubeStep>(query_plan.getCurrentDataStream(), std::move(params), final);
step = std::make_unique<CubeStep>(query_plan.getCurrentDataStream(), std::move(params), final, settings.group_by_use_nulls);
query_plan.addStep(std::move(step));
}

View File

@ -189,8 +189,6 @@ private:
void
executeMergeSorted(QueryPlan & query_plan, const SortDescription & sort_description, UInt64 limit, const std::string & description);
String generateFilterActions(ActionsDAGPtr & actions, const Names & prerequisite_columns = {}) const;
enum class Modificator
{
ROLLUP = 0,
@ -217,6 +215,9 @@ private:
ASTPtr row_policy_filter;
FilterDAGInfoPtr filter_info;
/// For additional_filter setting.
FilterDAGInfoPtr additional_filter_info;
QueryProcessingStage::Enum from_stage = QueryProcessingStage::FetchColumns;
/// List of columns to read to execute the query.

View File

@ -357,6 +357,7 @@ void InterpreterSelectWithUnionQuery::buildQueryPlan(QueryPlan & query_plan)
}
}
addAdditionalPostFilter(query_plan);
query_plan.addInterpreterContext(context);
}

View File

@ -5,7 +5,6 @@
#include <Parsers/IdentifierQuotingStyle.h>
#include <Common/Exception.h>
#include <Common/TypePromotion.h>
#include <Core/Settings.h>
#include <IO/WriteBufferFromString.h>
#include <algorithm>
@ -26,7 +25,7 @@ namespace ErrorCodes
using IdentifierNameSet = std::set<String>;
class WriteBuffer;
using Strings = std::vector<String>;
/** Element of the syntax tree (hereinafter - directed acyclic graph with elements of semantics)
*/

Some files were not shown because too many files have changed in this diff Show More