diff --git a/.github/ISSUE_TEMPLATE/85_bug-report.md b/.github/ISSUE_TEMPLATE/85_bug-report.md index bd59d17db3f..d78474670ff 100644 --- a/.github/ISSUE_TEMPLATE/85_bug-report.md +++ b/.github/ISSUE_TEMPLATE/85_bug-report.md @@ -2,7 +2,7 @@ name: Bug report about: Wrong behaviour (visible to users) in official ClickHouse release. title: '' -labels: bug +labels: 'potential bug' assignees: '' --- diff --git a/CHANGELOG.md b/CHANGELOG.md index 10ed4dae1c9..103d8e40fd9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,15 +2,15 @@ #### New Features -* Collect common system metrics (in `system.asynchronous_metrics` and `system.asynchronous_metric_log`) on CPU usage, disk usage, memory usage, IO, network, files, load average, CPU frequencies, thermal sensors, EDAC counters, system uptime; also added metrics about the scheduling jitter and the time spent collecting the metrics. It works similar to `atop` in ClickHouse and allows access to monitoring data even if you have no additional tools installed. Close [#9430](https://github.com/ClickHouse/ClickHouse/issues/9430). [#24416](https://github.com/ClickHouse/ClickHouse/pull/24416) ([Yegor Levankov](https://github.com/elevankoff)). +* Add support for a part of SQL/JSON standard. [#24148](https://github.com/ClickHouse/ClickHouse/pull/24148) ([l1tsolaiki](https://github.com/l1tsolaiki), [Kseniia Sumarokova](https://github.com/kssenii)). +* Collect common system metrics (in `system.asynchronous_metrics` and `system.asynchronous_metric_log`) on CPU usage, disk usage, memory usage, IO, network, files, load average, CPU frequencies, thermal sensors, EDAC counters, system uptime; also added metrics about the scheduling jitter and the time spent collecting the metrics. It works similar to `atop` in ClickHouse and allows access to monitoring data even if you have no additional tools installed. Close [#9430](https://github.com/ClickHouse/ClickHouse/issues/9430). [#24416](https://github.com/ClickHouse/ClickHouse/pull/24416) ([alexey-milovidov](https://github.com/alexey-milovidov), [Yegor Levankov](https://github.com/elevankoff)). +* Add MaterializedPostgreSQL table engine and database engine. This database engine allows replicating a whole database or any subset of database tables. [#20470](https://github.com/ClickHouse/ClickHouse/pull/20470) ([Kseniia Sumarokova](https://github.com/kssenii)). * Add new functions `leftPad()`, `rightPad()`, `leftPadUTF8()`, `rightPadUTF8()`. [#26075](https://github.com/ClickHouse/ClickHouse/pull/26075) ([Vitaly Baranov](https://github.com/vitlibar)). * Add the `FIRST` keyword to the `ADD INDEX` command to be able to add the index at the beginning of the indices list. [#25904](https://github.com/ClickHouse/ClickHouse/pull/25904) ([xjewer](https://github.com/xjewer)). * Introduce `system.data_skipping_indices` table containing information about existing data skipping indices. Close [#7659](https://github.com/ClickHouse/ClickHouse/issues/7659). [#25693](https://github.com/ClickHouse/ClickHouse/pull/25693) ([Dmitry Novik](https://github.com/novikd)). * Add `bin`/`unbin` functions. [#25609](https://github.com/ClickHouse/ClickHouse/pull/25609) ([zhaoyu](https://github.com/zxc111)). -* Support `Map` and `(U)Int128`, `U(Int256) types in `mapAdd` and `mapSubtract` functions. [#25596](https://github.com/ClickHouse/ClickHouse/pull/25596) ([Ildus Kurbangaliev](https://github.com/ildus)). +* Support `Map` and `UInt128`, `Int128`, `UInt256`, `Int256` types in `mapAdd` and `mapSubtract` functions. [#25596](https://github.com/ClickHouse/ClickHouse/pull/25596) ([Ildus Kurbangaliev](https://github.com/ildus)). * Support `DISTINCT ON (columns)` expression, close [#25404](https://github.com/ClickHouse/ClickHouse/issues/25404). [#25589](https://github.com/ClickHouse/ClickHouse/pull/25589) ([Zijie Lu](https://github.com/TszKitLo40)). -* Add support for a part of SQLJSON standard. [#24148](https://github.com/ClickHouse/ClickHouse/pull/24148) ([l1tsolaiki](https://github.com/l1tsolaiki)). -* Add MaterializedPostgreSQL table engine and database engine. This database engine allows replicating a whole database or any subset of database tables. [#20470](https://github.com/ClickHouse/ClickHouse/pull/20470) ([Kseniia Sumarokova](https://github.com/kssenii)). * Add an ability to reset a custom setting to default and remove it from the table's metadata. It allows rolling back the change without knowing the system/config's default. Closes [#14449](https://github.com/ClickHouse/ClickHouse/issues/14449). [#17769](https://github.com/ClickHouse/ClickHouse/pull/17769) ([xjewer](https://github.com/xjewer)). * Render pipelines as graphs in Web UI if `EXPLAIN PIPELINE graph = 1` query is submitted. [#26067](https://github.com/ClickHouse/ClickHouse/pull/26067) ([alexey-milovidov](https://github.com/alexey-milovidov)). @@ -21,11 +21,11 @@ #### Improvements -* Use `Map` data type for system logs tables (`system.query_log`, `system.query_thread_log`, `system.processes`, `system.opentelemetry_span_log`). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes [#18698](https://github.com/ClickHouse/ClickHouse/issues/18698). [#23934](https://github.com/ClickHouse/ClickHouse/pull/23934), [#25773](https://github.com/ClickHouse/ClickHouse/pull/25773) ([hexiaoting](https://github.com/hexiaoting), [sundy-li](https://github.com/sundy-li)). +* Use `Map` data type for system logs tables (`system.query_log`, `system.query_thread_log`, `system.processes`, `system.opentelemetry_span_log`). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes [#18698](https://github.com/ClickHouse/ClickHouse/issues/18698). [#23934](https://github.com/ClickHouse/ClickHouse/pull/23934), [#25773](https://github.com/ClickHouse/ClickHouse/pull/25773) ([hexiaoting](https://github.com/hexiaoting), [sundy-li](https://github.com/sundy-li), [Maksim Kita](https://github.com/kitaisreal)). * For a dictionary with a complex key containing only one attribute, allow not wrapping the key expression in tuple for functions `dictGet`, `dictHas`. [#26130](https://github.com/ClickHouse/ClickHouse/pull/26130) ([Maksim Kita](https://github.com/kitaisreal)). * Implement function `bin`/`hex` from `AggregateFunction` states. [#26094](https://github.com/ClickHouse/ClickHouse/pull/26094) ([zhaoyu](https://github.com/zxc111)). * Support arguments of `UUID` type for `empty` and `notEmpty` functions. `UUID` is empty if it is all zeros (nil UUID). Closes [#3446](https://github.com/ClickHouse/ClickHouse/issues/3446). [#25974](https://github.com/ClickHouse/ClickHouse/pull/25974) ([zhaoyu](https://github.com/zxc111)). -* Fix error with query `SET SQL_SELECT_LIMIT` in MySQL protocol. Closes [#17115](https://github.com/ClickHouse/ClickHouse/issues/17115). [#25972](https://github.com/ClickHouse/ClickHouse/pull/25972) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Add support for `SET SQL_SELECT_LIMIT` in MySQL protocol. Closes [#17115](https://github.com/ClickHouse/ClickHouse/issues/17115). [#25972](https://github.com/ClickHouse/ClickHouse/pull/25972) ([Kseniia Sumarokova](https://github.com/kssenii)). * More instrumentation for network interaction: add counters for recv/send bytes; add gauges for recvs/sends. Added missing documentation. Close [#5897](https://github.com/ClickHouse/ClickHouse/issues/5897). [#25962](https://github.com/ClickHouse/ClickHouse/pull/25962) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Add setting `optimize_move_to_prewhere_if_final`. If query has `FINAL`, the optimization `move_to_prewhere` will be enabled only if both `optimize_move_to_prewhere` and `optimize_move_to_prewhere_if_final` are enabled. Closes [#8684](https://github.com/ClickHouse/ClickHouse/issues/8684). [#25940](https://github.com/ClickHouse/ClickHouse/pull/25940) ([Kseniia Sumarokova](https://github.com/kssenii)). * Allow complex quoted identifiers of JOINed tables. Close [#17861](https://github.com/ClickHouse/ClickHouse/issues/17861). [#25924](https://github.com/ClickHouse/ClickHouse/pull/25924) ([alexey-milovidov](https://github.com/alexey-milovidov)). @@ -37,7 +37,7 @@ * Support for queries with a column named `"null"` (it must be specified in back-ticks or double quotes) and `ON CLUSTER`. Closes [#24035](https://github.com/ClickHouse/ClickHouse/issues/24035). [#25907](https://github.com/ClickHouse/ClickHouse/pull/25907) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Support `LowCardinality`, `Decimal`, and `UUID` for `JSONExtract`. Closes [#24606](https://github.com/ClickHouse/ClickHouse/issues/24606). [#25900](https://github.com/ClickHouse/ClickHouse/pull/25900) ([Kseniia Sumarokova](https://github.com/kssenii)). * Convert history file from `readline` format to `replxx` format. [#25888](https://github.com/ClickHouse/ClickHouse/pull/25888) ([Azat Khuzhin](https://github.com/azat)). -* Fix bug which can lead to intersecting parts after `DROP PART` or background deletion of an empty part. [#25884](https://github.com/ClickHouse/ClickHouse/pull/25884) ([alesapin](https://github.com/alesapin)). +* Fix an issue which can lead to intersecting parts after `DROP PART` or background deletion of an empty part. [#25884](https://github.com/ClickHouse/ClickHouse/pull/25884) ([alesapin](https://github.com/alesapin)). * Better handling of lost parts for `ReplicatedMergeTree` tables. Fixes rare inconsistencies in `ReplicationQueue`. Fixes [#10368](https://github.com/ClickHouse/ClickHouse/issues/10368). [#25820](https://github.com/ClickHouse/ClickHouse/pull/25820) ([alesapin](https://github.com/alesapin)). * Allow starting clickhouse-client with unreadable working directory. [#25817](https://github.com/ClickHouse/ClickHouse/pull/25817) ([ianton-ru](https://github.com/ianton-ru)). * Fix "No available columns" error for `Merge` storage. [#25801](https://github.com/ClickHouse/ClickHouse/pull/25801) ([Azat Khuzhin](https://github.com/azat)). @@ -48,7 +48,7 @@ * Support materialized and aliased columns in JOIN, close [#13274](https://github.com/ClickHouse/ClickHouse/issues/13274). [#25634](https://github.com/ClickHouse/ClickHouse/pull/25634) ([Vladimir C](https://github.com/vdimir)). * Fix possible logical race condition between `ALTER TABLE ... DETACH` and background merges. [#25605](https://github.com/ClickHouse/ClickHouse/pull/25605) ([Azat Khuzhin](https://github.com/azat)). * Make `NetworkReceiveElapsedMicroseconds` metric to correctly include the time spent waiting for data from the client to `INSERT`. Close [#9958](https://github.com/ClickHouse/ClickHouse/issues/9958). [#25602](https://github.com/ClickHouse/ClickHouse/pull/25602) ([alexey-milovidov](https://github.com/alexey-milovidov)). -* Support `TRUNCATE TABLE` for StorageS3 and StorageHDFS. Close [#25530](https://github.com/ClickHouse/ClickHouse/issues/25530). [#25550](https://github.com/ClickHouse/ClickHouse/pull/25550) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Support `TRUNCATE TABLE` for S3 and HDFS. Close [#25530](https://github.com/ClickHouse/ClickHouse/issues/25530). [#25550](https://github.com/ClickHouse/ClickHouse/pull/25550) ([Kseniia Sumarokova](https://github.com/kssenii)). * Support for dynamic reloading of config to change number of threads in pool for background jobs execution (merges, mutations, fetches). [#25548](https://github.com/ClickHouse/ClickHouse/pull/25548) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). * Allow extracting of non-string element as string using `JSONExtract`. This is for [#25414](https://github.com/ClickHouse/ClickHouse/issues/25414). [#25452](https://github.com/ClickHouse/ClickHouse/pull/25452) ([Amos Bird](https://github.com/amosbird)). * Support regular expression in `Database` argument for `StorageMerge`. Close [#776](https://github.com/ClickHouse/ClickHouse/issues/776). [#25064](https://github.com/ClickHouse/ClickHouse/pull/25064) ([flynn](https://github.com/ucasfl)). @@ -60,13 +60,13 @@ * Fix incorrect `SET ROLE` in some cases. [#26707](https://github.com/ClickHouse/ClickHouse/pull/26707) ([Vitaly Baranov](https://github.com/vitlibar)). * Fix potential `nullptr` dereference in window functions. Fix [#25276](https://github.com/ClickHouse/ClickHouse/issues/25276). [#26668](https://github.com/ClickHouse/ClickHouse/pull/26668) ([Alexander Kuzmenkov](https://github.com/akuzm)). * Fix incorrect function names of `groupBitmapAnd/Or/Xor`. Fix [#26557](https://github.com/ClickHouse/ClickHouse/pull/26557) ([Amos Bird](https://github.com/amosbird)). -* Fix crash in rabbitmq shutdown in case rabbitmq setup was not started. Closes [#26504](https://github.com/ClickHouse/ClickHouse/issues/26504). [#26529](https://github.com/ClickHouse/ClickHouse/pull/26529) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix crash in RabbitMQ shutdown in case RabbitMQ setup was not started. Closes [#26504](https://github.com/ClickHouse/ClickHouse/issues/26504). [#26529](https://github.com/ClickHouse/ClickHouse/pull/26529) ([Kseniia Sumarokova](https://github.com/kssenii)). * Fix issues with `CREATE DICTIONARY` query if dictionary name or database name was quoted. Closes [#26491](https://github.com/ClickHouse/ClickHouse/issues/26491). [#26508](https://github.com/ClickHouse/ClickHouse/pull/26508) ([Maksim Kita](https://github.com/kitaisreal)). * Fix broken name resolution after rewriting column aliases. Fix [#26432](https://github.com/ClickHouse/ClickHouse/issues/26432). [#26475](https://github.com/ClickHouse/ClickHouse/pull/26475) ([Amos Bird](https://github.com/amosbird)). * Fix infinite non-joined block stream in `partial_merge_join` close [#26325](https://github.com/ClickHouse/ClickHouse/issues/26325). [#26374](https://github.com/ClickHouse/ClickHouse/pull/26374) ([Vladimir C](https://github.com/vdimir)). * Fix possible crash when login as dropped user. Fix [#26073](https://github.com/ClickHouse/ClickHouse/issues/26073). [#26363](https://github.com/ClickHouse/ClickHouse/pull/26363) ([Vitaly Baranov](https://github.com/vitlibar)). * Fix `optimize_distributed_group_by_sharding_key` for multiple columns (leads to incorrect result w/ `optimize_skip_unused_shards=1`/`allow_nondeterministic_optimize_skip_unused_shards=1` and multiple columns in sharding key expression). [#26353](https://github.com/ClickHouse/ClickHouse/pull/26353) ([Azat Khuzhin](https://github.com/azat)). - * `CAST` from `Date` to `DateTime` (or `DateTime64`) was not using the timezone of the `DateTime` type. It can also affect the comparison between `Date` and `DateTime`. Inference of the common type for `Date` and `DateTime` also was not using the corresponding timezone. It affected the results of function `if` and array construction. Closes [#24128](https://github.com/ClickHouse/ClickHouse/issues/24128). [#24129](https://github.com/ClickHouse/ClickHouse/pull/24129) ([Maksim Kita](https://github.com/kitaisreal)). +* `CAST` from `Date` to `DateTime` (or `DateTime64`) was not using the timezone of the `DateTime` type. It can also affect the comparison between `Date` and `DateTime`. Inference of the common type for `Date` and `DateTime` also was not using the corresponding timezone. It affected the results of function `if` and array construction. Closes [#24128](https://github.com/ClickHouse/ClickHouse/issues/24128). [#24129](https://github.com/ClickHouse/ClickHouse/pull/24129) ([Maksim Kita](https://github.com/kitaisreal)). * Fixed rare bug in lost replica recovery that may cause replicas to diverge. [#26321](https://github.com/ClickHouse/ClickHouse/pull/26321) ([tavplubix](https://github.com/tavplubix)). * Fix zstd decompression in case there are escape sequences at the end of internal buffer. Closes [#26013](https://github.com/ClickHouse/ClickHouse/issues/26013). [#26314](https://github.com/ClickHouse/ClickHouse/pull/26314) ([Kseniia Sumarokova](https://github.com/kssenii)). * Fix logical error on join with totals, close [#26017](https://github.com/ClickHouse/ClickHouse/issues/26017). [#26250](https://github.com/ClickHouse/ClickHouse/pull/26250) ([Vladimir C](https://github.com/vdimir)). @@ -75,7 +75,7 @@ * Fix possible crash in `pointInPolygon` if the setting `validate_polygons` is turned off. [#26113](https://github.com/ClickHouse/ClickHouse/pull/26113) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix throwing exception when iterate over non-existing remote directory. [#26087](https://github.com/ClickHouse/ClickHouse/pull/26087) ([ianton-ru](https://github.com/ianton-ru)). * Fix rare server crash because of `abort` in ZooKeeper client. Fixes [#25813](https://github.com/ClickHouse/ClickHouse/issues/25813). [#26079](https://github.com/ClickHouse/ClickHouse/pull/26079) ([alesapin](https://github.com/alesapin)). -* Fix wrong thread estimation for right subquery join in some cases. Close [#24075](https://github.com/ClickHouse/ClickHouse/issues/24075). [#26052](https://github.com/ClickHouse/ClickHouse/pull/26052) ([Vladimir C](https://github.com/vdimir)). +* Fix wrong thread count estimation for right subquery join in some cases. Close [#24075](https://github.com/ClickHouse/ClickHouse/issues/24075). [#26052](https://github.com/ClickHouse/ClickHouse/pull/26052) ([Vladimir C](https://github.com/vdimir)). * Fixed incorrect `sequence_id` in MySQL protocol packets that ClickHouse sends on exception during query execution. It might cause MySQL client to reset connection to ClickHouse server. Fixes [#21184](https://github.com/ClickHouse/ClickHouse/issues/21184). [#26051](https://github.com/ClickHouse/ClickHouse/pull/26051) ([tavplubix](https://github.com/tavplubix)). * Fix possible mismatched header when using normal projection with `PREWHERE`. Fix [#26020](https://github.com/ClickHouse/ClickHouse/issues/26020). [#26038](https://github.com/ClickHouse/ClickHouse/pull/26038) ([Amos Bird](https://github.com/amosbird)). * Fix formatting of type `Map` with integer keys to `JSON`. [#25982](https://github.com/ClickHouse/ClickHouse/pull/25982) ([Anton Popov](https://github.com/CurtizJ)). @@ -94,20 +94,8 @@ * Fix `ALTER MODIFY COLUMN` of columns, which participates in TTL expressions. [#25554](https://github.com/ClickHouse/ClickHouse/pull/25554) ([Anton Popov](https://github.com/CurtizJ)). * Fix assertion in `PREWHERE` with non-UInt8 type, close [#19589](https://github.com/ClickHouse/ClickHouse/issues/19589). [#25484](https://github.com/ClickHouse/ClickHouse/pull/25484) ([Vladimir C](https://github.com/vdimir)). * Fix some fuzzed msan crash. Fixes [#22517](https://github.com/ClickHouse/ClickHouse/issues/22517). [#26428](https://github.com/ClickHouse/ClickHouse/pull/26428) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix empty history file conversion. [#26589](https://github.com/ClickHouse/ClickHouse/pull/26589) ([Azat Khuzhin](https://github.com/azat)). * Update `chown` cmd check in `clickhouse-server` docker entrypoint. It fixes error 'cluster pod restart failed (or timeout)' on kubernetes. [#26545](https://github.com/ClickHouse/ClickHouse/pull/26545) ([Ky Li](https://github.com/Kylinrix)). -#### Build/Testing/Packaging Improvements - -* Disabling TestFlows LDAP module due to test fails. [#26065](https://github.com/ClickHouse/ClickHouse/pull/26065) ([vzakaznikov](https://github.com/vzakaznikov)). -* Enabling all TestFlows modules and fixing some tests. [#26011](https://github.com/ClickHouse/ClickHouse/pull/26011) ([vzakaznikov](https://github.com/vzakaznikov)). -* Add new tests for checking access rights for columns used in filters (`WHERE` / `PREWHERE` / row policy) of the `SELECT` statement after changes in [#24405](https://github.com/ClickHouse/ClickHouse/pull/24405). [#25619](https://github.com/ClickHouse/ClickHouse/pull/25619) ([Vitaly Baranov](https://github.com/vitlibar)). - -#### Other - -* Add `clickhouse-keeper-converter` tool which allows converting zookeeper logs and snapshots into `clickhouse-keeper` snapshot format. [#25428](https://github.com/ClickHouse/ClickHouse/pull/25428) ([alesapin](https://github.com/alesapin)). - - ### ClickHouse release v21.7, 2021-07-09 @@ -1294,13 +1282,6 @@ * PODArray: Avoid call to memcpy with (nullptr, 0) arguments (Fix UBSan report). This fixes [#18525](https://github.com/ClickHouse/ClickHouse/issues/18525). [#18526](https://github.com/ClickHouse/ClickHouse/pull/18526) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Minor improvement for path concatenation of zookeeper paths inside DDLWorker. [#17767](https://github.com/ClickHouse/ClickHouse/pull/17767) ([Bharat Nallan](https://github.com/bharatnc)). * Allow to reload symbols from debug file. This PR also fixes a build-id issue. [#17637](https://github.com/ClickHouse/ClickHouse/pull/17637) ([Amos Bird](https://github.com/amosbird)). -* TestFlows: fixes to LDAP tests that fail due to slow test execution. [#18790](https://github.com/ClickHouse/ClickHouse/pull/18790) ([vzakaznikov](https://github.com/vzakaznikov)). -* TestFlows: Merging requirements for AES encryption functions. Updating aes_encryption tests to use new requirements. Updating TestFlows version to 1.6.72. [#18221](https://github.com/ClickHouse/ClickHouse/pull/18221) ([vzakaznikov](https://github.com/vzakaznikov)). -* TestFlows: Updating TestFlows version to the latest 1.6.72. Re-generating requirements.py. [#18208](https://github.com/ClickHouse/ClickHouse/pull/18208) ([vzakaznikov](https://github.com/vzakaznikov)). -* TestFlows: Updating TestFlows README.md to include "How To Debug Why Test Failed" section. [#17808](https://github.com/ClickHouse/ClickHouse/pull/17808) ([vzakaznikov](https://github.com/vzakaznikov)). -* TestFlows: tests for RBAC [ACCESS MANAGEMENT](https://clickhouse.tech/docs/en/sql-reference/statements/grant/#grant-access-management) privileges. [#17804](https://github.com/ClickHouse/ClickHouse/pull/17804) ([MyroTk](https://github.com/MyroTk)). -* TestFlows: RBAC tests for SHOW, TRUNCATE, KILL, and OPTIMIZE. - Updates to old tests. - Resolved comments from #https://github.com/ClickHouse/ClickHouse/pull/16977. [#17657](https://github.com/ClickHouse/ClickHouse/pull/17657) ([MyroTk](https://github.com/MyroTk)). -* TestFlows: Added RBAC tests for `ATTACH`, `CREATE`, `DROP`, and `DETACH`. [#16977](https://github.com/ClickHouse/ClickHouse/pull/16977) ([MyroTk](https://github.com/MyroTk)). ## [Changelog for 2020](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/whats-new/changelog/2020.md) diff --git a/CMakeLists.txt b/CMakeLists.txt index 1727caea766..35c22526816 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -395,9 +395,10 @@ endif () # Turns on all external libs like s3, kafka, ODBC, ... option(ENABLE_LIBRARIES "Enable all external libraries by default" ON) -# We recommend avoiding this mode for production builds because we can't guarantee all needed libraries exist in your -# system. +# We recommend avoiding this mode for production builds because we can't guarantee +# all needed libraries exist in your system. # This mode exists for enthusiastic developers who are searching for trouble. +# The whole idea of using unknown version of libraries from the OS distribution is deeply flawed. # Useful for maintainers of OS packages. option (UNBUNDLED "Use system libraries instead of ones in contrib/" OFF) diff --git a/base/glibc-compatibility/CMakeLists.txt b/base/glibc-compatibility/CMakeLists.txt index 8cba91de33f..4fc2a002cd8 100644 --- a/base/glibc-compatibility/CMakeLists.txt +++ b/base/glibc-compatibility/CMakeLists.txt @@ -9,10 +9,6 @@ if (GLIBC_COMPATIBILITY) check_include_file("sys/random.h" HAVE_SYS_RANDOM_H) - if(COMPILER_CLANG) - set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-builtin-requires-header") - endif() - add_headers_and_sources(glibc_compatibility .) add_headers_and_sources(glibc_compatibility musl) if (ARCH_AARCH64) @@ -35,11 +31,9 @@ if (GLIBC_COMPATIBILITY) add_library(glibc-compatibility STATIC ${glibc_compatibility_sources}) - if (COMPILER_CLANG) - target_compile_options(glibc-compatibility PRIVATE -Wno-unused-command-line-argument) - elseif (COMPILER_GCC) - target_compile_options(glibc-compatibility PRIVATE -Wno-unused-but-set-variable) - endif () + target_no_warning(glibc-compatibility unused-command-line-argument) + target_no_warning(glibc-compatibility unused-but-set-variable) + target_no_warning(glibc-compatibility builtin-requires-header) target_include_directories(glibc-compatibility PRIVATE libcxxabi ${musl_arch_include_dir}) diff --git a/base/mysqlxx/Pool.cpp b/base/mysqlxx/Pool.cpp index 386b4544b78..2f47aa67356 100644 --- a/base/mysqlxx/Pool.cpp +++ b/base/mysqlxx/Pool.cpp @@ -296,7 +296,7 @@ void Pool::initialize() Pool::Connection * Pool::allocConnection(bool dont_throw_if_failed_first_time) { - std::unique_ptr conn_ptr{new Connection}; + std::unique_ptr conn_ptr = std::make_unique(); try { diff --git a/cmake/add_warning.cmake b/cmake/add_warning.cmake index 3a776c98ab6..bc9642c9cc6 100644 --- a/cmake/add_warning.cmake +++ b/cmake/add_warning.cmake @@ -27,3 +27,22 @@ endmacro () macro (no_warning flag) add_warning(no-${flag}) endmacro () + + +# The same but only for specified target. +macro (target_add_warning target flag) + string (REPLACE "-" "_" underscored_flag ${flag}) + string (REPLACE "+" "x" underscored_flag ${underscored_flag}) + + check_cxx_compiler_flag("-W${flag}" SUPPORTS_CXXFLAG_${underscored_flag}) + + if (SUPPORTS_CXXFLAG_${underscored_flag}) + target_compile_options (${target} PRIVATE "-W${flag}") + else () + message (WARNING "Flag -W${flag} is unsupported") + endif () +endmacro () + +macro (target_no_warning target flag) + target_add_warning(${target} no-${flag}) +endmacro () diff --git a/cmake/autogenerated_versions.txt b/cmake/autogenerated_versions.txt index 18072566d04..2435335f669 100644 --- a/cmake/autogenerated_versions.txt +++ b/cmake/autogenerated_versions.txt @@ -6,7 +6,7 @@ SET(VERSION_REVISION 54454) SET(VERSION_MAJOR 21) SET(VERSION_MINOR 9) SET(VERSION_PATCH 1) -SET(VERSION_GITHASH f48c5af90c2ad51955d1ee3b6b05d006b03e4238) -SET(VERSION_DESCRIBE v21.9.1.1-prestable) -SET(VERSION_STRING 21.9.1.1) +SET(VERSION_GITHASH f063e44131a048ba2d9af8075f03700fd5ec3e69) +SET(VERSION_DESCRIBE v21.9.1.7770-prestable) +SET(VERSION_STRING 21.9.1.7770) # end of autochange diff --git a/contrib/arrow-cmake/CMakeLists.txt b/contrib/arrow-cmake/CMakeLists.txt index 2c72055a3e7..427379dc9b2 100644 --- a/contrib/arrow-cmake/CMakeLists.txt +++ b/contrib/arrow-cmake/CMakeLists.txt @@ -119,12 +119,9 @@ set(ORC_SRCS "${ORC_SOURCE_SRC_DIR}/ColumnWriter.cc" "${ORC_SOURCE_SRC_DIR}/Common.cc" "${ORC_SOURCE_SRC_DIR}/Compression.cc" - "${ORC_SOURCE_SRC_DIR}/Exceptions.cc" "${ORC_SOURCE_SRC_DIR}/Int128.cc" "${ORC_SOURCE_SRC_DIR}/LzoDecompressor.cc" "${ORC_SOURCE_SRC_DIR}/MemoryPool.cc" - "${ORC_SOURCE_SRC_DIR}/OrcFile.cc" - "${ORC_SOURCE_SRC_DIR}/Reader.cc" "${ORC_SOURCE_SRC_DIR}/RLE.cc" "${ORC_SOURCE_SRC_DIR}/RLEv1.cc" "${ORC_SOURCE_SRC_DIR}/RLEv2.cc" diff --git a/contrib/croaring-cmake/CMakeLists.txt b/contrib/croaring-cmake/CMakeLists.txt index 84cdccedbd3..5223027843d 100644 --- a/contrib/croaring-cmake/CMakeLists.txt +++ b/contrib/croaring-cmake/CMakeLists.txt @@ -27,8 +27,10 @@ target_include_directories(roaring SYSTEM BEFORE PUBLIC "${LIBRARY_DIR}/cpp") # We redirect malloc/free family of functions to different functions that will track memory in ClickHouse. # Also note that we exploit implicit function declarations. +# Also it is disabled on Mac OS because it fails). -target_compile_definitions(roaring PRIVATE +if (NOT OS_DARWIN) + target_compile_definitions(roaring PRIVATE -Dmalloc=clickhouse_malloc -Dcalloc=clickhouse_calloc -Drealloc=clickhouse_realloc @@ -36,4 +38,5 @@ target_compile_definitions(roaring PRIVATE -Dfree=clickhouse_free -Dposix_memalign=clickhouse_posix_memalign) -target_link_libraries(roaring PUBLIC clickhouse_common_io) + target_link_libraries(roaring PUBLIC clickhouse_common_io) +endif () diff --git a/contrib/libmetrohash/CMakeLists.txt b/contrib/libmetrohash/CMakeLists.txt index 9304cb3644c..4ec5a58717d 100644 --- a/contrib/libmetrohash/CMakeLists.txt +++ b/contrib/libmetrohash/CMakeLists.txt @@ -2,9 +2,5 @@ set (SRCS src/metrohash64.cpp src/metrohash128.cpp ) -if (HAVE_SSE42) # Not used. Pretty easy to port. - list (APPEND SRCS src/metrohash128crc.cpp) -endif () - add_library(metrohash ${SRCS}) target_include_directories(metrohash PUBLIC src) diff --git a/docker/packager/packager b/docker/packager/packager index c05c85d3e28..95b7fcd8568 100755 --- a/docker/packager/packager +++ b/docker/packager/packager @@ -151,8 +151,14 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ cmake_flags.append('-DENABLE_TESTS=1') cmake_flags.append('-DUSE_GTEST=1') + # "Unbundled" build is not suitable for any production usage. + # But it is occasionally used by some developers. + # The whole idea of using unknown version of libraries from the OS distribution is deeply flawed. + # We wish these developers good luck. if unbundled: - cmake_flags.append('-DUNBUNDLED=1 -DUSE_INTERNAL_RDKAFKA_LIBRARY=1 -DENABLE_ARROW=0 -DENABLE_AVRO=0 -DENABLE_ORC=0 -DENABLE_PARQUET=0') + # We also disable all CPU features except basic x86_64. + # It is only slightly related to "unbundled" build, but it is a good place to test if code compiles without these instruction sets. + cmake_flags.append('-DUNBUNDLED=1 -DUSE_INTERNAL_RDKAFKA_LIBRARY=1 -DENABLE_ARROW=0 -DENABLE_AVRO=0 -DENABLE_ORC=0 -DENABLE_PARQUET=0 -DENABLE_SSSE3=0 -DENABLE_SSE41=0 -DENABLE_SSE42=0 -DENABLE_PCLMULQDQ=0 -DENABLE_POPCNT=0 -DENABLE_AVX=0 -DENABLE_AVX2=0') if split_binary: cmake_flags.append('-DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1') diff --git a/docker/test/fuzzer/run-fuzzer.sh b/docker/test/fuzzer/run-fuzzer.sh index 3b074ae46cd..44183a50ae5 100755 --- a/docker/test/fuzzer/run-fuzzer.sh +++ b/docker/test/fuzzer/run-fuzzer.sh @@ -226,7 +226,7 @@ continue task_exit_code=$fuzzer_exit_code echo "failure" > status.txt { grep --text -o "Found error:.*" fuzzer.log \ - || grep --text -o "Exception.*" fuzzer.log \ + || grep --text -ao "Exception:.*" fuzzer.log \ || echo "Fuzzer failed ($fuzzer_exit_code). See the logs." ; } \ | tail -1 > description.txt fi diff --git a/docker/test/stateless/process_functional_tests_result.py b/docker/test/stateless/process_functional_tests_result.py index e60424ad4d1..a42b0e68d88 100755 --- a/docker/test/stateless/process_functional_tests_result.py +++ b/docker/test/stateless/process_functional_tests_result.py @@ -105,6 +105,10 @@ def process_result(result_path): description += ", skipped: {}".format(skipped) if unknown != 0: description += ", unknown: {}".format(unknown) + + # Temporary green for tests with DatabaseReplicated: + if 1 == int(os.environ.get('USE_DATABASE_REPLICATED', 0)): + state = "success" else: state = "failure" description = "Output log doesn't exist" diff --git a/docs/en/engines/database-engines/materialized-mysql.md b/docs/en/engines/database-engines/materialized-mysql.md index ca550776d53..d329dff32c5 100644 --- a/docs/en/engines/database-engines/materialized-mysql.md +++ b/docs/en/engines/database-engines/materialized-mysql.md @@ -1,6 +1,6 @@ --- toc_priority: 29 -toc_title: MaterializedMySQL +toc_title: "[experimental] MaterializedMySQL" --- # [experimental] MaterializedMySQL {#materialized-mysql} @@ -27,28 +27,33 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo - `password` — User password. **Engine Settings** -- `max_rows_in_buffer` — Max rows that data is allowed to cache in memory(for single table and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `65505`. -- `max_bytes_in_buffer` — Max bytes that data is allowed to cache in memory(for single table and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `1048576`. -- `max_rows_in_buffers` — Max rows that data is allowed to cache in memory(for database and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `65505`. -- `max_bytes_in_buffers` — Max bytes that data is allowed to cache in memory(for database and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `1048576`. -- `max_flush_data_time` — Max milliseconds that data is allowed to cache in memory(for database and the cache data unable to query). when this time is exceeded, the data will be materialized. Default: `1000`. -- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disable retry. Default: `1000`. -- `allows_query_when_mysql_lost` — Allow query materialized table when mysql is lost. Default: `0` (`false`). -``` + +- `max_rows_in_buffer` — Maximum number of rows that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`. +- `max_bytes_in_buffer` — Maximum number of bytes that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`. +- `max_rows_in_buffers` — Maximum number of rows that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`. +- `max_bytes_in_buffers` — Maximum number of bytes that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`. +- `max_flush_data_time` — Maximum number of milliseconds that data is allowed to cache in memory (for database and the cache data unable to query). When this time is exceeded, the data will be materialized. Default: `1000`. +- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disables retry. Default: `1000`. +- `allows_query_when_mysql_lost` — Allows to query a materialized table when MySQL is lost. Default: `0` (`false`). + +```sql CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***') SETTINGS allows_query_when_mysql_lost=true, max_wait_time_when_mysql_unavailable=10000; ``` -**Settings on MySQL-server side** +**Settings on MySQL-server Side** -For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-side configuration settings that should be set: +For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-side configuration settings that must be set: - `default_authentication_plugin = mysql_native_password` since `MaterializedMySQL` can only authorize with this method. -- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication. Pay attention that while turning this mode `On` you should also specify `enforce_gtid_consistency = on`. +- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication. -## Virtual columns {#virtual-columns} +!!! attention "Attention" + While turning on `gtid_mode` you should also specify `enforce_gtid_consistency = on`. + +## Virtual Columns {#virtual-columns} When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns. @@ -78,13 +83,13 @@ When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree]( | BLOB | [String](../../sql-reference/data-types/string.md) | | BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) | -Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication. - [Nullable](../../sql-reference/data-types/nullable.md) is supported. +Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication. + ## Specifics and Recommendations {#specifics-and-recommendations} -### Compatibility restrictions +### Compatibility Restrictions {#compatibility-restrictions} Apart of the data types limitations there are few restrictions comparing to `MySQL` databases, that should be resolved before replication will be possible: diff --git a/docs/en/engines/table-engines/mergetree-family/mergetree.md b/docs/en/engines/table-engines/mergetree-family/mergetree.md index 0c900454cd0..1b1313e625c 100644 --- a/docs/en/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md @@ -79,7 +79,7 @@ For a description of parameters, see the [CREATE query description](../../../sql - `SAMPLE BY` — An expression for sampling. Optional. - If a sampling expression is used, the primary key must contain it. The result of sampling expression must be unsigned integer. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`. + If a sampling expression is used, the primary key must contain it. The result of a sampling expression must be an unsigned integer. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`. - `TTL` — A list of rules specifying storage duration of rows and defining logic of automatic parts movement [between disks and volumes](#table_engine-mergetree-multiple-volumes). Optional. diff --git a/docs/en/sql-reference/functions/array-functions.md b/docs/en/sql-reference/functions/array-functions.md index 422bbe4b4ea..e7918c018db 100644 --- a/docs/en/sql-reference/functions/array-functions.md +++ b/docs/en/sql-reference/functions/array-functions.md @@ -7,19 +7,89 @@ toc_title: Arrays ## empty {#function-empty} -Returns 1 for an empty array, or 0 for a non-empty array. -The result type is UInt8. -The function also works for strings. +Checks whether the input array is empty. -Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT empty(arr) FROM table` transforms to `SELECT arr.size0 = 0 FROM TABLE`. +**Syntax** + +``` sql +empty([x]) +``` + +An array is considered empty if it does not contain any elements. + +!!! note "Note" + Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT empty(arr) FROM TABLE;` transforms to `SELECT arr.size0 = 0 FROM TABLE;`. + +The function also works for [strings](string-functions.md#empty) or [UUID](uuid-functions.md#empty). + +**Arguments** + +- `[x]` — Input array. [Array](../data-types/array.md). + +**Returned value** + +- Returns `1` for an empty array or `0` for a non-empty array. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT empty([]); +``` + +Result: + +```text +┌─empty(array())─┐ +│ 1 │ +└────────────────┘ +``` ## notEmpty {#function-notempty} -Returns 0 for an empty array, or 1 for a non-empty array. -The result type is UInt8. -The function also works for strings. +Checks whether the input array is non-empty. -Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT notEmpty(arr) FROM table` transforms to `SELECT arr.size0 != 0 FROM TABLE`. +**Syntax** + +``` sql +notEmpty([x]) +``` + +An array is considered non-empty if it contains at least one element. + +!!! note "Note" + Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT notEmpty(arr) FROM table` transforms to `SELECT arr.size0 != 0 FROM TABLE`. + +The function also works for [strings](string-functions.md#notempty) or [UUID](uuid-functions.md#notempty). + +**Arguments** + +- `[x]` — Input array. [Array](../data-types/array.md). + +**Returned value** + +- Returns `1` for a non-empty array or `0` for an empty array. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT notEmpty([1,2]); +``` + +Result: + +```text +┌─notEmpty([1, 2])─┐ +│ 1 │ +└──────────────────┘ +``` ## length {#array_functions-length} diff --git a/docs/en/sql-reference/functions/string-functions.md b/docs/en/sql-reference/functions/string-functions.md index 77bb14c4f5f..c7c84c5aca6 100644 --- a/docs/en/sql-reference/functions/string-functions.md +++ b/docs/en/sql-reference/functions/string-functions.md @@ -10,17 +10,83 @@ toc_title: Strings ## empty {#empty} -Returns 1 for an empty string or 0 for a non-empty string. -The result type is UInt8. +Checks whether the input string is empty. + +**Syntax** + +``` sql +empty(x) +``` + A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte. -The function also works for arrays or UUID. -UUID is empty if it is all zeros (nil UUID). + +The function also works for [arrays](array-functions.md#function-empty) or [UUID](uuid-functions.md#empty). + +**Arguments** + +- `x` — Input value. [String](../data-types/string.md). + +**Returned value** + +- Returns `1` for an empty string or `0` for a non-empty string. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT empty(''); +``` + +Result: + +```text +┌─empty('')─┐ +│ 1 │ +└───────────┘ +``` ## notEmpty {#notempty} -Returns 0 for an empty string or 1 for a non-empty string. -The result type is UInt8. -The function also works for arrays or UUID. +Checks whether the input string is non-empty. + +**Syntax** + +``` sql +notEmpty(x) +``` + +A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte. + +The function also works for [arrays](array-functions.md#function-notempty) or [UUID](uuid-functions.md#notempty). + +**Arguments** + +- `x` — Input value. [String](../data-types/string.md). + +**Returned value** + +- Returns `1` for a non-empty string or `0` for an empty string string. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT notEmpty('text'); +``` + +Result: + +```text +┌─notEmpty('text')─┐ +│ 1 │ +└──────────────────┘ +``` ## length {#length} diff --git a/docs/en/sql-reference/functions/uuid-functions.md b/docs/en/sql-reference/functions/uuid-functions.md index e7e55c699cd..e5ab45bda40 100644 --- a/docs/en/sql-reference/functions/uuid-functions.md +++ b/docs/en/sql-reference/functions/uuid-functions.md @@ -9,7 +9,7 @@ The functions for working with UUID are listed below. ## generateUUIDv4 {#uuid-function-generate} -Generates the [UUID](../../sql-reference/data-types/uuid.md) of [version 4](https://tools.ietf.org/html/rfc4122#section-4.4). +Generates the [UUID](../data-types/uuid.md) of [version 4](https://tools.ietf.org/html/rfc4122#section-4.4). ``` sql generateUUIDv4() @@ -37,6 +37,90 @@ SELECT * FROM t_uuid └──────────────────────────────────────┘ ``` +## empty {#empty} + +Checks whether the input UUID is empty. + +**Syntax** + +```sql +empty(UUID) +``` + +The UUID is considered empty if it contains all zeros (zero UUID). + +The function also works for [arrays](array-functions.md#function-empty) or [strings](string-functions.md#empty). + +**Arguments** + +- `x` — Input UUID. [UUID](../data-types/uuid.md). + +**Returned value** + +- Returns `1` for an empty UUID or `0` for a non-empty UUID. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +To generate the UUID value, ClickHouse provides the [generateUUIDv4](#uuid-function-generate) function. + +Query: + +```sql +SELECT empty(generateUUIDv4()); +``` + +Result: + +```text +┌─empty(generateUUIDv4())─┐ +│ 0 │ +└─────────────────────────┘ +``` + +## notEmpty {#notempty} + +Checks whether the input UUID is non-empty. + +**Syntax** + +```sql +notEmpty(UUID) +``` + +The UUID is considered empty if it contains all zeros (zero UUID). + +The function also works for [arrays](array-functions.md#function-notempty) or [strings](string-functions.md#notempty). + +**Arguments** + +- `x` — Input UUID. [UUID](../data-types/uuid.md). + +**Returned value** + +- Returns `1` for a non-empty UUID or `0` for an empty UUID. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +To generate the UUID value, ClickHouse provides the [generateUUIDv4](#uuid-function-generate) function. + +Query: + +```sql +SELECT notEmpty(generateUUIDv4()); +``` + +Result: + +```text +┌─notEmpty(generateUUIDv4())─┐ +│ 1 │ +└────────────────────────────┘ +``` + ## toUUID (x) {#touuid-x} Converts String type value to UUID type. diff --git a/docs/en/sql-reference/statements/select/distinct.md b/docs/en/sql-reference/statements/select/distinct.md index 87154cba05a..390afa46248 100644 --- a/docs/en/sql-reference/statements/select/distinct.md +++ b/docs/en/sql-reference/statements/select/distinct.md @@ -6,23 +6,55 @@ toc_title: DISTINCT If `SELECT DISTINCT` is specified, only unique rows will remain in a query result. Thus only a single row will remain out of all the sets of fully matching rows in the result. -## Null Processing {#null-processing} +You can specify the list of columns that must have unique values: `SELECT DISTINCT ON (column1, column2,...)`. If the columns are not specified, all of them are taken into consideration. -`DISTINCT` works with [NULL](../../../sql-reference/syntax.md#null-literal) as if `NULL` were a specific value, and `NULL==NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` occur only once. It differs from `NULL` processing in most other contexts. +Consider the table: -## Alternatives {#alternatives} +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 1 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` -It is possible to obtain the same result by applying [GROUP BY](../../../sql-reference/statements/select/group-by.md) across the same set of values as specified as `SELECT` clause, without using any aggregate functions. But there are few differences from `GROUP BY` approach: +Using `DISTINCT` without specifying columns: -- `DISTINCT` can be applied together with `GROUP BY`. -- When [ORDER BY](../../../sql-reference/statements/select/order-by.md) is omitted and [LIMIT](../../../sql-reference/statements/select/limit.md) is defined, the query stops running immediately after the required number of different rows has been read. -- Data blocks are output as they are processed, without waiting for the entire query to finish running. +```sql +SELECT DISTINCT * FROM t1; +``` -## Examples {#examples} +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 1 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` + +Using `DISTINCT` with specified columns: + +```sql +SELECT DISTINCT ON (a,b) * FROM t1; +``` + +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` + +## DISTINCT and ORDER BY {#distinct-orderby} ClickHouse supports using the `DISTINCT` and `ORDER BY` clauses for different columns in one query. The `DISTINCT` clause is executed before the `ORDER BY` clause. -Example table: +Consider the table: ``` text ┌─a─┬─b─┐ @@ -33,7 +65,11 @@ Example table: └───┴───┘ ``` -When selecting data with the `SELECT DISTINCT a FROM t1 ORDER BY b ASC` query, we get the following result: +Selecting data: + +```sql +SELECT DISTINCT a FROM t1 ORDER BY b ASC; +``` ``` text ┌─a─┐ @@ -42,8 +78,11 @@ When selecting data with the `SELECT DISTINCT a FROM t1 ORDER BY b ASC` query, w │ 3 │ └───┘ ``` +Selecting data with the different sorting direction: -If we change the sorting direction `SELECT DISTINCT a FROM t1 ORDER BY b DESC`, we get the following result: +```sql +SELECT DISTINCT a FROM t1 ORDER BY b DESC; +``` ``` text ┌─a─┐ @@ -56,3 +95,15 @@ If we change the sorting direction `SELECT DISTINCT a FROM t1 ORDER BY b DESC`, Row `2, 4` was cut before sorting. Take this implementation specificity into account when programming queries. + +## Null Processing {#null-processing} + +`DISTINCT` works with [NULL](../../../sql-reference/syntax.md#null-literal) as if `NULL` were a specific value, and `NULL==NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` occur only once. It differs from `NULL` processing in most other contexts. + +## Alternatives {#alternatives} + +It is possible to obtain the same result by applying [GROUP BY](../../../sql-reference/statements/select/group-by.md) across the same set of values as specified as `SELECT` clause, without using any aggregate functions. But there are few differences from `GROUP BY` approach: + +- `DISTINCT` can be applied together with `GROUP BY`. +- When [ORDER BY](../../../sql-reference/statements/select/order-by.md) is omitted and [LIMIT](../../../sql-reference/statements/select/limit.md) is defined, the query stops running immediately after the required number of different rows has been read. +- Data blocks are output as they are processed, without waiting for the entire query to finish running. diff --git a/docs/en/sql-reference/statements/select/index.md b/docs/en/sql-reference/statements/select/index.md index 04273ca1d4d..4e96bae8493 100644 --- a/docs/en/sql-reference/statements/select/index.md +++ b/docs/en/sql-reference/statements/select/index.md @@ -13,7 +13,7 @@ toc_title: Overview ``` sql [WITH expr_list|(subquery)] -SELECT [DISTINCT] expr_list +SELECT [DISTINCT [ON (column1, column2, ...)]] expr_list [FROM [db.]table | (subquery) | table_function] [FINAL] [SAMPLE sample_coeff] [ARRAY JOIN ...] @@ -36,6 +36,8 @@ All clauses are optional, except for the required list of expressions immediatel Specifics of each optional clause are covered in separate sections, which are listed in the same order as they are executed: - [WITH clause](../../../sql-reference/statements/select/with.md) +- [SELECT clause](#select-clause) +- [DISTINCT clause](../../../sql-reference/statements/select/distinct.md) - [FROM clause](../../../sql-reference/statements/select/from.md) - [SAMPLE clause](../../../sql-reference/statements/select/sample.md) - [JOIN clause](../../../sql-reference/statements/select/join.md) @@ -44,8 +46,6 @@ Specifics of each optional clause are covered in separate sections, which are li - [GROUP BY clause](../../../sql-reference/statements/select/group-by.md) - [LIMIT BY clause](../../../sql-reference/statements/select/limit-by.md) - [HAVING clause](../../../sql-reference/statements/select/having.md) -- [SELECT clause](#select-clause) -- [DISTINCT clause](../../../sql-reference/statements/select/distinct.md) - [LIMIT clause](../../../sql-reference/statements/select/limit.md) - [OFFSET clause](../../../sql-reference/statements/select/offset.md) - [UNION clause](../../../sql-reference/statements/select/union.md) diff --git a/docs/ru/engines/database-engines/materialized-mysql.md b/docs/ru/engines/database-engines/materialized-mysql.md index 0175e794cd5..1cd864c01e9 100644 --- a/docs/ru/engines/database-engines/materialized-mysql.md +++ b/docs/ru/engines/database-engines/materialized-mysql.md @@ -1,10 +1,12 @@ --- toc_priority: 29 -toc_title: MaterializedMySQL +toc_title: "[experimental] MaterializedMySQL" --- # [экспериментальный] MaterializedMySQL {#materialized-mysql} +**Это экспериментальный движок, который не следует использовать в продакшене.** + Создает базу данных ClickHouse со всеми таблицами, существующими в MySQL, и всеми данными в этих таблицах. Сервер ClickHouse работает как реплика MySQL. Он читает файл binlog и выполняет DDL and DML-запросы. @@ -23,6 +25,32 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo - `user` — пользователь MySQL. - `password` — пароль пользователя. +**Настройки движка** + +- `max_rows_in_buffer` — максимальное количество строк, содержимое которых может кешироваться в памяти (для одной таблицы и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `65 505`. +- `max_bytes_in_buffer` — максимальное количество байтов, которое разрешено кешировать в памяти (для одной таблицы и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `1 048 576`. +- `max_rows_in_buffers` — максимальное количество строк, содержимое которых может кешироваться в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `65 505`. +- `max_bytes_in_buffers` — максимальное количество байтов, которое разрешено кешировать данным в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `1 048 576`. +- `max_flush_data_time` — максимальное время в миллисекундах, в течение которого разрешено кешировать данные в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества указанного периода, данные будут материализованы. Значение по умолчанию: `1000`. +- `max_wait_time_when_mysql_unavailable` — интервал между повторными попытками, если MySQL недоступен. Указывается в миллисекундах. Отрицательное значение отключает повторные попытки. Значение по умолчанию: `1000`. +- `allows_query_when_mysql_lost` — признак, разрешен ли запрос к материализованной таблице при потере соединения с MySQL. Значение по умолчанию: `0` (`false`). + +```sql +CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***') + SETTINGS + allows_query_when_mysql_lost=true, + max_wait_time_when_mysql_unavailable=10000; +``` + +**Настройки на стороне MySQL-сервера** + +Для правильной работы `MaterializedMySQL` следует обязательно указать на сервере MySQL следующие параметры конфигурации: +- `default_authentication_plugin = mysql_native_password` — `MaterializedMySQL` может авторизоваться только с помощью этого метода. +- `gtid_mode = on` — ведение журнала на основе GTID является обязательным для обеспечения правильной репликации. + +!!! attention "Внимание" + При включении `gtid_mode` вы также должны указать `enforce_gtid_consistency = on`. + ## Виртуальные столбцы {#virtual-columns} При работе с движком баз данных `MaterializedMySQL` используются таблицы семейства [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) с виртуальными столбцами `_sign` и `_version`. @@ -51,13 +79,21 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo | STRING | [String](../../sql-reference/data-types/string.md) | | VARCHAR, VAR_STRING | [String](../../sql-reference/data-types/string.md) | | BLOB | [String](../../sql-reference/data-types/string.md) | - -Другие типы не поддерживаются. Если таблица MySQL содержит столбец другого типа, ClickHouse выдаст исключение "Неподдерживаемый тип данных" ("Unhandled data type") и остановит репликацию. +| BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) | Тип [Nullable](../../sql-reference/data-types/nullable.md) поддерживается. +Другие типы не поддерживаются. Если таблица MySQL содержит столбец другого типа, ClickHouse выдаст исключение "Неподдерживаемый тип данных" ("Unhandled data type") и остановит репликацию. + ## Особенности и рекомендации {#specifics-and-recommendations} +### Ограничения совместимости {#compatibility-restrictions} + +Кроме ограничений на типы данных, существует несколько ограничений по сравнению с базами данных MySQL, которые следует решить до того, как станет возможной репликация: + +- Каждая таблица в MySQL должна содержать `PRIMARY KEY`. +- Репликация для таблиц, содержащих строки со значениями полей `ENUM` вне диапазона значений (определяется размерностью `ENUM`), не будет работать. + ### DDL-запросы {#ddl-queries} DDL-запросы в MySQL конвертируются в соответствующие DDL-запросы в ClickHouse ([ALTER](../../sql-reference/statements/alter/index.md), [CREATE](../../sql-reference/statements/create/index.md), [DROP](../../sql-reference/statements/drop.md), [RENAME](../../sql-reference/statements/rename.md)). Если ClickHouse не может конвертировать какой-либо DDL-запрос, он его игнорирует. @@ -158,3 +194,4 @@ SELECT * FROM mysql.test; └───┴─────┴──────┘ ``` +[Оригинальная статья](https://clickhouse.tech/docs/ru/engines/database-engines/materialized-mysql/) diff --git a/docs/ru/engines/table-engines/mergetree-family/mergetree.md b/docs/ru/engines/table-engines/mergetree-family/mergetree.md index 4bced6254d1..61ed34b686c 100644 --- a/docs/ru/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/ru/engines/table-engines/mergetree-family/mergetree.md @@ -68,7 +68,7 @@ ORDER BY expr - `SAMPLE BY` — выражение для сэмплирования. Необязательный параметр. - Если используется выражение для сэмплирования, то первичный ключ должен содержать его. Пример: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`. + Если используется выражение для сэмплирования, то первичный ключ должен содержать его. Результат выражения для сэмплирования должен быть беззнаковым целым числом. Пример: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`. - `TTL` — список правил, определяющих длительности хранения строк, а также задающих правила перемещения частей на определённые тома или диски. Необязательный параметр. @@ -375,6 +375,24 @@ INDEX b (u64 * length(str), i32 + f64 * 100, date, str) TYPE set(100) GRANULARIT - `s != 1` - `NOT startsWith(s, 'test')` +### Проекции {#projections} +Проекции похожи на материализованные представления, но определяются на уровне партов. Это обеспечивает гарантии согласованности наряду с автоматическим использованием в запросах. + +#### Запрос {#projection-query} +Запрос проекции — это то, что определяет проекцию. Он имеет следующую грамматику: + +`SELECT [GROUP BY] [ORDER BY]` + +Он неявно выбирает данные из родительской таблицы. + +#### Хранение {#projection-storage} +Проекции хранятся в каталоге парта. Это похоже на хранение индексов, но используется подкаталог, в котором хранится анонимный парт таблицы MergeTree. Таблица создается запросом определения проекции. Если есть конструкция GROUP BY, то базовый механизм хранения становится AggregatedMergeTree, а все агрегатные функции преобразуются в AggregateFunction. Если есть конструкция ORDER BY, таблица MergeTree будет использовать его в качестве выражения первичного ключа. Во время процесса слияния парт проекции будет слит с помощью процедуры слияния ее хранилища. Контрольная сумма парта родительской таблицы будет включать парт проекции. Другие процедуры аналогичны индексам пропуска данных. + +#### Анализ запросов {#projection-query-analysis} +1. Проверить, можно ли использовать проекцию в данном запросе, то есть, что с ней выходит тот же результат, что и с запросом к базовой таблице. +2. Выбрать наиболее подходящее совпадение, содержащее наименьшее количество гранул для чтения. +3. План запроса, который использует проекции, будет отличаться от того, который использует исходные парты. При отсутствии проекции в некоторых партах можно расширить план, чтобы «проецировать» на лету. + ## Конкурентный доступ к данным {#concurrent-data-access} Для конкурентного доступа к таблице используется мультиверсионность. То есть, при одновременном чтении и обновлении таблицы, данные будут читаться из набора кусочков, актуального на момент запроса. Длинных блокировок нет. Вставки никак не мешают чтениям. diff --git a/docs/ru/sql-reference/aggregate-functions/parametric-functions.md b/docs/ru/sql-reference/aggregate-functions/parametric-functions.md index b1eefc3fc16..b3bb611e28c 100644 --- a/docs/ru/sql-reference/aggregate-functions/parametric-functions.md +++ b/docs/ru/sql-reference/aggregate-functions/parametric-functions.md @@ -172,7 +172,7 @@ SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2, number = 4) FROM ## sequenceCount(pattern)(time, cond1, cond2, …) {#function-sequencecount} -Вычисляет количество цепочек событий, соответствующих шаблону. Функция обнаруживает только непересекающиеся цепочки событий. Она начитает искать следующую цепочку только после того, как полностью совпала текущая цепочка событий. +Вычисляет количество цепочек событий, соответствующих шаблону. Функция обнаруживает только непересекающиеся цепочки событий. Она начинает искать следующую цепочку только после того, как полностью совпала текущая цепочка событий. !!! warning "Предупреждение" События, произошедшие в одну и ту же секунду, располагаются в последовательности в неопределенном порядке, что может повлиять на результат работы функции. diff --git a/docs/ru/sql-reference/functions/array-functions.md b/docs/ru/sql-reference/functions/array-functions.md index 52fd63864ce..b7a301d30a9 100644 --- a/docs/ru/sql-reference/functions/array-functions.md +++ b/docs/ru/sql-reference/functions/array-functions.md @@ -7,19 +7,89 @@ toc_title: "Массивы" ## empty {#function-empty} -Возвращает 1 для пустого массива, и 0 для непустого массива. -Тип результата - UInt8. -Функция также работает для строк. +Проверяет, является ли входной массив пустым. -Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT empty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 = 0 FROM TABLE`. +**Синтаксис** + +``` sql +empty([x]) +``` + +Массив считается пустым, если он не содержит ни одного элемента. + +!!! note "Примечание" + Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT empty(arr) FROM TABLE` преобразуется к запросу `SELECT arr.size0 = 0 FROM TABLE`. + +Функция также поддерживает работу с типами [String](string-functions.md#empty) и [UUID](uuid-functions.md#empty). + +**Параметры** + +- `[x]` — массив на входе функции. [Array](../data-types/array.md). + +**Возвращаемое значение** + +- Возвращает `1` для пустого массива или `0` — для непустого массива. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Запрос: + +```sql +SELECT empty([]); +``` + +Ответ: + +```text +┌─empty(array())─┐ +│ 1 │ +└────────────────┘ +``` ## notEmpty {#function-notempty} -Возвращает 0 для пустого массива, и 1 для непустого массива. -Тип результата - UInt8. -Функция также работает для строк. +Проверяет, является ли входной массив непустым. -Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT notEmpty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 != 0 FROM TABLE`. +**Синтаксис** + +``` sql +notEmpty([x]) +``` + +Массив считается непустым, если он содержит хотя бы один элемент. + +!!! note "Примечание" + Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT notEmpty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 != 0 FROM TABLE`. + +Функция также поддерживает работу с типами [String](string-functions.md#notempty) и [UUID](uuid-functions.md#notempty). + +**Параметры** + +- `[x]` — массив на входе функции. [Array](../data-types/array.md). + +**Возвращаемое значение** + +- Возвращает `1` для непустого массива или `0` — для пустого массива. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Запрос: + +```sql +SELECT notEmpty([1,2]); +``` + +Результат: + +```text +┌─notEmpty([1, 2])─┐ +│ 1 │ +└──────────────────┘ +``` ## length {#array_functions-length} diff --git a/docs/ru/sql-reference/functions/string-functions.md b/docs/ru/sql-reference/functions/string-functions.md index dfe57b7c870..cbda5188881 100644 --- a/docs/ru/sql-reference/functions/string-functions.md +++ b/docs/ru/sql-reference/functions/string-functions.md @@ -7,16 +7,83 @@ toc_title: "Функции для работы со строками" ## empty {#empty} -Возвращает 1 для пустой строки, и 0 для непустой строки. -Тип результата — UInt8. -Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт. -Функция также работает для массивов. +Проверяет, является ли входная строка пустой. + +**Синтаксис** + +``` sql +empty(x) +``` + +Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт. + +Функция также поддерживает работу с типами [Array](array-functions.md#function-empty) и [UUID](uuid-functions.md#empty). + +**Параметры** + +- `x` — Входная строка. [String](../data-types/string.md). + +**Возвращаемое значение** + +- Возвращает `1` для пустой строки и `0` — для непустой строки. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Запрос: + +```sql +SELECT notempty('text'); +``` + +Результат: + +```text +┌─empty('')─┐ +│ 1 │ +└───────────┘ +``` ## notEmpty {#notempty} -Возвращает 0 для пустой строки, и 1 для непустой строки. -Тип результата — UInt8. -Функция также работает для массивов. +Проверяет, является ли входная строка непустой. + +**Синтаксис** + +``` sql +notEmpty(x) +``` + +Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт. + +Функция также поддерживает работу с типами [Array](array-functions.md#function-notempty) и [UUID](uuid-functions.md#notempty). + +**Параметры** + +- `x` — Входная строка. [String](../data-types/string.md). + +**Возвращаемое значение** + +- Возвращает `1` для непустой строки и `0` — для пустой строки. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Запрос: + +```sql +SELECT notEmpty('text'); +``` + +Результат: + +```text +┌─notEmpty('text')─┐ +│ 1 │ +└──────────────────┘ +``` ## length {#length} diff --git a/docs/ru/sql-reference/functions/uuid-functions.md b/docs/ru/sql-reference/functions/uuid-functions.md index f0017adbc8b..0d534a2d7ce 100644 --- a/docs/ru/sql-reference/functions/uuid-functions.md +++ b/docs/ru/sql-reference/functions/uuid-functions.md @@ -35,6 +35,90 @@ SELECT * FROM t_uuid └──────────────────────────────────────┘ ``` +## empty {#empty} + +Проверяет, является ли входной UUID пустым. + +**Синтаксис** + +```sql +empty(UUID) +``` + +UUID считается пустым, если он содержит все нули (нулевой UUID). + +Функция также поддерживает работу с типами [Array](array-functions.md#function-empty) и [String](string-functions.md#empty). + +**Параметры** + +- `x` — UUID на входе функции. [UUID](../data-types/uuid.md). + +**Возвращаемое значение** + +- Возвращает `1` для пустого UUID или `0` — для непустого UUID. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Для генерации UUID-значений предназначена функция [generateUUIDv4](#uuid-function-generate). + +Запрос: + +```sql +SELECT empty(generateUUIDv4()); +``` + +Ответ: + +```text +┌─empty(generateUUIDv4())─┐ +│ 0 │ +└─────────────────────────┘ +``` + +## notEmpty {#notempty} + +Проверяет, является ли входной UUID непустым. + +**Синтаксис** + +```sql +notEmpty(UUID) +``` + +UUID считается пустым, если он содержит все нули (нулевой UUID). + +Функция также поддерживает работу с типами [Array](array-functions.md#function-notempty) и [String](string-functions.md#function-notempty). + +**Параметры** + +- `x` — UUID на входе функции. [UUID](../data-types/uuid.md). + +**Возвращаемое значение** + +- Возвращает `1` для непустого UUID или `0` — для пустого UUID. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Для генерации UUID-значений предназначена функция [generateUUIDv4](#uuid-function-generate). + +Запрос: + +```sql +SELECT notEmpty(generateUUIDv4()); +``` + +Результат: + +```text +┌─notEmpty(generateUUIDv4())─┐ +│ 1 │ +└────────────────────────────┘ +``` + ## toUUID (x) {#touuid-x} Преобразует значение типа String в тип UUID. diff --git a/docs/ru/sql-reference/statements/select/distinct.md b/docs/ru/sql-reference/statements/select/distinct.md index f57c2a42593..42c1df64540 100644 --- a/docs/ru/sql-reference/statements/select/distinct.md +++ b/docs/ru/sql-reference/statements/select/distinct.md @@ -6,19 +6,51 @@ toc_title: DISTINCT Если указан `SELECT DISTINCT`, то в результате запроса останутся только уникальные строки. Таким образом, из всех наборов полностью совпадающих строк в результате останется только одна строка. -## Обработка NULL {#null-processing} +Вы можете указать столбцы, по которым хотите отбирать уникальные значения: `SELECT DISTINCT ON (column1, column2,...)`. Если столбцы не указаны, то отбираются строки, в которых значения уникальны во всех столбцах. -`DISTINCT` работает с [NULL](../../syntax.md#null-literal) как-будто `NULL` — обычное значение и `NULL==NULL`. Другими словами, в результате `DISTINCT`, различные комбинации с `NULL` встретятся только один раз. Это отличается от обработки `NULL` в большинстве других контекстов. +Рассмотрим таблицу: -## Альтернативы {#alternatives} +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 1 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` -Такой же результат можно получить, применив секцию [GROUP BY](group-by.md) для того же набора значений, которые указан в секции `SELECT`, без использования каких-либо агрегатных функций. Но есть от `GROUP BY` несколько отличий: +Использование `DISTINCT` без указания столбцов: -- `DISTINCT` может применяться вместе с `GROUP BY`. -- Когда секция [ORDER BY](order-by.md) опущена, а секция [LIMIT](limit.md) присутствует, запрос прекращает выполнение сразу после считывания необходимого количества различных строк. -- Блоки данных выводятся по мере их обработки, не дожидаясь завершения выполнения всего запроса. +```sql +SELECT DISTINCT * FROM t1; +``` -## Примеры {#examples} +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 1 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` + +Использование `DISTINCT` с указанием столбцов: + +```sql +SELECT DISTINCT ON (a,b) * FROM t1; +``` + +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` + +## DISTINCT и ORDER BY {#distinct-orderby} ClickHouse поддерживает использование секций `DISTINCT` и `ORDER BY` для разных столбцов в одном запросе. Секция `DISTINCT` выполняется до секции `ORDER BY`. @@ -56,3 +88,16 @@ ClickHouse поддерживает использование секций `DIS Ряд `2, 4` был разрезан перед сортировкой. Учитывайте эту специфику при разработке запросов. + +## Обработка NULL {#null-processing} + +`DISTINCT` работает с [NULL](../../syntax.md#null-literal) как-будто `NULL` — обычное значение и `NULL==NULL`. Другими словами, в результате `DISTINCT`, различные комбинации с `NULL` встретятся только один раз. Это отличается от обработки `NULL` в большинстве других контекстов. + +## Альтернативы {#alternatives} + +Можно получить такой же результат, применив [GROUP BY](group-by.md) для того же набора значений, которые указан в секции `SELECT`, без использования каких-либо агрегатных функций. Но есть несколько отличий от `GROUP BY`: + +- `DISTINCT` может применяться вместе с `GROUP BY`. +- Когда секция [ORDER BY](order-by.md) опущена, а секция [LIMIT](limit.md) присутствует, запрос прекращает выполнение сразу после считывания необходимого количества различных строк. +- Блоки данных выводятся по мере их обработки, не дожидаясь завершения выполнения всего запроса. + diff --git a/docs/ru/sql-reference/statements/select/index.md b/docs/ru/sql-reference/statements/select/index.md index a0a862cbf55..c2820bc7be4 100644 --- a/docs/ru/sql-reference/statements/select/index.md +++ b/docs/ru/sql-reference/statements/select/index.md @@ -11,7 +11,7 @@ toc_title: "Обзор" ``` sql [WITH expr_list|(subquery)] -SELECT [DISTINCT] expr_list +SELECT [DISTINCT [ON (column1, column2, ...)]] expr_list [FROM [db.]table | (subquery) | table_function] [FINAL] [SAMPLE sample_coeff] [ARRAY JOIN ...] @@ -34,6 +34,8 @@ SELECT [DISTINCT] expr_list Особенности каждой необязательной секции рассматриваются в отдельных разделах, которые перечислены в том же порядке, в каком они выполняются: - [Секция WITH](with.md) +- [Секция SELECT](#select-clause) +- [Секция DISTINCT](distinct.md) - [Секция FROM](from.md) - [Секция SAMPLE](sample.md) - [Секция JOIN](join.md) @@ -42,8 +44,6 @@ SELECT [DISTINCT] expr_list - [Секция GROUP BY](group-by.md) - [Секция LIMIT BY](limit-by.md) - [Секция HAVING](having.md) -- [Секция SELECT](#select-clause) -- [Секция DISTINCT](distinct.md) - [Секция LIMIT](limit.md) [Секция OFFSET](offset.md) - [Секция UNION ALL](union.md) diff --git a/programs/client/CMakeLists.txt b/programs/client/CMakeLists.txt index 084e1b45911..1de5ea88aee 100644 --- a/programs/client/CMakeLists.txt +++ b/programs/client/CMakeLists.txt @@ -3,6 +3,7 @@ set (CLICKHOUSE_CLIENT_SOURCES ConnectionParameters.cpp QueryFuzzer.cpp Suggest.cpp + TestHint.cpp ) set (CLICKHOUSE_CLIENT_LINK diff --git a/programs/client/TestHint.cpp b/programs/client/TestHint.cpp new file mode 100644 index 00000000000..2f3be2a5350 --- /dev/null +++ b/programs/client/TestHint.cpp @@ -0,0 +1,105 @@ +#include "TestHint.h" + +#include +#include +#include +#include +#include + +namespace +{ + +/// Parse error as number or as a string (name of the error code const) +int parseErrorCode(DB::ReadBufferFromString & in) +{ + int code = -1; + String code_name; + + auto * pos = in.position(); + tryReadText(code, in); + if (pos != in.position()) + { + return code; + } + + /// Try parse as string + readStringUntilWhitespace(code_name, in); + return DB::ErrorCodes::getErrorCodeByName(code_name); +} + +} + +namespace DB +{ + +TestHint::TestHint(bool enabled_, const String & query_) + : query(query_) +{ + if (!enabled_) + return; + + // Don't parse error hints in leading comments, because it feels weird. + // Leading 'echo' hint is OK. + bool is_leading_hint = true; + + Lexer lexer(query.data(), query.data() + query.size()); + + for (Token token = lexer.nextToken(); !token.isEnd(); token = lexer.nextToken()) + { + if (token.type != TokenType::Comment + && token.type != TokenType::Whitespace) + { + is_leading_hint = false; + } + else if (token.type == TokenType::Comment) + { + String comment(token.begin, token.begin + token.size()); + + if (!comment.empty()) + { + size_t pos_start = comment.find('{', 0); + if (pos_start != String::npos) + { + size_t pos_end = comment.find('}', pos_start); + if (pos_end != String::npos) + { + String hint(comment.begin() + pos_start + 1, comment.begin() + pos_end); + parse(hint, is_leading_hint); + } + } + } + } + } +} + +void TestHint::parse(const String & hint, bool is_leading_hint) +{ + ReadBufferFromString in(hint); + String item; + + while (!in.eof()) + { + readStringUntilWhitespace(item, in); + if (in.eof()) + break; + + skipWhitespaceIfAny(in); + + if (!is_leading_hint) + { + if (item == "serverError") + server_error = parseErrorCode(in); + else if (item == "clientError") + client_error = parseErrorCode(in); + } + + if (item == "echo") + echo.emplace(true); + if (item == "echoOn") + echo.emplace(true); + if (item == "echoOff") + echo.emplace(false); + } +} + +} diff --git a/programs/client/TestHint.h b/programs/client/TestHint.h index 100d47f4dd2..377637d0db8 100644 --- a/programs/client/TestHint.h +++ b/programs/client/TestHint.h @@ -1,11 +1,7 @@ #pragma once -#include -#include -#include +#include #include -#include -#include namespace DB @@ -19,6 +15,10 @@ namespace DB /// /// - "-- { clientError 20 }" -- in case of you are expecting client error. /// +/// - "-- { serverError FUNCTION_THROW_IF_VALUE_IS_NON_ZERO }" -- by error name. +/// +/// - "-- { clientError FUNCTION_THROW_IF_VALUE_IS_NON_ZERO }" -- by error name. +/// /// Remember that the client parse the query first (not the server), so for /// example if you are expecting syntax error, then you should use /// clientError not serverError. @@ -43,45 +43,7 @@ namespace DB class TestHint { public: - TestHint(bool enabled_, const String & query_) : - query(query_) - { - if (!enabled_) - return; - - // Don't parse error hints in leading comments, because it feels weird. - // Leading 'echo' hint is OK. - bool is_leading_hint = true; - - Lexer lexer(query.data(), query.data() + query.size()); - - for (Token token = lexer.nextToken(); !token.isEnd(); token = lexer.nextToken()) - { - if (token.type != TokenType::Comment - && token.type != TokenType::Whitespace) - { - is_leading_hint = false; - } - else if (token.type == TokenType::Comment) - { - String comment(token.begin, token.begin + token.size()); - - if (!comment.empty()) - { - size_t pos_start = comment.find('{', 0); - if (pos_start != String::npos) - { - size_t pos_end = comment.find('}', pos_start); - if (pos_end != String::npos) - { - String hint(comment.begin() + pos_start + 1, comment.begin() + pos_end); - parse(hint, is_leading_hint); - } - } - } - } - } - } + TestHint(bool enabled_, const String & query_); int serverError() const { return server_error; } int clientError() const { return client_error; } @@ -93,34 +55,7 @@ private: int client_error = 0; std::optional echo; - void parse(const String & hint, bool is_leading_hint) - { - std::stringstream ss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM - ss << hint; - String item; - - while (!ss.eof()) - { - ss >> item; - if (ss.eof()) - break; - - if (!is_leading_hint) - { - if (item == "serverError") - ss >> server_error; - else if (item == "clientError") - ss >> client_error; - } - - if (item == "echo") - echo.emplace(true); - if (item == "echoOn") - echo.emplace(true); - if (item == "echoOff") - echo.emplace(false); - } - } + void parse(const String & hint, bool is_leading_hint); bool allErrorsExpected(int actual_server_error, int actual_client_error) const { diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp index c69c48bb23d..5520f920823 100644 --- a/programs/server/Server.cpp +++ b/programs/server/Server.cpp @@ -97,7 +97,7 @@ #endif #if USE_SSL -# if USE_INTERNAL_SSL_LIBRARY +# if USE_INTERNAL_SSL_LIBRARY && !defined(ARCADIA_BUILD) # include # endif # include diff --git a/programs/server/config.xml b/programs/server/config.xml index 136b982a181..510a5e230f8 100644 --- a/programs/server/config.xml +++ b/programs/server/config.xml @@ -888,13 +888,13 @@ system part_log
+ toYYYYMM(event_date) 7500
- --> IN + * != ALL --> NOT IN + * = ALL --> IN (SELECT singleValueOrNull(*) FROM subquery) + * != ANY --> NOT IN (SELECT singleValueOrNull(*) FROM subquery) + **/ + + auto * function = assert_cast(ast.get()); + String operator_name = function->name; + + auto function_equals = operator_name == "equals"; + auto function_not_equals = operator_name == "notEquals"; + + String aggregate_function_name; + if (function_equals || function_not_equals) + { + if (operator_name == "notEquals") + function->name = "notIn"; + else + function->name = "in"; + + if ((type == SubqueryFunctionType::ANY && function_equals) + || (type == SubqueryFunctionType::ALL && function_not_equals)) + { + return true; + } + + aggregate_function_name = "singleValueOrNull"; + } + else if (operator_name == "greaterOrEquals" || operator_name == "greater") + { + aggregate_function_name = (type == SubqueryFunctionType::ANY ? "min" : "max"); + } + else if (operator_name == "lessOrEquals" || operator_name == "less") + { + aggregate_function_name = (type == SubqueryFunctionType::ANY ? "max" : "min"); + } + else + return false; + + /// subquery --> (SELECT aggregate_function(*) FROM subquery) + auto aggregate_function = makeASTFunction(aggregate_function_name, std::make_shared()); + auto subquery_node = function->children[0]->children[1]; + + auto table_expression = std::make_shared(); + table_expression->subquery = std::move(subquery_node); + table_expression->children.push_back(table_expression->subquery); + + auto tables_in_select_element = std::make_shared(); + tables_in_select_element->table_expression = std::move(table_expression); + tables_in_select_element->children.push_back(tables_in_select_element->table_expression); + + auto tables_in_select = std::make_shared(); + tables_in_select->children.push_back(std::move(tables_in_select_element)); + + auto select_exp_list = std::make_shared(); + select_exp_list->children.push_back(aggregate_function); + + auto select_query = std::make_shared(); + select_query->children.push_back(select_exp_list); + select_query->children.push_back(tables_in_select); + + select_query->setExpression(ASTSelectQuery::Expression::SELECT, select_exp_list); + select_query->setExpression(ASTSelectQuery::Expression::TABLES, tables_in_select); + + auto select_with_union_query = std::make_shared(); + select_with_union_query->list_of_selects = std::make_shared(); + select_with_union_query->list_of_selects->children.push_back(std::move(select_query)); + select_with_union_query->children.push_back(select_with_union_query->list_of_selects); + + auto new_subquery = std::make_shared(); + new_subquery->children.push_back(select_with_union_query); + ast->children[0]->children.back() = std::move(new_subquery); + + return true; +} bool ParserLeftAssociativeBinaryOperatorList::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) { @@ -213,7 +320,15 @@ bool ParserLeftAssociativeBinaryOperatorList::parseImpl(Pos & pos, ASTPtr & node auto exp_list = std::make_shared(); ASTPtr elem; - if (!(remaining_elem_parser ? remaining_elem_parser : first_elem_parser)->parse(pos, elem, expected)) + SubqueryFunctionType subquery_function_type = SubqueryFunctionType::NONE; + if (allow_any_all_operators && ParserKeyword("ANY").ignore(pos, expected)) + subquery_function_type = SubqueryFunctionType::ANY; + else if (allow_any_all_operators && ParserKeyword("ALL").ignore(pos, expected)) + subquery_function_type = SubqueryFunctionType::ALL; + else if (!(remaining_elem_parser ? remaining_elem_parser : first_elem_parser)->parse(pos, elem, expected)) + return false; + + if (subquery_function_type != SubqueryFunctionType::NONE && !ParserSubquery().parse(pos, elem, expected)) return false; /// the first argument of the function is the previous element, the second is the next one @@ -224,6 +339,9 @@ bool ParserLeftAssociativeBinaryOperatorList::parseImpl(Pos & pos, ASTPtr & node exp_list->children.push_back(node); exp_list->children.push_back(elem); + if (allow_any_all_operators && subquery_function_type != SubqueryFunctionType::NONE && !modifyAST(function, subquery_function_type)) + return false; + /** special exception for the access operator to the element of the array `x[y]`, which * contains the infix part '[' and the suffix ''] '(specified as' [') */ @@ -855,4 +973,3 @@ bool ParserKeyValuePairsList::parseImpl(Pos & pos, ASTPtr & node, Expected & exp } } - diff --git a/src/Parsers/ExpressionListParsers.h b/src/Parsers/ExpressionListParsers.h index bd4763297d4..17deec4e9e4 100644 --- a/src/Parsers/ExpressionListParsers.h +++ b/src/Parsers/ExpressionListParsers.h @@ -79,14 +79,6 @@ private: class ParserUnionList : public IParserBase { public: - ParserUnionList(ParserPtr && elem_parser_, ParserPtr && s_union_parser_, ParserPtr && s_all_parser_, ParserPtr && s_distinct_parser_) - : elem_parser(std::move(elem_parser_)) - , s_union_parser(std::move(s_union_parser_)) - , s_all_parser(std::move(s_all_parser_)) - , s_distinct_parser(std::move(s_distinct_parser_)) - { - } - template static bool parseUtil(Pos & pos, const ElemFunc & parse_element, const SepFunc & parse_separator) { @@ -116,10 +108,6 @@ protected: const char * getName() const override { return "list of union elements"; } bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; private: - ParserPtr elem_parser; - ParserPtr s_union_parser; - ParserPtr s_all_parser; - ParserPtr s_distinct_parser; ASTSelectWithUnionQuery::UnionModes union_modes; }; @@ -133,6 +121,8 @@ private: Operators_t overlapping_operators_to_skip = { (const char *[]){ nullptr } }; ParserPtr first_elem_parser; ParserPtr remaining_elem_parser; + /// =, !=, <, > ALL (subquery) / ANY (subquery) + bool allow_any_all_operators = false; public: /** `operators_` - allowed operators and their corresponding functions @@ -142,8 +132,10 @@ public: { } - ParserLeftAssociativeBinaryOperatorList(Operators_t operators_, Operators_t overlapping_operators_to_skip_, ParserPtr && first_elem_parser_) - : operators(operators_), overlapping_operators_to_skip(overlapping_operators_to_skip_), first_elem_parser(std::move(first_elem_parser_)) + ParserLeftAssociativeBinaryOperatorList(Operators_t operators_, + Operators_t overlapping_operators_to_skip_, ParserPtr && first_elem_parser_, bool allow_any_all_operators_ = false) + : operators(operators_), overlapping_operators_to_skip(overlapping_operators_to_skip_), + first_elem_parser(std::move(first_elem_parser_)), allow_any_all_operators(allow_any_all_operators_) { } @@ -353,7 +345,8 @@ class ParserComparisonExpression : public IParserBase private: static const char * operators[]; static const char * overlapping_operators_to_skip[]; - ParserLeftAssociativeBinaryOperatorList operator_parser {operators, overlapping_operators_to_skip, std::make_unique()}; + ParserLeftAssociativeBinaryOperatorList operator_parser {operators, + overlapping_operators_to_skip, std::make_unique(), true}; protected: const char * getName() const override{ return "comparison expression"; } @@ -364,7 +357,6 @@ protected: } }; - /** Parser for nullity checking with IS (NOT) NULL. */ class ParserNullityChecking : public IParserBase diff --git a/src/Parsers/ParserQueryWithOutput.cpp b/src/Parsers/ParserQueryWithOutput.cpp index d5aa1e47533..82f9f561187 100644 --- a/src/Parsers/ParserQueryWithOutput.cpp +++ b/src/Parsers/ParserQueryWithOutput.cpp @@ -1,27 +1,27 @@ -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include #include #include #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include #include #include #include #include -#include +#include +#include +#include +#include #include diff --git a/src/Parsers/ParserSelectWithUnionQuery.cpp b/src/Parsers/ParserSelectWithUnionQuery.cpp index 87e2dab1a47..532a9e20735 100644 --- a/src/Parsers/ParserSelectWithUnionQuery.cpp +++ b/src/Parsers/ParserSelectWithUnionQuery.cpp @@ -10,12 +10,7 @@ namespace DB bool ParserSelectWithUnionQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) { ASTPtr list_node; - - ParserUnionList parser( - std::make_unique(), - std::make_unique("UNION"), - std::make_unique("ALL"), - std::make_unique("DISTINCT")); + ParserUnionList parser; if (!parser.parse(pos, list_node, expected)) return false; diff --git a/src/Processors/Formats/Impl/ConstantExpressionTemplate.cpp b/src/Processors/Formats/Impl/ConstantExpressionTemplate.cpp index cc9ae5e65bb..1f780a206dd 100644 --- a/src/Processors/Formats/Impl/ConstantExpressionTemplate.cpp +++ b/src/Processors/Formats/Impl/ConstantExpressionTemplate.cpp @@ -639,7 +639,7 @@ void ConstantExpressionTemplate::TemplateStructure::addNodesToCastResult(const I expr = makeASTFunction("assumeNotNull", std::move(expr)); } - expr = makeASTFunction("CAST", std::move(expr), std::make_shared(result_column_type.getName())); + expr = makeASTFunction("_CAST", std::move(expr), std::make_shared(result_column_type.getName())); if (null_as_default) { diff --git a/src/Processors/QueryPlan/IntersectOrExceptStep.cpp b/src/Processors/QueryPlan/IntersectOrExceptStep.cpp new file mode 100644 index 00000000000..d1bb1eb41e9 --- /dev/null +++ b/src/Processors/QueryPlan/IntersectOrExceptStep.cpp @@ -0,0 +1,87 @@ +#include + +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + +static Block checkHeaders(const DataStreams & input_streams_) +{ + if (input_streams_.empty()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot perform intersect/except on empty set of query plan steps"); + + Block res = input_streams_.front().header; + for (const auto & stream : input_streams_) + assertBlocksHaveEqualStructure(stream.header, res, "IntersectOrExceptStep"); + + return res; +} + +IntersectOrExceptStep::IntersectOrExceptStep( + DataStreams input_streams_ , Operator operator_ , size_t max_threads_) + : header(checkHeaders(input_streams_)) + , current_operator(operator_) + , max_threads(max_threads_) +{ + input_streams = std::move(input_streams_); + output_stream = DataStream{.header = header}; +} + +QueryPipelinePtr IntersectOrExceptStep::updatePipeline(QueryPipelines pipelines, const BuildQueryPipelineSettings &) +{ + auto pipeline = std::make_unique(); + QueryPipelineProcessorsCollector collector(*pipeline, this); + + if (pipelines.empty()) + { + pipeline->init(Pipe(std::make_shared(output_stream->header))); + processors = collector.detachProcessors(); + return pipeline; + } + + for (auto & cur_pipeline : pipelines) + { + /// Just in case. + if (!isCompatibleHeader(cur_pipeline->getHeader(), getOutputStream().header)) + { + auto converting_dag = ActionsDAG::makeConvertingActions( + cur_pipeline->getHeader().getColumnsWithTypeAndName(), + getOutputStream().header.getColumnsWithTypeAndName(), + ActionsDAG::MatchColumnsMode::Name); + + auto converting_actions = std::make_shared(std::move(converting_dag)); + cur_pipeline->addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared(cur_header, converting_actions); + }); + } + + /// For the case of union. + cur_pipeline->addTransform(std::make_shared(header, cur_pipeline->getNumStreams(), 1)); + } + + *pipeline = QueryPipeline::unitePipelines(std::move(pipelines), max_threads); + pipeline->addTransform(std::make_shared(header, current_operator)); + + processors = collector.detachProcessors(); + return pipeline; +} + +void IntersectOrExceptStep::describePipeline(FormatSettings & settings) const +{ + IQueryPlanStep::describePipeline(processors, settings); +} + +} diff --git a/src/Processors/QueryPlan/IntersectOrExceptStep.h b/src/Processors/QueryPlan/IntersectOrExceptStep.h new file mode 100644 index 00000000000..9e87c921ab2 --- /dev/null +++ b/src/Processors/QueryPlan/IntersectOrExceptStep.h @@ -0,0 +1,30 @@ +#pragma once +#include +#include + + +namespace DB +{ + +class IntersectOrExceptStep : public IQueryPlanStep +{ +using Operator = ASTSelectIntersectExceptQuery::Operator; + +public: + /// max_threads is used to limit the number of threads for result pipeline. + IntersectOrExceptStep(DataStreams input_streams_, Operator operator_, size_t max_threads_ = 0); + + String getName() const override { return "IntersectOrExcept"; } + + QueryPipelinePtr updatePipeline(QueryPipelines pipelines, const BuildQueryPipelineSettings & settings) override; + + void describePipeline(FormatSettings & settings) const override; + +private: + Block header; + Operator current_operator; + size_t max_threads; + Processors processors; +}; + +} diff --git a/src/Processors/Transforms/IntersectOrExceptTransform.cpp b/src/Processors/Transforms/IntersectOrExceptTransform.cpp new file mode 100644 index 00000000000..3e39123ae4b --- /dev/null +++ b/src/Processors/Transforms/IntersectOrExceptTransform.cpp @@ -0,0 +1,197 @@ +#include + + +namespace DB +{ + +/// After visitor is applied, ASTSelectIntersectExcept always has two child nodes. +IntersectOrExceptTransform::IntersectOrExceptTransform(const Block & header_, Operator operator_) + : IProcessor(InputPorts(2, header_), {header_}) + , current_operator(operator_) +{ + const Names & columns = header_.getNames(); + size_t num_columns = columns.empty() ? header_.columns() : columns.size(); + + key_columns_pos.reserve(columns.size()); + for (size_t i = 0; i < num_columns; ++i) + { + auto pos = columns.empty() ? i : header_.getPositionByName(columns[i]); + key_columns_pos.emplace_back(pos); + } +} + + +IntersectOrExceptTransform::Status IntersectOrExceptTransform::prepare() +{ + auto & output = outputs.front(); + + if (output.isFinished()) + { + for (auto & in : inputs) + in.close(); + + return Status::Finished; + } + + if (!output.canPush()) + { + for (auto & input : inputs) + input.setNotNeeded(); + + return Status::PortFull; + } + + if (current_output_chunk) + { + output.push(std::move(current_output_chunk)); + } + + if (finished_second_input) + { + if (inputs.front().isFinished()) + { + output.finish(); + return Status::Finished; + } + } + else if (inputs.back().isFinished()) + { + finished_second_input = true; + } + + if (!has_input) + { + InputPort & input = finished_second_input ? inputs.front() : inputs.back(); + + input.setNeeded(); + if (!input.hasData()) + return Status::NeedData; + + current_input_chunk = input.pull(); + has_input = true; + } + + return Status::Ready; +} + + +void IntersectOrExceptTransform::work() +{ + if (!finished_second_input) + { + accumulate(std::move(current_input_chunk)); + } + else + { + filter(current_input_chunk); + current_output_chunk = std::move(current_input_chunk); + } + + has_input = false; +} + + +template +void IntersectOrExceptTransform::addToSet(Method & method, const ColumnRawPtrs & columns, size_t rows, SetVariants & variants) const +{ + typename Method::State state(columns, key_sizes, nullptr); + + for (size_t i = 0; i < rows; ++i) + state.emplaceKey(method.data, i, variants.string_pool); +} + + +template +size_t IntersectOrExceptTransform::buildFilter( + Method & method, const ColumnRawPtrs & columns, IColumn::Filter & filter, size_t rows, SetVariants & variants) const +{ + typename Method::State state(columns, key_sizes, nullptr); + size_t new_rows_num = 0; + + for (size_t i = 0; i < rows; ++i) + { + auto find_result = state.findKey(method.data, i, variants.string_pool); + filter[i] = current_operator == ASTSelectIntersectExceptQuery::Operator::EXCEPT ? !find_result.isFound() : find_result.isFound(); + if (filter[i]) + ++new_rows_num; + } + return new_rows_num; +} + + +void IntersectOrExceptTransform::accumulate(Chunk chunk) +{ + auto num_rows = chunk.getNumRows(); + auto columns = chunk.detachColumns(); + + ColumnRawPtrs column_ptrs; + column_ptrs.reserve(key_columns_pos.size()); + + for (auto pos : key_columns_pos) + column_ptrs.emplace_back(columns[pos].get()); + + if (!data) + data.emplace(); + + if (data->empty()) + data->init(SetVariants::chooseMethod(column_ptrs, key_sizes)); + + auto & data_set = *data; + switch (data->type) + { + case SetVariants::Type::EMPTY: + break; +#define M(NAME) \ + case SetVariants::Type::NAME: \ + addToSet(*data_set.NAME, column_ptrs, num_rows, data_set); \ + break; + APPLY_FOR_SET_VARIANTS(M) +#undef M + } +} + + +void IntersectOrExceptTransform::filter(Chunk & chunk) +{ + auto num_rows = chunk.getNumRows(); + auto columns = chunk.detachColumns(); + + ColumnRawPtrs column_ptrs; + column_ptrs.reserve(key_columns_pos.size()); + + for (auto pos : key_columns_pos) + column_ptrs.emplace_back(columns[pos].get()); + + if (!data) + data.emplace(); + + if (data->empty()) + data->init(SetVariants::chooseMethod(column_ptrs, key_sizes)); + + size_t new_rows_num = 0; + + IColumn::Filter filter(num_rows); + auto & data_set = *data; + + switch (data->type) + { + case SetVariants::Type::EMPTY: + break; +#define M(NAME) \ + case SetVariants::Type::NAME: \ + new_rows_num = buildFilter(*data_set.NAME, column_ptrs, filter, num_rows, data_set); \ + break; + APPLY_FOR_SET_VARIANTS(M) +#undef M + } + + if (!new_rows_num) + return; + + for (auto & column : columns) + column = column->filter(filter, -1); + + chunk.setColumns(std::move(columns), new_rows_num); +} + +} diff --git a/src/Processors/Transforms/IntersectOrExceptTransform.h b/src/Processors/Transforms/IntersectOrExceptTransform.h new file mode 100644 index 00000000000..e200bfd6cc5 --- /dev/null +++ b/src/Processors/Transforms/IntersectOrExceptTransform.h @@ -0,0 +1,51 @@ +#pragma once + +#include +#include +#include +#include + + +namespace DB +{ + +class IntersectOrExceptTransform : public IProcessor +{ +using Operator = ASTSelectIntersectExceptQuery::Operator; + +public: + IntersectOrExceptTransform(const Block & header_, Operator operator_); + + String getName() const override { return "IntersectOrExcept"; } + +protected: + Status prepare() override; + + void work() override; + +private: + Operator current_operator; + + ColumnNumbers key_columns_pos; + std::optional data; + Sizes key_sizes; + + Chunk current_input_chunk; + Chunk current_output_chunk; + + bool finished_second_input = false; + bool has_input = false; + + void accumulate(Chunk chunk); + + void filter(Chunk & chunk); + + template + void addToSet(Method & method, const ColumnRawPtrs & key_columns, size_t rows, SetVariants & variants) const; + + template + size_t buildFilter(Method & method, const ColumnRawPtrs & columns, + IColumn::Filter & filter, size_t rows, SetVariants & variants) const; +}; + +} diff --git a/src/Processors/ya.make b/src/Processors/ya.make index 50faac6d97b..db0ae80c742 100644 --- a/src/Processors/ya.make +++ b/src/Processors/ya.make @@ -7,8 +7,14 @@ PEERDIR( clickhouse/src/Common contrib/libs/msgpack contrib/libs/protobuf + contrib/libs/apache/arrow ) +ADDINCL( + contrib/libs/apache/arrow/src +) + +CFLAGS(-DUSE_ARROW=1) SRCS( Chunk.cpp @@ -25,8 +31,13 @@ SRCS( Formats/IOutputFormat.cpp Formats/IRowInputFormat.cpp Formats/IRowOutputFormat.cpp + Formats/Impl/ArrowBlockInputFormat.cpp + Formats/Impl/ArrowBlockOutputFormat.cpp + Formats/Impl/ArrowBufferedStreams.cpp + Formats/Impl/ArrowColumnToCHColumn.cpp Formats/Impl/BinaryRowInputFormat.cpp Formats/Impl/BinaryRowOutputFormat.cpp + Formats/Impl/CHColumnToArrowColumn.cpp Formats/Impl/CSVRowInputFormat.cpp Formats/Impl/CSVRowOutputFormat.cpp Formats/Impl/ConstantExpressionTemplate.cpp diff --git a/src/Processors/ya.make.in b/src/Processors/ya.make.in index 06230b96be8..7160e80bcce 100644 --- a/src/Processors/ya.make.in +++ b/src/Processors/ya.make.in @@ -6,11 +6,17 @@ PEERDIR( clickhouse/src/Common contrib/libs/msgpack contrib/libs/protobuf + contrib/libs/apache/arrow ) +ADDINCL( + contrib/libs/apache/arrow/src +) + +CFLAGS(-DUSE_ARROW=1) SRCS( - + ) END() diff --git a/src/Storages/MergeTree/IMergeTreeReader.cpp b/src/Storages/MergeTree/IMergeTreeReader.cpp index 4efd3d669eb..d659259e1a9 100644 --- a/src/Storages/MergeTree/IMergeTreeReader.cpp +++ b/src/Storages/MergeTree/IMergeTreeReader.cpp @@ -48,9 +48,8 @@ IMergeTreeReader::IMergeTreeReader( part_columns = Nested::collect(part_columns); } - columns_from_part.set_empty_key(StringRef()); for (const auto & column_from_part : part_columns) - columns_from_part.emplace(column_from_part.name, &column_from_part.type); + columns_from_part[column_from_part.name] = &column_from_part.type; } IMergeTreeReader::~IMergeTreeReader() = default; @@ -213,7 +212,7 @@ NameAndTypePair IMergeTreeReader::getColumnFromPart(const NameAndTypePair & requ { auto name_in_storage = required_column.getNameInStorage(); - decltype(columns_from_part.begin()) it; + ColumnsFromPart::ConstLookupResult it; if (alter_conversions.isColumnRenamed(name_in_storage)) { String old_name = alter_conversions.getColumnOldName(name_in_storage); @@ -227,7 +226,7 @@ NameAndTypePair IMergeTreeReader::getColumnFromPart(const NameAndTypePair & requ if (it == columns_from_part.end()) return required_column; - const auto & type = *it->second; + const DataTypePtr & type = *it->getMapped(); if (required_column.isSubcolumn()) { auto subcolumn_name = required_column.getSubcolumnName(); @@ -236,10 +235,10 @@ NameAndTypePair IMergeTreeReader::getColumnFromPart(const NameAndTypePair & requ if (!subcolumn_type) return required_column; - return {String(it->first), subcolumn_name, type, subcolumn_type}; + return {String(it->getKey()), subcolumn_name, type, subcolumn_type}; } - return {String(it->first), type}; + return {String(it->getKey()), type}; } void IMergeTreeReader::performRequiredConversions(Columns & res_columns) diff --git a/src/Storages/MergeTree/IMergeTreeReader.h b/src/Storages/MergeTree/IMergeTreeReader.h index ab412e48822..696cc2f105b 100644 --- a/src/Storages/MergeTree/IMergeTreeReader.h +++ b/src/Storages/MergeTree/IMergeTreeReader.h @@ -1,9 +1,9 @@ #pragma once #include +#include #include #include -#include namespace DB { @@ -95,11 +95,8 @@ private: /// Actual data type of columns in part -#if !defined(ARCADIA_BUILD) - google::dense_hash_map columns_from_part; -#else - google::sparsehash::dense_hash_map columns_from_part; -#endif + using ColumnsFromPart = HashMapWithSavedHash; + ColumnsFromPart columns_from_part; }; } diff --git a/src/Storages/MergeTree/KeyCondition.cpp b/src/Storages/MergeTree/KeyCondition.cpp index 235cadfba11..18ca00ebf0d 100644 --- a/src/Storages/MergeTree/KeyCondition.cpp +++ b/src/Storages/MergeTree/KeyCondition.cpp @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -1367,7 +1368,7 @@ bool KeyCondition::tryParseAtomFromAST(const ASTPtr & node, ContextPtr context, { ColumnsWithTypeAndName arguments{ {nullptr, key_expr_type, ""}, {DataTypeString().createColumnConst(1, common_type->getName()), common_type, ""}}; - FunctionOverloadResolverPtr func_builder_cast = CastOverloadResolver::createImpl(false); + FunctionOverloadResolverPtr func_builder_cast = CastInternalOverloadResolver::createImpl(); auto func_cast = func_builder_cast->build(arguments); /// If we know the given range only contains one value, then we treat all functions as positive monotonic. diff --git a/src/Storages/MergeTree/MergeTreeData.cpp b/src/Storages/MergeTree/MergeTreeData.cpp index 4730bf9f47c..2892efab12d 100644 --- a/src/Storages/MergeTree/MergeTreeData.cpp +++ b/src/Storages/MergeTree/MergeTreeData.cpp @@ -3207,6 +3207,16 @@ Pipe MergeTreeData::alterPartition( return {}; } +void checkPartitionExpressionFunction(const ASTPtr & ast) +{ + if (const auto * func = ast->as()) + if (func->name == "arrayJoin") + throw Exception("The partition expression cannot contain array joins", ErrorCodes::INVALID_PARTITION_VALUE); + for (const auto & child : ast->children) + checkPartitionExpressionFunction(child); +} + + String MergeTreeData::getPartitionIDFromQuery(const ASTPtr & ast, ContextPtr local_context) const { const auto & partition_ast = ast->as(); @@ -3217,6 +3227,9 @@ String MergeTreeData::getPartitionIDFromQuery(const ASTPtr & ast, ContextPtr loc return partition_ast.id; } + if (partition_ast.value->as()) + checkPartitionExpressionFunction(ast); + if (format_version < MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING) { /// Month-partitioning specific - partition ID can be passed in the partition value. diff --git a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp index baea7e72b21..b6d55828e85 100644 --- a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp +++ b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp @@ -1332,11 +1332,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor { /// We will modify only some of the columns. Other columns and key values can be copied as-is. NameSet updated_columns; - if (mutation_kind != MutationsInterpreter::MutationKind::MUTATE_INDEX_PROJECTION) - { - for (const auto & name_type : updated_header.getNamesAndTypesList()) - updated_columns.emplace(name_type.name); - } + for (const auto & name_type : updated_header.getNamesAndTypesList()) + updated_columns.emplace(name_type.name); auto indices_to_recalc = getIndicesToRecalculate( in, updated_columns, metadata_snapshot, context, materialized_indices, source_part); @@ -1345,7 +1342,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor NameSet files_to_skip = collectFilesToSkip( source_part, - mutation_kind == MutationsInterpreter::MutationKind::MUTATE_INDEX_PROJECTION ? Block{} : updated_header, + updated_header, indices_to_recalc, mrk_extension, projections_to_recalc); @@ -1413,8 +1410,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor metadata_snapshot, indices_to_recalc, projections_to_recalc, - // If it's an index/projection materialization, we don't write any data columns, thus empty header is used - mutation_kind == MutationsInterpreter::MutationKind::MUTATE_INDEX_PROJECTION ? Block{} : updated_header, + updated_header, new_data_part, in, time_of_mutation, diff --git a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp index f60acca12a7..c7eb8200957 100644 --- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp +++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp @@ -361,10 +361,11 @@ QueryPlanPtr MergeTreeDataSelectExecutor::read( pipes.emplace_back(std::move(projection_pipe)); pipes.emplace_back(std::move(ordinary_pipe)); auto pipe = Pipe::unitePipes(std::move(pipes)); - // TODO what if pipe is empty? pipe.resize(1); - auto step = std::make_unique(std::move(pipe), "MergeTree(with projection)"); + auto step = std::make_unique( + std::move(pipe), + fmt::format("MergeTree(with {} projection {})", query_info.projection->desc->type, query_info.projection->desc->name)); auto plan = std::make_unique(); plan->addStep(std::move(step)); return plan; diff --git a/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp b/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp index ea5f7cfc36a..ef276a53df2 100644 --- a/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp +++ b/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp @@ -1012,8 +1012,24 @@ bool ReplicatedMergeTreeQueue::isNotCoveredByFuturePartsImpl(const String & log_ bool ReplicatedMergeTreeQueue::addFuturePartIfNotCoveredByThem(const String & part_name, LogEntry & entry, String & reject_reason) { + /// We have found `part_name` on some replica and are going to fetch it instead of covered `entry->new_part_name`. std::lock_guard lock(state_mutex); + if (virtual_parts.getContainingPart(part_name).empty()) + { + /// We should not fetch any parts that absent in our `virtual_parts` set, + /// because we do not know about such parts according to our replication queue (we know about them from some side-channel). + /// Otherwise, it may break invariants in replication queue reordering, for example: + /// 1. Our queue contains GET_PART all_2_2_0, log contains DROP_RANGE all_2_2_0 and MERGE_PARTS all_1_3_1 + /// 2. We execute GET_PART all_2_2_0, but fetch all_1_3_1 instead + /// (drop_ranges.isAffectedByDropRange(...) is false-negative, because DROP_RANGE all_2_2_0 is not pulled yet). + /// It actually means, that MERGE_PARTS all_1_3_1 is executed too, but it's not even pulled yet. + /// 3. Then we pull log, trying to execute DROP_RANGE all_2_2_0 + /// and reveal that it was incorrectly reordered with MERGE_PARTS all_1_3_1 (drop range intersects merged part). + reject_reason = fmt::format("Log entry for part {} or covering part is not pulled from log to queue yet.", part_name); + return false; + } + /// FIXME get rid of actual_part_name. /// If new covering part jumps over DROP_RANGE we should execute drop range first if (drop_ranges.isAffectedByDropRange(part_name, reject_reason)) diff --git a/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp b/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp index b812c6d2923..3477397adb7 100644 --- a/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp +++ b/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp @@ -1,6 +1,6 @@ #include "PostgreSQLReplicationHandler.h" -#include +#include #include #include #include diff --git a/src/Storages/StorageInMemoryMetadata.cpp b/src/Storages/StorageInMemoryMetadata.cpp index dad83f64c70..5183b925141 100644 --- a/src/Storages/StorageInMemoryMetadata.cpp +++ b/src/Storages/StorageInMemoryMetadata.cpp @@ -1,7 +1,7 @@ #include -#include -#include +#include +#include #include #include #include @@ -320,18 +320,12 @@ Block StorageInMemoryMetadata::getSampleBlockForColumns( { Block res; -#if !defined(ARCADIA_BUILD) - google::dense_hash_map virtuals_map; -#else - google::sparsehash::dense_hash_map virtuals_map; -#endif - - virtuals_map.set_empty_key(StringRef()); + HashMapWithSavedHash virtuals_map; /// Virtual columns must be appended after ordinary, because user can /// override them. for (const auto & column : virtuals) - virtuals_map.emplace(column.name, &column.type); + virtuals_map[column.name] = &column.type; for (const auto & name : column_names) { @@ -340,9 +334,9 @@ Block StorageInMemoryMetadata::getSampleBlockForColumns( { res.insert({column->type->createColumn(), column->type, column->name}); } - else if (auto it = virtuals_map.find(name); it != virtuals_map.end()) + else if (auto * it = virtuals_map.find(name); it != virtuals_map.end()) { - const auto & type = *it->second; + const auto & type = *it->getMapped(); res.insert({type->createColumn(), type, name}); } else @@ -475,13 +469,8 @@ bool StorageInMemoryMetadata::hasSelectQuery() const namespace { -#if !defined(ARCADIA_BUILD) - using NamesAndTypesMap = google::dense_hash_map; - using UniqueStrings = google::dense_hash_set; -#else - using NamesAndTypesMap = google::sparsehash::dense_hash_map; - using UniqueStrings = google::sparsehash::dense_hash_set; -#endif + using NamesAndTypesMap = HashMapWithSavedHash; + using UniqueStrings = HashSetWithSavedHash; String listOfColumns(const NamesAndTypesList & available_columns) { @@ -498,20 +487,12 @@ namespace NamesAndTypesMap getColumnsMap(const NamesAndTypesList & columns) { NamesAndTypesMap res; - res.set_empty_key(StringRef()); for (const auto & column : columns) res.insert({column.name, column.type.get()}); return res; } - - UniqueStrings initUniqueStrings() - { - UniqueStrings strings; - strings.set_empty_key(StringRef()); - return strings; - } } void StorageInMemoryMetadata::check(const Names & column_names, const NamesAndTypesList & virtuals, const StorageID & storage_id) const @@ -524,11 +505,12 @@ void StorageInMemoryMetadata::check(const Names & column_names, const NamesAndTy } const auto virtuals_map = getColumnsMap(virtuals); - auto unique_names = initUniqueStrings(); + UniqueStrings unique_names; for (const auto & name : column_names) { - bool has_column = getColumns().hasColumnOrSubcolumn(ColumnsDescription::AllPhysical, name) || virtuals_map.count(name); + bool has_column = getColumns().hasColumnOrSubcolumn(ColumnsDescription::AllPhysical, name) + || virtuals_map.find(name) != nullptr; if (!has_column) { @@ -550,23 +532,31 @@ void StorageInMemoryMetadata::check(const NamesAndTypesList & provided_columns) const NamesAndTypesList & available_columns = getColumns().getAllPhysical(); const auto columns_map = getColumnsMap(available_columns); - auto unique_names = initUniqueStrings(); + UniqueStrings unique_names; + for (const NameAndTypePair & column : provided_columns) { - auto it = columns_map.find(column.name); + const auto * it = columns_map.find(column.name); if (columns_map.end() == it) throw Exception( - "There is no column with name " + column.name + ". There are columns: " + listOfColumns(available_columns), - ErrorCodes::NO_SUCH_COLUMN_IN_TABLE); + ErrorCodes::NO_SUCH_COLUMN_IN_TABLE, + "There is no column with name {}. There are columns: {}", + column.name, + listOfColumns(available_columns)); - if (!column.type->equals(*it->second)) + if (!column.type->equals(*it->getMapped())) throw Exception( - "Type mismatch for column " + column.name + ". Column has type " + it->second->getName() + ", got type " - + column.type->getName(), - ErrorCodes::TYPE_MISMATCH); + ErrorCodes::TYPE_MISMATCH, + "Type mismatch for column {}. Column has type {}, got type {}", + column.name, + it->getMapped()->getName(), + column.type->getName()); if (unique_names.end() != unique_names.find(column.name)) - throw Exception("Column " + column.name + " queried more than once", ErrorCodes::COLUMN_QUERIED_MORE_THAN_ONCE); + throw Exception(ErrorCodes::COLUMN_QUERIED_MORE_THAN_ONCE, + "Column {} queried more than once", + column.name); + unique_names.insert(column.name); } } @@ -582,26 +572,38 @@ void StorageInMemoryMetadata::check(const NamesAndTypesList & provided_columns, "Empty list of columns queried. There are columns: " + listOfColumns(available_columns), ErrorCodes::EMPTY_LIST_OF_COLUMNS_QUERIED); - auto unique_names = initUniqueStrings(); + UniqueStrings unique_names; + for (const String & name : column_names) { - auto it = provided_columns_map.find(name); + const auto * it = provided_columns_map.find(name); if (provided_columns_map.end() == it) continue; - auto jt = available_columns_map.find(name); + const auto * jt = available_columns_map.find(name); if (available_columns_map.end() == jt) throw Exception( - "There is no column with name " + name + ". There are columns: " + listOfColumns(available_columns), - ErrorCodes::NO_SUCH_COLUMN_IN_TABLE); + ErrorCodes::NO_SUCH_COLUMN_IN_TABLE, + "There is no column with name {}. There are columns: {}", + name, + listOfColumns(available_columns)); - if (!it->second->equals(*jt->second)) + const auto & provided_column_type = *it->getMapped(); + const auto & available_column_type = *jt->getMapped(); + + if (!provided_column_type.equals(available_column_type)) throw Exception( - "Type mismatch for column " + name + ". Column has type " + jt->second->getName() + ", got type " + it->second->getName(), - ErrorCodes::TYPE_MISMATCH); + ErrorCodes::TYPE_MISMATCH, + "Type mismatch for column {}. Column has type {}, got type {}", + name, + provided_column_type.getName(), + available_column_type.getName()); if (unique_names.end() != unique_names.find(name)) - throw Exception("Column " + name + " queried more than once", ErrorCodes::COLUMN_QUERIED_MORE_THAN_ONCE); + throw Exception(ErrorCodes::COLUMN_QUERIED_MORE_THAN_ONCE, + "Column {} queried more than once", + name); + unique_names.insert(name); } } @@ -622,17 +624,21 @@ void StorageInMemoryMetadata::check(const Block & block, bool need_all) const names_in_block.insert(column.name); - auto it = columns_map.find(column.name); + const auto * it = columns_map.find(column.name); if (columns_map.end() == it) throw Exception( - "There is no column with name " + column.name + ". There are columns: " + listOfColumns(available_columns), - ErrorCodes::NO_SUCH_COLUMN_IN_TABLE); + ErrorCodes::NO_SUCH_COLUMN_IN_TABLE, + "There is no column with name {}. There are columns: {}", + column.name, + listOfColumns(available_columns)); - if (!column.type->equals(*it->second)) + if (!column.type->equals(*it->getMapped())) throw Exception( - "Type mismatch for column " + column.name + ". Column has type " + it->second->getName() + ", got type " - + column.type->getName(), - ErrorCodes::TYPE_MISMATCH); + ErrorCodes::TYPE_MISMATCH, + "Type mismatch for column {}. Column has type {}, got type {}", + column.name, + it->getMapped()->getName(), + column.type->getName()); } if (need_all && names_in_block.size() < columns_map.size()) diff --git a/src/Storages/StorageMongoDB.cpp b/src/Storages/StorageMongoDB.cpp index a973efd7277..3bdef7fd295 100644 --- a/src/Storages/StorageMongoDB.cpp +++ b/src/Storages/StorageMongoDB.cpp @@ -15,7 +15,7 @@ #include #include #include -#include +#include namespace DB { diff --git a/src/Storages/StorageMySQL.cpp b/src/Storages/StorageMySQL.cpp index 431fda530f4..79bb1f59cc7 100644 --- a/src/Storages/StorageMySQL.cpp +++ b/src/Storages/StorageMySQL.cpp @@ -4,7 +4,7 @@ #include #include -#include +#include #include #include #include diff --git a/src/Storages/StoragePostgreSQL.cpp b/src/Storages/StoragePostgreSQL.cpp index 3ea4a03d8e1..603a52b2801 100644 --- a/src/Storages/StoragePostgreSQL.cpp +++ b/src/Storages/StoragePostgreSQL.cpp @@ -1,7 +1,7 @@ #include "StoragePostgreSQL.h" #if USE_LIBPQXX -#include +#include #include #include diff --git a/src/Storages/StorageSQLite.cpp b/src/Storages/StorageSQLite.cpp index ba66083fea5..758284e8d50 100644 --- a/src/Storages/StorageSQLite.cpp +++ b/src/Storages/StorageSQLite.cpp @@ -2,7 +2,7 @@ #if USE_SQLITE #include -#include +#include #include #include #include @@ -78,8 +78,7 @@ Pipe StorageSQLite::read( sample_block.insert({column_data.type, column_data.name}); } - return Pipe(std::make_shared( - std::make_shared(sqlite_db, query, sample_block, max_block_size))); + return Pipe(std::make_shared(sqlite_db, query, sample_block, max_block_size)); } diff --git a/src/TableFunctions/TableFunctionMySQL.cpp b/src/TableFunctions/TableFunctionMySQL.cpp index f8e0c41634b..09f9cf8b1f5 100644 --- a/src/TableFunctions/TableFunctionMySQL.cpp +++ b/src/TableFunctions/TableFunctionMySQL.cpp @@ -8,7 +8,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/tests/integration/ci-runner.py b/tests/integration/ci-runner.py index ecd4cb8d4e7..bf7549a83c4 100755 --- a/tests/integration/ci-runner.py +++ b/tests/integration/ci-runner.py @@ -125,13 +125,13 @@ def clear_ip_tables_and_restart_daemons(): logging.info("Dump iptables after run %s", subprocess.check_output("iptables -L", shell=True)) try: logging.info("Killing all alive docker containers") - subprocess.check_output("docker kill $(docker ps -q)", shell=True) + subprocess.check_output("timeout -s 9 10m docker kill $(docker ps -q)", shell=True) except subprocess.CalledProcessError as err: logging.info("docker kill excepted: " + str(err)) try: logging.info("Removing all docker containers") - subprocess.check_output("docker rm $(docker ps -a -q) --force", shell=True) + subprocess.check_output("timeout -s 9 10m docker rm $(docker ps -a -q) --force", shell=True) except subprocess.CalledProcessError as err: logging.info("docker rm excepted: " + str(err)) @@ -264,7 +264,7 @@ class ClickhouseIntegrationTestsRunner: out_file = "all_tests.txt" out_file_full = "all_tests_full.txt" cmd = "cd {repo_path}/tests/integration && " \ - "./runner --tmpfs {image_cmd} ' --setup-plan' " \ + "timeout -s 9 1h ./runner --tmpfs {image_cmd} ' --setup-plan' " \ "| tee {out_file_full} | grep '::' | sed 's/ (fixtures used:.*//g' | sed 's/^ *//g' | sed 's/ *$//g' " \ "| grep -v 'SKIPPED' | sort -u > {out_file}".format( repo_path=repo_path, image_cmd=image_cmd, out_file=out_file, out_file_full=out_file_full) @@ -376,6 +376,24 @@ class ClickhouseIntegrationTestsRunner: res.add(path) return res + def try_run_test_group(self, repo_path, test_group, tests_in_group, num_tries, num_workers): + try: + return self.run_test_group(repo_path, test_group, tests_in_group, num_tries, num_workers) + except Exception as e: + logging.info("Failed to run {}:\n{}".format(str(test_group), str(e))) + counters = { + "ERROR": [], + "PASSED": [], + "FAILED": [], + "SKIPPED": [], + "FLAKY": [], + } + tests_times = defaultdict(float) + for test in tests_in_group: + counters["ERROR"].append(test) + tests_times[test] = 0 + return counters, tests_times, [] + def run_test_group(self, repo_path, test_group, tests_in_group, num_tries, num_workers): counters = { "ERROR": [], @@ -419,7 +437,7 @@ class ClickhouseIntegrationTestsRunner: test_cmd = ' '.join([test for test in sorted(test_names)]) parallel_cmd = " --parallel {} ".format(num_workers) if num_workers > 0 else "" - cmd = "cd {}/tests/integration && ./runner --tmpfs {} -t {} {} '-rfEp --run-id={} --color=no --durations=0 {}' | tee {}".format( + cmd = "cd {}/tests/integration && timeout -s 9 1h ./runner --tmpfs {} -t {} {} '-rfEp --run-id={} --color=no --durations=0 {}' | tee {}".format( repo_path, image_cmd, test_cmd, parallel_cmd, i, _get_deselect_option(self.should_skip_tests()), info_path) log_basename = test_group_str + "_" + str(i) + ".log" @@ -507,7 +525,7 @@ class ClickhouseIntegrationTestsRunner: for i in range(TRIES_COUNT): final_retry += 1 logging.info("Running tests for the %s time", i) - counters, tests_times, log_paths = self.run_test_group(repo_path, "flaky", tests_to_run, 1, 1) + counters, tests_times, log_paths = self.try_run_test_group(repo_path, "flaky", tests_to_run, 1, 1) logs += log_paths if counters["FAILED"]: logging.info("Found failed tests: %s", ' '.join(counters["FAILED"])) @@ -583,7 +601,7 @@ class ClickhouseIntegrationTestsRunner: for group, tests in items_to_run: logging.info("Running test group %s countaining %s tests", group, len(tests)) - group_counters, group_test_times, log_paths = self.run_test_group(repo_path, group, tests, MAX_RETRY, NUM_WORKERS) + group_counters, group_test_times, log_paths = self.try_run_test_group(repo_path, group, tests, MAX_RETRY, NUM_WORKERS) total_tests = 0 for counter, value in group_counters.items(): logging.info("Tests from group %s stats, %s count %s", group, counter, len(value)) diff --git a/tests/integration/test_alter_update_cast_keep_nullable/__init__.py b/tests/integration/test_alter_update_cast_keep_nullable/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_alter_update_cast_keep_nullable/configs/users.xml b/tests/integration/test_alter_update_cast_keep_nullable/configs/users.xml new file mode 100644 index 00000000000..aa2f240b831 --- /dev/null +++ b/tests/integration/test_alter_update_cast_keep_nullable/configs/users.xml @@ -0,0 +1,8 @@ + + + + + 1 + + + diff --git a/tests/integration/test_alter_update_cast_keep_nullable/test.py b/tests/integration/test_alter_update_cast_keep_nullable/test.py new file mode 100644 index 00000000000..497a9e21d94 --- /dev/null +++ b/tests/integration/test_alter_update_cast_keep_nullable/test.py @@ -0,0 +1,36 @@ +import pytest +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) + +node1 = cluster.add_instance('node1', user_configs=['configs/users.xml'], with_zookeeper=True) + +@pytest.fixture(scope="module") +def started_cluster(): + try: + cluster.start() + yield cluster + finally: + cluster.shutdown() + +def test_cast_keep_nullable(started_cluster): + setting = node1.query("SELECT value FROM system.settings WHERE name='cast_keep_nullable'") + assert(setting.strip() == "1") + + result = node1.query(""" + DROP TABLE IF EXISTS t; + CREATE TABLE t (x UInt64) ENGINE = MergeTree ORDER BY tuple(); + INSERT INTO t SELECT number FROM numbers(10); + SELECT * FROM t; + """) + assert(result.strip() == "0\n1\n2\n3\n4\n5\n6\n7\n8\n9") + + error = node1.query_and_get_error(""" + SET mutations_sync = 1; + ALTER TABLE t UPDATE x = x % 3 = 0 ? NULL : x WHERE x % 2 = 1;  + """) + assert("DB::Exception: Cannot convert NULL value to non-Nullable type" in error) + + result = node1.query("SELECT * FROM t;") + assert(result.strip() == "0\n1\n2\n3\n4\n5\n6\n7\n8\n9") + diff --git a/tests/integration/test_config_corresponding_root/configs/config.xml b/tests/integration/test_config_corresponding_root/configs/config.xml index 4e130afa84d..a518bd88b2e 100644 --- a/tests/integration/test_config_corresponding_root/configs/config.xml +++ b/tests/integration/test_config_corresponding_root/configs/config.xml @@ -304,14 +304,13 @@ system part_log
+ toYYYYMM(event_date) 7500
- --> - - + @@ -838,13 +838,13 @@ system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- -->