diff --git a/.github/ISSUE_TEMPLATE/40_bug-report.md b/.github/ISSUE_TEMPLATE/85_bug-report.md similarity index 93% rename from .github/ISSUE_TEMPLATE/40_bug-report.md rename to .github/ISSUE_TEMPLATE/85_bug-report.md index d62ec578f8d..d78474670ff 100644 --- a/.github/ISSUE_TEMPLATE/40_bug-report.md +++ b/.github/ISSUE_TEMPLATE/85_bug-report.md @@ -1,8 +1,8 @@ --- name: Bug report -about: Create a report to help us improve ClickHouse +about: Wrong behaviour (visible to users) in official ClickHouse release. title: '' -labels: bug +labels: 'potential bug' assignees: '' --- diff --git a/CHANGELOG.md b/CHANGELOG.md index 34d11c6a2cd..103d8e40fd9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,102 @@ +### ClickHouse release v21.8, 2021-08-12 + +#### New Features + +* Add support for a part of SQL/JSON standard. [#24148](https://github.com/ClickHouse/ClickHouse/pull/24148) ([l1tsolaiki](https://github.com/l1tsolaiki), [Kseniia Sumarokova](https://github.com/kssenii)). +* Collect common system metrics (in `system.asynchronous_metrics` and `system.asynchronous_metric_log`) on CPU usage, disk usage, memory usage, IO, network, files, load average, CPU frequencies, thermal sensors, EDAC counters, system uptime; also added metrics about the scheduling jitter and the time spent collecting the metrics. It works similar to `atop` in ClickHouse and allows access to monitoring data even if you have no additional tools installed. Close [#9430](https://github.com/ClickHouse/ClickHouse/issues/9430). [#24416](https://github.com/ClickHouse/ClickHouse/pull/24416) ([alexey-milovidov](https://github.com/alexey-milovidov), [Yegor Levankov](https://github.com/elevankoff)). +* Add MaterializedPostgreSQL table engine and database engine. This database engine allows replicating a whole database or any subset of database tables. [#20470](https://github.com/ClickHouse/ClickHouse/pull/20470) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Add new functions `leftPad()`, `rightPad()`, `leftPadUTF8()`, `rightPadUTF8()`. [#26075](https://github.com/ClickHouse/ClickHouse/pull/26075) ([Vitaly Baranov](https://github.com/vitlibar)). +* Add the `FIRST` keyword to the `ADD INDEX` command to be able to add the index at the beginning of the indices list. [#25904](https://github.com/ClickHouse/ClickHouse/pull/25904) ([xjewer](https://github.com/xjewer)). +* Introduce `system.data_skipping_indices` table containing information about existing data skipping indices. Close [#7659](https://github.com/ClickHouse/ClickHouse/issues/7659). [#25693](https://github.com/ClickHouse/ClickHouse/pull/25693) ([Dmitry Novik](https://github.com/novikd)). +* Add `bin`/`unbin` functions. [#25609](https://github.com/ClickHouse/ClickHouse/pull/25609) ([zhaoyu](https://github.com/zxc111)). +* Support `Map` and `UInt128`, `Int128`, `UInt256`, `Int256` types in `mapAdd` and `mapSubtract` functions. [#25596](https://github.com/ClickHouse/ClickHouse/pull/25596) ([Ildus Kurbangaliev](https://github.com/ildus)). +* Support `DISTINCT ON (columns)` expression, close [#25404](https://github.com/ClickHouse/ClickHouse/issues/25404). [#25589](https://github.com/ClickHouse/ClickHouse/pull/25589) ([Zijie Lu](https://github.com/TszKitLo40)). +* Add an ability to reset a custom setting to default and remove it from the table's metadata. It allows rolling back the change without knowing the system/config's default. Closes [#14449](https://github.com/ClickHouse/ClickHouse/issues/14449). [#17769](https://github.com/ClickHouse/ClickHouse/pull/17769) ([xjewer](https://github.com/xjewer)). +* Render pipelines as graphs in Web UI if `EXPLAIN PIPELINE graph = 1` query is submitted. [#26067](https://github.com/ClickHouse/ClickHouse/pull/26067) ([alexey-milovidov](https://github.com/alexey-milovidov)). + +#### Performance Improvements + +* Compile aggregate functions. Use option `compile_aggregate_expressions` to enable it. [#24789](https://github.com/ClickHouse/ClickHouse/pull/24789) ([Maksim Kita](https://github.com/kitaisreal)). +* Improve latency of short queries that require reading from tables with many columns. [#26371](https://github.com/ClickHouse/ClickHouse/pull/26371) ([Anton Popov](https://github.com/CurtizJ)). + +#### Improvements + +* Use `Map` data type for system logs tables (`system.query_log`, `system.query_thread_log`, `system.processes`, `system.opentelemetry_span_log`). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes [#18698](https://github.com/ClickHouse/ClickHouse/issues/18698). [#23934](https://github.com/ClickHouse/ClickHouse/pull/23934), [#25773](https://github.com/ClickHouse/ClickHouse/pull/25773) ([hexiaoting](https://github.com/hexiaoting), [sundy-li](https://github.com/sundy-li), [Maksim Kita](https://github.com/kitaisreal)). +* For a dictionary with a complex key containing only one attribute, allow not wrapping the key expression in tuple for functions `dictGet`, `dictHas`. [#26130](https://github.com/ClickHouse/ClickHouse/pull/26130) ([Maksim Kita](https://github.com/kitaisreal)). +* Implement function `bin`/`hex` from `AggregateFunction` states. [#26094](https://github.com/ClickHouse/ClickHouse/pull/26094) ([zhaoyu](https://github.com/zxc111)). +* Support arguments of `UUID` type for `empty` and `notEmpty` functions. `UUID` is empty if it is all zeros (nil UUID). Closes [#3446](https://github.com/ClickHouse/ClickHouse/issues/3446). [#25974](https://github.com/ClickHouse/ClickHouse/pull/25974) ([zhaoyu](https://github.com/zxc111)). +* Add support for `SET SQL_SELECT_LIMIT` in MySQL protocol. Closes [#17115](https://github.com/ClickHouse/ClickHouse/issues/17115). [#25972](https://github.com/ClickHouse/ClickHouse/pull/25972) ([Kseniia Sumarokova](https://github.com/kssenii)). +* More instrumentation for network interaction: add counters for recv/send bytes; add gauges for recvs/sends. Added missing documentation. Close [#5897](https://github.com/ClickHouse/ClickHouse/issues/5897). [#25962](https://github.com/ClickHouse/ClickHouse/pull/25962) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Add setting `optimize_move_to_prewhere_if_final`. If query has `FINAL`, the optimization `move_to_prewhere` will be enabled only if both `optimize_move_to_prewhere` and `optimize_move_to_prewhere_if_final` are enabled. Closes [#8684](https://github.com/ClickHouse/ClickHouse/issues/8684). [#25940](https://github.com/ClickHouse/ClickHouse/pull/25940) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Allow complex quoted identifiers of JOINed tables. Close [#17861](https://github.com/ClickHouse/ClickHouse/issues/17861). [#25924](https://github.com/ClickHouse/ClickHouse/pull/25924) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Add support for Unicode (e.g. Chinese, Cyrillic) components in `Nested` data types. Close [#25594](https://github.com/ClickHouse/ClickHouse/issues/25594). [#25923](https://github.com/ClickHouse/ClickHouse/pull/25923) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Allow `quantiles*` functions to work with `aggregate_functions_null_for_empty`. Close [#25892](https://github.com/ClickHouse/ClickHouse/issues/25892). [#25919](https://github.com/ClickHouse/ClickHouse/pull/25919) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Allow parameters for parametric aggregate functions to be arbitrary constant expressions (e.g., `1 + 2`), not just literals. It also allows using the query parameters (in parameterized queries like `{param:UInt8}`) inside parametric aggregate functions. Closes [#11607](https://github.com/ClickHouse/ClickHouse/issues/11607). [#25910](https://github.com/ClickHouse/ClickHouse/pull/25910) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Correctly throw the exception on the attempt to parse an invalid `Date`. Closes [#6481](https://github.com/ClickHouse/ClickHouse/issues/6481). [#25909](https://github.com/ClickHouse/ClickHouse/pull/25909) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Support for multiple includes in configuration. It is possible to include users configuration, remote server configuration from multiple sources. Simply place `` element with `from_zk`, `from_env` or `incl` attribute, and it will be replaced with the substitution. [#24404](https://github.com/ClickHouse/ClickHouse/pull/24404) ([nvartolomei](https://github.com/nvartolomei)). +* Support for queries with a column named `"null"` (it must be specified in back-ticks or double quotes) and `ON CLUSTER`. Closes [#24035](https://github.com/ClickHouse/ClickHouse/issues/24035). [#25907](https://github.com/ClickHouse/ClickHouse/pull/25907) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Support `LowCardinality`, `Decimal`, and `UUID` for `JSONExtract`. Closes [#24606](https://github.com/ClickHouse/ClickHouse/issues/24606). [#25900](https://github.com/ClickHouse/ClickHouse/pull/25900) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Convert history file from `readline` format to `replxx` format. [#25888](https://github.com/ClickHouse/ClickHouse/pull/25888) ([Azat Khuzhin](https://github.com/azat)). +* Fix an issue which can lead to intersecting parts after `DROP PART` or background deletion of an empty part. [#25884](https://github.com/ClickHouse/ClickHouse/pull/25884) ([alesapin](https://github.com/alesapin)). +* Better handling of lost parts for `ReplicatedMergeTree` tables. Fixes rare inconsistencies in `ReplicationQueue`. Fixes [#10368](https://github.com/ClickHouse/ClickHouse/issues/10368). [#25820](https://github.com/ClickHouse/ClickHouse/pull/25820) ([alesapin](https://github.com/alesapin)). +* Allow starting clickhouse-client with unreadable working directory. [#25817](https://github.com/ClickHouse/ClickHouse/pull/25817) ([ianton-ru](https://github.com/ianton-ru)). +* Fix "No available columns" error for `Merge` storage. [#25801](https://github.com/ClickHouse/ClickHouse/pull/25801) ([Azat Khuzhin](https://github.com/azat)). +* MySQL Engine now supports the exchange of column comments between MySQL and ClickHouse. [#25795](https://github.com/ClickHouse/ClickHouse/pull/25795) ([Storozhuk Kostiantyn](https://github.com/sand6255)). +* Fix inconsistent behaviour of `GROUP BY` constant on empty set. Closes [#6842](https://github.com/ClickHouse/ClickHouse/issues/6842). [#25786](https://github.com/ClickHouse/ClickHouse/pull/25786) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Cancel already running merges in partition on `DROP PARTITION` and `TRUNCATE` for `ReplicatedMergeTree`. Resolves [#17151](https://github.com/ClickHouse/ClickHouse/issues/17151). [#25684](https://github.com/ClickHouse/ClickHouse/pull/25684) ([tavplubix](https://github.com/tavplubix)). +* Support ENUM` data type for MaterializeMySQL. [#25676](https://github.com/ClickHouse/ClickHouse/pull/25676) ([Storozhuk Kostiantyn](https://github.com/sand6255)). +* Support materialized and aliased columns in JOIN, close [#13274](https://github.com/ClickHouse/ClickHouse/issues/13274). [#25634](https://github.com/ClickHouse/ClickHouse/pull/25634) ([Vladimir C](https://github.com/vdimir)). +* Fix possible logical race condition between `ALTER TABLE ... DETACH` and background merges. [#25605](https://github.com/ClickHouse/ClickHouse/pull/25605) ([Azat Khuzhin](https://github.com/azat)). +* Make `NetworkReceiveElapsedMicroseconds` metric to correctly include the time spent waiting for data from the client to `INSERT`. Close [#9958](https://github.com/ClickHouse/ClickHouse/issues/9958). [#25602](https://github.com/ClickHouse/ClickHouse/pull/25602) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Support `TRUNCATE TABLE` for S3 and HDFS. Close [#25530](https://github.com/ClickHouse/ClickHouse/issues/25530). [#25550](https://github.com/ClickHouse/ClickHouse/pull/25550) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Support for dynamic reloading of config to change number of threads in pool for background jobs execution (merges, mutations, fetches). [#25548](https://github.com/ClickHouse/ClickHouse/pull/25548) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Allow extracting of non-string element as string using `JSONExtract`. This is for [#25414](https://github.com/ClickHouse/ClickHouse/issues/25414). [#25452](https://github.com/ClickHouse/ClickHouse/pull/25452) ([Amos Bird](https://github.com/amosbird)). +* Support regular expression in `Database` argument for `StorageMerge`. Close [#776](https://github.com/ClickHouse/ClickHouse/issues/776). [#25064](https://github.com/ClickHouse/ClickHouse/pull/25064) ([flynn](https://github.com/ucasfl)). +* Web UI: if the value looks like a URL, automatically generate a link. [#25965](https://github.com/ClickHouse/ClickHouse/pull/25965) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Make `sudo service clickhouse-server start` to work on systems with `systemd` like Centos 8. Close [#14298](https://github.com/ClickHouse/ClickHouse/issues/14298). Close [#17799](https://github.com/ClickHouse/ClickHouse/issues/17799). [#25921](https://github.com/ClickHouse/ClickHouse/pull/25921) ([alexey-milovidov](https://github.com/alexey-milovidov)). + +#### Bug Fixes + +* Fix incorrect `SET ROLE` in some cases. [#26707](https://github.com/ClickHouse/ClickHouse/pull/26707) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix potential `nullptr` dereference in window functions. Fix [#25276](https://github.com/ClickHouse/ClickHouse/issues/25276). [#26668](https://github.com/ClickHouse/ClickHouse/pull/26668) ([Alexander Kuzmenkov](https://github.com/akuzm)). +* Fix incorrect function names of `groupBitmapAnd/Or/Xor`. Fix [#26557](https://github.com/ClickHouse/ClickHouse/pull/26557) ([Amos Bird](https://github.com/amosbird)). +* Fix crash in RabbitMQ shutdown in case RabbitMQ setup was not started. Closes [#26504](https://github.com/ClickHouse/ClickHouse/issues/26504). [#26529](https://github.com/ClickHouse/ClickHouse/pull/26529) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix issues with `CREATE DICTIONARY` query if dictionary name or database name was quoted. Closes [#26491](https://github.com/ClickHouse/ClickHouse/issues/26491). [#26508](https://github.com/ClickHouse/ClickHouse/pull/26508) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix broken name resolution after rewriting column aliases. Fix [#26432](https://github.com/ClickHouse/ClickHouse/issues/26432). [#26475](https://github.com/ClickHouse/ClickHouse/pull/26475) ([Amos Bird](https://github.com/amosbird)). +* Fix infinite non-joined block stream in `partial_merge_join` close [#26325](https://github.com/ClickHouse/ClickHouse/issues/26325). [#26374](https://github.com/ClickHouse/ClickHouse/pull/26374) ([Vladimir C](https://github.com/vdimir)). +* Fix possible crash when login as dropped user. Fix [#26073](https://github.com/ClickHouse/ClickHouse/issues/26073). [#26363](https://github.com/ClickHouse/ClickHouse/pull/26363) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix `optimize_distributed_group_by_sharding_key` for multiple columns (leads to incorrect result w/ `optimize_skip_unused_shards=1`/`allow_nondeterministic_optimize_skip_unused_shards=1` and multiple columns in sharding key expression). [#26353](https://github.com/ClickHouse/ClickHouse/pull/26353) ([Azat Khuzhin](https://github.com/azat)). +* `CAST` from `Date` to `DateTime` (or `DateTime64`) was not using the timezone of the `DateTime` type. It can also affect the comparison between `Date` and `DateTime`. Inference of the common type for `Date` and `DateTime` also was not using the corresponding timezone. It affected the results of function `if` and array construction. Closes [#24128](https://github.com/ClickHouse/ClickHouse/issues/24128). [#24129](https://github.com/ClickHouse/ClickHouse/pull/24129) ([Maksim Kita](https://github.com/kitaisreal)). +* Fixed rare bug in lost replica recovery that may cause replicas to diverge. [#26321](https://github.com/ClickHouse/ClickHouse/pull/26321) ([tavplubix](https://github.com/tavplubix)). +* Fix zstd decompression in case there are escape sequences at the end of internal buffer. Closes [#26013](https://github.com/ClickHouse/ClickHouse/issues/26013). [#26314](https://github.com/ClickHouse/ClickHouse/pull/26314) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix logical error on join with totals, close [#26017](https://github.com/ClickHouse/ClickHouse/issues/26017). [#26250](https://github.com/ClickHouse/ClickHouse/pull/26250) ([Vladimir C](https://github.com/vdimir)). +* Remove excessive newline in `thread_name` column in `system.stack_trace` table. Fix [#24124](https://github.com/ClickHouse/ClickHouse/issues/24124). [#26210](https://github.com/ClickHouse/ClickHouse/pull/26210) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix `joinGet` with `LowCarinality` columns, close [#25993](https://github.com/ClickHouse/ClickHouse/issues/25993). [#26118](https://github.com/ClickHouse/ClickHouse/pull/26118) ([Vladimir C](https://github.com/vdimir)). +* Fix possible crash in `pointInPolygon` if the setting `validate_polygons` is turned off. [#26113](https://github.com/ClickHouse/ClickHouse/pull/26113) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix throwing exception when iterate over non-existing remote directory. [#26087](https://github.com/ClickHouse/ClickHouse/pull/26087) ([ianton-ru](https://github.com/ianton-ru)). +* Fix rare server crash because of `abort` in ZooKeeper client. Fixes [#25813](https://github.com/ClickHouse/ClickHouse/issues/25813). [#26079](https://github.com/ClickHouse/ClickHouse/pull/26079) ([alesapin](https://github.com/alesapin)). +* Fix wrong thread count estimation for right subquery join in some cases. Close [#24075](https://github.com/ClickHouse/ClickHouse/issues/24075). [#26052](https://github.com/ClickHouse/ClickHouse/pull/26052) ([Vladimir C](https://github.com/vdimir)). +* Fixed incorrect `sequence_id` in MySQL protocol packets that ClickHouse sends on exception during query execution. It might cause MySQL client to reset connection to ClickHouse server. Fixes [#21184](https://github.com/ClickHouse/ClickHouse/issues/21184). [#26051](https://github.com/ClickHouse/ClickHouse/pull/26051) ([tavplubix](https://github.com/tavplubix)). +* Fix possible mismatched header when using normal projection with `PREWHERE`. Fix [#26020](https://github.com/ClickHouse/ClickHouse/issues/26020). [#26038](https://github.com/ClickHouse/ClickHouse/pull/26038) ([Amos Bird](https://github.com/amosbird)). +* Fix formatting of type `Map` with integer keys to `JSON`. [#25982](https://github.com/ClickHouse/ClickHouse/pull/25982) ([Anton Popov](https://github.com/CurtizJ)). +* Fix possible deadlock during query profiler stack unwinding. Fix [#25968](https://github.com/ClickHouse/ClickHouse/issues/25968). [#25970](https://github.com/ClickHouse/ClickHouse/pull/25970) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix crash on call `dictGet()` with bad arguments. [#25913](https://github.com/ClickHouse/ClickHouse/pull/25913) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fixed `scram-sha-256` authentication for PostgreSQL engines. Closes [#24516](https://github.com/ClickHouse/ClickHouse/issues/24516). [#25906](https://github.com/ClickHouse/ClickHouse/pull/25906) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix extremely long backoff for background tasks when the background pool is full. Fixes [#25836](https://github.com/ClickHouse/ClickHouse/issues/25836). [#25893](https://github.com/ClickHouse/ClickHouse/pull/25893) ([alesapin](https://github.com/alesapin)). +* Fix ARM exception handling with non default page size. Fixes [#25512](https://github.com/ClickHouse/ClickHouse/issues/25512), [#25044](https://github.com/ClickHouse/ClickHouse/issues/25044), [#24901](https://github.com/ClickHouse/ClickHouse/issues/24901), [#23183](https://github.com/ClickHouse/ClickHouse/issues/23183), [#20221](https://github.com/ClickHouse/ClickHouse/issues/20221), [#19703](https://github.com/ClickHouse/ClickHouse/issues/19703), [#19028](https://github.com/ClickHouse/ClickHouse/issues/19028), [#18391](https://github.com/ClickHouse/ClickHouse/issues/18391), [#18121](https://github.com/ClickHouse/ClickHouse/issues/18121), [#17994](https://github.com/ClickHouse/ClickHouse/issues/17994), [#12483](https://github.com/ClickHouse/ClickHouse/issues/12483). [#25854](https://github.com/ClickHouse/ClickHouse/pull/25854) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix sharding_key from column w/o function for `remote()` (before `select * from remote('127.1', system.one, dummy)` leads to `Unknown column: dummy, there are only columns .` error). [#25824](https://github.com/ClickHouse/ClickHouse/pull/25824) ([Azat Khuzhin](https://github.com/azat)). +* Fixed `Not found column ...` and `Missing column ...` errors when selecting from `MaterializeMySQL`. Fixes [#23708](https://github.com/ClickHouse/ClickHouse/issues/23708), [#24830](https://github.com/ClickHouse/ClickHouse/issues/24830), [#25794](https://github.com/ClickHouse/ClickHouse/issues/25794). [#25822](https://github.com/ClickHouse/ClickHouse/pull/25822) ([tavplubix](https://github.com/tavplubix)). +* Fix `optimize_skip_unused_shards_rewrite_in` for non-UInt64 types (may select incorrect shards eventually or throw `Cannot infer type of an empty tuple` or `Function tuple requires at least one argument`). [#25798](https://github.com/ClickHouse/ClickHouse/pull/25798) ([Azat Khuzhin](https://github.com/azat)). +* Fix rare bug with `DROP PART` query for `ReplicatedMergeTree` tables which can lead to error message `Unexpected merged part intersecting drop range`. [#25783](https://github.com/ClickHouse/ClickHouse/pull/25783) ([alesapin](https://github.com/alesapin)). +* Fix bug in `TTL` with `GROUP BY` expression which refuses to execute `TTL` after first execution in part. [#25743](https://github.com/ClickHouse/ClickHouse/pull/25743) ([alesapin](https://github.com/alesapin)). +* Allow StorageMerge to access tables with aliases. Closes [#6051](https://github.com/ClickHouse/ClickHouse/issues/6051). [#25694](https://github.com/ClickHouse/ClickHouse/pull/25694) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix slow dict join in some cases, close [#24209](https://github.com/ClickHouse/ClickHouse/issues/24209). [#25618](https://github.com/ClickHouse/ClickHouse/pull/25618) ([Vladimir C](https://github.com/vdimir)). +* Fix `ALTER MODIFY COLUMN` of columns, which participates in TTL expressions. [#25554](https://github.com/ClickHouse/ClickHouse/pull/25554) ([Anton Popov](https://github.com/CurtizJ)). +* Fix assertion in `PREWHERE` with non-UInt8 type, close [#19589](https://github.com/ClickHouse/ClickHouse/issues/19589). [#25484](https://github.com/ClickHouse/ClickHouse/pull/25484) ([Vladimir C](https://github.com/vdimir)). +* Fix some fuzzed msan crash. Fixes [#22517](https://github.com/ClickHouse/ClickHouse/issues/22517). [#26428](https://github.com/ClickHouse/ClickHouse/pull/26428) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Update `chown` cmd check in `clickhouse-server` docker entrypoint. It fixes error 'cluster pod restart failed (or timeout)' on kubernetes. [#26545](https://github.com/ClickHouse/ClickHouse/pull/26545) ([Ky Li](https://github.com/Kylinrix)). + + ### ClickHouse release v21.7, 2021-07-09 #### Backward Incompatible Change @@ -1183,13 +1282,6 @@ * PODArray: Avoid call to memcpy with (nullptr, 0) arguments (Fix UBSan report). This fixes [#18525](https://github.com/ClickHouse/ClickHouse/issues/18525). [#18526](https://github.com/ClickHouse/ClickHouse/pull/18526) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Minor improvement for path concatenation of zookeeper paths inside DDLWorker. [#17767](https://github.com/ClickHouse/ClickHouse/pull/17767) ([Bharat Nallan](https://github.com/bharatnc)). * Allow to reload symbols from debug file. This PR also fixes a build-id issue. [#17637](https://github.com/ClickHouse/ClickHouse/pull/17637) ([Amos Bird](https://github.com/amosbird)). -* TestFlows: fixes to LDAP tests that fail due to slow test execution. [#18790](https://github.com/ClickHouse/ClickHouse/pull/18790) ([vzakaznikov](https://github.com/vzakaznikov)). -* TestFlows: Merging requirements for AES encryption functions. Updating aes_encryption tests to use new requirements. Updating TestFlows version to 1.6.72. [#18221](https://github.com/ClickHouse/ClickHouse/pull/18221) ([vzakaznikov](https://github.com/vzakaznikov)). -* TestFlows: Updating TestFlows version to the latest 1.6.72. Re-generating requirements.py. [#18208](https://github.com/ClickHouse/ClickHouse/pull/18208) ([vzakaznikov](https://github.com/vzakaznikov)). -* TestFlows: Updating TestFlows README.md to include "How To Debug Why Test Failed" section. [#17808](https://github.com/ClickHouse/ClickHouse/pull/17808) ([vzakaznikov](https://github.com/vzakaznikov)). -* TestFlows: tests for RBAC [ACCESS MANAGEMENT](https://clickhouse.tech/docs/en/sql-reference/statements/grant/#grant-access-management) privileges. [#17804](https://github.com/ClickHouse/ClickHouse/pull/17804) ([MyroTk](https://github.com/MyroTk)). -* TestFlows: RBAC tests for SHOW, TRUNCATE, KILL, and OPTIMIZE. - Updates to old tests. - Resolved comments from #https://github.com/ClickHouse/ClickHouse/pull/16977. [#17657](https://github.com/ClickHouse/ClickHouse/pull/17657) ([MyroTk](https://github.com/MyroTk)). -* TestFlows: Added RBAC tests for `ATTACH`, `CREATE`, `DROP`, and `DETACH`. [#16977](https://github.com/ClickHouse/ClickHouse/pull/16977) ([MyroTk](https://github.com/MyroTk)). ## [Changelog for 2020](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/whats-new/changelog/2020.md) diff --git a/README.md b/README.md index 496a6357f44..178547ea523 100644 --- a/README.md +++ b/README.md @@ -13,3 +13,6 @@ ClickHouse® is an open-source column-oriented database management system that a * [Code Browser](https://clickhouse.tech/codebrowser/html_report/ClickHouse/index.html) with syntax highlight and navigation. * [Contacts](https://clickhouse.tech/#contacts) can help to get your questions answered if there are any. * You can also [fill this form](https://clickhouse.tech/#meet) to meet Yandex ClickHouse team in person. + +## Upcoming Events +* [SF Bay Area ClickHouse August Community Meetup (online)](https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/279109379/) on 25 August 2021. diff --git a/base/common/DateLUTImpl.cpp b/base/common/DateLUTImpl.cpp index e7faeb63760..472f24f3805 100644 --- a/base/common/DateLUTImpl.cpp +++ b/base/common/DateLUTImpl.cpp @@ -60,6 +60,7 @@ DateLUTImpl::DateLUTImpl(const std::string & time_zone_) offset_at_start_of_epoch = cctz_time_zone.lookup(cctz_time_zone.lookup(epoch).pre).offset; offset_at_start_of_lut = cctz_time_zone.lookup(cctz_time_zone.lookup(lut_start).pre).offset; offset_is_whole_number_of_hours_during_epoch = true; + offset_is_whole_number_of_minutes_during_epoch = true; cctz::civil_day date = lut_start; @@ -108,6 +109,9 @@ DateLUTImpl::DateLUTImpl(const std::string & time_zone_) if (offset_is_whole_number_of_hours_during_epoch && start_of_day > 0 && start_of_day % 3600) offset_is_whole_number_of_hours_during_epoch = false; + if (offset_is_whole_number_of_minutes_during_epoch && start_of_day > 0 && start_of_day % 60) + offset_is_whole_number_of_minutes_during_epoch = false; + /// If UTC offset was changed this day. /// Change in time zone without transition is possible, e.g. Moscow 1991 Sun, 31 Mar, 02:00 MSK to EEST cctz::time_zone::civil_transition transition{}; diff --git a/base/common/DateLUTImpl.h b/base/common/DateLUTImpl.h index 202eb88a361..012d2cefe84 100644 --- a/base/common/DateLUTImpl.h +++ b/base/common/DateLUTImpl.h @@ -193,6 +193,7 @@ private: /// UTC offset at the beginning of the first supported year. Time offset_at_start_of_lut; bool offset_is_whole_number_of_hours_during_epoch; + bool offset_is_whole_number_of_minutes_during_epoch; /// Time zone name. std::string time_zone; @@ -251,18 +252,23 @@ private: } template - static inline T roundDown(T x, Divisor divisor) + inline T roundDown(T x, Divisor divisor) const { static_assert(std::is_integral_v && std::is_integral_v); assert(divisor > 0); - if (likely(x >= 0)) - return x / divisor * divisor; + if (likely(offset_is_whole_number_of_hours_during_epoch)) + { + if (likely(x >= 0)) + return x / divisor * divisor; - /// Integer division for negative numbers rounds them towards zero (up). - /// We will shift the number so it will be rounded towards -inf (down). + /// Integer division for negative numbers rounds them towards zero (up). + /// We will shift the number so it will be rounded towards -inf (down). + return (x + 1 - divisor) / divisor * divisor; + } - return (x + 1 - divisor) / divisor * divisor; + Time date = find(x).date; + return date + (x - date) / divisor * divisor; } public: @@ -459,10 +465,21 @@ public: inline unsigned toSecond(Time t) const { - auto res = t % 60; - if (likely(res >= 0)) - return res; - return res + 60; + if (likely(offset_is_whole_number_of_minutes_during_epoch)) + { + Time res = t % 60; + if (likely(res >= 0)) + return res; + return res + 60; + } + + LUTIndex index = findIndex(t); + Time time = t - lut[index].date; + + if (time >= lut[index].time_at_offset_change()) + time += lut[index].amount_of_offset_change(); + + return time % 60; } inline unsigned toMinute(Time t) const @@ -483,29 +500,11 @@ public: } /// NOTE: Assuming timezone offset is a multiple of 15 minutes. - inline Time toStartOfMinute(Time t) const { return roundDown(t, 60); } - inline Time toStartOfFiveMinute(Time t) const { return roundDown(t, 300); } - inline Time toStartOfFifteenMinutes(Time t) const { return roundDown(t, 900); } - - inline Time toStartOfTenMinutes(Time t) const - { - if (t >= 0 && offset_is_whole_number_of_hours_during_epoch) - return t / 600 * 600; - - /// More complex logic is for Nepal - it has offset 05:45. Australia/Eucla is also unfortunate. - Time date = find(t).date; - return date + (t - date) / 600 * 600; - } - - /// NOTE: Assuming timezone transitions are multiple of hours. Lord Howe Island in Australia is a notable exception. - inline Time toStartOfHour(Time t) const - { - if (t >= 0 && offset_is_whole_number_of_hours_during_epoch) - return t / 3600 * 3600; - - Time date = find(t).date; - return date + (t - date) / 3600 * 3600; - } + inline Time toStartOfMinute(Time t) const { return toStartOfMinuteInterval(t, 1); } + inline Time toStartOfFiveMinute(Time t) const { return toStartOfMinuteInterval(t, 5); } + inline Time toStartOfFifteenMinutes(Time t) const { return toStartOfMinuteInterval(t, 15); } + inline Time toStartOfTenMinutes(Time t) const { return toStartOfMinuteInterval(t, 10); } + inline Time toStartOfHour(Time t) const { return roundDown(t, 3600); } /** Number of calendar day since the beginning of UNIX epoch (1970-01-01 is zero) * We use just two bytes for it. It covers the range up to 2105 and slightly more. @@ -903,25 +902,24 @@ public: inline Time toStartOfMinuteInterval(Time t, UInt64 minutes) const { - if (minutes == 1) - return toStartOfMinute(t); + UInt64 divisor = 60 * minutes; + if (likely(offset_is_whole_number_of_minutes_during_epoch)) + { + if (likely(t >= 0)) + return t / divisor * divisor; + return (t + 1 - divisor) / divisor * divisor; + } - /** In contrast to "toStartOfHourInterval" function above, - * the minute intervals are not aligned to the midnight. - * You will get unexpected results if for example, you round down to 60 minute interval - * and there was a time shift to 30 minutes. - * - * But this is not specified in docs and can be changed in future. - */ - - UInt64 seconds = 60 * minutes; - return roundDown(t, seconds); + Time date = find(t).date; + return date + (t - date) / divisor * divisor; } inline Time toStartOfSecondInterval(Time t, UInt64 seconds) const { if (seconds == 1) return t; + if (seconds % 60 == 0) + return toStartOfMinuteInterval(t, seconds / 60); return roundDown(t, seconds); } @@ -955,7 +953,7 @@ public: inline Time makeDateTime(Int16 year, UInt8 month, UInt8 day_of_month, UInt8 hour, UInt8 minute, UInt8 second) const { size_t index = makeLUTIndex(year, month, day_of_month); - UInt32 time_offset = hour * 3600 + minute * 60 + second; + Time time_offset = hour * 3600 + minute * 60 + second; if (time_offset >= lut[index].time_at_offset_change()) time_offset -= lut[index].amount_of_offset_change(); diff --git a/base/glibc-compatibility/musl/getauxval.c b/base/glibc-compatibility/musl/getauxval.c index a429273fa1a..dad7aa938d7 100644 --- a/base/glibc-compatibility/musl/getauxval.c +++ b/base/glibc-compatibility/musl/getauxval.c @@ -1,4 +1,5 @@ #include +#include "atomic.h" #include // __environ #include @@ -17,18 +18,7 @@ static size_t __find_auxv(unsigned long type) return (size_t) -1; } -__attribute__((constructor)) static void __auxv_init() -{ - size_t i; - for (i = 0; __environ[i]; i++); - __auxv = (unsigned long *) (__environ + i + 1); - - size_t secure_idx = __find_auxv(AT_SECURE); - if (secure_idx != ((size_t) -1)) - __auxv_secure = __auxv[secure_idx]; -} - -unsigned long getauxval(unsigned long type) +unsigned long __getauxval(unsigned long type) { if (type == AT_SECURE) return __auxv_secure; @@ -43,3 +33,38 @@ unsigned long getauxval(unsigned long type) errno = ENOENT; return 0; } + +static void * volatile getauxval_func; + +static unsigned long __auxv_init(unsigned long type) +{ + if (!__environ) + { + // __environ is not initialized yet so we can't initialize __auxv right now. + // That's normally occurred only when getauxval() is called from some sanitizer's internal code. + errno = ENOENT; + return 0; + } + + // Initialize __auxv and __auxv_secure. + size_t i; + for (i = 0; __environ[i]; i++); + __auxv = (unsigned long *) (__environ + i + 1); + + size_t secure_idx = __find_auxv(AT_SECURE); + if (secure_idx != ((size_t) -1)) + __auxv_secure = __auxv[secure_idx]; + + // Now we've initialized __auxv, next time getauxval() will only call __get_auxval(). + a_cas_p(&getauxval_func, (void *)__auxv_init, (void *)__getauxval); + + return __getauxval(type); +} + +// First time getauxval() will call __auxv_init(). +static void * volatile getauxval_func = (void *)__auxv_init; + +unsigned long getauxval(unsigned long type) +{ + return ((unsigned long (*)(unsigned long))getauxval_func)(type); +} diff --git a/base/mysqlxx/Pool.cpp b/base/mysqlxx/Pool.cpp index 386b4544b78..2f47aa67356 100644 --- a/base/mysqlxx/Pool.cpp +++ b/base/mysqlxx/Pool.cpp @@ -296,7 +296,7 @@ void Pool::initialize() Pool::Connection * Pool::allocConnection(bool dont_throw_if_failed_first_time) { - std::unique_ptr conn_ptr{new Connection}; + std::unique_ptr conn_ptr = std::make_unique(); try { diff --git a/contrib/croaring-cmake/CMakeLists.txt b/contrib/croaring-cmake/CMakeLists.txt index f0cb378864b..84cdccedbd3 100644 --- a/contrib/croaring-cmake/CMakeLists.txt +++ b/contrib/croaring-cmake/CMakeLists.txt @@ -26,17 +26,14 @@ target_include_directories(roaring SYSTEM BEFORE PUBLIC "${LIBRARY_DIR}/include" target_include_directories(roaring SYSTEM BEFORE PUBLIC "${LIBRARY_DIR}/cpp") # We redirect malloc/free family of functions to different functions that will track memory in ClickHouse. -# It will make this library depend on linking to 'clickhouse_common_io' library that is not done explicitly via 'target_link_libraries'. -# And we check that all libraries dependencies are satisfied and all symbols are resolved if we do build with shared libraries. -# That's why we enable it only in static build. # Also note that we exploit implicit function declarations. -if (USE_STATIC_LIBRARIES) - target_compile_definitions(roaring PRIVATE +target_compile_definitions(roaring PRIVATE -Dmalloc=clickhouse_malloc -Dcalloc=clickhouse_calloc -Drealloc=clickhouse_realloc -Dreallocarray=clickhouse_reallocarray -Dfree=clickhouse_free -Dposix_memalign=clickhouse_posix_memalign) -endif () + +target_link_libraries(roaring PUBLIC clickhouse_common_io) diff --git a/docs/en/development/build.md b/docs/en/development/build.md index 97b477d55a5..be45c1ed5f7 100644 --- a/docs/en/development/build.md +++ b/docs/en/development/build.md @@ -155,6 +155,10 @@ Normally ClickHouse is statically linked into a single static `clickhouse` binar -DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1 ``` -Note that in this configuration there is no single `clickhouse` binary, and you have to run `clickhouse-server`, `clickhouse-client` etc. +Note that the split build has several drawbacks: +* There is no single `clickhouse` binary, and you have to run `clickhouse-server`, `clickhouse-client`, etc. +* Risk of segfault if you run any of the programs while rebuilding the project. +* You cannot run the integration tests since they only work a single complete binary. +* You can't easily copy the binaries elsewhere. Instead of moving a single binary you'll need to copy all binaries and libraries. [Original article](https://clickhouse.tech/docs/en/development/build/) diff --git a/docs/en/engines/database-engines/materialized-mysql.md b/docs/en/engines/database-engines/materialized-mysql.md index ca550776d53..d329dff32c5 100644 --- a/docs/en/engines/database-engines/materialized-mysql.md +++ b/docs/en/engines/database-engines/materialized-mysql.md @@ -1,6 +1,6 @@ --- toc_priority: 29 -toc_title: MaterializedMySQL +toc_title: "[experimental] MaterializedMySQL" --- # [experimental] MaterializedMySQL {#materialized-mysql} @@ -27,28 +27,33 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo - `password` — User password. **Engine Settings** -- `max_rows_in_buffer` — Max rows that data is allowed to cache in memory(for single table and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `65505`. -- `max_bytes_in_buffer` — Max bytes that data is allowed to cache in memory(for single table and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `1048576`. -- `max_rows_in_buffers` — Max rows that data is allowed to cache in memory(for database and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `65505`. -- `max_bytes_in_buffers` — Max bytes that data is allowed to cache in memory(for database and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `1048576`. -- `max_flush_data_time` — Max milliseconds that data is allowed to cache in memory(for database and the cache data unable to query). when this time is exceeded, the data will be materialized. Default: `1000`. -- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disable retry. Default: `1000`. -- `allows_query_when_mysql_lost` — Allow query materialized table when mysql is lost. Default: `0` (`false`). -``` + +- `max_rows_in_buffer` — Maximum number of rows that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`. +- `max_bytes_in_buffer` — Maximum number of bytes that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`. +- `max_rows_in_buffers` — Maximum number of rows that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`. +- `max_bytes_in_buffers` — Maximum number of bytes that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`. +- `max_flush_data_time` — Maximum number of milliseconds that data is allowed to cache in memory (for database and the cache data unable to query). When this time is exceeded, the data will be materialized. Default: `1000`. +- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disables retry. Default: `1000`. +- `allows_query_when_mysql_lost` — Allows to query a materialized table when MySQL is lost. Default: `0` (`false`). + +```sql CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***') SETTINGS allows_query_when_mysql_lost=true, max_wait_time_when_mysql_unavailable=10000; ``` -**Settings on MySQL-server side** +**Settings on MySQL-server Side** -For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-side configuration settings that should be set: +For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-side configuration settings that must be set: - `default_authentication_plugin = mysql_native_password` since `MaterializedMySQL` can only authorize with this method. -- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication. Pay attention that while turning this mode `On` you should also specify `enforce_gtid_consistency = on`. +- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication. -## Virtual columns {#virtual-columns} +!!! attention "Attention" + While turning on `gtid_mode` you should also specify `enforce_gtid_consistency = on`. + +## Virtual Columns {#virtual-columns} When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns. @@ -78,13 +83,13 @@ When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree]( | BLOB | [String](../../sql-reference/data-types/string.md) | | BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) | -Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication. - [Nullable](../../sql-reference/data-types/nullable.md) is supported. +Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication. + ## Specifics and Recommendations {#specifics-and-recommendations} -### Compatibility restrictions +### Compatibility Restrictions {#compatibility-restrictions} Apart of the data types limitations there are few restrictions comparing to `MySQL` databases, that should be resolved before replication will be possible: diff --git a/docs/en/engines/table-engines/mergetree-family/mergetree.md b/docs/en/engines/table-engines/mergetree-family/mergetree.md index 561b0ad8023..0c900454cd0 100644 --- a/docs/en/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md @@ -39,7 +39,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2], ... INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1, - INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2 + INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2, + ... + PROJECTION projection_name_1 (SELECT [GROUP BY] [ORDER BY]), + PROJECTION projection_name_2 (SELECT [GROUP BY] [ORDER BY]) ) ENGINE = MergeTree() ORDER BY expr [PARTITION BY expr] @@ -385,6 +388,24 @@ Functions with a constant argument that is less than ngram size can’t be used - `s != 1` - `NOT startsWith(s, 'test')` +### Projections {#projections} +Projections are like materialized views but defined in part-level. It provides consistency guarantees along with automatic usage in queries. + +#### Query {#projection-query} +A projection query is what defines a projection. It has the following grammar: + +`SELECT [GROUP BY] [ORDER BY]` + +It implicitly selects data from the parent table. + +#### Storage {#projection-storage} +Projections are stored inside the part directory. It's similar to an index but contains a subdirectory that stores an anonymous MergeTree table's part. The table is induced by the definition query of the projection. If there is a GROUP BY clause, the underlying storage engine becomes AggregatedMergeTree, and all aggregate functions are converted to AggregateFunction. If there is an ORDER BY clause, the MergeTree table will use it as its primary key expression. During the merge process, the projection part will be merged via its storage's merge routine. The checksum of the parent table's part will combine the projection's part. Other maintenance jobs are similar to skip indices. + +#### Query Analysis {#projection-query-analysis} +1. Check if the projection can be used to answer the given query, that is, it generates the same answer as querying the base table. +2. Select the best feasible match, which contains the least granules to read. +3. The query pipeline which uses projections will be different from the one that uses the original parts. If the projection is absent in some parts, we can add the pipeline to "project" it on the fly. + ## Concurrent Data Access {#concurrent-data-access} For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations. diff --git a/docs/en/operations/server-configuration-parameters/settings.md b/docs/en/operations/server-configuration-parameters/settings.md index d7ffcff35fb..a620565b71a 100644 --- a/docs/en/operations/server-configuration-parameters/settings.md +++ b/docs/en/operations/server-configuration-parameters/settings.md @@ -892,6 +892,33 @@ If the table does not exist, ClickHouse will create it. If the structure of the ``` +## query_views_log {#server_configuration_parameters-query_views_log} + +Setting for logging views dependant of queries received with the [log_query_views=1](../../operations/settings/settings.md#settings-log-query-views) setting. + +Queries are logged in the [system.query_views_log](../../operations/system-tables/query_thread_log.md#system_tables-query_views_log) table, not in a separate file. You can change the name of the table in the `table` parameter (see below). + +Use the following parameters to configure logging: + +- `database` – Name of the database. +- `table` – Name of the system table the queries will be logged in. +- `partition_by` — [Custom partitioning key](../../engines/table-engines/mergetree-family/custom-partitioning-key.md) for a system table. Can't be used if `engine` defined. +- `engine` - [MergeTree Engine Definition](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) for a system table. Can't be used if `partition_by` defined. +- `flush_interval_milliseconds` – Interval for flushing data from the buffer in memory to the table. + +If the table does not exist, ClickHouse will create it. If the structure of the query views log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically. + +**Example** + +``` xml + + system + query_views_log
+ toYYYYMM(event_date) + 7500 +
+``` + ## text_log {#server_configuration_parameters-text_log} Settings for the [text_log](../../operations/system-tables/text_log.md#system_tables-text_log) system table for logging text messages. diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index 4936c782299..07bfe158a0a 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -890,7 +890,7 @@ log_queries_min_type='EXCEPTION_WHILE_PROCESSING' Setting up query threads logging. -Queries’ threads runned by ClickHouse with this setup are logged according to the rules in the [query_thread_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_thread_log) server configuration parameter. +Queries’ threads run by ClickHouse with this setup are logged according to the rules in the [query_thread_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_thread_log) server configuration parameter. Example: @@ -898,6 +898,19 @@ Example: log_query_threads=1 ``` +## log_query_views {#settings-log-query-views} + +Setting up query views logging. + +When a query run by ClickHouse with this setup on has associated views (materialized or live views), they are logged in the [query_views_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_views_log) server configuration parameter. + +Example: + +``` text +log_query_views=1 +``` + + ## log_comment {#settings-log-comment} Specifies the value for the `log_comment` field of the [system.query_log](../system-tables/query_log.md) table and comment text for the server log. diff --git a/docs/en/operations/system-tables/query_log.md b/docs/en/operations/system-tables/query_log.md index 987f1968356..548e454cf58 100644 --- a/docs/en/operations/system-tables/query_log.md +++ b/docs/en/operations/system-tables/query_log.md @@ -50,6 +50,7 @@ Columns: - `query_kind` ([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md)) — Type of the query. - `databases` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the databases present in the query. - `tables` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the tables present in the query. +- `views` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the (materialized or live) views present in the query. - `columns` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the columns present in the query. - `projections` ([String](../../sql-reference/data-types/string.md)) — Names of the projections used during the query execution. - `exception_code` ([Int32](../../sql-reference/data-types/int-uint.md)) — Code of an exception. @@ -180,5 +181,6 @@ used_table_functions: [] **See Also** - [system.query_thread_log](../../operations/system-tables/query_thread_log.md#system_tables-query_thread_log) — This table contains information about each query execution thread. +- [system.query_views_log](../../operations/system-tables/query_views_log.md#system_tables-query_views_log) — This table contains information about each view executed during a query. [Original article](https://clickhouse.tech/docs/en/operations/system-tables/query_log) diff --git a/docs/en/operations/system-tables/query_thread_log.md b/docs/en/operations/system-tables/query_thread_log.md index 7ecea2971b4..152a10504bb 100644 --- a/docs/en/operations/system-tables/query_thread_log.md +++ b/docs/en/operations/system-tables/query_thread_log.md @@ -112,5 +112,6 @@ ProfileEvents: {'Query':1,'SelectQuery':1,'ReadCompressedBytes':36,'Compr **See Also** - [system.query_log](../../operations/system-tables/query_log.md#system_tables-query_log) — Description of the `query_log` system table which contains common information about queries execution. +- [system.query_views_log](../../operations/system-tables/query_views_log.md#system_tables-query_views_log) — This table contains information about each view executed during a query. [Original article](https://clickhouse.tech/docs/en/operations/system-tables/query_thread_log) diff --git a/docs/en/operations/system-tables/query_views_log.md b/docs/en/operations/system-tables/query_views_log.md new file mode 100644 index 00000000000..48d36a6a118 --- /dev/null +++ b/docs/en/operations/system-tables/query_views_log.md @@ -0,0 +1,81 @@ +# system.query_views_log {#system_tables-query_views_log} + +Contains information about the dependent views executed when running a query, for example, the view type or the execution time. + +To start logging: + +1. Configure parameters in the [query_views_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_views_log) section. +2. Set [log_query_views](../../operations/settings/settings.md#settings-log-query-views) to 1. + +The flushing period of data is set in `flush_interval_milliseconds` parameter of the [query_views_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_views_log) server settings section. To force flushing, use the [SYSTEM FLUSH LOGS](../../sql-reference/statements/system.md#query_language-system-flush_logs) query. + +ClickHouse does not delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details. + +Columns: + +- `event_date` ([Date](../../sql-reference/data-types/date.md)) — The date when the last event of the view happened. +- `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — The date and time when the view finished execution. +- `event_time_microseconds` ([DateTime](../../sql-reference/data-types/datetime.md)) — The date and time when the view finished execution with microseconds precision. +- `view_duration_ms` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Duration of view execution (sum of its stages) in milliseconds. +- `initial_query_id` ([String](../../sql-reference/data-types/string.md)) — ID of the initial query (for distributed query execution). +- `view_name` ([String](../../sql-reference/data-types/string.md)) — Name of the view. +- `view_uuid` ([UUID](../../sql-reference/data-types/uuid.md)) — UUID of the view. +- `view_type` ([Enum8](../../sql-reference/data-types/enum.md)) — Type of the view. Values: + - `'Default' = 1` — [Default views](../../sql-reference/statements/create/view.md#normal). Should not appear in this log. + - `'Materialized' = 2` — [Materialized views](../../sql-reference/statements/create/view.md#materialized). + - `'Live' = 3` — [Live views](../../sql-reference/statements/create/view.md#live-view). +- `view_query` ([String](../../sql-reference/data-types/string.md)) — The query executed by the view. +- `view_target` ([String](../../sql-reference/data-types/string.md)) — The name of the view target table. +- `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of read rows. +- `read_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of read bytes. +- `written_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of written rows. +- `written_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of written bytes. +- `peak_memory_usage` ([Int64](../../sql-reference/data-types/int-uint.md)) — The maximum difference between the amount of allocated and freed memory in context of this view. +- `ProfileEvents` ([Map(String, UInt64)](../../sql-reference/data-types/array.md)) — ProfileEvents that measure different metrics. The description of them could be found in the table [system.events](../../operations/system-tables/events.md#system_tables-events). +- `status` ([Enum8](../../sql-reference/data-types/enum.md)) — Status of the view. Values: + - `'QueryStart' = 1` — Successful start the view execution. Should not appear. + - `'QueryFinish' = 2` — Successful end of the view execution. + - `'ExceptionBeforeStart' = 3` — Exception before the start of the view execution. + - `'ExceptionWhileProcessing' = 4` — Exception during the view execution. +- `exception_code` ([Int32](../../sql-reference/data-types/int-uint.md)) — Code of an exception. +- `exception` ([String](../../sql-reference/data-types/string.md)) — Exception message. +- `stack_trace` ([String](../../sql-reference/data-types/string.md)) — [Stack trace](https://en.wikipedia.org/wiki/Stack_trace). An empty string, if the query was completed successfully. + +**Example** + +``` sql + SELECT * FROM system.query_views_log LIMIT 1 \G +``` + +``` text +Row 1: +────── +event_date: 2021-06-22 +event_time: 2021-06-22 13:23:07 +event_time_microseconds: 2021-06-22 13:23:07.738221 +view_duration_ms: 0 +initial_query_id: c3a1ac02-9cad-479b-af54-9e9c0a7afd70 +view_name: default.matview_inner +view_uuid: 00000000-0000-0000-0000-000000000000 +view_type: Materialized +view_query: SELECT * FROM default.table_b +view_target: default.`.inner.matview_inner` +read_rows: 4 +read_bytes: 64 +written_rows: 2 +written_bytes: 32 +peak_memory_usage: 4196188 +ProfileEvents: {'FileOpen':2,'WriteBufferFromFileDescriptorWrite':2,'WriteBufferFromFileDescriptorWriteBytes':187,'IOBufferAllocs':3,'IOBufferAllocBytes':3145773,'FunctionExecute':3,'DiskWriteElapsedMicroseconds':13,'InsertedRows':2,'InsertedBytes':16,'SelectedRows':4,'SelectedBytes':48,'ContextLock':16,'RWLockAcquiredReadLocks':1,'RealTimeMicroseconds':698,'SoftPageFaults':4,'OSReadChars':463} +status: QueryFinish +exception_code: 0 +exception: +stack_trace: +``` + +**See Also** + +- [system.query_log](../../operations/system-tables/query_log.md#system_tables-query_log) — Description of the `query_log` system table which contains common information about queries execution. +- [system.query_thread_log](../../operations/system-tables/query_thread_log.md#system_tables-query_thread_log) — This table contains information about each query execution thread. + + +[Original article](https://clickhouse.tech/docs/en/operations/system_tables/query_thread_log) diff --git a/docs/en/sql-reference/data-types/string.md b/docs/en/sql-reference/data-types/string.md index cb3a70ec7f8..2cf11ac85a3 100644 --- a/docs/en/sql-reference/data-types/string.md +++ b/docs/en/sql-reference/data-types/string.md @@ -15,6 +15,6 @@ When creating tables, numeric parameters for string fields can be set (e.g. `VAR ClickHouse does not have the concept of encodings. Strings can contain an arbitrary set of bytes, which are stored and output as-is. If you need to store texts, we recommend using UTF-8 encoding. At the very least, if your terminal uses UTF-8 (as recommended), you can read and write your values without making conversions. Similarly, certain functions for working with strings have separate variations that work under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. -For example, the ‘length’ function calculates the string length in bytes, while the ‘lengthUTF8’ function calculates the string length in Unicode code points, assuming that the value is UTF-8 encoded. +For example, the [length](../functions/string-functions.md#length) function calculates the string length in bytes, while the [lengthUTF8](../functions/string-functions.md#lengthutf8) function calculates the string length in Unicode code points, assuming that the value is UTF-8 encoded. [Original article](https://clickhouse.tech/docs/en/data_types/string/) diff --git a/docs/en/sql-reference/functions/array-functions.md b/docs/en/sql-reference/functions/array-functions.md index 422bbe4b4ea..e7918c018db 100644 --- a/docs/en/sql-reference/functions/array-functions.md +++ b/docs/en/sql-reference/functions/array-functions.md @@ -7,19 +7,89 @@ toc_title: Arrays ## empty {#function-empty} -Returns 1 for an empty array, or 0 for a non-empty array. -The result type is UInt8. -The function also works for strings. +Checks whether the input array is empty. -Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT empty(arr) FROM table` transforms to `SELECT arr.size0 = 0 FROM TABLE`. +**Syntax** + +``` sql +empty([x]) +``` + +An array is considered empty if it does not contain any elements. + +!!! note "Note" + Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT empty(arr) FROM TABLE;` transforms to `SELECT arr.size0 = 0 FROM TABLE;`. + +The function also works for [strings](string-functions.md#empty) or [UUID](uuid-functions.md#empty). + +**Arguments** + +- `[x]` — Input array. [Array](../data-types/array.md). + +**Returned value** + +- Returns `1` for an empty array or `0` for a non-empty array. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT empty([]); +``` + +Result: + +```text +┌─empty(array())─┐ +│ 1 │ +└────────────────┘ +``` ## notEmpty {#function-notempty} -Returns 0 for an empty array, or 1 for a non-empty array. -The result type is UInt8. -The function also works for strings. +Checks whether the input array is non-empty. -Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT notEmpty(arr) FROM table` transforms to `SELECT arr.size0 != 0 FROM TABLE`. +**Syntax** + +``` sql +notEmpty([x]) +``` + +An array is considered non-empty if it contains at least one element. + +!!! note "Note" + Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT notEmpty(arr) FROM table` transforms to `SELECT arr.size0 != 0 FROM TABLE`. + +The function also works for [strings](string-functions.md#notempty) or [UUID](uuid-functions.md#notempty). + +**Arguments** + +- `[x]` — Input array. [Array](../data-types/array.md). + +**Returned value** + +- Returns `1` for a non-empty array or `0` for an empty array. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT notEmpty([1,2]); +``` + +Result: + +```text +┌─notEmpty([1, 2])─┐ +│ 1 │ +└──────────────────┘ +``` ## length {#array_functions-length} diff --git a/docs/en/sql-reference/functions/string-functions.md b/docs/en/sql-reference/functions/string-functions.md index 8ec8aa7339d..c7c84c5aca6 100644 --- a/docs/en/sql-reference/functions/string-functions.md +++ b/docs/en/sql-reference/functions/string-functions.md @@ -10,17 +10,83 @@ toc_title: Strings ## empty {#empty} -Returns 1 for an empty string or 0 for a non-empty string. -The result type is UInt8. +Checks whether the input string is empty. + +**Syntax** + +``` sql +empty(x) +``` + A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte. -The function also works for arrays or UUID. -UUID is empty if it is all zeros (nil UUID). + +The function also works for [arrays](array-functions.md#function-empty) or [UUID](uuid-functions.md#empty). + +**Arguments** + +- `x` — Input value. [String](../data-types/string.md). + +**Returned value** + +- Returns `1` for an empty string or `0` for a non-empty string. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT empty(''); +``` + +Result: + +```text +┌─empty('')─┐ +│ 1 │ +└───────────┘ +``` ## notEmpty {#notempty} -Returns 0 for an empty string or 1 for a non-empty string. -The result type is UInt8. -The function also works for arrays or UUID. +Checks whether the input string is non-empty. + +**Syntax** + +``` sql +notEmpty(x) +``` + +A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte. + +The function also works for [arrays](array-functions.md#function-notempty) or [UUID](uuid-functions.md#notempty). + +**Arguments** + +- `x` — Input value. [String](../data-types/string.md). + +**Returned value** + +- Returns `1` for a non-empty string or `0` for an empty string string. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT notEmpty('text'); +``` + +Result: + +```text +┌─notEmpty('text')─┐ +│ 1 │ +└──────────────────┘ +``` ## length {#length} @@ -43,6 +109,158 @@ The result type is UInt64. Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception). The result type is UInt64. +## leftPad {#leftpad} + +Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL `LPAD` function. + +**Syntax** + +``` sql +leftPad('string', 'length'[, 'pad_string']) +``` + +**Arguments** + +- `string` — Input string that needs to be padded. [String](../data-types/string.md). +- `length` — The length of the resulting string. [UInt](../data-types/int-uint.md). If the value is less than the input string length, then the input string is returned as-is. +- `pad_string` — The string to pad the input string with. [String](../data-types/string.md). Optional. If not specified, then the input string is padded with spaces. + +**Returned value** + +- The resulting string of the given length. + +Type: [String](../data-types/string.md). + +**Example** + +Query: + +``` sql +SELECT leftPad('abc', 7, '*'), leftPad('def', 7); +``` + +Result: + +``` text +┌─leftPad('abc', 7, '*')─┬─leftPad('def', 7)─┐ +│ ****abc │ def │ +└────────────────────────┴───────────────────┘ +``` + +## leftPadUTF8 {#leftpadutf8} + +Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL `LPAD` function. While in the [leftPad](#leftpad) function the length is measured in bytes, here in the `leftPadUTF8` function it is measured in code points. + +**Syntax** + +``` sql +leftPadUTF8('string','length'[, 'pad_string']) +``` + +**Arguments** + +- `string` — Input string that needs to be padded. [String](../data-types/string.md). +- `length` — The length of the resulting string. [UInt](../data-types/int-uint.md). If the value is less than the input string length, then the input string is returned as-is. +- `pad_string` — The string to pad the input string with. [String](../data-types/string.md). Optional. If not specified, then the input string is padded with spaces. + +**Returned value** + +- The resulting string of the given length. + +Type: [String](../data-types/string.md). + +**Example** + +Query: + +``` sql +SELECT leftPadUTF8('абвг', 7, '*'), leftPadUTF8('дежз', 7); +``` + +Result: + +``` text +┌─leftPadUTF8('абвг', 7, '*')─┬─leftPadUTF8('дежз', 7)─┐ +│ ***абвг │ дежз │ +└─────────────────────────────┴────────────────────────┘ +``` + +## rightPad {#rightpad} + +Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL `RPAD` function. + +**Syntax** + +``` sql +rightPad('string', 'length'[, 'pad_string']) +``` + +**Arguments** + +- `string` — Input string that needs to be padded. [String](../data-types/string.md). +- `length` — The length of the resulting string. [UInt](../data-types/int-uint.md). If the value is less than the input string length, then the input string is returned as-is. +- `pad_string` — The string to pad the input string with. [String](../data-types/string.md). Optional. If not specified, then the input string is padded with spaces. + +**Returned value** + +- The resulting string of the given length. + +Type: [String](../data-types/string.md). + +**Example** + +Query: + +``` sql +SELECT rightPad('abc', 7, '*'), rightPad('abc', 7); +``` + +Result: + +``` text +┌─rightPad('abc', 7, '*')─┬─rightPad('abc', 7)─┐ +│ abc**** │ abc │ +└─────────────────────────┴────────────────────┘ +``` + +## rightPadUTF8 {#rightpadutf8} + +Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL `RPAD` function. While in the [rightPad](#rightpad) function the length is measured in bytes, here in the `rightPadUTF8` function it is measured in code points. + +**Syntax** + +``` sql +rightPadUTF8('string','length'[, 'pad_string']) +``` + +**Arguments** + +- `string` — Input string that needs to be padded. [String](../data-types/string.md). +- `length` — The length of the resulting string. [UInt](../data-types/int-uint.md). If the value is less than the input string length, then the input string is returned as-is. +- `pad_string` — The string to pad the input string with. [String](../data-types/string.md). Optional. If not specified, then the input string is padded with spaces. + +**Returned value** + +- The resulting string of the given length. + +Type: [String](../data-types/string.md). + +**Example** + +Query: + +``` sql +SELECT rightPadUTF8('абвг', 7, '*'), rightPadUTF8('абвг', 7); +``` + +Result: + +``` text +┌─rightPadUTF8('абвг', 7, '*')─┬─rightPadUTF8('абвг', 7)─┐ +│ абвг*** │ абвг │ +└──────────────────────────────┴─────────────────────────┘ +``` + ## lower, lcase {#lower} Converts ASCII Latin symbols in a string to lowercase. diff --git a/docs/en/sql-reference/functions/uuid-functions.md b/docs/en/sql-reference/functions/uuid-functions.md index e7e55c699cd..e5ab45bda40 100644 --- a/docs/en/sql-reference/functions/uuid-functions.md +++ b/docs/en/sql-reference/functions/uuid-functions.md @@ -9,7 +9,7 @@ The functions for working with UUID are listed below. ## generateUUIDv4 {#uuid-function-generate} -Generates the [UUID](../../sql-reference/data-types/uuid.md) of [version 4](https://tools.ietf.org/html/rfc4122#section-4.4). +Generates the [UUID](../data-types/uuid.md) of [version 4](https://tools.ietf.org/html/rfc4122#section-4.4). ``` sql generateUUIDv4() @@ -37,6 +37,90 @@ SELECT * FROM t_uuid └──────────────────────────────────────┘ ``` +## empty {#empty} + +Checks whether the input UUID is empty. + +**Syntax** + +```sql +empty(UUID) +``` + +The UUID is considered empty if it contains all zeros (zero UUID). + +The function also works for [arrays](array-functions.md#function-empty) or [strings](string-functions.md#empty). + +**Arguments** + +- `x` — Input UUID. [UUID](../data-types/uuid.md). + +**Returned value** + +- Returns `1` for an empty UUID or `0` for a non-empty UUID. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +To generate the UUID value, ClickHouse provides the [generateUUIDv4](#uuid-function-generate) function. + +Query: + +```sql +SELECT empty(generateUUIDv4()); +``` + +Result: + +```text +┌─empty(generateUUIDv4())─┐ +│ 0 │ +└─────────────────────────┘ +``` + +## notEmpty {#notempty} + +Checks whether the input UUID is non-empty. + +**Syntax** + +```sql +notEmpty(UUID) +``` + +The UUID is considered empty if it contains all zeros (zero UUID). + +The function also works for [arrays](array-functions.md#function-notempty) or [strings](string-functions.md#notempty). + +**Arguments** + +- `x` — Input UUID. [UUID](../data-types/uuid.md). + +**Returned value** + +- Returns `1` for a non-empty UUID or `0` for an empty UUID. + +Type: [UInt8](../data-types/int-uint.md). + +**Example** + +To generate the UUID value, ClickHouse provides the [generateUUIDv4](#uuid-function-generate) function. + +Query: + +```sql +SELECT notEmpty(generateUUIDv4()); +``` + +Result: + +```text +┌─notEmpty(generateUUIDv4())─┐ +│ 1 │ +└────────────────────────────┘ +``` + ## toUUID (x) {#touuid-x} Converts String type value to UUID type. diff --git a/docs/en/sql-reference/statements/select/distinct.md b/docs/en/sql-reference/statements/select/distinct.md index 87154cba05a..390afa46248 100644 --- a/docs/en/sql-reference/statements/select/distinct.md +++ b/docs/en/sql-reference/statements/select/distinct.md @@ -6,23 +6,55 @@ toc_title: DISTINCT If `SELECT DISTINCT` is specified, only unique rows will remain in a query result. Thus only a single row will remain out of all the sets of fully matching rows in the result. -## Null Processing {#null-processing} +You can specify the list of columns that must have unique values: `SELECT DISTINCT ON (column1, column2,...)`. If the columns are not specified, all of them are taken into consideration. -`DISTINCT` works with [NULL](../../../sql-reference/syntax.md#null-literal) as if `NULL` were a specific value, and `NULL==NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` occur only once. It differs from `NULL` processing in most other contexts. +Consider the table: -## Alternatives {#alternatives} +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 1 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` -It is possible to obtain the same result by applying [GROUP BY](../../../sql-reference/statements/select/group-by.md) across the same set of values as specified as `SELECT` clause, without using any aggregate functions. But there are few differences from `GROUP BY` approach: +Using `DISTINCT` without specifying columns: -- `DISTINCT` can be applied together with `GROUP BY`. -- When [ORDER BY](../../../sql-reference/statements/select/order-by.md) is omitted and [LIMIT](../../../sql-reference/statements/select/limit.md) is defined, the query stops running immediately after the required number of different rows has been read. -- Data blocks are output as they are processed, without waiting for the entire query to finish running. +```sql +SELECT DISTINCT * FROM t1; +``` -## Examples {#examples} +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 1 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` + +Using `DISTINCT` with specified columns: + +```sql +SELECT DISTINCT ON (a,b) * FROM t1; +``` + +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` + +## DISTINCT and ORDER BY {#distinct-orderby} ClickHouse supports using the `DISTINCT` and `ORDER BY` clauses for different columns in one query. The `DISTINCT` clause is executed before the `ORDER BY` clause. -Example table: +Consider the table: ``` text ┌─a─┬─b─┐ @@ -33,7 +65,11 @@ Example table: └───┴───┘ ``` -When selecting data with the `SELECT DISTINCT a FROM t1 ORDER BY b ASC` query, we get the following result: +Selecting data: + +```sql +SELECT DISTINCT a FROM t1 ORDER BY b ASC; +``` ``` text ┌─a─┐ @@ -42,8 +78,11 @@ When selecting data with the `SELECT DISTINCT a FROM t1 ORDER BY b ASC` query, w │ 3 │ └───┘ ``` +Selecting data with the different sorting direction: -If we change the sorting direction `SELECT DISTINCT a FROM t1 ORDER BY b DESC`, we get the following result: +```sql +SELECT DISTINCT a FROM t1 ORDER BY b DESC; +``` ``` text ┌─a─┐ @@ -56,3 +95,15 @@ If we change the sorting direction `SELECT DISTINCT a FROM t1 ORDER BY b DESC`, Row `2, 4` was cut before sorting. Take this implementation specificity into account when programming queries. + +## Null Processing {#null-processing} + +`DISTINCT` works with [NULL](../../../sql-reference/syntax.md#null-literal) as if `NULL` were a specific value, and `NULL==NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` occur only once. It differs from `NULL` processing in most other contexts. + +## Alternatives {#alternatives} + +It is possible to obtain the same result by applying [GROUP BY](../../../sql-reference/statements/select/group-by.md) across the same set of values as specified as `SELECT` clause, without using any aggregate functions. But there are few differences from `GROUP BY` approach: + +- `DISTINCT` can be applied together with `GROUP BY`. +- When [ORDER BY](../../../sql-reference/statements/select/order-by.md) is omitted and [LIMIT](../../../sql-reference/statements/select/limit.md) is defined, the query stops running immediately after the required number of different rows has been read. +- Data blocks are output as they are processed, without waiting for the entire query to finish running. diff --git a/docs/en/sql-reference/statements/select/index.md b/docs/en/sql-reference/statements/select/index.md index 04273ca1d4d..4e96bae8493 100644 --- a/docs/en/sql-reference/statements/select/index.md +++ b/docs/en/sql-reference/statements/select/index.md @@ -13,7 +13,7 @@ toc_title: Overview ``` sql [WITH expr_list|(subquery)] -SELECT [DISTINCT] expr_list +SELECT [DISTINCT [ON (column1, column2, ...)]] expr_list [FROM [db.]table | (subquery) | table_function] [FINAL] [SAMPLE sample_coeff] [ARRAY JOIN ...] @@ -36,6 +36,8 @@ All clauses are optional, except for the required list of expressions immediatel Specifics of each optional clause are covered in separate sections, which are listed in the same order as they are executed: - [WITH clause](../../../sql-reference/statements/select/with.md) +- [SELECT clause](#select-clause) +- [DISTINCT clause](../../../sql-reference/statements/select/distinct.md) - [FROM clause](../../../sql-reference/statements/select/from.md) - [SAMPLE clause](../../../sql-reference/statements/select/sample.md) - [JOIN clause](../../../sql-reference/statements/select/join.md) @@ -44,8 +46,6 @@ Specifics of each optional clause are covered in separate sections, which are li - [GROUP BY clause](../../../sql-reference/statements/select/group-by.md) - [LIMIT BY clause](../../../sql-reference/statements/select/limit-by.md) - [HAVING clause](../../../sql-reference/statements/select/having.md) -- [SELECT clause](#select-clause) -- [DISTINCT clause](../../../sql-reference/statements/select/distinct.md) - [LIMIT clause](../../../sql-reference/statements/select/limit.md) - [OFFSET clause](../../../sql-reference/statements/select/offset.md) - [UNION clause](../../../sql-reference/statements/select/union.md) diff --git a/docs/ru/development/developer-instruction.md b/docs/ru/development/developer-instruction.md index d23c0bbbdca..c568db4731f 100644 --- a/docs/ru/development/developer-instruction.md +++ b/docs/ru/development/developer-instruction.md @@ -168,7 +168,13 @@ sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)" cmake -D CMAKE_BUILD_TYPE=Debug .. -Вы можете изменить вариант сборки, выполнив эту команду в директории build. +В случае использования на разработческой машине старого HDD или SSD, а также при желании использовать меньше места для артефактов сборки можно использовать следующую команду: +```bash +cmake -DUSE_DEBUG_HELPERS=1 -DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1 .. +``` +При этом надо учесть, что получаемые в результате сборки исполнимые файлы будут динамически слинкованы с библиотеками, и поэтому фактически станут непереносимыми на другие компьютеры (либо для этого нужно будет предпринять значительно больше усилий по сравнению со статической сборкой). Плюсом же в данном случае является значительно меньшее время сборки (это проявляется не на первой сборке, а на последующих, после внесения изменений в исходный код - тратится меньшее время на линковку по сравнению со статической сборкой) и значительно меньшее использование места на жёстком диске (экономия более, чем в 3 раза по сравнению со статической сборкой). Для целей разработки, когда планируются только отладочные запуски на том же компьютере, где осуществлялась сборка, это может быть наиболее удобным вариантом. + +Вы можете изменить вариант сборки, выполнив новую команду в директории build. Запустите ninja для сборки: diff --git a/docs/ru/engines/database-engines/materialized-mysql.md b/docs/ru/engines/database-engines/materialized-mysql.md index 0175e794cd5..1cd864c01e9 100644 --- a/docs/ru/engines/database-engines/materialized-mysql.md +++ b/docs/ru/engines/database-engines/materialized-mysql.md @@ -1,10 +1,12 @@ --- toc_priority: 29 -toc_title: MaterializedMySQL +toc_title: "[experimental] MaterializedMySQL" --- # [экспериментальный] MaterializedMySQL {#materialized-mysql} +**Это экспериментальный движок, который не следует использовать в продакшене.** + Создает базу данных ClickHouse со всеми таблицами, существующими в MySQL, и всеми данными в этих таблицах. Сервер ClickHouse работает как реплика MySQL. Он читает файл binlog и выполняет DDL and DML-запросы. @@ -23,6 +25,32 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo - `user` — пользователь MySQL. - `password` — пароль пользователя. +**Настройки движка** + +- `max_rows_in_buffer` — максимальное количество строк, содержимое которых может кешироваться в памяти (для одной таблицы и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `65 505`. +- `max_bytes_in_buffer` — максимальное количество байтов, которое разрешено кешировать в памяти (для одной таблицы и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `1 048 576`. +- `max_rows_in_buffers` — максимальное количество строк, содержимое которых может кешироваться в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `65 505`. +- `max_bytes_in_buffers` — максимальное количество байтов, которое разрешено кешировать данным в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `1 048 576`. +- `max_flush_data_time` — максимальное время в миллисекундах, в течение которого разрешено кешировать данные в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества указанного периода, данные будут материализованы. Значение по умолчанию: `1000`. +- `max_wait_time_when_mysql_unavailable` — интервал между повторными попытками, если MySQL недоступен. Указывается в миллисекундах. Отрицательное значение отключает повторные попытки. Значение по умолчанию: `1000`. +- `allows_query_when_mysql_lost` — признак, разрешен ли запрос к материализованной таблице при потере соединения с MySQL. Значение по умолчанию: `0` (`false`). + +```sql +CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***') + SETTINGS + allows_query_when_mysql_lost=true, + max_wait_time_when_mysql_unavailable=10000; +``` + +**Настройки на стороне MySQL-сервера** + +Для правильной работы `MaterializedMySQL` следует обязательно указать на сервере MySQL следующие параметры конфигурации: +- `default_authentication_plugin = mysql_native_password` — `MaterializedMySQL` может авторизоваться только с помощью этого метода. +- `gtid_mode = on` — ведение журнала на основе GTID является обязательным для обеспечения правильной репликации. + +!!! attention "Внимание" + При включении `gtid_mode` вы также должны указать `enforce_gtid_consistency = on`. + ## Виртуальные столбцы {#virtual-columns} При работе с движком баз данных `MaterializedMySQL` используются таблицы семейства [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) с виртуальными столбцами `_sign` и `_version`. @@ -51,13 +79,21 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo | STRING | [String](../../sql-reference/data-types/string.md) | | VARCHAR, VAR_STRING | [String](../../sql-reference/data-types/string.md) | | BLOB | [String](../../sql-reference/data-types/string.md) | - -Другие типы не поддерживаются. Если таблица MySQL содержит столбец другого типа, ClickHouse выдаст исключение "Неподдерживаемый тип данных" ("Unhandled data type") и остановит репликацию. +| BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) | Тип [Nullable](../../sql-reference/data-types/nullable.md) поддерживается. +Другие типы не поддерживаются. Если таблица MySQL содержит столбец другого типа, ClickHouse выдаст исключение "Неподдерживаемый тип данных" ("Unhandled data type") и остановит репликацию. + ## Особенности и рекомендации {#specifics-and-recommendations} +### Ограничения совместимости {#compatibility-restrictions} + +Кроме ограничений на типы данных, существует несколько ограничений по сравнению с базами данных MySQL, которые следует решить до того, как станет возможной репликация: + +- Каждая таблица в MySQL должна содержать `PRIMARY KEY`. +- Репликация для таблиц, содержащих строки со значениями полей `ENUM` вне диапазона значений (определяется размерностью `ENUM`), не будет работать. + ### DDL-запросы {#ddl-queries} DDL-запросы в MySQL конвертируются в соответствующие DDL-запросы в ClickHouse ([ALTER](../../sql-reference/statements/alter/index.md), [CREATE](../../sql-reference/statements/create/index.md), [DROP](../../sql-reference/statements/drop.md), [RENAME](../../sql-reference/statements/rename.md)). Если ClickHouse не может конвертировать какой-либо DDL-запрос, он его игнорирует. @@ -158,3 +194,4 @@ SELECT * FROM mysql.test; └───┴─────┴──────┘ ``` +[Оригинальная статья](https://clickhouse.tech/docs/ru/engines/database-engines/materialized-mysql/) diff --git a/docs/ru/engines/table-engines/mergetree-family/mergetree.md b/docs/ru/engines/table-engines/mergetree-family/mergetree.md index 4bced6254d1..db6eb8154ba 100644 --- a/docs/ru/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/ru/engines/table-engines/mergetree-family/mergetree.md @@ -375,6 +375,24 @@ INDEX b (u64 * length(str), i32 + f64 * 100, date, str) TYPE set(100) GRANULARIT - `s != 1` - `NOT startsWith(s, 'test')` +### Проекции {#projections} +Проекции похожи на материализованные представления, но определяются на уровне партов. Это обеспечивает гарантии согласованности наряду с автоматическим использованием в запросах. + +#### Запрос {#projection-query} +Запрос проекции — это то, что определяет проекцию. Он имеет следующую грамматику: + +`SELECT [GROUP BY] [ORDER BY]` + +Он неявно выбирает данные из родительской таблицы. + +#### Хранение {#projection-storage} +Проекции хранятся в каталоге парта. Это похоже на хранение индексов, но используется подкаталог, в котором хранится анонимный парт таблицы MergeTree. Таблица создается запросом определения проекции. Если есть конструкция GROUP BY, то базовый механизм хранения становится AggregatedMergeTree, а все агрегатные функции преобразуются в AggregateFunction. Если есть конструкция ORDER BY, таблица MergeTree будет использовать его в качестве выражения первичного ключа. Во время процесса слияния парт проекции будет слит с помощью процедуры слияния ее хранилища. Контрольная сумма парта родительской таблицы будет включать парт проекции. Другие процедуры аналогичны индексам пропуска данных. + +#### Анализ запросов {#projection-query-analysis} +1. Проверить, можно ли использовать проекцию в данном запросе, то есть, что с ней выходит тот же результат, что и с запросом к базовой таблице. +2. Выбрать наиболее подходящее совпадение, содержащее наименьшее количество гранул для чтения. +3. План запроса, который использует проекции, будет отличаться от того, который использует исходные парты. При отсутствии проекции в некоторых партах можно расширить план, чтобы «проецировать» на лету. + ## Конкурентный доступ к данным {#concurrent-data-access} Для конкурентного доступа к таблице используется мультиверсионность. То есть, при одновременном чтении и обновлении таблицы, данные будут читаться из набора кусочков, актуального на момент запроса. Длинных блокировок нет. Вставки никак не мешают чтениям. diff --git a/docs/ru/sql-reference/functions/array-functions.md b/docs/ru/sql-reference/functions/array-functions.md index 52fd63864ce..b7a301d30a9 100644 --- a/docs/ru/sql-reference/functions/array-functions.md +++ b/docs/ru/sql-reference/functions/array-functions.md @@ -7,19 +7,89 @@ toc_title: "Массивы" ## empty {#function-empty} -Возвращает 1 для пустого массива, и 0 для непустого массива. -Тип результата - UInt8. -Функция также работает для строк. +Проверяет, является ли входной массив пустым. -Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT empty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 = 0 FROM TABLE`. +**Синтаксис** + +``` sql +empty([x]) +``` + +Массив считается пустым, если он не содержит ни одного элемента. + +!!! note "Примечание" + Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT empty(arr) FROM TABLE` преобразуется к запросу `SELECT arr.size0 = 0 FROM TABLE`. + +Функция также поддерживает работу с типами [String](string-functions.md#empty) и [UUID](uuid-functions.md#empty). + +**Параметры** + +- `[x]` — массив на входе функции. [Array](../data-types/array.md). + +**Возвращаемое значение** + +- Возвращает `1` для пустого массива или `0` — для непустого массива. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Запрос: + +```sql +SELECT empty([]); +``` + +Ответ: + +```text +┌─empty(array())─┐ +│ 1 │ +└────────────────┘ +``` ## notEmpty {#function-notempty} -Возвращает 0 для пустого массива, и 1 для непустого массива. -Тип результата - UInt8. -Функция также работает для строк. +Проверяет, является ли входной массив непустым. -Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT notEmpty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 != 0 FROM TABLE`. +**Синтаксис** + +``` sql +notEmpty([x]) +``` + +Массив считается непустым, если он содержит хотя бы один элемент. + +!!! note "Примечание" + Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT notEmpty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 != 0 FROM TABLE`. + +Функция также поддерживает работу с типами [String](string-functions.md#notempty) и [UUID](uuid-functions.md#notempty). + +**Параметры** + +- `[x]` — массив на входе функции. [Array](../data-types/array.md). + +**Возвращаемое значение** + +- Возвращает `1` для непустого массива или `0` — для пустого массива. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Запрос: + +```sql +SELECT notEmpty([1,2]); +``` + +Результат: + +```text +┌─notEmpty([1, 2])─┐ +│ 1 │ +└──────────────────┘ +``` ## length {#array_functions-length} diff --git a/docs/ru/sql-reference/functions/string-functions.md b/docs/ru/sql-reference/functions/string-functions.md index b587a991db1..cbda5188881 100644 --- a/docs/ru/sql-reference/functions/string-functions.md +++ b/docs/ru/sql-reference/functions/string-functions.md @@ -7,16 +7,83 @@ toc_title: "Функции для работы со строками" ## empty {#empty} -Возвращает 1 для пустой строки, и 0 для непустой строки. -Тип результата — UInt8. -Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт. -Функция также работает для массивов. +Проверяет, является ли входная строка пустой. + +**Синтаксис** + +``` sql +empty(x) +``` + +Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт. + +Функция также поддерживает работу с типами [Array](array-functions.md#function-empty) и [UUID](uuid-functions.md#empty). + +**Параметры** + +- `x` — Входная строка. [String](../data-types/string.md). + +**Возвращаемое значение** + +- Возвращает `1` для пустой строки и `0` — для непустой строки. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Запрос: + +```sql +SELECT notempty('text'); +``` + +Результат: + +```text +┌─empty('')─┐ +│ 1 │ +└───────────┘ +``` ## notEmpty {#notempty} -Возвращает 0 для пустой строки, и 1 для непустой строки. -Тип результата — UInt8. -Функция также работает для массивов. +Проверяет, является ли входная строка непустой. + +**Синтаксис** + +``` sql +notEmpty(x) +``` + +Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт. + +Функция также поддерживает работу с типами [Array](array-functions.md#function-notempty) и [UUID](uuid-functions.md#notempty). + +**Параметры** + +- `x` — Входная строка. [String](../data-types/string.md). + +**Возвращаемое значение** + +- Возвращает `1` для непустой строки и `0` — для пустой строки. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Запрос: + +```sql +SELECT notEmpty('text'); +``` + +Результат: + +```text +┌─notEmpty('text')─┐ +│ 1 │ +└──────────────────┘ +``` ## length {#length} @@ -39,6 +106,158 @@ toc_title: "Функции для работы со строками" Возвращает длину строки в кодовых точках Unicode (не символах), при допущении, что строка содержит набор байтов, являющийся текстом в кодировке UTF-8. Если допущение не выполнено, возвращает какой-нибудь результат (не кидает исключение). Тип результата — UInt64. +## leftPad {#leftpad} + +Дополняет текущую строку слева пробелами или указанной строкой (несколько раз, если необходимо), пока результирующая строка не достигнет заданной длины. Соответствует MySQL функции `LPAD`. + +**Синтаксис** + +``` sql +leftPad('string', 'length'[, 'pad_string']) +``` + +**Параметры** + +- `string` — входная строка, которую необходимо дополнить. [String](../data-types/string.md). +- `length` — длина результирующей строки. [UInt](../data-types/int-uint.md). Если указанное значение меньше, чем длина входной строки, то входная строка возвращается как есть. +- `pad_string` — строка, используемая для дополнения входной строки. [String](../data-types/string.md). Необязательный параметр. Если не указано, то входная строка дополняется пробелами. + +**Возвращаемое значение** + +- Результирующая строка заданной длины. + +Type: [String](../data-types/string.md). + +**Пример** + +Запрос: + +``` sql +SELECT leftPad('abc', 7, '*'), leftPad('def', 7); +``` + +Результат: + +``` text +┌─leftPad('abc', 7, '*')─┬─leftPad('def', 7)─┐ +│ ****abc │ def │ +└────────────────────────┴───────────────────┘ +``` + +## leftPadUTF8 {#leftpadutf8} + +Дополняет текущую строку слева пробелами или указанной строкой (несколько раз, если необходимо), пока результирующая строка не достигнет заданной длины. Соответствует MySQL функции `LPAD`. В отличие от функции [leftPad](#leftpad), измеряет длину строки не в байтах, а в кодовых точках Unicode. + +**Синтаксис** + +``` sql +leftPadUTF8('string','length'[, 'pad_string']) +``` + +**Параметры** + +- `string` — входная строка, которую необходимо дополнить. [String](../data-types/string.md). +- `length` — длина результирующей строки. [UInt](../data-types/int-uint.md). Если указанное значение меньше, чем длина входной строки, то входная строка возвращается как есть. +- `pad_string` — строка, используемая для дополнения входной строки. [String](../data-types/string.md). Необязательный параметр. Если не указано, то входная строка дополняется пробелами. + +**Возвращаемое значение** + +- Результирующая строка заданной длины. + +Type: [String](../data-types/string.md). + +**Пример** + +Запрос: + +``` sql +SELECT leftPadUTF8('абвг', 7, '*'), leftPadUTF8('дежз', 7); +``` + +Результат: + +``` text +┌─leftPadUTF8('абвг', 7, '*')─┬─leftPadUTF8('дежз', 7)─┐ +│ ***абвг │ дежз │ +└─────────────────────────────┴────────────────────────┘ +``` + +## rightPad {#rightpad} + +Дополняет текущую строку справа пробелами или указанной строкой (несколько раз, если необходимо), пока результирующая строка не достигнет заданной длины. Соответствует MySQL функции `RPAD`. + +**Синтаксис** + +``` sql +rightPad('string', 'length'[, 'pad_string']) +``` + +**Параметры** + +- `string` — входная строка, которую необходимо дополнить. [String](../data-types/string.md). +- `length` — длина результирующей строки. [UInt](../data-types/int-uint.md). Если указанное значение меньше, чем длина входной строки, то входная строка возвращается как есть. +- `pad_string` — строка, используемая для дополнения входной строки. [String](../data-types/string.md). Необязательный параметр. Если не указано, то входная строка дополняется пробелами. + +**Возвращаемое значение** + +- Результирующая строка заданной длины. + +Type: [String](../data-types/string.md). + +**Пример** + +Запрос: + +``` sql +SELECT rightPad('abc', 7, '*'), rightPad('abc', 7); +``` + +Результат: + +``` text +┌─rightPad('abc', 7, '*')─┬─rightPad('abc', 7)─┐ +│ abc**** │ abc │ +└─────────────────────────┴────────────────────┘ +``` + +## rightPadUTF8 {#rightpadutf8} + +Дополняет текущую строку слева пробелами или указанной строкой (несколько раз, если необходимо), пока результирующая строка не достигнет заданной длины. Соответствует MySQL функции `RPAD`. В отличие от функции [rightPad](#rightpad), измеряет длину строки не в байтах, а в кодовых точках Unicode. + +**Синтаксис** + +``` sql +rightPadUTF8('string','length'[, 'pad_string']) +``` + +**Параметры** + +- `string` — входная строка, которую необходимо дополнить. [String](../data-types/string.md). +- `length` — длина результирующей строки. [UInt](../data-types/int-uint.md). Если указанное значение меньше, чем длина входной строки, то входная строка возвращается как есть. +- `pad_string` — строка, используемая для дополнения входной строки. [String](../data-types/string.md). Необязательный параметр. Если не указано, то входная строка дополняется пробелами. + +**Возвращаемое значение** + +- Результирующая строка заданной длины. + +Type: [String](../data-types/string.md). + +**Пример** + +Запрос: + +``` sql +SELECT rightPadUTF8('абвг', 7, '*'), rightPadUTF8('абвг', 7); +``` + +Результат: + +``` text +┌─rightPadUTF8('абвг', 7, '*')─┬─rightPadUTF8('абвг', 7)─┐ +│ абвг*** │ абвг │ +└──────────────────────────────┴─────────────────────────┘ +``` + ## lower, lcase {#lower} Переводит ASCII-символы латиницы в строке в нижний регистр. diff --git a/docs/ru/sql-reference/functions/uuid-functions.md b/docs/ru/sql-reference/functions/uuid-functions.md index f0017adbc8b..0d534a2d7ce 100644 --- a/docs/ru/sql-reference/functions/uuid-functions.md +++ b/docs/ru/sql-reference/functions/uuid-functions.md @@ -35,6 +35,90 @@ SELECT * FROM t_uuid └──────────────────────────────────────┘ ``` +## empty {#empty} + +Проверяет, является ли входной UUID пустым. + +**Синтаксис** + +```sql +empty(UUID) +``` + +UUID считается пустым, если он содержит все нули (нулевой UUID). + +Функция также поддерживает работу с типами [Array](array-functions.md#function-empty) и [String](string-functions.md#empty). + +**Параметры** + +- `x` — UUID на входе функции. [UUID](../data-types/uuid.md). + +**Возвращаемое значение** + +- Возвращает `1` для пустого UUID или `0` — для непустого UUID. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Для генерации UUID-значений предназначена функция [generateUUIDv4](#uuid-function-generate). + +Запрос: + +```sql +SELECT empty(generateUUIDv4()); +``` + +Ответ: + +```text +┌─empty(generateUUIDv4())─┐ +│ 0 │ +└─────────────────────────┘ +``` + +## notEmpty {#notempty} + +Проверяет, является ли входной UUID непустым. + +**Синтаксис** + +```sql +notEmpty(UUID) +``` + +UUID считается пустым, если он содержит все нули (нулевой UUID). + +Функция также поддерживает работу с типами [Array](array-functions.md#function-notempty) и [String](string-functions.md#function-notempty). + +**Параметры** + +- `x` — UUID на входе функции. [UUID](../data-types/uuid.md). + +**Возвращаемое значение** + +- Возвращает `1` для непустого UUID или `0` — для пустого UUID. + +Тип: [UInt8](../data-types/int-uint.md). + +**Пример** + +Для генерации UUID-значений предназначена функция [generateUUIDv4](#uuid-function-generate). + +Запрос: + +```sql +SELECT notEmpty(generateUUIDv4()); +``` + +Результат: + +```text +┌─notEmpty(generateUUIDv4())─┐ +│ 1 │ +└────────────────────────────┘ +``` + ## toUUID (x) {#touuid-x} Преобразует значение типа String в тип UUID. diff --git a/docs/ru/sql-reference/statements/alter/projection.md b/docs/ru/sql-reference/statements/alter/projection.md new file mode 100644 index 00000000000..db116963aa6 --- /dev/null +++ b/docs/ru/sql-reference/statements/alter/projection.md @@ -0,0 +1,23 @@ +--- +toc_priority: 49 +toc_title: PROJECTION +--- + +# Манипуляции с проекциями {#manipulations-with-projections} + +Доступны следующие операции: + +- `ALTER TABLE [db].name ADD PROJECTION name AS SELECT [GROUP BY] [ORDER BY]` — добавляет описание проекции в метаданные. + +- `ALTER TABLE [db].name DROP PROJECTION name` — удаляет описание проекции из метаданных и удаляет файлы проекции с диска. + +- `ALTER TABLE [db.]table MATERIALIZE PROJECTION name IN PARTITION partition_name` — перестраивает проекцию в указанной партиции. Реализовано как [мутация](../../../sql-reference/statements/alter/index.md#mutations). + +- `ALTER TABLE [db.]table CLEAR PROJECTION name IN PARTITION partition_name` — удаляет файлы проекции с диска без удаления описания. + +Комманды ADD, DROP и CLEAR — легковесны, поскольку они только меняют метаданные или удаляют файлы. + +Также команды реплицируются, синхронизируя описания проекций в метаданных с помощью ZooKeeper. + +!!! note "Note" + Манипуляции с проекциями поддерживаются только для таблиц с движком [`*MergeTree`](../../../engines/table-engines/mergetree-family/mergetree.md) (включая [replicated](../../../engines/table-engines/mergetree-family/replication.md) варианты). diff --git a/docs/ru/sql-reference/statements/select/distinct.md b/docs/ru/sql-reference/statements/select/distinct.md index f57c2a42593..42c1df64540 100644 --- a/docs/ru/sql-reference/statements/select/distinct.md +++ b/docs/ru/sql-reference/statements/select/distinct.md @@ -6,19 +6,51 @@ toc_title: DISTINCT Если указан `SELECT DISTINCT`, то в результате запроса останутся только уникальные строки. Таким образом, из всех наборов полностью совпадающих строк в результате останется только одна строка. -## Обработка NULL {#null-processing} +Вы можете указать столбцы, по которым хотите отбирать уникальные значения: `SELECT DISTINCT ON (column1, column2,...)`. Если столбцы не указаны, то отбираются строки, в которых значения уникальны во всех столбцах. -`DISTINCT` работает с [NULL](../../syntax.md#null-literal) как-будто `NULL` — обычное значение и `NULL==NULL`. Другими словами, в результате `DISTINCT`, различные комбинации с `NULL` встретятся только один раз. Это отличается от обработки `NULL` в большинстве других контекстов. +Рассмотрим таблицу: -## Альтернативы {#alternatives} +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 1 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` -Такой же результат можно получить, применив секцию [GROUP BY](group-by.md) для того же набора значений, которые указан в секции `SELECT`, без использования каких-либо агрегатных функций. Но есть от `GROUP BY` несколько отличий: +Использование `DISTINCT` без указания столбцов: -- `DISTINCT` может применяться вместе с `GROUP BY`. -- Когда секция [ORDER BY](order-by.md) опущена, а секция [LIMIT](limit.md) присутствует, запрос прекращает выполнение сразу после считывания необходимого количества различных строк. -- Блоки данных выводятся по мере их обработки, не дожидаясь завершения выполнения всего запроса. +```sql +SELECT DISTINCT * FROM t1; +``` -## Примеры {#examples} +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 1 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` + +Использование `DISTINCT` с указанием столбцов: + +```sql +SELECT DISTINCT ON (a,b) * FROM t1; +``` + +```text +┌─a─┬─b─┬─c─┐ +│ 1 │ 1 │ 1 │ +│ 2 │ 2 │ 2 │ +│ 1 │ 2 │ 2 │ +└───┴───┴───┘ +``` + +## DISTINCT и ORDER BY {#distinct-orderby} ClickHouse поддерживает использование секций `DISTINCT` и `ORDER BY` для разных столбцов в одном запросе. Секция `DISTINCT` выполняется до секции `ORDER BY`. @@ -56,3 +88,16 @@ ClickHouse поддерживает использование секций `DIS Ряд `2, 4` был разрезан перед сортировкой. Учитывайте эту специфику при разработке запросов. + +## Обработка NULL {#null-processing} + +`DISTINCT` работает с [NULL](../../syntax.md#null-literal) как-будто `NULL` — обычное значение и `NULL==NULL`. Другими словами, в результате `DISTINCT`, различные комбинации с `NULL` встретятся только один раз. Это отличается от обработки `NULL` в большинстве других контекстов. + +## Альтернативы {#alternatives} + +Можно получить такой же результат, применив [GROUP BY](group-by.md) для того же набора значений, которые указан в секции `SELECT`, без использования каких-либо агрегатных функций. Но есть несколько отличий от `GROUP BY`: + +- `DISTINCT` может применяться вместе с `GROUP BY`. +- Когда секция [ORDER BY](order-by.md) опущена, а секция [LIMIT](limit.md) присутствует, запрос прекращает выполнение сразу после считывания необходимого количества различных строк. +- Блоки данных выводятся по мере их обработки, не дожидаясь завершения выполнения всего запроса. + diff --git a/docs/ru/sql-reference/statements/select/index.md b/docs/ru/sql-reference/statements/select/index.md index a0a862cbf55..c2820bc7be4 100644 --- a/docs/ru/sql-reference/statements/select/index.md +++ b/docs/ru/sql-reference/statements/select/index.md @@ -11,7 +11,7 @@ toc_title: "Обзор" ``` sql [WITH expr_list|(subquery)] -SELECT [DISTINCT] expr_list +SELECT [DISTINCT [ON (column1, column2, ...)]] expr_list [FROM [db.]table | (subquery) | table_function] [FINAL] [SAMPLE sample_coeff] [ARRAY JOIN ...] @@ -34,6 +34,8 @@ SELECT [DISTINCT] expr_list Особенности каждой необязательной секции рассматриваются в отдельных разделах, которые перечислены в том же порядке, в каком они выполняются: - [Секция WITH](with.md) +- [Секция SELECT](#select-clause) +- [Секция DISTINCT](distinct.md) - [Секция FROM](from.md) - [Секция SAMPLE](sample.md) - [Секция JOIN](join.md) @@ -42,8 +44,6 @@ SELECT [DISTINCT] expr_list - [Секция GROUP BY](group-by.md) - [Секция LIMIT BY](limit-by.md) - [Секция HAVING](having.md) -- [Секция SELECT](#select-clause) -- [Секция DISTINCT](distinct.md) - [Секция LIMIT](limit.md) [Секция OFFSET](offset.md) - [Секция UNION ALL](union.md) diff --git a/programs/client/QueryFuzzer.h b/programs/client/QueryFuzzer.h index 19f089c6c4e..09d57f4161f 100644 --- a/programs/client/QueryFuzzer.h +++ b/programs/client/QueryFuzzer.h @@ -7,7 +7,6 @@ #include #include -#include #include #include diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp index 86bb04351b1..5520f920823 100644 --- a/programs/server/Server.cpp +++ b/programs/server/Server.cpp @@ -97,7 +97,7 @@ #endif #if USE_SSL -# if USE_INTERNAL_SSL_LIBRARY +# if USE_INTERNAL_SSL_LIBRARY && !defined(ARCADIA_BUILD) # include # endif # include @@ -126,6 +126,7 @@ namespace CurrentMetrics extern const Metric VersionInteger; extern const Metric MemoryTracking; extern const Metric MaxDDLEntryID; + extern const Metric MaxPushedDDLEntryID; } namespace fs = std::filesystem; @@ -1468,7 +1469,8 @@ if (ThreadFuzzer::instance().isEffective()) if (pool_size < 1) throw Exception("distributed_ddl.pool_size should be greater then 0", ErrorCodes::ARGUMENT_OUT_OF_BOUND); global_context->setDDLWorker(std::make_unique(pool_size, ddl_zookeeper_path, global_context, &config(), - "distributed_ddl", "DDLWorker", &CurrentMetrics::MaxDDLEntryID)); + "distributed_ddl", "DDLWorker", + &CurrentMetrics::MaxDDLEntryID, &CurrentMetrics::MaxPushedDDLEntryID)); } for (auto & server : *servers) diff --git a/programs/server/config.xml b/programs/server/config.xml index 78182482c1c..510a5e230f8 100644 --- a/programs/server/config.xml +++ b/programs/server/config.xml @@ -320,7 +320,7 @@ The amount of data in mapped files can be monitored in system.metrics, system.metric_log by the MMappedFiles, MMappedFileBytes metrics and in system.asynchronous_metrics, system.asynchronous_metrics_log by the MMapCacheCells metric, - and also in system.events, system.processes, system.query_log, system.query_thread_log by the + and also in system.events, system.processes, system.query_log, system.query_thread_log, system.query_views_log by the CreatedReadBufferMMap, CreatedReadBufferMMapFailed, MMappedFileCacheHits, MMappedFileCacheMisses events. Note that the amount of data in mapped files does not consume memory directly and is not accounted in query or server memory usage - because this memory can be discarded similar to OS page cache. @@ -878,14 +878,23 @@ 7500 + + + system + query_views_log
+ toYYYYMM(event_date) + 7500 +
+ system part_log
+ toYYYYMM(event_date) 7500
- --> diff --git a/programs/server/config.yaml.example b/programs/server/config.yaml.example index bebfd74ff58..5b2da1d3128 100644 --- a/programs/server/config.yaml.example +++ b/programs/server/config.yaml.example @@ -271,7 +271,7 @@ mark_cache_size: 5368709120 # The amount of data in mapped files can be monitored # in system.metrics, system.metric_log by the MMappedFiles, MMappedFileBytes metrics # and in system.asynchronous_metrics, system.asynchronous_metrics_log by the MMapCacheCells metric, -# and also in system.events, system.processes, system.query_log, system.query_thread_log by the +# and also in system.events, system.processes, system.query_log, system.query_thread_log, system.query_views_log by the # CreatedReadBufferMMap, CreatedReadBufferMMapFailed, MMappedFileCacheHits, MMappedFileCacheMisses events. # Note that the amount of data in mapped files does not consume memory directly and is not accounted # in query or server memory usage - because this memory can be discarded similar to OS page cache. @@ -731,12 +731,21 @@ query_thread_log: partition_by: toYYYYMM(event_date) flush_interval_milliseconds: 7500 +# Query views log. Has information about all dependent views associated with a query. +# Used only for queries with setting log_query_views = 1. +query_views_log: + database: system + table: query_views_log + partition_by: toYYYYMM(event_date) + flush_interval_milliseconds: 7500 + # Uncomment if use part log. # Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads). -# part_log: -# database: system -# table: part_log -# flush_interval_milliseconds: 7500 +part_log: + database: system + table: part_log + partition_by: toYYYYMM(event_date) + flush_interval_milliseconds: 7500 # Uncomment to write text log into table. # Text log contains all information from usual server log but stores it in structured and efficient way. diff --git a/src/Access/ya.make b/src/Access/ya.make index 5f2f410cabd..38c1b007ff8 100644 --- a/src/Access/ya.make +++ b/src/Access/ya.make @@ -46,7 +46,6 @@ SRCS( SettingsProfilesInfo.cpp User.cpp UsersConfigAccessStorage.cpp - tests/gtest_access_rights_ops.cpp ) diff --git a/src/Access/ya.make.in b/src/Access/ya.make.in index 1f11c7d7d2a..5fa69cec4bb 100644 --- a/src/Access/ya.make.in +++ b/src/Access/ya.make.in @@ -8,7 +8,7 @@ PEERDIR( SRCS( - + ) END() diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 6c10d3e2f2b..796c9eb4d2c 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -299,10 +299,11 @@ target_link_libraries(clickhouse_common_io ${ZLIB_LIBRARIES} pcg_random Poco::Foundation - roaring ) - +# Make dbms depend on roaring instead of clickhouse_common_io so that roaring itself can depend on clickhouse_common_io +# That way we we can redirect malloc/free functions avoiding circular dependencies +dbms_target_link_libraries(PUBLIC roaring) if (USE_RDKAFKA) dbms_target_link_libraries(PRIVATE ${CPPKAFKA_LIBRARY} ${RDKAFKA_LIBRARY}) diff --git a/src/Columns/ColumnLowCardinality.h b/src/Columns/ColumnLowCardinality.h index faf5bb9e712..a78c7d88a11 100644 --- a/src/Columns/ColumnLowCardinality.h +++ b/src/Columns/ColumnLowCardinality.h @@ -194,6 +194,7 @@ public: const IColumnUnique & getDictionary() const { return dictionary.getColumnUnique(); } IColumnUnique & getDictionary() { return dictionary.getColumnUnique(); } const ColumnPtr & getDictionaryPtr() const { return dictionary.getColumnUniquePtr(); } + ColumnPtr & getDictionaryPtr() { return dictionary.getColumnUniquePtr(); } /// IColumnUnique & getUnique() { return static_cast(*column_unique); } /// ColumnPtr getUniquePtr() const { return column_unique; } diff --git a/src/Common/CurrentMetrics.cpp b/src/Common/CurrentMetrics.cpp index f94c3421107..9acefe8a2d8 100644 --- a/src/Common/CurrentMetrics.cpp +++ b/src/Common/CurrentMetrics.cpp @@ -60,6 +60,7 @@ M(BrokenDistributedFilesToInsert, "Number of files for asynchronous insertion into Distributed tables that has been marked as broken. This metric will starts from 0 on start. Number of files for every shard is summed.") \ M(TablesToDropQueueSize, "Number of dropped tables, that are waiting for background data removal.") \ M(MaxDDLEntryID, "Max processed DDL entry of DDLWorker.") \ + M(MaxPushedDDLEntryID, "Max DDL entry of DDLWorker that pushed to zookeeper.") \ M(PartsTemporary, "The part is generating now, it is not in data_parts list.") \ M(PartsPreCommitted, "The part is in data_parts, but not used for SELECTs.") \ M(PartsCommitted, "Active data part, used by current and upcoming SELECTs.") \ diff --git a/src/Common/DenseHashMap.h b/src/Common/DenseHashMap.h new file mode 100644 index 00000000000..9ac21c82676 --- /dev/null +++ b/src/Common/DenseHashMap.h @@ -0,0 +1,29 @@ +#pragma once +#include + +/// DenseHashMap is a wrapper for google::dense_hash_map. +/// Some hacks are needed to make it work in "Arcadia". +/// "Arcadia" is a proprietary monorepository in Yandex. +/// It uses slightly changed version of sparsehash with a different set of hash functions (which we don't need). +/// Those defines are needed to make it compile. +#if defined(ARCADIA_BUILD) +#define HASH_FUN_H +template +struct THash; +#endif + +#include + +#if !defined(ARCADIA_BUILD) + template , + class EqualKey = std::equal_to, + class Alloc = google::libc_allocator_with_realloc>> + using DenseHashMap = google::dense_hash_map; +#else + template , + class EqualKey = std::equal_to, + class Alloc = google::sparsehash::libc_allocator_with_realloc>> + using DenseHashMap = google::sparsehash::dense_hash_map; + + #undef THash +#endif diff --git a/src/Common/DenseHashSet.h b/src/Common/DenseHashSet.h new file mode 100644 index 00000000000..e8c06f36aa3 --- /dev/null +++ b/src/Common/DenseHashSet.h @@ -0,0 +1,25 @@ +#pragma once + +/// DenseHashSet is a wrapper for google::dense_hash_set. +/// See comment in DenseHashMap.h +#if defined(ARCADIA_BUILD) +#define HASH_FUN_H +template +struct THash; +#endif + +#include + +#if !defined(ARCADIA_BUILD) + template , + class EqualKey = std::equal_to, + class Alloc = google::libc_allocator_with_realloc> + using DenseHashSet = google::dense_hash_set; +#else + template , + class EqualKey = std::equal_to, + class Alloc = google::sparsehash::libc_allocator_with_realloc> + using DenseHashSet = google::sparsehash::dense_hash_set; + + #undef THash +#endif diff --git a/src/Common/Exception.cpp b/src/Common/Exception.cpp index 641f8bbe0f0..09629b436b2 100644 --- a/src/Common/Exception.cpp +++ b/src/Common/Exception.cpp @@ -94,6 +94,22 @@ std::string getExceptionStackTraceString(const std::exception & e) #endif } +std::string getExceptionStackTraceString(std::exception_ptr e) +{ + try + { + std::rethrow_exception(e); + } + catch (const std::exception & exception) + { + return getExceptionStackTraceString(exception); + } + catch (...) + { + return {}; + } +} + std::string Exception::getStackTraceString() const { @@ -380,6 +396,30 @@ int getCurrentExceptionCode() } } +int getExceptionErrorCode(std::exception_ptr e) +{ + try + { + std::rethrow_exception(e); + } + catch (const Exception & exception) + { + return exception.code(); + } + catch (const Poco::Exception &) + { + return ErrorCodes::POCO_EXCEPTION; + } + catch (const std::exception &) + { + return ErrorCodes::STD_EXCEPTION; + } + catch (...) + { + return ErrorCodes::UNKNOWN_EXCEPTION; + } +} + void rethrowFirstException(const Exceptions & exceptions) { diff --git a/src/Common/Exception.h b/src/Common/Exception.h index 79b4394948a..d04b0f71b9e 100644 --- a/src/Common/Exception.h +++ b/src/Common/Exception.h @@ -82,6 +82,7 @@ private: std::string getExceptionStackTraceString(const std::exception & e); +std::string getExceptionStackTraceString(std::exception_ptr e); /// Contains an additional member `saved_errno`. See the throwFromErrno function. @@ -167,6 +168,7 @@ std::string getCurrentExceptionMessage(bool with_stacktrace, bool check_embedded /// Returns error code from ErrorCodes int getCurrentExceptionCode(); +int getExceptionErrorCode(std::exception_ptr e); /// An execution status of any piece of code, contains return code and optional error diff --git a/src/Common/MemoryTracker.cpp b/src/Common/MemoryTracker.cpp index a05fa3b5ad5..50ddcb5a9eb 100644 --- a/src/Common/MemoryTracker.cpp +++ b/src/Common/MemoryTracker.cpp @@ -183,9 +183,6 @@ void MemoryTracker::allocImpl(Int64 size, bool throw_if_memory_exceeded) std::bernoulli_distribution fault(fault_probability); if (unlikely(fault_probability && fault(thread_local_rng)) && memoryTrackerCanThrow(level, true) && throw_if_memory_exceeded) { - ProfileEvents::increment(ProfileEvents::QueryMemoryLimitExceeded); - amount.fetch_sub(size, std::memory_order_relaxed); - /// Prevent recursion. Exception::ctor -> std::string -> new[] -> MemoryTracker::alloc BlockerInThread untrack_lock(VariableContext::Global); @@ -363,7 +360,7 @@ void MemoryTracker::setOrRaiseHardLimit(Int64 value) { /// This is just atomic set to maximum. Int64 old_value = hard_limit.load(std::memory_order_relaxed); - while (old_value < value && !hard_limit.compare_exchange_weak(old_value, value)) + while ((value == 0 || old_value < value) && !hard_limit.compare_exchange_weak(old_value, value)) ; } @@ -371,6 +368,6 @@ void MemoryTracker::setOrRaiseHardLimit(Int64 value) void MemoryTracker::setOrRaiseProfilerLimit(Int64 value) { Int64 old_value = profiler_limit.load(std::memory_order_relaxed); - while (old_value < value && !profiler_limit.compare_exchange_weak(old_value, value)) + while ((value == 0 || old_value < value) && !profiler_limit.compare_exchange_weak(old_value, value)) ; } diff --git a/src/Common/SparseHashMap.h b/src/Common/SparseHashMap.h new file mode 100644 index 00000000000..f01fc633d84 --- /dev/null +++ b/src/Common/SparseHashMap.h @@ -0,0 +1,25 @@ +#pragma once + +/// SparseHashMap is a wrapper for google::sparse_hash_map. +/// See comment in DenseHashMap.h +#if defined(ARCADIA_BUILD) +#define HASH_FUN_H +template +struct THash; +#endif + +#include + +#if !defined(ARCADIA_BUILD) + template , + class EqualKey = std::equal_to, + class Alloc = google::libc_allocator_with_realloc>> + using SparseHashMap = google::sparse_hash_map; +#else + template , + class EqualKey = std::equal_to, + class Alloc = google::sparsehash::libc_allocator_with_realloc>> + using SparseHashMap = google::sparsehash::sparse_hash_map; + + #undef THash +#endif diff --git a/src/Common/ThreadStatus.cpp b/src/Common/ThreadStatus.cpp index 0e12830e49d..81c6b8eb1c3 100644 --- a/src/Common/ThreadStatus.cpp +++ b/src/Common/ThreadStatus.cpp @@ -149,7 +149,11 @@ ThreadStatus::~ThreadStatus() if (deleter) deleter(); - current_thread = nullptr; + + /// Only change current_thread if it's currently being used by this ThreadStatus + /// For example, PushingToViewsBlockOutputStream creates and deletes ThreadStatus instances while running in the main query thread + if (current_thread == this) + current_thread = nullptr; } void ThreadStatus::updatePerformanceCounters() diff --git a/src/Common/ThreadStatus.h b/src/Common/ThreadStatus.h index 6fc43114621..dbfb33a320c 100644 --- a/src/Common/ThreadStatus.h +++ b/src/Common/ThreadStatus.h @@ -37,6 +37,8 @@ struct RUsageCounters; struct PerfEventsCounters; class TaskStatsInfoGetter; class InternalTextLogsQueue; +struct ViewRuntimeData; +class QueryViewsLog; using InternalTextLogsQueuePtr = std::shared_ptr; using InternalTextLogsQueueWeakPtr = std::weak_ptr; @@ -143,6 +145,7 @@ protected: Poco::Logger * log = nullptr; friend class CurrentThread; + friend class PushingToViewsBlockOutputStream; /// Use ptr not to add extra dependencies in the header std::unique_ptr last_rusage; @@ -151,6 +154,9 @@ protected: /// Is used to send logs from logs_queue to client in case of fatal errors. std::function fatal_error_callback; + /// It is used to avoid enabling the query profiler when you have multiple ThreadStatus in the same thread + bool query_profiled_enabled = true; + public: ThreadStatus(); ~ThreadStatus(); @@ -210,9 +216,13 @@ public: /// Update ProfileEvents and dumps info to system.query_thread_log void finalizePerformanceCounters(); + /// Set the counters last usage to now + void resetPerformanceCountersLastUsage(); + /// Detaches thread from the thread group and the query, dumps performance counters if they have not been dumped void detachQuery(bool exit_if_already_detached = false, bool thread_exits = false); + protected: void applyQuerySettings(); @@ -224,6 +234,8 @@ protected: void logToQueryThreadLog(QueryThreadLog & thread_log, const String & current_database, std::chrono::time_point now); + void logToQueryViewsLog(const ViewRuntimeData & vinfo); + void assertState(const std::initializer_list & permitted_states, const char * description = nullptr) const; diff --git a/src/Common/ya.make b/src/Common/ya.make index 60dfd5f6bee..82962123e56 100644 --- a/src/Common/ya.make +++ b/src/Common/ya.make @@ -102,6 +102,7 @@ SRCS( ZooKeeper/ZooKeeperNodeCache.cpp checkStackSize.cpp clearPasswordFromCommandLine.cpp + clickhouse_malloc.cpp createHardLink.cpp escapeForFileName.cpp filesystemHelpers.cpp @@ -116,6 +117,7 @@ SRCS( hex.cpp isLocalAddress.cpp malloc.cpp + memory.cpp new_delete.cpp parseAddress.cpp parseGlobs.cpp diff --git a/src/Compression/CompressionCodecEncrypted.cpp b/src/Compression/CompressionCodecEncrypted.cpp index d0904b4bf24..6b921fb9c0a 100644 --- a/src/Compression/CompressionCodecEncrypted.cpp +++ b/src/Compression/CompressionCodecEncrypted.cpp @@ -1,13 +1,15 @@ -#include +#if !defined(ARCADIA_BUILD) +# include +#endif #include #if USE_SSL && USE_INTERNAL_SSL_LIBRARY #include #include #include -#include +#include // Y_IGNORE #include -#include +#include // Y_IGNORE #include namespace DB diff --git a/src/Compression/CompressionCodecEncrypted.h b/src/Compression/CompressionCodecEncrypted.h index e58fd4ab173..bacd58bcd2f 100644 --- a/src/Compression/CompressionCodecEncrypted.h +++ b/src/Compression/CompressionCodecEncrypted.h @@ -2,11 +2,11 @@ // This depends on BoringSSL-specific API, notably . #include -#if USE_SSL && USE_INTERNAL_SSL_LIBRARY +#if USE_SSL && USE_INTERNAL_SSL_LIBRARY && !defined(ARCADIA_BUILD) #include #include -#include +#include // Y_IGNORE #include namespace DB diff --git a/src/Coordination/KeeperStorageDispatcher.cpp b/src/Coordination/KeeperStorageDispatcher.cpp index e95a6940baa..7c416b38d8b 100644 --- a/src/Coordination/KeeperStorageDispatcher.cpp +++ b/src/Coordination/KeeperStorageDispatcher.cpp @@ -1,6 +1,5 @@ #include #include -#include #include #include #include diff --git a/src/Core/ExternalResultDescription.h b/src/Core/ExternalResultDescription.h index 78c054e805f..a9ffe8b2ed2 100644 --- a/src/Core/ExternalResultDescription.h +++ b/src/Core/ExternalResultDescription.h @@ -6,7 +6,7 @@ namespace DB { -/** Common part for implementation of MySQLBlockInputStream, MongoDBBlockInputStream and others. +/** Common part for implementation of MySQLSource, MongoDBSource and others. */ struct ExternalResultDescription { diff --git a/src/Core/NamesAndTypes.cpp b/src/Core/NamesAndTypes.cpp index 91191c73fd0..54f83fc13fc 100644 --- a/src/Core/NamesAndTypes.cpp +++ b/src/Core/NamesAndTypes.cpp @@ -6,7 +6,7 @@ #include #include #include -#include +#include namespace DB @@ -163,11 +163,7 @@ NamesAndTypesList NamesAndTypesList::filter(const Names & names) const NamesAndTypesList NamesAndTypesList::addTypes(const Names & names) const { /// NOTE: It's better to make a map in `IStorage` than to create it here every time again. -#if !defined(ARCADIA_BUILD) - google::dense_hash_map types; -#else - google::sparsehash::dense_hash_map types; -#endif + DenseHashMap types; types.set_empty_key(StringRef()); for (const auto & column : *this) diff --git a/src/Core/Settings.h b/src/Core/Settings.h index e1bd1d29153..19f9f2a94c8 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -173,7 +173,7 @@ class IColumn; M(Bool, log_queries, 1, "Log requests and write the log to the system table.", 0) \ M(Bool, log_formatted_queries, 0, "Log formatted queries and write the log to the system table.", 0) \ M(LogQueriesType, log_queries_min_type, QueryLogElementType::QUERY_START, "Minimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING.", 0) \ - M(Milliseconds, log_queries_min_query_duration_ms, 0, "Minimal time for the query to run, to get to the query_log/query_thread_log.", 0) \ + M(Milliseconds, log_queries_min_query_duration_ms, 0, "Minimal time for the query to run, to get to the query_log/query_thread_log/query_views_log.", 0) \ M(UInt64, log_queries_cut_to_length, 100000, "If query length is greater than specified threshold (in bytes), then cut query when writing to query log. Also limit length of printed query in ordinary text log.", 0) \ \ M(DistributedProductMode, distributed_product_mode, DistributedProductMode::DENY, "How are distributed subqueries performed inside IN or JOIN sections?", IMPORTANT) \ @@ -352,9 +352,10 @@ class IColumn; M(UInt64, max_network_bandwidth_for_user, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running user queries. Zero means unlimited.", 0)\ M(UInt64, max_network_bandwidth_for_all_users, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running queries. Zero means unlimited.", 0) \ \ - M(Bool, log_profile_events, true, "Log query performance statistics into the query_log and query_thread_log.", 0) \ + M(Bool, log_profile_events, true, "Log query performance statistics into the query_log, query_thread_log and query_views_log.", 0) \ M(Bool, log_query_settings, true, "Log query settings into the query_log.", 0) \ M(Bool, log_query_threads, true, "Log query threads into system.query_thread_log table. This setting have effect only when 'log_queries' is true.", 0) \ + M(Bool, log_query_views, true, "Log query dependent views into system.query_views_log table. This setting have effect only when 'log_queries' is true.", 0) \ M(String, log_comment, "", "Log comment into system.query_log table and server log. It can be set to arbitrary string no longer than max_query_size.", 0) \ M(LogsLevel, send_logs_level, LogsLevel::fatal, "Send server text logs with specified minimum level to client. Valid values: 'trace', 'debug', 'information', 'warning', 'error', 'fatal', 'none'", 0) \ M(Bool, enable_optimize_predicate_expression, 1, "If it is set to true, optimize predicates to subqueries.", 0) \ @@ -527,6 +528,9 @@ class IColumn; M(Bool, input_format_tsv_empty_as_default, false, "Treat empty fields in TSV input as default values.", 0) \ M(Bool, input_format_tsv_enum_as_number, false, "Treat inserted enum values in TSV formats as enum indices \\N", 0) \ M(Bool, input_format_null_as_default, true, "For text input formats initialize null fields with default values if data type of this field is not nullable", 0) \ + M(Bool, input_format_arrow_import_nested, false, "Allow to insert array of structs into Nested table in Arrow input format.", 0) \ + M(Bool, input_format_orc_import_nested, false, "Allow to insert array of structs into Nested table in ORC input format.", 0) \ + M(Bool, input_format_parquet_import_nested, false, "Allow to insert array of structs into Nested table in Parquet input format.", 0) \ \ M(DateTimeInputFormat, date_time_input_format, FormatSettings::DateTimeInputFormat::Basic, "Method to read DateTime from text input formats. Possible values: 'basic' and 'best_effort'.", 0) \ M(DateTimeOutputFormat, date_time_output_format, FormatSettings::DateTimeOutputFormat::Simple, "Method to write DateTime to text output. Possible values: 'simple', 'iso', 'unix_timestamp'.", 0) \ diff --git a/src/DataStreams/ExecutionSpeedLimits.h b/src/DataStreams/ExecutionSpeedLimits.h index d52dc713c1a..9c86ba2faf4 100644 --- a/src/DataStreams/ExecutionSpeedLimits.h +++ b/src/DataStreams/ExecutionSpeedLimits.h @@ -3,7 +3,8 @@ #include #include #include -#include + +class Stopwatch; namespace DB { diff --git a/src/DataStreams/MongoDBBlockInputStream.cpp b/src/DataStreams/MongoDBSource.cpp similarity index 99% rename from src/DataStreams/MongoDBBlockInputStream.cpp rename to src/DataStreams/MongoDBSource.cpp index a0a8e3e40a5..c00d214249a 100644 --- a/src/DataStreams/MongoDBBlockInputStream.cpp +++ b/src/DataStreams/MongoDBSource.cpp @@ -1,3 +1,5 @@ +#include "MongoDBSource.h" + #include #include @@ -15,7 +17,6 @@ #include #include #include -#include #include #include #include diff --git a/src/DataStreams/MongoDBBlockInputStream.h b/src/DataStreams/MongoDBSource.h similarity index 100% rename from src/DataStreams/MongoDBBlockInputStream.h rename to src/DataStreams/MongoDBSource.h diff --git a/src/DataStreams/PostgreSQLBlockInputStream.cpp b/src/DataStreams/PostgreSQLSource.cpp similarity index 98% rename from src/DataStreams/PostgreSQLBlockInputStream.cpp rename to src/DataStreams/PostgreSQLSource.cpp index 7f8949740df..c3bde8c84ad 100644 --- a/src/DataStreams/PostgreSQLBlockInputStream.cpp +++ b/src/DataStreams/PostgreSQLSource.cpp @@ -1,4 +1,4 @@ -#include "PostgreSQLBlockInputStream.h" +#include "PostgreSQLSource.h" #if USE_LIBPQXX #include @@ -73,7 +73,7 @@ void PostgreSQLSource::init(const Block & sample_block) template void PostgreSQLSource::onStart() { - if (connection_holder) + if (!tx) tx = std::make_shared(connection_holder->get()); stream = std::make_unique(*tx, pqxx::from_query, std::string_view(query_str)); diff --git a/src/DataStreams/PostgreSQLBlockInputStream.h b/src/DataStreams/PostgreSQLSource.h similarity index 86% rename from src/DataStreams/PostgreSQLBlockInputStream.h rename to src/DataStreams/PostgreSQLSource.h index 008da976619..2736afec7a9 100644 --- a/src/DataStreams/PostgreSQLBlockInputStream.h +++ b/src/DataStreams/PostgreSQLSource.h @@ -76,19 +76,6 @@ public: const Block & sample_block_, const UInt64 max_block_size_) : PostgreSQLSource(tx_, query_str_, sample_block_, max_block_size_, false) {} - - Chunk generate() override - { - if (!is_initialized) - { - Base::stream = std::make_unique(*Base::tx, pqxx::from_query, std::string_view(Base::query_str)); - is_initialized = true; - } - - return Base::generate(); - } - - bool is_initialized = false; }; } diff --git a/src/DataStreams/PushingToViewsBlockOutputStream.cpp b/src/DataStreams/PushingToViewsBlockOutputStream.cpp index 7729eb5fb44..dec5b710f75 100644 --- a/src/DataStreams/PushingToViewsBlockOutputStream.cpp +++ b/src/DataStreams/PushingToViewsBlockOutputStream.cpp @@ -1,24 +1,31 @@ #include +#include +#include +#include #include #include -#include -#include #include #include -#include -#include #include +#include +#include #include -#include -#include -#include -#include -#include -#include #include +#include #include +#include +#include +#include +#include +#include +#include +#include +#include #include -#include +#include + +#include +#include namespace DB { @@ -79,9 +86,12 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream( ASTPtr query; BlockOutputStreamPtr out; + QueryViewsLogElement::ViewType type = QueryViewsLogElement::ViewType::DEFAULT; + String target_name = database_table.getFullTableName(); if (auto * materialized_view = dynamic_cast(dependent_table.get())) { + type = QueryViewsLogElement::ViewType::MATERIALIZED; addTableLock( materialized_view->lockForShare(getContext()->getInitialQueryId(), getContext()->getSettingsRef().lock_acquire_timeout)); @@ -89,6 +99,7 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream( auto inner_table_id = inner_table->getStorageID(); auto inner_metadata_snapshot = inner_table->getInMemoryMetadataPtr(); query = dependent_metadata_snapshot->getSelectQuery().inner_query; + target_name = inner_table_id.getFullTableName(); std::unique_ptr insert = std::make_unique(); insert->table_id = inner_table_id; @@ -114,14 +125,57 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream( BlockIO io = interpreter.execute(); out = io.out; } - else if (dynamic_cast(dependent_table.get())) + else if (const auto * live_view = dynamic_cast(dependent_table.get())) + { + type = QueryViewsLogElement::ViewType::LIVE; + query = live_view->getInnerQuery(); // Used only to log in system.query_views_log out = std::make_shared( dependent_table, dependent_metadata_snapshot, insert_context, ASTPtr(), true); + } else out = std::make_shared( dependent_table, dependent_metadata_snapshot, insert_context, ASTPtr()); - views.emplace_back(ViewInfo{std::move(query), database_table, std::move(out), nullptr, 0 /* elapsed_ms */}); + /// If the materialized view is executed outside of a query, for example as a result of SYSTEM FLUSH LOGS or + /// SYSTEM FLUSH DISTRIBUTED ..., we can't attach to any thread group and we won't log, so there is no point on collecting metrics + std::unique_ptr thread_status = nullptr; + + ThreadGroupStatusPtr running_group = current_thread && current_thread->getThreadGroup() + ? current_thread->getThreadGroup() + : MainThreadStatus::getInstance().thread_group; + if (running_group) + { + /// We are creating a ThreadStatus per view to store its metrics individually + /// Since calling ThreadStatus() changes current_thread we save it and restore it after the calls + /// Later on, before doing any task related to a view, we'll switch to its ThreadStatus, do the work, + /// and switch back to the original thread_status. + auto * original_thread = current_thread; + SCOPE_EXIT({ current_thread = original_thread; }); + + thread_status = std::make_unique(); + /// Disable query profiler for this ThreadStatus since the running (main query) thread should already have one + /// If we didn't disable it, then we could end up with N + 1 (N = number of dependencies) profilers which means + /// N times more interruptions + thread_status->query_profiled_enabled = false; + thread_status->setupState(running_group); + } + + QueryViewsLogElement::ViewRuntimeStats runtime_stats{ + target_name, + type, + std::move(thread_status), + 0, + std::chrono::system_clock::now(), + QueryViewsLogElement::ViewStatus::EXCEPTION_BEFORE_START}; + views.emplace_back(ViewRuntimeData{std::move(query), database_table, std::move(out), nullptr, std::move(runtime_stats)}); + + + /// Add the view to the query access info so it can appear in system.query_log + if (!no_destination) + { + getContext()->getQueryContext()->addQueryAccessInfo( + backQuoteIfNeed(database_table.getDatabaseName()), target_name, {}, "", database_table.getFullTableName()); + } } /// Do not push to destination table if the flag is set @@ -136,7 +190,6 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream( } } - Block PushingToViewsBlockOutputStream::getHeader() const { /// If we don't write directly to the destination @@ -147,6 +200,39 @@ Block PushingToViewsBlockOutputStream::getHeader() const return metadata_snapshot->getSampleBlockWithVirtuals(storage->getVirtuals()); } +/// Auxiliary function to do the setup and teardown to run a view individually and collect its metrics inside the view ThreadStatus +void inline runViewStage(ViewRuntimeData & view, const std::string & action, std::function stage) +{ + Stopwatch watch; + + auto * original_thread = current_thread; + SCOPE_EXIT({ current_thread = original_thread; }); + + if (view.runtime_stats.thread_status) + { + /// Change thread context to store individual metrics per view. Once the work in done, go back to the original thread + view.runtime_stats.thread_status->resetPerformanceCountersLastUsage(); + current_thread = view.runtime_stats.thread_status.get(); + } + + try + { + stage(); + } + catch (Exception & ex) + { + ex.addMessage(action + " " + view.table_id.getNameForLogs()); + view.setException(std::current_exception()); + } + catch (...) + { + view.setException(std::current_exception()); + } + + if (view.runtime_stats.thread_status) + view.runtime_stats.thread_status->updatePerformanceCounters(); + view.runtime_stats.elapsed_ms += watch.elapsedMilliseconds(); +} void PushingToViewsBlockOutputStream::write(const Block & block) { @@ -169,39 +255,34 @@ void PushingToViewsBlockOutputStream::write(const Block & block) output->write(block); } - /// Don't process materialized views if this block is duplicate - if (!getContext()->getSettingsRef().deduplicate_blocks_in_dependent_materialized_views && replicated_output && replicated_output->lastBlockIsDuplicate()) + if (views.empty()) return; - // Insert data into materialized views only after successful insert into main table + /// Don't process materialized views if this block is duplicate const Settings & settings = getContext()->getSettingsRef(); - if (settings.parallel_view_processing && views.size() > 1) + if (!settings.deduplicate_blocks_in_dependent_materialized_views && replicated_output && replicated_output->lastBlockIsDuplicate()) + return; + + size_t max_threads = 1; + if (settings.parallel_view_processing) + max_threads = settings.max_threads ? std::min(static_cast(settings.max_threads), views.size()) : views.size(); + if (max_threads > 1) { - // Push to views concurrently if enabled and more than one view is attached - ThreadPool pool(std::min(size_t(settings.max_threads), views.size())); + ThreadPool pool(max_threads); for (auto & view : views) { - auto thread_group = CurrentThread::getGroup(); - pool.scheduleOrThrowOnError([=, &view, this] - { + pool.scheduleOrThrowOnError([&] { setThreadName("PushingToViews"); - if (thread_group) - CurrentThread::attachToIfDetached(thread_group); - process(block, view); + runViewStage(view, "while pushing to view", [&]() { process(block, view); }); }); } - // Wait for concurrent view processing pool.wait(); } else { - // Process sequentially for (auto & view : views) { - process(block, view); - - if (view.exception) - std::rethrow_exception(view.exception); + runViewStage(view, "while pushing to view", [&]() { process(block, view); }); } } } @@ -213,14 +294,11 @@ void PushingToViewsBlockOutputStream::writePrefix() for (auto & view : views) { - try + runViewStage(view, "while writing prefix to view", [&] { view.out->writePrefix(); }); + if (view.exception) { - view.out->writePrefix(); - } - catch (Exception & ex) - { - ex.addMessage("while write prefix to view " + view.table_id.getNameForLogs()); - throw; + logQueryViews(); + std::rethrow_exception(view.exception); } } } @@ -230,95 +308,82 @@ void PushingToViewsBlockOutputStream::writeSuffix() if (output) output->writeSuffix(); - std::exception_ptr first_exception; + if (views.empty()) + return; - const Settings & settings = getContext()->getSettingsRef(); - bool parallel_processing = false; + auto process_suffix = [](ViewRuntimeData & view) + { + view.out->writeSuffix(); + view.runtime_stats.setStatus(QueryViewsLogElement::ViewStatus::QUERY_FINISH); + }; + static std::string stage_step = "while writing suffix to view"; /// Run writeSuffix() for views in separate thread pool. /// In could have been done in PushingToViewsBlockOutputStream::process, however /// it is not good if insert into main table fail but into view succeed. - if (settings.parallel_view_processing && views.size() > 1) + const Settings & settings = getContext()->getSettingsRef(); + size_t max_threads = 1; + if (settings.parallel_view_processing) + max_threads = settings.max_threads ? std::min(static_cast(settings.max_threads), views.size()) : views.size(); + bool exception_happened = false; + if (max_threads > 1) { - parallel_processing = true; - - // Push to views concurrently if enabled and more than one view is attached - ThreadPool pool(std::min(size_t(settings.max_threads), views.size())); - auto thread_group = CurrentThread::getGroup(); - + ThreadPool pool(max_threads); + std::atomic_uint8_t exception_count = 0; for (auto & view : views) { if (view.exception) - continue; - - pool.scheduleOrThrowOnError([thread_group, &view, this] { + exception_happened = true; + continue; + } + pool.scheduleOrThrowOnError([&] { setThreadName("PushingToViews"); - if (thread_group) - CurrentThread::attachToIfDetached(thread_group); - Stopwatch watch; - try - { - view.out->writeSuffix(); - } - catch (...) - { - view.exception = std::current_exception(); - } - view.elapsed_ms += watch.elapsedMilliseconds(); - - LOG_TRACE(log, "Pushing from {} to {} took {} ms.", - storage->getStorageID().getNameForLogs(), - view.table_id.getNameForLogs(), - view.elapsed_ms); + runViewStage(view, stage_step, [&] { process_suffix(view); }); + if (view.exception) + exception_count.fetch_add(1, std::memory_order_relaxed); }); } - // Wait for concurrent view processing pool.wait(); + exception_happened |= exception_count.load(std::memory_order_relaxed) != 0; + } + else + { + for (auto & view : views) + { + if (view.exception) + { + exception_happened = true; + continue; + } + runViewStage(view, stage_step, [&] { process_suffix(view); }); + if (view.exception) + exception_happened = true; + } } for (auto & view : views) { - if (view.exception) - { - if (!first_exception) - first_exception = view.exception; - - continue; - } - - if (parallel_processing) - continue; - - Stopwatch watch; - try - { - view.out->writeSuffix(); - } - catch (Exception & ex) - { - ex.addMessage("while write prefix to view " + view.table_id.getNameForLogs()); - throw; - } - view.elapsed_ms += watch.elapsedMilliseconds(); - - LOG_TRACE(log, "Pushing from {} to {} took {} ms.", - storage->getStorageID().getNameForLogs(), - view.table_id.getNameForLogs(), - view.elapsed_ms); + if (!view.exception) + LOG_TRACE( + log, + "Pushing ({}) from {} to {} took {} ms.", + max_threads <= 1 ? "sequentially" : ("parallel " + std::to_string(max_threads)), + storage->getStorageID().getNameForLogs(), + view.table_id.getNameForLogs(), + view.runtime_stats.elapsed_ms); } - if (first_exception) - std::rethrow_exception(first_exception); + if (exception_happened) + checkExceptionsInViews(); - UInt64 milliseconds = main_watch.elapsedMilliseconds(); if (views.size() > 1) { - LOG_DEBUG(log, "Pushing from {} to {} views took {} ms.", - storage->getStorageID().getNameForLogs(), views.size(), - milliseconds); + UInt64 milliseconds = main_watch.elapsedMilliseconds(); + LOG_DEBUG(log, "Pushing from {} to {} views took {} ms.", storage->getStorageID().getNameForLogs(), views.size(), milliseconds); } + logQueryViews(); } void PushingToViewsBlockOutputStream::flush() @@ -330,70 +395,103 @@ void PushingToViewsBlockOutputStream::flush() view.out->flush(); } -void PushingToViewsBlockOutputStream::process(const Block & block, ViewInfo & view) +void PushingToViewsBlockOutputStream::process(const Block & block, ViewRuntimeData & view) { - Stopwatch watch; + BlockInputStreamPtr in; - try + /// We need keep InterpreterSelectQuery, until the processing will be finished, since: + /// + /// - We copy Context inside InterpreterSelectQuery to support + /// modification of context (Settings) for subqueries + /// - InterpreterSelectQuery lives shorter than query pipeline. + /// It's used just to build the query pipeline and no longer needed + /// - ExpressionAnalyzer and then, Functions, that created in InterpreterSelectQuery, + /// **can** take a reference to Context from InterpreterSelectQuery + /// (the problem raises only when function uses context from the + /// execute*() method, like FunctionDictGet do) + /// - These objects live inside query pipeline (DataStreams) and the reference become dangling. + std::optional select; + + if (view.runtime_stats.type == QueryViewsLogElement::ViewType::MATERIALIZED) { - BlockInputStreamPtr in; + /// We create a table with the same name as original table and the same alias columns, + /// but it will contain single block (that is INSERT-ed into main table). + /// InterpreterSelectQuery will do processing of alias columns. - /// We need keep InterpreterSelectQuery, until the processing will be finished, since: - /// - /// - We copy Context inside InterpreterSelectQuery to support - /// modification of context (Settings) for subqueries - /// - InterpreterSelectQuery lives shorter than query pipeline. - /// It's used just to build the query pipeline and no longer needed - /// - ExpressionAnalyzer and then, Functions, that created in InterpreterSelectQuery, - /// **can** take a reference to Context from InterpreterSelectQuery - /// (the problem raises only when function uses context from the - /// execute*() method, like FunctionDictGet do) - /// - These objects live inside query pipeline (DataStreams) and the reference become dangling. - std::optional select; + auto local_context = Context::createCopy(select_context); + local_context->addViewSource( + StorageValues::create(storage->getStorageID(), metadata_snapshot->getColumns(), block, storage->getVirtuals())); + select.emplace(view.query, local_context, SelectQueryOptions()); + in = std::make_shared(select->execute().getInputStream()); - if (view.query) - { - /// We create a table with the same name as original table and the same alias columns, - /// but it will contain single block (that is INSERT-ed into main table). - /// InterpreterSelectQuery will do processing of alias columns. - - auto local_context = Context::createCopy(select_context); - local_context->addViewSource( - StorageValues::create(storage->getStorageID(), metadata_snapshot->getColumns(), block, storage->getVirtuals())); - select.emplace(view.query, local_context, SelectQueryOptions()); - in = std::make_shared(select->execute().getInputStream()); - - /// Squashing is needed here because the materialized view query can generate a lot of blocks - /// even when only one block is inserted into the parent table (e.g. if the query is a GROUP BY - /// and two-level aggregation is triggered). - in = std::make_shared( - in, getContext()->getSettingsRef().min_insert_block_size_rows, getContext()->getSettingsRef().min_insert_block_size_bytes); - in = std::make_shared(in, view.out->getHeader(), ConvertingBlockInputStream::MatchColumnsMode::Name); - } - else - in = std::make_shared(block); - - in->readPrefix(); - - while (Block result_block = in->read()) - { - Nested::validateArraySizes(result_block); - view.out->write(result_block); - } - - in->readSuffix(); + /// Squashing is needed here because the materialized view query can generate a lot of blocks + /// even when only one block is inserted into the parent table (e.g. if the query is a GROUP BY + /// and two-level aggregation is triggered). + in = std::make_shared( + in, getContext()->getSettingsRef().min_insert_block_size_rows, getContext()->getSettingsRef().min_insert_block_size_bytes); + in = std::make_shared(in, view.out->getHeader(), ConvertingBlockInputStream::MatchColumnsMode::Name); } - catch (Exception & ex) + else + in = std::make_shared(block); + + in->setProgressCallback([this](const Progress & progress) { - ex.addMessage("while pushing to view " + view.table_id.getNameForLogs()); - view.exception = std::current_exception(); - } - catch (...) + CurrentThread::updateProgressIn(progress); + this->onProgress(progress); + }); + + in->readPrefix(); + + while (Block result_block = in->read()) { - view.exception = std::current_exception(); + Nested::validateArraySizes(result_block); + view.out->write(result_block); } - view.elapsed_ms += watch.elapsedMilliseconds(); + in->readSuffix(); } +void PushingToViewsBlockOutputStream::checkExceptionsInViews() +{ + for (auto & view : views) + { + if (view.exception) + { + logQueryViews(); + std::rethrow_exception(view.exception); + } + } +} + +void PushingToViewsBlockOutputStream::logQueryViews() +{ + const auto & settings = getContext()->getSettingsRef(); + const UInt64 min_query_duration = settings.log_queries_min_query_duration_ms.totalMilliseconds(); + const QueryViewsLogElement::ViewStatus min_status = settings.log_queries_min_type; + if (views.empty() || !settings.log_queries || !settings.log_query_views) + return; + + for (auto & view : views) + { + if ((min_query_duration && view.runtime_stats.elapsed_ms <= min_query_duration) || (view.runtime_stats.event_status < min_status)) + continue; + + try + { + if (view.runtime_stats.thread_status) + view.runtime_stats.thread_status->logToQueryViewsLog(view); + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + } + } +} + + +void PushingToViewsBlockOutputStream::onProgress(const Progress & progress) +{ + if (getContext()->getProgressCallback()) + getContext()->getProgressCallback()(progress); +} } diff --git a/src/DataStreams/PushingToViewsBlockOutputStream.h b/src/DataStreams/PushingToViewsBlockOutputStream.h index db6b671ce2c..ba125e28829 100644 --- a/src/DataStreams/PushingToViewsBlockOutputStream.h +++ b/src/DataStreams/PushingToViewsBlockOutputStream.h @@ -1,6 +1,7 @@ #pragma once #include +#include #include #include #include @@ -8,13 +9,28 @@ namespace Poco { class Logger; -}; +} namespace DB { class ReplicatedMergeTreeSink; +struct ViewRuntimeData +{ + const ASTPtr query; + StorageID table_id; + BlockOutputStreamPtr out; + std::exception_ptr exception; + QueryViewsLogElement::ViewRuntimeStats runtime_stats; + + void setException(std::exception_ptr e) + { + exception = e; + runtime_stats.setStatus(QueryViewsLogElement::ViewStatus::EXCEPTION_WHILE_PROCESSING); + } +}; + /** Writes data to the specified table and to all dependent materialized views. */ class PushingToViewsBlockOutputStream : public IBlockOutputStream, WithContext @@ -33,6 +49,7 @@ public: void flush() override; void writePrefix() override; void writeSuffix() override; + void onProgress(const Progress & progress) override; private: StoragePtr storage; @@ -44,20 +61,13 @@ private: ASTPtr query_ptr; Stopwatch main_watch; - struct ViewInfo - { - ASTPtr query; - StorageID table_id; - BlockOutputStreamPtr out; - std::exception_ptr exception; - UInt64 elapsed_ms = 0; - }; - - std::vector views; + std::vector views; ContextMutablePtr select_context; ContextMutablePtr insert_context; - void process(const Block & block, ViewInfo & view); + void process(const Block & block, ViewRuntimeData & view); + void checkExceptionsInViews(); + void logQueryViews(); }; diff --git a/src/DataStreams/SQLiteBlockInputStream.cpp b/src/DataStreams/SQLiteSource.cpp similarity index 74% rename from src/DataStreams/SQLiteBlockInputStream.cpp rename to src/DataStreams/SQLiteSource.cpp index da7645d968d..d0d8724c2dd 100644 --- a/src/DataStreams/SQLiteBlockInputStream.cpp +++ b/src/DataStreams/SQLiteSource.cpp @@ -1,4 +1,4 @@ -#include "SQLiteBlockInputStream.h" +#include "SQLiteSource.h" #if USE_SQLITE #include @@ -22,21 +22,18 @@ namespace ErrorCodes extern const int SQLITE_ENGINE_ERROR; } -SQLiteBlockInputStream::SQLiteBlockInputStream( - SQLitePtr sqlite_db_, - const String & query_str_, - const Block & sample_block, - const UInt64 max_block_size_) - : query_str(query_str_) +SQLiteSource::SQLiteSource( + SQLitePtr sqlite_db_, + const String & query_str_, + const Block & sample_block, + const UInt64 max_block_size_) + : SourceWithProgress(sample_block.cloneEmpty()) + , query_str(query_str_) , max_block_size(max_block_size_) , sqlite_db(std::move(sqlite_db_)) { description.init(sample_block); -} - -void SQLiteBlockInputStream::readPrefix() -{ sqlite3_stmt * compiled_stmt = nullptr; int status = sqlite3_prepare_v2(sqlite_db.get(), query_str.c_str(), query_str.size() + 1, &compiled_stmt, nullptr); @@ -48,11 +45,10 @@ void SQLiteBlockInputStream::readPrefix() compiled_statement = std::unique_ptr(compiled_stmt, StatementDeleter()); } - -Block SQLiteBlockInputStream::readImpl() +Chunk SQLiteSource::generate() { if (!compiled_statement) - return Block(); + return {}; MutableColumns columns = description.sample_block.cloneEmptyColumns(); size_t num_rows = 0; @@ -73,30 +69,30 @@ Block SQLiteBlockInputStream::readImpl() else if (status != SQLITE_ROW) { throw Exception(ErrorCodes::SQLITE_ENGINE_ERROR, - "Expected SQLITE_ROW status, but got status {}. Error: {}, Message: {}", - status, sqlite3_errstr(status), sqlite3_errmsg(sqlite_db.get())); + "Expected SQLITE_ROW status, but got status {}. Error: {}, Message: {}", + status, sqlite3_errstr(status), sqlite3_errmsg(sqlite_db.get())); } int column_count = sqlite3_column_count(compiled_statement.get()); - for (const auto idx : collections::range(0, column_count)) - { - const auto & sample = description.sample_block.getByPosition(idx); - if (sqlite3_column_type(compiled_statement.get(), idx) == SQLITE_NULL) + for (int column_index = 0; column_index < column_count; ++column_index) + { + if (sqlite3_column_type(compiled_statement.get(), column_index) == SQLITE_NULL) { - insertDefaultSQLiteValue(*columns[idx], *sample.column); + columns[column_index]->insertDefault(); continue; } - if (description.types[idx].second) + auto & [type, is_nullable] = description.types[column_index]; + if (is_nullable) { - ColumnNullable & column_nullable = assert_cast(*columns[idx]); - insertValue(column_nullable.getNestedColumn(), description.types[idx].first, idx); + ColumnNullable & column_nullable = assert_cast(*columns[column_index]); + insertValue(column_nullable.getNestedColumn(), type, column_index); column_nullable.getNullMapData().emplace_back(0); } else { - insertValue(*columns[idx], description.types[idx].first, idx); + insertValue(*columns[column_index], type, column_index); } } @@ -104,18 +100,16 @@ Block SQLiteBlockInputStream::readImpl() break; } - return description.sample_block.cloneWithColumns(std::move(columns)); -} - - -void SQLiteBlockInputStream::readSuffix() -{ - if (compiled_statement) + if (num_rows == 0) + { compiled_statement.reset(); + return {}; + } + + return Chunk(std::move(columns), num_rows); } - -void SQLiteBlockInputStream::insertValue(IColumn & column, const ExternalResultDescription::ValueType type, size_t idx) +void SQLiteSource::insertValue(IColumn & column, ExternalResultDescription::ValueType type, size_t idx) { switch (type) { diff --git a/src/DataStreams/SQLiteBlockInputStream.h b/src/DataStreams/SQLiteSource.h similarity index 59% rename from src/DataStreams/SQLiteBlockInputStream.h rename to src/DataStreams/SQLiteSource.h index 35fc4801b4b..0f8b42c536b 100644 --- a/src/DataStreams/SQLiteBlockInputStream.h +++ b/src/DataStreams/SQLiteSource.h @@ -6,32 +6,28 @@ #if USE_SQLITE #include -#include +#include #include // Y_IGNORE namespace DB { -class SQLiteBlockInputStream : public IBlockInputStream + +class SQLiteSource : public SourceWithProgress { + using SQLitePtr = std::shared_ptr; public: - SQLiteBlockInputStream(SQLitePtr sqlite_db_, + SQLiteSource(SQLitePtr sqlite_db_, const String & query_str_, const Block & sample_block, UInt64 max_block_size_); String getName() const override { return "SQLite"; } - Block getHeader() const override { return description.sample_block.cloneEmpty(); } - private: - void insertDefaultSQLiteValue(IColumn & column, const IColumn & sample_column) - { - column.insertFrom(sample_column, 0); - } using ValueType = ExternalResultDescription::ValueType; @@ -40,19 +36,14 @@ private: void operator()(sqlite3_stmt * stmt) { sqlite3_finalize(stmt); } }; - void readPrefix() override; + Chunk generate() override; - Block readImpl() override; - - void readSuffix() override; - - void insertValue(IColumn & column, const ExternalResultDescription::ValueType type, size_t idx); + void insertValue(IColumn & column, ExternalResultDescription::ValueType type, size_t idx); String query_str; UInt64 max_block_size; ExternalResultDescription description; - SQLitePtr sqlite_db; std::unique_ptr compiled_statement; }; diff --git a/src/DataStreams/ya.make b/src/DataStreams/ya.make index b1205828a7e..c16db808a5b 100644 --- a/src/DataStreams/ya.make +++ b/src/DataStreams/ya.make @@ -29,7 +29,7 @@ SRCS( ITTLAlgorithm.cpp InternalTextLogsRowOutputStream.cpp MaterializingBlockInputStream.cpp - MongoDBBlockInputStream.cpp + MongoDBSource.cpp NativeBlockInputStream.cpp NativeBlockOutputStream.cpp PushingToViewsBlockOutputStream.cpp @@ -37,7 +37,7 @@ SRCS( RemoteBlockOutputStream.cpp RemoteQueryExecutor.cpp RemoteQueryExecutorReadContext.cpp - SQLiteBlockInputStream.cpp + SQLiteSource.cpp SizeLimits.cpp SquashingBlockInputStream.cpp SquashingBlockOutputStream.cpp diff --git a/src/DataTypes/NestedUtils.cpp b/src/DataTypes/NestedUtils.cpp index ed9ea3e1b5c..94b3b2f3cf7 100644 --- a/src/DataTypes/NestedUtils.cpp +++ b/src/DataTypes/NestedUtils.cpp @@ -208,6 +208,18 @@ void validateArraySizes(const Block & block) } } +std::unordered_set getAllTableNames(const Block & block) +{ + std::unordered_set nested_table_names; + for (auto & name : block.getNames()) + { + auto nested_table_name = Nested::extractTableName(name); + if (!nested_table_name.empty()) + nested_table_names.insert(nested_table_name); + } + return nested_table_names; +} + } } diff --git a/src/DataTypes/NestedUtils.h b/src/DataTypes/NestedUtils.h index b8428b96d3e..d16e309fc81 100644 --- a/src/DataTypes/NestedUtils.h +++ b/src/DataTypes/NestedUtils.h @@ -28,6 +28,9 @@ namespace Nested /// Check that sizes of arrays - elements of nested data structures - are equal. void validateArraySizes(const Block & block); + + /// Get all nested tables names from a block. + std::unordered_set getAllTableNames(const Block & block); } } diff --git a/src/Databases/DatabaseReplicated.cpp b/src/Databases/DatabaseReplicated.cpp index 26dd8763c40..8e8fb4e2d6d 100644 --- a/src/Databases/DatabaseReplicated.cpp +++ b/src/Databases/DatabaseReplicated.cpp @@ -8,7 +8,6 @@ #include #include #include -#include #include #include #include diff --git a/src/Databases/MySQL/DatabaseMySQL.cpp b/src/Databases/MySQL/DatabaseMySQL.cpp index d4acd2af85e..858255e730a 100644 --- a/src/Databases/MySQL/DatabaseMySQL.cpp +++ b/src/Databases/MySQL/DatabaseMySQL.cpp @@ -11,7 +11,7 @@ # include # include # include -# include +# include # include # include # include diff --git a/src/Databases/MySQL/FetchTablesColumnsList.cpp b/src/Databases/MySQL/FetchTablesColumnsList.cpp index 353bcd877ee..c67dcefb433 100644 --- a/src/Databases/MySQL/FetchTablesColumnsList.cpp +++ b/src/Databases/MySQL/FetchTablesColumnsList.cpp @@ -10,7 +10,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/Databases/MySQL/MaterializeMetadata.cpp b/src/Databases/MySQL/MaterializeMetadata.cpp index 9f5100991aa..f684797c675 100644 --- a/src/Databases/MySQL/MaterializeMetadata.cpp +++ b/src/Databases/MySQL/MaterializeMetadata.cpp @@ -5,7 +5,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/Databases/MySQL/MaterializedMySQLSyncThread.cpp b/src/Databases/MySQL/MaterializedMySQLSyncThread.cpp index 5175e9d0467..53495aa3cb1 100644 --- a/src/Databases/MySQL/MaterializedMySQLSyncThread.cpp +++ b/src/Databases/MySQL/MaterializedMySQLSyncThread.cpp @@ -16,7 +16,7 @@ # include # include # include -# include +# include # include # include # include diff --git a/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp b/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp index c848c784712..259648f4399 100644 --- a/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp +++ b/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp @@ -164,7 +164,7 @@ StoragePtr DatabasePostgreSQL::tryGetTable(const String & table_name, ContextPtr } -StoragePtr DatabasePostgreSQL::fetchTable(const String & table_name, ContextPtr local_context, const bool table_checked) const +StoragePtr DatabasePostgreSQL::fetchTable(const String & table_name, ContextPtr, const bool table_checked) const { if (!cache_tables || !cached_tables.count(table_name)) { @@ -179,7 +179,7 @@ StoragePtr DatabasePostgreSQL::fetchTable(const String & table_name, ContextPtr auto storage = StoragePostgreSQL::create( StorageID(database_name, table_name), pool, table_name, - ColumnsDescription{*columns}, ConstraintsDescription{}, String{}, local_context, postgres_schema); + ColumnsDescription{*columns}, ConstraintsDescription{}, String{}, postgres_schema); if (cache_tables) cached_tables[table_name] = storage; diff --git a/src/Dictionaries/CacheDictionary.cpp b/src/Dictionaries/CacheDictionary.cpp index 4dfe802dd2b..a5f953ccc15 100644 --- a/src/Dictionaries/CacheDictionary.cpp +++ b/src/Dictionaries/CacheDictionary.cpp @@ -10,7 +10,7 @@ #include #include -#include +#include #include #include @@ -18,21 +18,21 @@ namespace ProfileEvents { -extern const Event DictCacheKeysRequested; -extern const Event DictCacheKeysRequestedMiss; -extern const Event DictCacheKeysRequestedFound; -extern const Event DictCacheKeysExpired; -extern const Event DictCacheKeysNotFound; -extern const Event DictCacheKeysHit; -extern const Event DictCacheRequestTimeNs; -extern const Event DictCacheRequests; -extern const Event DictCacheLockWriteNs; -extern const Event DictCacheLockReadNs; + extern const Event DictCacheKeysRequested; + extern const Event DictCacheKeysRequestedMiss; + extern const Event DictCacheKeysRequestedFound; + extern const Event DictCacheKeysExpired; + extern const Event DictCacheKeysNotFound; + extern const Event DictCacheKeysHit; + extern const Event DictCacheRequestTimeNs; + extern const Event DictCacheRequests; + extern const Event DictCacheLockWriteNs; + extern const Event DictCacheLockReadNs; } namespace CurrentMetrics { -extern const Metric DictCacheRequests; + extern const Metric DictCacheRequests; } namespace DB diff --git a/src/Dictionaries/CassandraDictionarySource.cpp b/src/Dictionaries/CassandraDictionarySource.cpp index 8b31b4d6fa2..aa8d6107508 100644 --- a/src/Dictionaries/CassandraDictionarySource.cpp +++ b/src/Dictionaries/CassandraDictionarySource.cpp @@ -36,10 +36,10 @@ void registerDictionarySourceCassandra(DictionarySourceFactory & factory) #if USE_CASSANDRA -#include -#include -#include "CassandraBlockInputStream.h" #include +#include +#include +#include namespace DB { @@ -49,7 +49,7 @@ namespace ErrorCodes extern const int INVALID_CONFIG_PARAMETER; } -CassandraSettings::CassandraSettings( +CassandraDictionarySource::Configuration::Configuration( const Poco::Util::AbstractConfiguration & config, const String & config_prefix) : host(config.getString(config_prefix + ".host")) @@ -66,7 +66,7 @@ CassandraSettings::CassandraSettings( setConsistency(config.getString(config_prefix + ".consistency", "One")); } -void CassandraSettings::setConsistency(const String & config_str) +void CassandraDictionarySource::Configuration::setConsistency(const String & config_str) { if (config_str == "One") consistency = CASS_CONSISTENCY_ONE; @@ -96,19 +96,19 @@ static const size_t max_block_size = 8192; CassandraDictionarySource::CassandraDictionarySource( const DictionaryStructure & dict_struct_, - const CassandraSettings & settings_, + const Configuration & configuration_, const Block & sample_block_) : log(&Poco::Logger::get("CassandraDictionarySource")) , dict_struct(dict_struct_) - , settings(settings_) + , configuration(configuration_) , sample_block(sample_block_) - , query_builder(dict_struct, settings.db, "", settings.table, settings.where, IdentifierQuotingStyle::DoubleQuotes) + , query_builder(dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::DoubleQuotes) { - cassandraCheck(cass_cluster_set_contact_points(cluster, settings.host.c_str())); - if (settings.port) - cassandraCheck(cass_cluster_set_port(cluster, settings.port)); - cass_cluster_set_credentials(cluster, settings.user.c_str(), settings.password.c_str()); - cassandraCheck(cass_cluster_set_consistency(cluster, settings.consistency)); + cassandraCheck(cass_cluster_set_contact_points(cluster, configuration.host.c_str())); + if (configuration.port) + cassandraCheck(cass_cluster_set_port(cluster, configuration.port)); + cass_cluster_set_credentials(cluster, configuration.user.c_str(), configuration.password.c_str()); + cassandraCheck(cass_cluster_set_consistency(cluster, configuration.consistency)); } CassandraDictionarySource::CassandraDictionarySource( @@ -118,14 +118,14 @@ CassandraDictionarySource::CassandraDictionarySource( Block & sample_block_) : CassandraDictionarySource( dict_struct_, - CassandraSettings(config, config_prefix), + Configuration(config, config_prefix), sample_block_) { } void CassandraDictionarySource::maybeAllowFiltering(String & query) const { - if (!settings.allow_filtering) + if (!configuration.allow_filtering) return; query.pop_back(); /// remove semicolon query += " ALLOW FILTERING;"; @@ -141,7 +141,7 @@ Pipe CassandraDictionarySource::loadAll() std::string CassandraDictionarySource::toString() const { - return "Cassandra: " + settings.db + '.' + settings.table; + return "Cassandra: " + configuration.db + '.' + configuration.table; } Pipe CassandraDictionarySource::loadIds(const std::vector & ids) @@ -162,7 +162,7 @@ Pipe CassandraDictionarySource::loadKeys(const Columns & key_columns, const std: for (const auto & row : requested_rows) { SipHash partition_key; - for (size_t i = 0; i < settings.partition_key_prefix; ++i) + for (size_t i = 0; i < configuration.partition_key_prefix; ++i) key_columns[i]->updateHashWithValue(row, partition_key); partitions[partition_key.get64()].push_back(row); } @@ -170,7 +170,7 @@ Pipe CassandraDictionarySource::loadKeys(const Columns & key_columns, const std: Pipes pipes; for (const auto & partition : partitions) { - String query = query_builder.composeLoadKeysQuery(key_columns, partition.second, ExternalQueryBuilder::CASSANDRA_SEPARATE_PARTITION_KEY, settings.partition_key_prefix); + String query = query_builder.composeLoadKeysQuery(key_columns, partition.second, ExternalQueryBuilder::CASSANDRA_SEPARATE_PARTITION_KEY, configuration.partition_key_prefix); maybeAllowFiltering(query); LOG_INFO(log, "Loading keys for partition hash {} using query: {}", partition.first, query); pipes.push_back(Pipe(std::make_shared(getSession(), query, sample_block, max_block_size))); diff --git a/src/Dictionaries/CassandraDictionarySource.h b/src/Dictionaries/CassandraDictionarySource.h index 871e3dc4857..35419d3ea7d 100644 --- a/src/Dictionaries/CassandraDictionarySource.h +++ b/src/Dictionaries/CassandraDictionarySource.h @@ -14,33 +14,35 @@ namespace DB { -struct CassandraSettings -{ - String host; - UInt16 port; - String user; - String password; - String db; - String table; - - CassConsistency consistency; - bool allow_filtering; - /// TODO get information about key from the driver - size_t partition_key_prefix; - size_t max_threads; - String where; - - CassandraSettings(const Poco::Util::AbstractConfiguration & config, const String & config_prefix); - - void setConsistency(const String & config_str); -}; - class CassandraDictionarySource final : public IDictionarySource { public: + + struct Configuration + { + String host; + UInt16 port; + String user; + String password; + String db; + String table; + String query; + + CassConsistency consistency; + bool allow_filtering; + /// TODO get information about key from the driver + size_t partition_key_prefix; + size_t max_threads; + String where; + + Configuration(const Poco::Util::AbstractConfiguration & config, const String & config_prefix); + + void setConsistency(const String & config_str); + }; + CassandraDictionarySource( const DictionaryStructure & dict_struct, - const CassandraSettings & settings_, + const Configuration & configuration, const Block & sample_block); CassandraDictionarySource( @@ -59,7 +61,7 @@ public: DictionarySourcePtr clone() const override { - return std::make_unique(dict_struct, settings, sample_block); + return std::make_unique(dict_struct, configuration, sample_block); } Pipe loadIds(const std::vector & ids) override; @@ -76,7 +78,7 @@ private: Poco::Logger * log; const DictionaryStructure dict_struct; - const CassandraSettings settings; + const Configuration configuration; Block sample_block; ExternalQueryBuilder query_builder; diff --git a/src/Dictionaries/CassandraBlockInputStream.cpp b/src/Dictionaries/CassandraSource.cpp similarity index 99% rename from src/Dictionaries/CassandraBlockInputStream.cpp rename to src/Dictionaries/CassandraSource.cpp index 384717e2ba2..1ebacdb2c2f 100644 --- a/src/Dictionaries/CassandraBlockInputStream.cpp +++ b/src/Dictionaries/CassandraSource.cpp @@ -10,7 +10,7 @@ #include #include #include -#include "CassandraBlockInputStream.h" +#include "CassandraSource.h" namespace DB diff --git a/src/Dictionaries/CassandraBlockInputStream.h b/src/Dictionaries/CassandraSource.h similarity index 100% rename from src/Dictionaries/CassandraBlockInputStream.h rename to src/Dictionaries/CassandraSource.h diff --git a/src/Dictionaries/ClickHouseDictionarySource.cpp b/src/Dictionaries/ClickHouseDictionarySource.cpp index 8b2373302c8..0f085a7c1a2 100644 --- a/src/Dictionaries/ClickHouseDictionarySource.cpp +++ b/src/Dictionaries/ClickHouseDictionarySource.cpp @@ -67,7 +67,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource( : update_time{std::chrono::system_clock::from_time_t(0)} , dict_struct{dict_struct_} , configuration{configuration_} - , query_builder{dict_struct, configuration.db, "", configuration.table, configuration.where, IdentifierQuotingStyle::Backticks} + , query_builder{dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::Backticks} , sample_block{sample_block_} , context(Context::createCopy(context_)) , pool{createPool(configuration)} @@ -83,7 +83,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(const ClickHouseDictionar , dict_struct{other.dict_struct} , configuration{other.configuration} , invalidate_query_response{other.invalidate_query_response} - , query_builder{dict_struct, configuration.db, "", configuration.table, configuration.where, IdentifierQuotingStyle::Backticks} + , query_builder{dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::Backticks} , sample_block{other.sample_block} , context(Context::createCopy(other.context)) , pool{createPool(configuration)} @@ -241,7 +241,8 @@ void registerDictionarySourceClickHouse(DictionarySourceFactory & factory) .user = config.getString(settings_config_prefix + ".user", "default"), .password = config.getString(settings_config_prefix + ".password", ""), .db = config.getString(settings_config_prefix + ".db", default_database), - .table = config.getString(settings_config_prefix + ".table"), + .table = config.getString(settings_config_prefix + ".table", ""), + .query = config.getString(settings_config_prefix + ".query", ""), .where = config.getString(settings_config_prefix + ".where", ""), .invalidate_query = config.getString(settings_config_prefix + ".invalidate_query", ""), .update_field = config.getString(settings_config_prefix + ".update_field", ""), diff --git a/src/Dictionaries/ClickHouseDictionarySource.h b/src/Dictionaries/ClickHouseDictionarySource.h index f293c010ec3..2daa296af3e 100644 --- a/src/Dictionaries/ClickHouseDictionarySource.h +++ b/src/Dictionaries/ClickHouseDictionarySource.h @@ -25,6 +25,7 @@ public: const std::string password; const std::string db; const std::string table; + const std::string query; const std::string where; const std::string invalidate_query; const std::string update_field; diff --git a/src/Dictionaries/DictionaryHelpers.h b/src/Dictionaries/DictionaryHelpers.h index 5a050d68326..dde41864ddc 100644 --- a/src/Dictionaries/DictionaryHelpers.h +++ b/src/Dictionaries/DictionaryHelpers.h @@ -648,6 +648,16 @@ static const PaddedPODArray & getColumnVectorData( } } +template +static ColumnPtr getColumnFromPODArray(const PaddedPODArray & array) +{ + auto column_vector = ColumnVector::create(); + column_vector->getData().reserve(array.size()); + column_vector->getData().insert(array.begin(), array.end()); + + return column_vector; +} + } diff --git a/src/Dictionaries/DictionaryBlockInputStream.cpp b/src/Dictionaries/DictionarySource.cpp similarity index 89% rename from src/Dictionaries/DictionaryBlockInputStream.cpp rename to src/Dictionaries/DictionarySource.cpp index fedde8bd886..7ba6ea82ca9 100644 --- a/src/Dictionaries/DictionaryBlockInputStream.cpp +++ b/src/Dictionaries/DictionarySource.cpp @@ -1,4 +1,5 @@ -#include "DictionaryBlockInputStream.h" +#include "DictionarySource.h" +#include namespace DB { @@ -12,7 +13,7 @@ DictionarySourceData::DictionarySourceData( std::shared_ptr dictionary_, PaddedPODArray && ids_, const Names & column_names_) : num_rows(ids_.size()) , dictionary(dictionary_) - , column_names(column_names_) + , column_names(column_names_.begin(), column_names_.end()) , ids(std::move(ids_)) , key_type(DictionaryInputStreamKeyType::Id) { @@ -24,7 +25,7 @@ DictionarySourceData::DictionarySourceData( const Names & column_names_) : num_rows(keys.size()) , dictionary(dictionary_) - , column_names(column_names_) + , column_names(column_names_.begin(), column_names_.end()) , key_type(DictionaryInputStreamKeyType::ComplexKey) { const DictionaryStructure & dictionary_structure = dictionary->getStructure(); @@ -39,7 +40,7 @@ DictionarySourceData::DictionarySourceData( GetColumnsFunction && get_view_columns_function_) : num_rows(data_columns_.front()->size()) , dictionary(dictionary_) - , column_names(column_names_) + , column_names(column_names_.begin(), column_names_.end()) , data_columns(data_columns_) , get_key_columns_function(std::move(get_key_columns_function_)) , get_view_columns_function(std::move(get_view_columns_function_)) @@ -102,8 +103,6 @@ Block DictionarySourceData::fillBlock( const DataTypes & types, ColumnsWithTypeAndName && view) const { - std::unordered_set names(column_names.begin(), column_names.end()); - DataTypes data_types = types; ColumnsWithTypeAndName block_columns; @@ -114,13 +113,13 @@ Block DictionarySourceData::fillBlock( data_types.push_back(key.type); for (const auto & column : view) - if (names.find(column.name) != names.end()) + if (column_names.find(column.name) != column_names.end()) block_columns.push_back(column); const DictionaryStructure & structure = dictionary->getStructure(); - ColumnPtr ids_column = getColumnFromIds(ids_to_fill); + ColumnPtr ids_column = getColumnFromPODArray(ids_to_fill); - if (structure.id && names.find(structure.id->name) != names.end()) + if (structure.id && column_names.find(structure.id->name) != column_names.end()) { block_columns.emplace_back(ids_column, std::make_shared(), structure.id->name); } @@ -129,7 +128,7 @@ Block DictionarySourceData::fillBlock( for (const auto & attribute : structure.attributes) { - if (names.find(attribute.name) != names.end()) + if (column_names.find(attribute.name) != column_names.end()) { ColumnPtr column; @@ -159,13 +158,6 @@ Block DictionarySourceData::fillBlock( return Block(block_columns); } -ColumnPtr DictionarySourceData::getColumnFromIds(const PaddedPODArray & ids_to_fill) -{ - auto column_vector = ColumnVector::create(); - column_vector->getData().assign(ids_to_fill); - return column_vector; -} - void DictionarySourceData::fillKeyColumns( const PaddedPODArray & keys, size_t start, diff --git a/src/Dictionaries/DictionaryBlockInputStream.h b/src/Dictionaries/DictionarySource.h similarity index 86% rename from src/Dictionaries/DictionaryBlockInputStream.h rename to src/Dictionaries/DictionarySource.h index c15406487e2..195a3c66484 100644 --- a/src/Dictionaries/DictionaryBlockInputStream.h +++ b/src/Dictionaries/DictionarySource.h @@ -7,19 +7,14 @@ #include #include #include -#include -#include "DictionaryBlockInputStreamBase.h" -#include "DictionaryStructure.h" -#include "IDictionary.h" +#include +#include +#include namespace DB { -/// TODO: Remove this class -/* BlockInputStream implementation for external dictionaries - * read() returns blocks consisting of the in-memory contents of the dictionaries - */ class DictionarySourceData { public: @@ -56,8 +51,6 @@ private: const DataTypes & types, ColumnsWithTypeAndName && view) const; - static ColumnPtr getColumnFromIds(const PaddedPODArray & ids_to_fill); - static void fillKeyColumns( const PaddedPODArray & keys, size_t start, @@ -67,7 +60,7 @@ private: const size_t num_rows; std::shared_ptr dictionary; - Names column_names; + std::unordered_set column_names; PaddedPODArray ids; ColumnsWithTypeAndName key_columns; diff --git a/src/Dictionaries/DictionaryBlockInputStreamBase.cpp b/src/Dictionaries/DictionarySourceBase.cpp similarity index 91% rename from src/Dictionaries/DictionaryBlockInputStreamBase.cpp rename to src/Dictionaries/DictionarySourceBase.cpp index 0eac8edac3d..cc420b33144 100644 --- a/src/Dictionaries/DictionaryBlockInputStreamBase.cpp +++ b/src/Dictionaries/DictionarySourceBase.cpp @@ -1,4 +1,4 @@ -#include "DictionaryBlockInputStreamBase.h" +#include "DictionarySourceBase.h" namespace DB { diff --git a/src/Dictionaries/DictionaryBlockInputStreamBase.h b/src/Dictionaries/DictionarySourceBase.h similarity index 100% rename from src/Dictionaries/DictionaryBlockInputStreamBase.h rename to src/Dictionaries/DictionarySourceBase.h diff --git a/src/Dictionaries/ExternalQueryBuilder.cpp b/src/Dictionaries/ExternalQueryBuilder.cpp index e0920535e33..10c4f67d809 100644 --- a/src/Dictionaries/ExternalQueryBuilder.cpp +++ b/src/Dictionaries/ExternalQueryBuilder.cpp @@ -21,10 +21,23 @@ ExternalQueryBuilder::ExternalQueryBuilder( const std::string & db_, const std::string & schema_, const std::string & table_, + const std::string & query_, const std::string & where_, IdentifierQuotingStyle quoting_style_) - : dict_struct(dict_struct_), db(db_), schema(schema_), table(table_), where(where_), quoting_style(quoting_style_) -{} + : dict_struct(dict_struct_) + , db(db_) + , schema(schema_) + , table(table_) + , query(query_) + , where(where_) + , quoting_style(quoting_style_) +{ + if (table.empty() && query.empty()) + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Setting `table` or `query` must be non empty"); + + if (!query.empty() && (!table.empty() || !where.empty())) + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Setting `table` or `where` cannot be used with `query` parameter"); +} void ExternalQueryBuilder::writeQuoted(const std::string & s, WriteBuffer & out) const @@ -52,10 +65,17 @@ void ExternalQueryBuilder::writeQuoted(const std::string & s, WriteBuffer & out) std::string ExternalQueryBuilder::composeLoadAllQuery() const { - WriteBufferFromOwnString out; - composeLoadAllQuery(out); - writeChar(';', out); - return out.str(); + if (query.empty()) + { + WriteBufferFromOwnString out; + composeLoadAllQuery(out); + writeChar(';', out); + return out.str(); + } + else + { + return query; + } } void ExternalQueryBuilder::composeLoadAllQuery(WriteBuffer & out) const @@ -152,74 +172,314 @@ void ExternalQueryBuilder::composeLoadAllQuery(WriteBuffer & out) const std::string ExternalQueryBuilder::composeUpdateQuery(const std::string & update_field, const std::string & time_point) const { WriteBufferFromOwnString out; - composeLoadAllQuery(out); - if (!where.empty()) - writeString(" AND ", out); + if (query.empty()) + { + composeLoadAllQuery(out); + + if (!where.empty()) + writeString(" AND ", out); + else + writeString(" WHERE ", out); + + composeUpdateCondition(update_field, time_point, out); + + writeChar(';', out); + + return out.str(); + } else - writeString(" WHERE ", out); + { + writeString(query, out); - writeString(update_field, out); - writeString(" >= '", out); - writeString(time_point, out); - writeChar('\'', out); + auto condition_position = query.find("{condition}"); + if (condition_position == std::string::npos) + { + writeString(" WHERE ", out); + composeUpdateCondition(update_field, time_point, out); + writeString(";", out); - writeChar(';', out); - return out.str(); + return out.str(); + } + + WriteBufferFromOwnString condition_value_buffer; + composeUpdateCondition(update_field, time_point, condition_value_buffer); + const auto & condition_value = condition_value_buffer.str(); + + auto query_copy = query; + query_copy.replace(condition_position, condition_value.size(), condition_value); + + return query_copy; + } } -std::string ExternalQueryBuilder::composeLoadIdsQuery(const std::vector & ids) +std::string ExternalQueryBuilder::composeLoadIdsQuery(const std::vector & ids) const { if (!dict_struct.id) throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Simple key required for method"); WriteBufferFromOwnString out; - writeString("SELECT ", out); - if (!dict_struct.id->expression.empty()) + if (query.empty()) { - writeParenthesisedString(dict_struct.id->expression, out); - writeString(" AS ", out); - } + writeString("SELECT ", out); - writeQuoted(dict_struct.id->name, out); - - for (const auto & attr : dict_struct.attributes) - { - writeString(", ", out); - - if (!attr.expression.empty()) + if (!dict_struct.id->expression.empty()) { - writeParenthesisedString(attr.expression, out); + writeParenthesisedString(dict_struct.id->expression, out); writeString(" AS ", out); } - writeQuoted(attr.name, out); - } + writeQuoted(dict_struct.id->name, out); - writeString(" FROM ", out); - if (!db.empty()) + for (const auto & attr : dict_struct.attributes) + { + writeString(", ", out); + + if (!attr.expression.empty()) + { + writeParenthesisedString(attr.expression, out); + writeString(" AS ", out); + } + + writeQuoted(attr.name, out); + } + + writeString(" FROM ", out); + if (!db.empty()) + { + writeQuoted(db, out); + writeChar('.', out); + } + if (!schema.empty()) + { + writeQuoted(schema, out); + writeChar('.', out); + } + + writeQuoted(table, out); + + writeString(" WHERE ", out); + + if (!where.empty()) + { + writeString(where, out); + writeString(" AND ", out); + } + + composeIdsCondition(ids, out); + writeString(";", out); + + return out.str(); + } + else { - writeQuoted(db, out); - writeChar('.', out); + writeString(query, out); + + auto condition_position = query.find("{condition}"); + if (condition_position == std::string::npos) + { + writeString(" WHERE ", out); + composeIdsCondition(ids, out); + writeString(";", out); + + return out.str(); + } + + WriteBufferFromOwnString condition_value_buffer; + composeIdsCondition(ids, condition_value_buffer); + const auto & condition_value = condition_value_buffer.str(); + + auto query_copy = query; + query_copy.replace(condition_position, condition_value.size(), condition_value); + + return query_copy; } - if (!schema.empty()) +} + + +std::string ExternalQueryBuilder::composeLoadKeysQuery( + const Columns & key_columns, const std::vector & requested_rows, LoadKeysMethod method, size_t partition_key_prefix) const +{ + if (!dict_struct.key) + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Composite key required for method"); + + if (key_columns.size() != dict_struct.key->size()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of key_columns does not equal to the size of dictionary key"); + + WriteBufferFromOwnString out; + + if (query.empty()) { - writeQuoted(schema, out); - writeChar('.', out); + writeString("SELECT ", out); + + auto first = true; + for (const auto & key_or_attribute : boost::join(*dict_struct.key, dict_struct.attributes)) + { + if (!first) + writeString(", ", out); + + first = false; + + if (!key_or_attribute.expression.empty()) + { + writeParenthesisedString(key_or_attribute.expression, out); + writeString(" AS ", out); + } + + writeQuoted(key_or_attribute.name, out); + } + + writeString(" FROM ", out); + if (!db.empty()) + { + writeQuoted(db, out); + writeChar('.', out); + } + if (!schema.empty()) + { + writeQuoted(schema, out); + writeChar('.', out); + } + + writeQuoted(table, out); + + writeString(" WHERE ", out); + + if (!where.empty()) + { + if (method != CASSANDRA_SEPARATE_PARTITION_KEY) + writeString("(", out); + writeString(where, out); + if (method != CASSANDRA_SEPARATE_PARTITION_KEY) + writeString(") AND (", out); + else + writeString(" AND ", out); + } + + composeKeysCondition(key_columns, requested_rows, method, partition_key_prefix, out); + + writeString(";", out); + + return out.str(); } - - writeQuoted(table, out); - - writeString(" WHERE ", out); - - if (!where.empty()) + else { - writeString(where, out); - writeString(" AND ", out); + writeString(query, out); + + auto condition_position = query.find("{condition}"); + if (condition_position == std::string::npos) + { + writeString(" WHERE ", out); + composeKeysCondition(key_columns, requested_rows, method, partition_key_prefix, out); + writeString(";", out); + + return out.str(); + } + + WriteBufferFromOwnString condition_value_buffer; + composeKeysCondition(key_columns, requested_rows, method, partition_key_prefix, condition_value_buffer); + const auto & condition_value = condition_value_buffer.str(); + + auto query_copy = query; + query_copy.replace(condition_position, condition_value.size(), condition_value); + + return query_copy; + } +} + + +void ExternalQueryBuilder::composeKeyCondition(const Columns & key_columns, size_t row, WriteBuffer & out, + size_t beg, size_t end) const +{ + auto first = true; + for (size_t i = beg; i < end; ++i) + { + if (!first) + writeString(" AND ", out); + + first = false; + + const auto & key_description = (*dict_struct.key)[i]; + + /// key_i=value_i + writeQuoted(key_description.name, out); + writeString("=", out); + key_description.type_serialization->serializeTextQuoted(*key_columns[i], row, out, format_settings); + } +} + + +void ExternalQueryBuilder::composeInWithTuples(const Columns & key_columns, const std::vector & requested_rows, + WriteBuffer & out, size_t beg, size_t end) const +{ + composeKeyTupleDefinition(out, beg, end); + writeString(" IN (", out); + + bool first = true; + for (const auto row : requested_rows) + { + if (!first) + writeString(", ", out); + + first = false; + composeKeyTuple(key_columns, row, out, beg, end); } + writeString(")", out); +} + + +void ExternalQueryBuilder::composeKeyTupleDefinition(WriteBuffer & out, size_t beg, size_t end) const +{ + if (!dict_struct.key) + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Composite key required for method"); + + writeChar('(', out); + + auto first = true; + for (size_t i = beg; i < end; ++i) + { + if (!first) + writeString(", ", out); + + first = false; + writeQuoted((*dict_struct.key)[i].name, out); + } + + writeChar(')', out); +} + + +void ExternalQueryBuilder::composeKeyTuple(const Columns & key_columns, size_t row, WriteBuffer & out, size_t beg, size_t end) const +{ + writeString("(", out); + + auto first = true; + for (size_t i = beg; i < end; ++i) + { + if (!first) + writeString(", ", out); + + first = false; + auto serialization = (*dict_struct.key)[i].type_serialization; + serialization->serializeTextQuoted(*key_columns[i], row, out, format_settings); + } + + writeString(")", out); +} + +void ExternalQueryBuilder::composeUpdateCondition(const std::string & update_field, const std::string & time_point, WriteBuffer & out) +{ + writeString(update_field, out); + writeString(" >= '", out); + writeString(time_point, out); + writeChar('\'', out); +} + +void ExternalQueryBuilder::composeIdsCondition(const std::vector & ids, WriteBuffer & out) const +{ writeQuoted(dict_struct.id->name, out); writeString(" IN (", out); @@ -233,67 +493,12 @@ std::string ExternalQueryBuilder::composeLoadIdsQuery(const std::vector writeString(DB::toString(id), out); } - writeString(");", out); - - return out.str(); + writeString(")", out); } - -std::string ExternalQueryBuilder::composeLoadKeysQuery( - const Columns & key_columns, const std::vector & requested_rows, LoadKeysMethod method, size_t partition_key_prefix) +void ExternalQueryBuilder::composeKeysCondition(const Columns & key_columns, const std::vector & requested_rows, LoadKeysMethod method, size_t partition_key_prefix, WriteBuffer & out) const { - if (!dict_struct.key) - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Composite key required for method"); - - if (key_columns.size() != dict_struct.key->size()) - throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of key_columns does not equal to the size of dictionary key"); - - WriteBufferFromOwnString out; - writeString("SELECT ", out); - - auto first = true; - for (const auto & key_or_attribute : boost::join(*dict_struct.key, dict_struct.attributes)) - { - if (!first) - writeString(", ", out); - - first = false; - - if (!key_or_attribute.expression.empty()) - { - writeParenthesisedString(key_or_attribute.expression, out); - writeString(" AS ", out); - } - - writeQuoted(key_or_attribute.name, out); - } - - writeString(" FROM ", out); - if (!db.empty()) - { - writeQuoted(db, out); - writeChar('.', out); - } - if (!schema.empty()) - { - writeQuoted(schema, out); - writeChar('.', out); - } - - writeQuoted(table, out); - - writeString(" WHERE ", out); - - if (!where.empty()) - { - if (method != CASSANDRA_SEPARATE_PARTITION_KEY) - writeString("(", out); - writeString(where, out); - if (method != CASSANDRA_SEPARATE_PARTITION_KEY) - writeString(") AND (", out); - else - writeString(" AND ", out); - } + bool first = true; if (method == AND_OR_CHAIN) { @@ -334,92 +539,6 @@ std::string ExternalQueryBuilder::composeLoadKeysQuery( { writeString(")", out); } - - writeString(";", out); - - return out.str(); } - -void ExternalQueryBuilder::composeKeyCondition(const Columns & key_columns, const size_t row, WriteBuffer & out, - size_t beg, size_t end) const -{ - auto first = true; - for (size_t i = beg; i < end; ++i) - { - if (!first) - writeString(" AND ", out); - - first = false; - - const auto & key_description = (*dict_struct.key)[i]; - - /// key_i=value_i - writeQuoted(key_description.name, out); - writeString("=", out); - key_description.type_serialization->serializeTextQuoted(*key_columns[i], row, out, format_settings); - } -} - - -void ExternalQueryBuilder::composeInWithTuples(const Columns & key_columns, const std::vector & requested_rows, - WriteBuffer & out, size_t beg, size_t end) -{ - composeKeyTupleDefinition(out, beg, end); - writeString(" IN (", out); - - bool first = true; - for (const auto row : requested_rows) - { - if (!first) - writeString(", ", out); - - first = false; - composeKeyTuple(key_columns, row, out, beg, end); - } - - writeString(")", out); -} - - -void ExternalQueryBuilder::composeKeyTupleDefinition(WriteBuffer & out, size_t beg, size_t end) const -{ - if (!dict_struct.key) - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Composite key required for method"); - - writeChar('(', out); - - auto first = true; - for (size_t i = beg; i < end; ++i) - { - if (!first) - writeString(", ", out); - - first = false; - writeQuoted((*dict_struct.key)[i].name, out); - } - - writeChar(')', out); -} - - -void ExternalQueryBuilder::composeKeyTuple(const Columns & key_columns, const size_t row, WriteBuffer & out, size_t beg, size_t end) const -{ - writeString("(", out); - - auto first = true; - for (size_t i = beg; i < end; ++i) - { - if (!first) - writeString(", ", out); - - first = false; - auto serialization = (*dict_struct.key)[i].type_serialization; - serialization->serializeTextQuoted(*key_columns[i], row, out, format_settings); - } - - writeString(")", out); -} - - } diff --git a/src/Dictionaries/ExternalQueryBuilder.h b/src/Dictionaries/ExternalQueryBuilder.h index 9f9ccd65001..9d79ec3e702 100644 --- a/src/Dictionaries/ExternalQueryBuilder.h +++ b/src/Dictionaries/ExternalQueryBuilder.h @@ -21,6 +21,7 @@ struct ExternalQueryBuilder const std::string db; const std::string schema; const std::string table; + const std::string query; const std::string where; IdentifierQuotingStyle quoting_style; @@ -31,6 +32,7 @@ struct ExternalQueryBuilder const std::string & db_, const std::string & schema_, const std::string & table_, + const std::string & query_, const std::string & where_, IdentifierQuotingStyle quoting_style_); @@ -41,7 +43,7 @@ struct ExternalQueryBuilder std::string composeUpdateQuery(const std::string & update_field, const std::string & time_point) const; /** Generate a query to load data by set of UInt64 keys. */ - std::string composeLoadIdsQuery(const std::vector & ids); + std::string composeLoadIdsQuery(const std::vector & ids) const; /** Generate a query to load data by set of composite keys. * There are three methods of specification of composite keys in WHERE: @@ -56,7 +58,7 @@ struct ExternalQueryBuilder CASSANDRA_SEPARATE_PARTITION_KEY, }; - std::string composeLoadKeysQuery(const Columns & key_columns, const std::vector & requested_rows, LoadKeysMethod method, size_t partition_key_prefix = 0); + std::string composeLoadKeysQuery(const Columns & key_columns, const std::vector & requested_rows, LoadKeysMethod method, size_t partition_key_prefix = 0) const; private: @@ -67,16 +69,25 @@ private: /// In the following methods `beg` and `end` specifies which columns to write in expression /// Expression in form (x = c1 AND y = c2 ...) - void composeKeyCondition(const Columns & key_columns, const size_t row, WriteBuffer & out, size_t beg, size_t end) const; + void composeKeyCondition(const Columns & key_columns, size_t row, WriteBuffer & out, size_t beg, size_t end) const; /// Expression in form (x, y, ...) IN ((c1, c2, ...), ...) - void composeInWithTuples(const Columns & key_columns, const std::vector & requested_rows, WriteBuffer & out, size_t beg, size_t end); + void composeInWithTuples(const Columns & key_columns, const std::vector & requested_rows, WriteBuffer & out, size_t beg, size_t end) const; /// Expression in form (x, y, ...) void composeKeyTupleDefinition(WriteBuffer & out, size_t beg, size_t end) const; /// Expression in form (c1, c2, ...) - void composeKeyTuple(const Columns & key_columns, const size_t row, WriteBuffer & out, size_t beg, size_t end) const; + void composeKeyTuple(const Columns & key_columns, size_t row, WriteBuffer & out, size_t beg, size_t end) const; + + /// Compose update condition + static void composeUpdateCondition(const std::string & update_field, const std::string & time_point, WriteBuffer & out); + + /// Compose ids condition + void composeIdsCondition(const std::vector & ids, WriteBuffer & out) const; + + /// Compose keys condition + void composeKeysCondition(const Columns & key_columns, const std::vector & requested_rows, LoadKeysMethod method, size_t partition_key_prefix, WriteBuffer & out) const; /// Write string with specified quoting style. void writeQuoted(const std::string & s, WriteBuffer & out) const; diff --git a/src/Dictionaries/FlatDictionary.cpp b/src/Dictionaries/FlatDictionary.cpp index 58cb5048737..639895ac8ac 100644 --- a/src/Dictionaries/FlatDictionary.cpp +++ b/src/Dictionaries/FlatDictionary.cpp @@ -13,7 +13,7 @@ #include #include -#include +#include #include #include diff --git a/src/Dictionaries/HashedDictionary.cpp b/src/Dictionaries/HashedDictionary.cpp index b50b6a72707..189994dabf4 100644 --- a/src/Dictionaries/HashedDictionary.cpp +++ b/src/Dictionaries/HashedDictionary.cpp @@ -6,7 +6,7 @@ #include #include -#include +#include #include #include diff --git a/src/Dictionaries/HashedDictionary.h b/src/Dictionaries/HashedDictionary.h index 33c5fbf98bf..bf58638effc 100644 --- a/src/Dictionaries/HashedDictionary.h +++ b/src/Dictionaries/HashedDictionary.h @@ -5,7 +5,7 @@ #include #include -#include +#include #include #include @@ -125,14 +125,6 @@ private: HashMap, HashMapWithSavedHash>>; -#if !defined(ARCADIA_BUILD) - template - using SparseHashMap = google::sparse_hash_map>; -#else - template - using SparseHashMap = google::sparsehash::sparse_hash_map>; -#endif - template using CollectionTypeSparse = std::conditional_t< dictionary_key_type == DictionaryKeyType::simple, diff --git a/src/Dictionaries/IPAddressDictionary.cpp b/src/Dictionaries/IPAddressDictionary.cpp index 380ad460cba..fbe911c1d49 100644 --- a/src/Dictionaries/IPAddressDictionary.cpp +++ b/src/Dictionaries/IPAddressDictionary.cpp @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include diff --git a/src/Dictionaries/MongoDBDictionarySource.cpp b/src/Dictionaries/MongoDBDictionarySource.cpp index a3c5119ade1..23ea9bc00e2 100644 --- a/src/Dictionaries/MongoDBDictionarySource.cpp +++ b/src/Dictionaries/MongoDBDictionarySource.cpp @@ -50,7 +50,7 @@ void registerDictionarySourceMongoDB(DictionarySourceFactory & factory) // Poco/MongoDB/BSONWriter.h:54: void writeCString(const std::string & value); // src/IO/WriteHelpers.h:146 #define writeCString(s, buf) #include -#include +#include namespace DB diff --git a/src/Dictionaries/MySQLDictionarySource.cpp b/src/Dictionaries/MySQLDictionarySource.cpp index c7309ddb950..2eebb6970d0 100644 --- a/src/Dictionaries/MySQLDictionarySource.cpp +++ b/src/Dictionaries/MySQLDictionarySource.cpp @@ -22,6 +22,7 @@ static const size_t default_num_tries_on_connection_loss = 3; namespace ErrorCodes { extern const int SUPPORT_IS_DISABLED; + extern const int UNSUPPORTED_METHOD; } void registerDictionarySourceMysql(DictionarySourceFactory & factory) @@ -41,11 +42,19 @@ void registerDictionarySourceMysql(DictionarySourceFactory & factory) auto settings_config_prefix = config_prefix + ".mysql"; + auto table = config.getString(settings_config_prefix + ".table", ""); + auto where = config.getString(settings_config_prefix + ".where", ""); + auto query = config.getString(settings_config_prefix + ".query", ""); + + if (query.empty() && table.empty()) + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "MySQL dictionary source configuration must contain table or query field"); + MySQLDictionarySource::Configuration configuration { .db = config.getString(settings_config_prefix + ".db", ""), - .table = config.getString(settings_config_prefix + ".table"), - .where = config.getString(settings_config_prefix + ".where", ""), + .table = table, + .query = query, + .where = where, .invalidate_query = config.getString(settings_config_prefix + ".invalidate_query", ""), .update_field = config.getString(settings_config_prefix + ".update_field", ""), .update_lag = config.getUInt64(settings_config_prefix + ".update_lag", 1), @@ -94,7 +103,7 @@ MySQLDictionarySource::MySQLDictionarySource( , configuration(configuration_) , pool(std::move(pool_)) , sample_block(sample_block_) - , query_builder(dict_struct, configuration.db, "", configuration.table, configuration.where, IdentifierQuotingStyle::Backticks) + , query_builder(dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::Backticks) , load_all_query(query_builder.composeLoadAllQuery()) , settings(settings_) { @@ -108,7 +117,7 @@ MySQLDictionarySource::MySQLDictionarySource(const MySQLDictionarySource & other , configuration(other.configuration) , pool(other.pool) , sample_block(other.sample_block) - , query_builder{dict_struct, configuration.db, "", configuration.table, configuration.where, IdentifierQuotingStyle::Backticks} + , query_builder{dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::Backticks} , load_all_query{other.load_all_query} , last_modification{other.last_modification} , invalidate_query_response{other.invalidate_query_response} @@ -128,7 +137,7 @@ std::string MySQLDictionarySource::getUpdateFieldAndDate() else { update_time = std::chrono::system_clock::now(); - return query_builder.composeLoadAllQuery(); + return load_all_query; } } diff --git a/src/Dictionaries/MySQLDictionarySource.h b/src/Dictionaries/MySQLDictionarySource.h index 49ddc924a86..66c3f216e07 100644 --- a/src/Dictionaries/MySQLDictionarySource.h +++ b/src/Dictionaries/MySQLDictionarySource.h @@ -12,7 +12,7 @@ # include "DictionaryStructure.h" # include "ExternalQueryBuilder.h" # include "IDictionarySource.h" -# include +# include namespace Poco { @@ -35,6 +35,7 @@ public: { const std::string db; const std::string table; + const std::string query; const std::string where; const std::string invalidate_query; const std::string update_field; diff --git a/src/Dictionaries/PolygonDictionary.cpp b/src/Dictionaries/PolygonDictionary.cpp index 39152963ede..f10aa071442 100644 --- a/src/Dictionaries/PolygonDictionary.cpp +++ b/src/Dictionaries/PolygonDictionary.cpp @@ -3,14 +3,14 @@ #include #include -#include "DictionaryBlockInputStream.h" -#include "DictionaryFactory.h" - #include #include #include #include #include +#include +#include + namespace DB { diff --git a/src/Dictionaries/PostgreSQLDictionarySource.cpp b/src/Dictionaries/PostgreSQLDictionarySource.cpp index f226b7a9165..5a546820959 100644 --- a/src/Dictionaries/PostgreSQLDictionarySource.cpp +++ b/src/Dictionaries/PostgreSQLDictionarySource.cpp @@ -7,7 +7,7 @@ #if USE_LIBPQXX #include #include -#include +#include #include "readInvalidateQuery.h" #include #endif @@ -27,7 +27,7 @@ static const UInt64 max_block_size = 8192; namespace { - ExternalQueryBuilder makeExternalQueryBuilder(const DictionaryStructure & dict_struct, const String & schema, const String & table, const String & where) + ExternalQueryBuilder makeExternalQueryBuilder(const DictionaryStructure & dict_struct, const String & schema, const String & table, const String & query, const String & where) { auto schema_value = schema; auto table_value = table; @@ -41,7 +41,7 @@ namespace } } /// Do not need db because it is already in a connection string. - return {dict_struct, "", schema_value, table_value, where, IdentifierQuotingStyle::DoubleQuotes}; + return {dict_struct, "", schema_value, table_value, query, where, IdentifierQuotingStyle::DoubleQuotes}; } } @@ -56,7 +56,7 @@ PostgreSQLDictionarySource::PostgreSQLDictionarySource( , pool(std::move(pool_)) , sample_block(sample_block_) , log(&Poco::Logger::get("PostgreSQLDictionarySource")) - , query_builder(makeExternalQueryBuilder(dict_struct, configuration.schema, configuration.table, configuration.where)) + , query_builder(makeExternalQueryBuilder(dict_struct, configuration.schema, configuration.table, configuration.query, configuration.where)) , load_all_query(query_builder.composeLoadAllQuery()) { } @@ -69,7 +69,7 @@ PostgreSQLDictionarySource::PostgreSQLDictionarySource(const PostgreSQLDictionar , pool(other.pool) , sample_block(other.sample_block) , log(&Poco::Logger::get("PostgreSQLDictionarySource")) - , query_builder(makeExternalQueryBuilder(dict_struct, configuration.schema, configuration.table, configuration.where)) + , query_builder(makeExternalQueryBuilder(dict_struct, configuration.schema, configuration.table, configuration.query, configuration.where)) , load_all_query(query_builder.composeLoadAllQuery()) , update_time(other.update_time) , invalidate_query_response(other.invalidate_query_response) @@ -198,6 +198,7 @@ void registerDictionarySourcePostgreSQL(DictionarySourceFactory & factory) .db = config.getString(fmt::format("{}.db", settings_config_prefix), ""), .schema = config.getString(fmt::format("{}.schema", settings_config_prefix), ""), .table = config.getString(fmt::format("{}.table", settings_config_prefix), ""), + .query = config.getString(fmt::format("{}.query", settings_config_prefix), ""), .where = config.getString(fmt::format("{}.where", settings_config_prefix), ""), .invalidate_query = config.getString(fmt::format("{}.invalidate_query", settings_config_prefix), ""), .update_field = config.getString(fmt::format("{}.update_field", settings_config_prefix), ""), diff --git a/src/Dictionaries/PostgreSQLDictionarySource.h b/src/Dictionaries/PostgreSQLDictionarySource.h index 28ad28661ed..c5ade4d259a 100644 --- a/src/Dictionaries/PostgreSQLDictionarySource.h +++ b/src/Dictionaries/PostgreSQLDictionarySource.h @@ -26,6 +26,7 @@ public: const String db; const String schema; const String table; + const String query; const String where; const String invalidate_query; const String update_field; diff --git a/src/Dictionaries/RangeDictionaryBlockInputStream.h b/src/Dictionaries/RangeDictionarySource.h similarity index 90% rename from src/Dictionaries/RangeDictionaryBlockInputStream.h rename to src/Dictionaries/RangeDictionarySource.h index d17687b7164..d4fce32a54f 100644 --- a/src/Dictionaries/RangeDictionaryBlockInputStream.h +++ b/src/Dictionaries/RangeDictionarySource.h @@ -1,14 +1,14 @@ #pragma once +#include +#include #include #include #include -#include -#include -#include -#include "DictionaryBlockInputStreamBase.h" -#include "DictionaryStructure.h" -#include "IDictionary.h" -#include "RangeHashedDictionary.h" +#include +#include +#include +#include +#include namespace DB @@ -31,8 +31,6 @@ public: size_t getNumRows() const { return ids.size(); } private: - template - ColumnPtr getColumnFromPODArray(const PaddedPODArray & array) const; Block fillBlock( const PaddedPODArray & ids_to_fill, @@ -86,17 +84,6 @@ Block RangeDictionarySourceData::getBlock(size_t start, size_t length return fillBlock(block_ids, block_start_dates, block_end_dates); } -template -template -ColumnPtr RangeDictionarySourceData::getColumnFromPODArray(const PaddedPODArray & array) const -{ - auto column_vector = ColumnVector::create(); - column_vector->getData().reserve(array.size()); - column_vector->getData().insert(array.begin(), array.end()); - - return column_vector; -} - template PaddedPODArray RangeDictionarySourceData::makeDateKey( const PaddedPODArray & block_start_dates, const PaddedPODArray & block_end_dates) const diff --git a/src/Dictionaries/RangeHashedDictionary.cpp b/src/Dictionaries/RangeHashedDictionary.cpp index 8b882b5a107..bbd70b51437 100644 --- a/src/Dictionaries/RangeHashedDictionary.cpp +++ b/src/Dictionaries/RangeHashedDictionary.cpp @@ -2,11 +2,11 @@ #include #include #include -#include -#include "DictionaryFactory.h" -#include "RangeDictionaryBlockInputStream.h" #include #include +#include +#include + namespace { diff --git a/src/Dictionaries/RangeHashedDictionary.h b/src/Dictionaries/RangeHashedDictionary.h index 01ee2b3c773..13fa6ad570f 100644 --- a/src/Dictionaries/RangeHashedDictionary.h +++ b/src/Dictionaries/RangeHashedDictionary.h @@ -9,10 +9,10 @@ #include #include #include -#include "DictionaryStructure.h" -#include "IDictionary.h" -#include "IDictionarySource.h" -#include "DictionaryHelpers.h" +#include +#include +#include +#include namespace DB { diff --git a/src/Dictionaries/RedisDictionarySource.cpp b/src/Dictionaries/RedisDictionarySource.cpp index bf309dd0e8a..6561a122e9d 100644 --- a/src/Dictionaries/RedisDictionarySource.cpp +++ b/src/Dictionaries/RedisDictionarySource.cpp @@ -31,7 +31,7 @@ void registerDictionarySourceRedis(DictionarySourceFactory & factory) #include -#include "RedisBlockInputStream.h" +#include "RedisSource.h" namespace DB diff --git a/src/Dictionaries/RedisBlockInputStream.cpp b/src/Dictionaries/RedisSource.cpp similarity index 99% rename from src/Dictionaries/RedisBlockInputStream.cpp rename to src/Dictionaries/RedisSource.cpp index c6e2546cf1e..ad5cf8a0977 100644 --- a/src/Dictionaries/RedisBlockInputStream.cpp +++ b/src/Dictionaries/RedisSource.cpp @@ -1,4 +1,4 @@ -#include "RedisBlockInputStream.h" +#include "RedisSource.h" #include #include diff --git a/src/Dictionaries/RedisBlockInputStream.h b/src/Dictionaries/RedisSource.h similarity index 100% rename from src/Dictionaries/RedisBlockInputStream.h rename to src/Dictionaries/RedisSource.h diff --git a/src/Dictionaries/SSDCacheDictionaryStorage.h b/src/Dictionaries/SSDCacheDictionaryStorage.h index 395328a904d..bdb640c90be 100644 --- a/src/Dictionaries/SSDCacheDictionaryStorage.h +++ b/src/Dictionaries/SSDCacheDictionaryStorage.h @@ -12,7 +12,6 @@ #include #include -#include #include #include #include diff --git a/src/Dictionaries/XDBCDictionarySource.cpp b/src/Dictionaries/XDBCDictionarySource.cpp index 26b6c24cd2d..e79e55910b7 100644 --- a/src/Dictionaries/XDBCDictionarySource.cpp +++ b/src/Dictionaries/XDBCDictionarySource.cpp @@ -34,6 +34,7 @@ namespace const std::string & db_, const std::string & schema_, const std::string & table_, + const std::string & query_, const std::string & where_, IXDBCBridgeHelper & bridge_) { @@ -59,7 +60,7 @@ namespace bridge_.getName()); } - return {dict_struct_, db_, schema, table, where_, bridge_.getIdentifierQuotingStyle()}; + return {dict_struct_, db_, schema, table, query_, where_, bridge_.getIdentifierQuotingStyle()}; } } @@ -78,7 +79,7 @@ XDBCDictionarySource::XDBCDictionarySource( , dict_struct(dict_struct_) , configuration(configuration_) , sample_block(sample_block_) - , query_builder(makeExternalQueryBuilder(dict_struct, configuration.db, configuration.schema, configuration.table, configuration.where, *bridge_)) + , query_builder(makeExternalQueryBuilder(dict_struct, configuration.db, configuration.schema, configuration.table, configuration.query, configuration.where, *bridge_)) , load_all_query(query_builder.composeLoadAllQuery()) , bridge_helper(bridge_) , bridge_url(bridge_helper->getMainURI()) @@ -119,7 +120,7 @@ std::string XDBCDictionarySource::getUpdateFieldAndDate() else { update_time = std::chrono::system_clock::now(); - return query_builder.composeLoadAllQuery(); + return load_all_query; } } @@ -221,7 +222,7 @@ Pipe XDBCDictionarySource::loadFromQuery(const Poco::URI & url, const Block & re }; auto read_buf = std::make_unique(url, Poco::Net::HTTPRequest::HTTP_POST, write_body_callback, timeouts); - auto format = FormatFactory::instance().getInput(IXDBCBridgeHelper::DEFAULT_FORMAT, *read_buf, sample_block, getContext(), max_block_size); + auto format = FormatFactory::instance().getInput(IXDBCBridgeHelper::DEFAULT_FORMAT, *read_buf, required_sample_block, getContext(), max_block_size); format->addBuffer(std::move(read_buf)); return Pipe(std::move(format)); @@ -246,7 +247,8 @@ void registerDictionarySourceXDBC(DictionarySourceFactory & factory) { .db = config.getString(settings_config_prefix + ".db", ""), .schema = config.getString(settings_config_prefix + ".schema", ""), - .table = config.getString(settings_config_prefix + ".table"), + .table = config.getString(settings_config_prefix + ".table", ""), + .query = config.getString(settings_config_prefix + ".query", ""), .where = config.getString(settings_config_prefix + ".where", ""), .invalidate_query = config.getString(settings_config_prefix + ".invalidate_query", ""), .update_field = config.getString(settings_config_prefix + ".update_field", ""), diff --git a/src/Dictionaries/XDBCDictionarySource.h b/src/Dictionaries/XDBCDictionarySource.h index ebced022b62..df31e8a87cf 100644 --- a/src/Dictionaries/XDBCDictionarySource.h +++ b/src/Dictionaries/XDBCDictionarySource.h @@ -32,6 +32,7 @@ public: const std::string db; const std::string schema; const std::string table; + const std::string query; const std::string where; const std::string invalidate_query; const std::string update_field; diff --git a/src/Dictionaries/ya.make b/src/Dictionaries/ya.make index 36152fe439a..3f287f8bddc 100644 --- a/src/Dictionaries/ya.make +++ b/src/Dictionaries/ya.make @@ -22,13 +22,13 @@ NO_COMPILER_WARNINGS() SRCS( CacheDictionary.cpp CacheDictionaryUpdateQueue.cpp - CassandraBlockInputStream.cpp CassandraDictionarySource.cpp CassandraHelpers.cpp + CassandraSource.cpp ClickHouseDictionarySource.cpp - DictionaryBlockInputStream.cpp - DictionaryBlockInputStreamBase.cpp DictionaryFactory.cpp + DictionarySource.cpp + DictionarySourceBase.cpp DictionarySourceFactory.cpp DictionarySourceHelpers.cpp DictionaryStructure.cpp @@ -57,8 +57,8 @@ SRCS( PolygonDictionaryImplementations.cpp PolygonDictionaryUtils.cpp RangeHashedDictionary.cpp - RedisBlockInputStream.cpp RedisDictionarySource.cpp + RedisSource.cpp XDBCDictionarySource.cpp getDictionaryConfigurationFromAST.cpp readInvalidateQuery.cpp diff --git a/src/Formats/FormatFactory.cpp b/src/Formats/FormatFactory.cpp index d2d6d92dea3..7b2aac78067 100644 --- a/src/Formats/FormatFactory.cpp +++ b/src/Formats/FormatFactory.cpp @@ -5,7 +5,6 @@ #include #include #include -#include #include #include #include @@ -19,10 +18,6 @@ #include -#if !defined(ARCADIA_BUILD) -# include -#endif - namespace DB { @@ -32,7 +27,6 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; extern const int FORMAT_IS_NOT_SUITABLE_FOR_INPUT; extern const int FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT; - extern const int UNSUPPORTED_METHOD; } const FormatFactory::Creators & FormatFactory::getCreators(const String & name) const @@ -88,6 +82,7 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings) format_settings.json.quote_denormals = settings.output_format_json_quote_denormals; format_settings.null_as_default = settings.input_format_null_as_default; format_settings.parquet.row_group_size = settings.output_format_parquet_row_group_size; + format_settings.parquet.import_nested = settings.input_format_parquet_import_nested; format_settings.pretty.charset = settings.output_format_pretty_grid_charset.toString() == "ASCII" ? FormatSettings::Pretty::Charset::ASCII : FormatSettings::Pretty::Charset::UTF8; format_settings.pretty.color = settings.output_format_pretty_color; format_settings.pretty.max_column_pad_width = settings.output_format_pretty_max_column_pad_width; @@ -114,6 +109,8 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings) format_settings.with_names_use_header = settings.input_format_with_names_use_header; format_settings.write_statistics = settings.output_format_write_statistics; format_settings.arrow.low_cardinality_as_dictionary = settings.output_format_arrow_low_cardinality_as_dictionary; + format_settings.arrow.import_nested = settings.input_format_arrow_import_nested; + format_settings.orc.import_nested = settings.input_format_orc_import_nested; /// Validate avro_schema_registry_url with RemoteHostFilter when non-empty and in Server context if (format_settings.schema.is_server) @@ -212,13 +209,11 @@ BlockOutputStreamPtr FormatFactory::getOutputStreamParallelIfPossible( const Settings & settings = context->getSettingsRef(); bool parallel_formatting = settings.output_format_parallel_formatting; + auto format_settings = _format_settings ? *_format_settings : getFormatSettings(context); if (output_getter && parallel_formatting && getCreators(name).supports_parallel_formatting && !settings.output_format_json_array_of_rows) { - auto format_settings = _format_settings - ? *_format_settings : getFormatSettings(context); - auto formatter_creator = [output_getter, sample, callback, format_settings] (WriteBuffer & output) -> OutputFormatPtr { return output_getter(output, sample, {std::move(callback)}, format_settings);}; @@ -355,10 +350,6 @@ OutputFormatPtr FormatFactory::getOutputFormat( auto format_settings = _format_settings ? *_format_settings : getFormatSettings(context); - /// If we're handling MySQL protocol connection right now then MySQLWire is only allowed output format. - if (format_settings.mysql_wire.sequence_id && (name != "MySQLWire")) - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "MySQL protocol does not support custom output formats"); - /** TODO: Materialization is needed, because formats can use the functions `IDataType`, * which only work with full columns. */ diff --git a/src/Formats/FormatSettings.h b/src/Formats/FormatSettings.h index 69df095bca8..d77a7c95d69 100644 --- a/src/Formats/FormatSettings.h +++ b/src/Formats/FormatSettings.h @@ -53,6 +53,7 @@ struct FormatSettings { UInt64 row_group_size = 1000000; bool low_cardinality_as_dictionary = false; + bool import_nested = false; } arrow; struct @@ -100,6 +101,7 @@ struct FormatSettings struct { UInt64 row_group_size = 1000000; + bool import_nested = false; } parquet; struct Pretty @@ -174,6 +176,11 @@ struct FormatSettings bool deduce_templates_of_expressions = true; bool accurate_types_of_literals = true; } values; + + struct + { + bool import_nested = false; + } orc; }; } diff --git a/src/Formats/MySQLBlockInputStream.cpp b/src/Formats/MySQLSource.cpp similarity index 99% rename from src/Formats/MySQLBlockInputStream.cpp rename to src/Formats/MySQLSource.cpp index 401d85f3d6b..2d305a29df6 100644 --- a/src/Formats/MySQLBlockInputStream.cpp +++ b/src/Formats/MySQLSource.cpp @@ -19,7 +19,7 @@ #include #include #include -#include "MySQLBlockInputStream.h" +#include "MySQLSource.h" namespace DB diff --git a/src/Formats/MySQLBlockInputStream.h b/src/Formats/MySQLSource.h similarity index 96% rename from src/Formats/MySQLBlockInputStream.h rename to src/Formats/MySQLSource.h index 9c33b4404ae..5938cb4b57f 100644 --- a/src/Formats/MySQLBlockInputStream.h +++ b/src/Formats/MySQLSource.h @@ -58,7 +58,7 @@ protected: ExternalResultDescription description; }; -/// Like MySQLBlockInputStream, but allocates connection only when reading is starting. +/// Like MySQLSource, but allocates connection only when reading is starting. /// It allows to create a lot of stream objects without occupation of all connection pool. /// Also makes attempts to reconnect in case of connection failures. class MySQLWithFailoverSource final : public MySQLSource diff --git a/src/Formats/ya.make b/src/Formats/ya.make index 476e13f9a4f..90184350359 100644 --- a/src/Formats/ya.make +++ b/src/Formats/ya.make @@ -14,7 +14,7 @@ SRCS( FormatFactory.cpp FormatSchemaInfo.cpp JSONEachRowUtils.cpp - MySQLBlockInputStream.cpp + MySQLSource.cpp NativeFormat.cpp NullFormat.cpp ParsedTemplateFormatString.cpp diff --git a/src/Functions/CastOverloadResolver.cpp b/src/Functions/CastOverloadResolver.cpp new file mode 100644 index 00000000000..fd6fecc37d6 --- /dev/null +++ b/src/Functions/CastOverloadResolver.cpp @@ -0,0 +1,19 @@ +#include +#include + + +namespace DB +{ + +void registerCastOverloadResolvers(FunctionFactory & factory) +{ + factory.registerFunction>(FunctionFactory::CaseInsensitive); + factory.registerFunction>(); + factory.registerFunction>(); + + factory.registerFunction>(FunctionFactory::CaseInsensitive); + factory.registerFunction>(); + factory.registerFunction>(); +} + +} diff --git a/src/Functions/CastOverloadResolver.h b/src/Functions/CastOverloadResolver.h new file mode 100644 index 00000000000..ffd5dda4af3 --- /dev/null +++ b/src/Functions/CastOverloadResolver.h @@ -0,0 +1,121 @@ +#pragma once +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; +} + +/* + * CastInternal does not preserve nullability of the data type, + * i.e. CastInternal(toNullable(toInt8(1)) as Int32) will be Int32(1). + * + * Cast preserves nullability according to setting `cast_keep_nullable`, + * i.e. Cast(toNullable(toInt8(1)) as Int32) will be Nullable(Int32(1)) if `cast_keep_nullable` == 1. +**/ +template +class CastOverloadResolverImpl : public IFunctionOverloadResolver +{ +public: + using MonotonicityForRange = FunctionCastBase::MonotonicityForRange; + using Diagnostic = FunctionCastBase::Diagnostic; + + static constexpr auto name = cast_type == CastType::accurate + ? CastName::accurate_cast_name + : (cast_type == CastType::accurateOrNull ? CastName::accurate_cast_or_null_name : CastName::cast_name); + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 2; } + + ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; } + + explicit CastOverloadResolverImpl(std::optional diagnostic_, bool keep_nullable_) + : diagnostic(std::move(diagnostic_)), keep_nullable(keep_nullable_) + { + } + + static FunctionOverloadResolverPtr create(ContextPtr context) + { + if constexpr (internal) + return createImpl(); + return createImpl({}, context->getSettingsRef().cast_keep_nullable); + } + + static FunctionOverloadResolverPtr createImpl(std::optional diagnostic = {}, bool keep_nullable = false) + { + assert(!internal || !keep_nullable); + return std::make_unique(std::move(diagnostic), keep_nullable); + } + +protected: + + FunctionBasePtr buildImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & return_type) const override + { + DataTypes data_types(arguments.size()); + + for (size_t i = 0; i < arguments.size(); ++i) + data_types[i] = arguments[i].type; + + auto monotonicity = MonotonicityHelper::getMonotonicityInformation(arguments.front().type, return_type.get()); + return std::make_unique>(name, std::move(monotonicity), data_types, return_type, diagnostic, cast_type); + } + + DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override + { + const auto & column = arguments.back().column; + if (!column) + throw Exception("Second argument to " + getName() + " must be a constant string describing type." + " Instead there is non-constant column of type " + arguments.back().type->getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const auto * type_col = checkAndGetColumnConst(column.get()); + if (!type_col) + throw Exception("Second argument to " + getName() + " must be a constant string describing type." + " Instead there is a column with the following structure: " + column->dumpStructure(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + DataTypePtr type = DataTypeFactory::instance().get(type_col->getValue()); + + if constexpr (cast_type == CastType::accurateOrNull) + return makeNullable(type); + + if constexpr (internal) + return type; + + if (keep_nullable && arguments.front().type->isNullable() && type->canBeInsideNullable()) + return makeNullable(type); + + return type; + } + + bool useDefaultImplementationForNulls() const override { return false; } + bool useDefaultImplementationForLowCardinalityColumns() const override { return false; } + +private: + std::optional diagnostic; + bool keep_nullable; +}; + + +struct CastOverloadName +{ + static constexpr auto cast_name = "CAST"; + static constexpr auto accurate_cast_name = "accurateCast"; + static constexpr auto accurate_cast_or_null_name = "accurateCastOrNull"; +}; + +struct CastInternalOverloadName +{ + static constexpr auto cast_name = "_CAST"; + static constexpr auto accurate_cast_name = "accurate_Cast"; + static constexpr auto accurate_cast_or_null_name = "accurate_CastOrNull"; +}; + +template using CastOverloadResolver = CastOverloadResolverImpl; +template using CastInternalOverloadResolver = CastOverloadResolverImpl; + +} diff --git a/src/Functions/DateOrDateTimeFunctionsConvertion.cpp b/src/Functions/DateOrDateTimeFunctionsConvertion.cpp new file mode 100644 index 00000000000..e69de29bb2d diff --git a/src/Functions/FunctionsConversion.cpp b/src/Functions/FunctionsConversion.cpp index d7686318efc..f32d5df8a21 100644 --- a/src/Functions/FunctionsConversion.cpp +++ b/src/Functions/FunctionsConversion.cpp @@ -7,6 +7,8 @@ namespace DB void registerFunctionFixedString(FunctionFactory & factory); +void registerCastOverloadResolvers(FunctionFactory & factory); + void registerFunctionsConversion(FunctionFactory & factory) { factory.registerFunction(); @@ -43,9 +45,7 @@ void registerFunctionsConversion(FunctionFactory & factory) factory.registerFunction(); - factory.registerFunction>(FunctionFactory::CaseInsensitive); - factory.registerFunction>(); - factory.registerFunction>(); + registerCastOverloadResolvers(factory); factory.registerFunction(); factory.registerFunction(); diff --git a/src/Functions/FunctionsConversion.h b/src/Functions/FunctionsConversion.h index 67a02e3fd34..e57998e4a72 100644 --- a/src/Functions/FunctionsConversion.h +++ b/src/Functions/FunctionsConversion.h @@ -2412,7 +2412,8 @@ private: std::optional diagnostic; }; -struct NameCast { static constexpr auto name = "CAST"; }; +struct CastName { static constexpr auto name = "CAST"; }; +struct CastInternalName { static constexpr auto name = "_CAST"; }; enum class CastType { @@ -2421,17 +2422,26 @@ enum class CastType accurateOrNull }; -class FunctionCast final : public IFunctionBase +class FunctionCastBase : public IFunctionBase +{ +public: + using MonotonicityForRange = std::function; + using Diagnostic = ExecutableFunctionCast::Diagnostic; +}; + +template +class FunctionCast final : public FunctionCastBase { public: using WrapperType = std::function; - using MonotonicityForRange = std::function; - using Diagnostic = ExecutableFunctionCast::Diagnostic; - FunctionCast(const char * name_, MonotonicityForRange && monotonicity_for_range_ - , const DataTypes & argument_types_, const DataTypePtr & return_type_ - , std::optional diagnostic_, CastType cast_type_) - : name(name_), monotonicity_for_range(std::move(monotonicity_for_range_)) + FunctionCast(const char * cast_name_ + , MonotonicityForRange && monotonicity_for_range_ + , const DataTypes & argument_types_ + , const DataTypePtr & return_type_ + , std::optional diagnostic_ + , CastType cast_type_) + : cast_name(cast_name_), monotonicity_for_range(std::move(monotonicity_for_range_)) , argument_types(argument_types_), return_type(return_type_), diagnostic(std::move(diagnostic_)) , cast_type(cast_type_) { @@ -2445,7 +2455,7 @@ public: try { return std::make_unique( - prepareUnpackDictionaries(getArgumentTypes()[0], getResultType()), name, diagnostic); + prepareUnpackDictionaries(getArgumentTypes()[0], getResultType()), cast_name, diagnostic); } catch (Exception & e) { @@ -2456,7 +2466,7 @@ public: } } - String getName() const override { return name; } + String getName() const override { return cast_name; } bool isDeterministic() const override { return true; } bool isDeterministicInScopeOfQuery() const override { return true; } @@ -2473,7 +2483,7 @@ public: private: - const char * name; + const char * cast_name; MonotonicityForRange monotonicity_for_range; DataTypes argument_types; @@ -2515,7 +2525,7 @@ private: { /// In case when converting to Nullable type, we apply different parsing rule, /// that will not throw an exception but return NULL in case of malformed input. - FunctionPtr function = FunctionConvertFromString::create(); + FunctionPtr function = FunctionConvertFromString::create(); return createFunctionAdaptor(function, from_type); } else if (!can_apply_accurate_cast) @@ -2539,12 +2549,12 @@ private: { if (wrapper_cast_type == CastType::accurate) { - result_column = ConvertImpl::execute( + result_column = ConvertImpl::execute( arguments, result_type, input_rows_count, AccurateConvertStrategyAdditions()); } else { - result_column = ConvertImpl::execute( + result_column = ConvertImpl::execute( arguments, result_type, input_rows_count, AccurateOrNullConvertStrategyAdditions()); } @@ -2559,7 +2569,7 @@ private: { if (wrapper_cast_type == CastType::accurateOrNull) { - auto nullable_column_wrapper = FunctionCast::createToNullableColumnWrapper(); + auto nullable_column_wrapper = FunctionCast::createToNullableColumnWrapper(); return nullable_column_wrapper(arguments, result_type, column_nullable, input_rows_count); } else @@ -2631,7 +2641,7 @@ private: { AccurateConvertStrategyAdditions additions; additions.scale = scale; - result_column = ConvertImpl::execute( + result_column = ConvertImpl::execute( arguments, result_type, input_rows_count, additions); return true; @@ -2640,7 +2650,7 @@ private: { AccurateOrNullConvertStrategyAdditions additions; additions.scale = scale; - result_column = ConvertImpl::execute( + result_column = ConvertImpl::execute( arguments, result_type, input_rows_count, additions); return true; @@ -2653,14 +2663,14 @@ private: /// Consistent with CAST(Nullable(String) AS Nullable(Numbers)) /// In case when converting to Nullable type, we apply different parsing rule, /// that will not throw an exception but return NULL in case of malformed input. - result_column = ConvertImpl::execute( + result_column = ConvertImpl::execute( arguments, result_type, input_rows_count, scale); return true; } } - result_column = ConvertImpl::execute(arguments, result_type, input_rows_count, scale); + result_column = ConvertImpl::execute(arguments, result_type, input_rows_count, scale); return true; }); @@ -2670,7 +2680,7 @@ private: { if (wrapper_cast_type == CastType::accurateOrNull) { - auto nullable_column_wrapper = FunctionCast::createToNullableColumnWrapper(); + auto nullable_column_wrapper = FunctionCast::createToNullableColumnWrapper(); return nullable_column_wrapper(arguments, result_type, column_nullable, input_rows_count); } else @@ -2990,7 +3000,7 @@ private: template WrapperType createStringToEnumWrapper() const { - const char * function_name = name; + const char * function_name = cast_name; return [function_name] ( ColumnsWithTypeAndName & arguments, const DataTypePtr & res_type, const ColumnNullable * nullable_col, size_t /*input_rows_count*/) { @@ -3324,7 +3334,7 @@ private: class MonotonicityHelper { public: - using MonotonicityForRange = FunctionCast::MonotonicityForRange; + using MonotonicityForRange = FunctionCastBase::MonotonicityForRange; template static auto monotonicityForType(const DataType * const) @@ -3382,89 +3392,4 @@ public: } }; -template -class CastOverloadResolver : public IFunctionOverloadResolver -{ -public: - using MonotonicityForRange = FunctionCast::MonotonicityForRange; - using Diagnostic = FunctionCast::Diagnostic; - - static constexpr auto accurate_cast_name = "accurateCast"; - static constexpr auto accurate_cast_or_null_name = "accurateCastOrNull"; - static constexpr auto cast_name = "CAST"; - - static constexpr auto name = cast_type == CastType::accurate - ? accurate_cast_name - : (cast_type == CastType::accurateOrNull ? accurate_cast_or_null_name : cast_name); - - static FunctionOverloadResolverPtr create(ContextPtr context) - { - return createImpl(context->getSettingsRef().cast_keep_nullable); - } - - static FunctionOverloadResolverPtr createImpl(bool keep_nullable, std::optional diagnostic = {}) - { - return std::make_unique(keep_nullable, std::move(diagnostic)); - } - - - explicit CastOverloadResolver(bool keep_nullable_, std::optional diagnostic_ = {}) - : keep_nullable(keep_nullable_), diagnostic(std::move(diagnostic_)) - {} - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 2; } - - ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; } - -protected: - - FunctionBasePtr buildImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & return_type) const override - { - DataTypes data_types(arguments.size()); - - for (size_t i = 0; i < arguments.size(); ++i) - data_types[i] = arguments[i].type; - - auto monotonicity = MonotonicityHelper::getMonotonicityInformation(arguments.front().type, return_type.get()); - return std::make_unique(name, std::move(monotonicity), data_types, return_type, diagnostic, cast_type); - } - - DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override - { - const auto & column = arguments.back().column; - if (!column) - throw Exception("Second argument to " + getName() + " must be a constant string describing type." - " Instead there is non-constant column of type " + arguments.back().type->getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const auto * type_col = checkAndGetColumnConst(column.get()); - if (!type_col) - throw Exception("Second argument to " + getName() + " must be a constant string describing type." - " Instead there is a column with the following structure: " + column->dumpStructure(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - DataTypePtr type = DataTypeFactory::instance().get(type_col->getValue()); - - if constexpr (cast_type == CastType::accurateOrNull) - { - return makeNullable(type); - } - else - { - if (keep_nullable && arguments.front().type->isNullable()) - return makeNullable(type); - return type; - } - } - - bool useDefaultImplementationForNulls() const override { return false; } - bool useDefaultImplementationForLowCardinalityColumns() const override { return false; } - -private: - bool keep_nullable; - std::optional diagnostic; -}; - } diff --git a/src/Functions/registerFunctionsTuple.cpp b/src/Functions/registerFunctionsTuple.cpp index 12092e1e7e0..33f078675e9 100644 --- a/src/Functions/registerFunctionsTuple.cpp +++ b/src/Functions/registerFunctionsTuple.cpp @@ -5,11 +5,13 @@ class FunctionFactory; void registerFunctionTuple(FunctionFactory &); void registerFunctionTupleElement(FunctionFactory &); +void registerFunctionTupleToNameValuePairs(FunctionFactory &); void registerFunctionsTuple(FunctionFactory & factory) { registerFunctionTuple(factory); registerFunctionTupleElement(factory); + registerFunctionTupleToNameValuePairs(factory); } } diff --git a/src/Functions/stem.cpp b/src/Functions/stem.cpp index 98dcbccd005..7092bac06ec 100644 --- a/src/Functions/stem.cpp +++ b/src/Functions/stem.cpp @@ -11,7 +11,7 @@ #include #include -#include +#include // Y_IGNORE namespace DB diff --git a/src/Functions/tupleToNameValuePairs.cpp b/src/Functions/tupleToNameValuePairs.cpp new file mode 100644 index 00000000000..c3e5f28037b --- /dev/null +++ b/src/Functions/tupleToNameValuePairs.cpp @@ -0,0 +1,131 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; +} + +namespace +{ + +/** Transform a named tuple into an array of pairs, where the first element + * of the pair corresponds to the tuple field name and the second one to the + * tuple value. + */ +class FunctionTupleToNameValuePairs : public IFunction +{ +public: + static constexpr auto name = "tupleToNameValuePairs"; + static FunctionPtr create(ContextPtr) + { + return std::make_shared(); + } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override + { + return 1; + } + + bool useDefaultImplementationForConstants() const override + { + return true; + } + + DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override + { + // get the type of all the fields in the tuple + const IDataType * col = arguments[0].type.get(); + const DataTypeTuple * tuple = checkAndGetDataType(col); + + if (!tuple) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "First argument for function {} must be a tuple.", + getName()); + + const auto & element_types = tuple->getElements(); + + if (element_types.empty()) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "The argument tuple for function {} must not be empty.", + getName()); + + const auto & first_element_type = element_types[0]; + + bool all_value_types_equal = std::all_of(element_types.begin() + 1, + element_types.end(), + [&](const auto &other) + { + return first_element_type->equals(*other); + }); + + if (!all_value_types_equal) + { + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "The argument tuple for function {} must contain just one type.", + getName()); + } + + DataTypePtr tuple_name_type = std::make_shared(); + DataTypes item_data_types = {tuple_name_type, + first_element_type}; + + auto item_data_type = std::make_shared(item_data_types); + + return std::make_shared(item_data_type); + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const IColumn * tuple_col = arguments[0].column.get(); + const DataTypeTuple * tuple = checkAndGetDataType(arguments[0].type.get()); + const auto * tuple_col_concrete = assert_cast(tuple_col); + + auto keys = ColumnString::create(); + MutableColumnPtr values = tuple_col_concrete->getColumn(0).cloneEmpty(); + auto offsets = ColumnVector::create(); + for (size_t row = 0; row < tuple_col_concrete->size(); ++row) + { + for (size_t col = 0; col < tuple_col_concrete->tupleSize(); ++col) + { + const std::string & key = tuple->getElementNames()[col]; + const IColumn & value_column = tuple_col_concrete->getColumn(col); + + values->insertFrom(value_column, row); + keys->insertData(key.data(), key.size()); + } + offsets->insertValue(tuple_col_concrete->tupleSize() * (row + 1)); + } + + std::vector tuple_columns = { std::move(keys), std::move(values) }; + auto tuple_column = ColumnTuple::create(std::move(tuple_columns)); + return ColumnArray::create(std::move(tuple_column), std::move(offsets)); + } +}; + +} + +void registerFunctionTupleToNameValuePairs(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +} diff --git a/src/Functions/ya.make b/src/Functions/ya.make index 2b9b3d94313..b231866b4fd 100644 --- a/src/Functions/ya.make +++ b/src/Functions/ya.make @@ -312,6 +312,7 @@ SRCS( hasToken.cpp hasTokenCaseInsensitive.cpp hostName.cpp + hyperscanRegexpChecker.cpp hypot.cpp identity.cpp if.cpp @@ -564,6 +565,7 @@ SRCS( tuple.cpp tupleElement.cpp tupleHammingDistance.cpp + tupleToNameValuePairs.cpp upper.cpp upperUTF8.cpp uptime.cpp diff --git a/src/IO/Bzip2ReadBuffer.cpp b/src/IO/Bzip2ReadBuffer.cpp index e264ce75444..99798bca325 100644 --- a/src/IO/Bzip2ReadBuffer.cpp +++ b/src/IO/Bzip2ReadBuffer.cpp @@ -4,7 +4,7 @@ #if USE_BZIP2 # include -# include +# include // Y_IGNORE namespace DB { diff --git a/src/IO/Bzip2WriteBuffer.cpp b/src/IO/Bzip2WriteBuffer.cpp index 41cb972966c..39c5356b792 100644 --- a/src/IO/Bzip2WriteBuffer.cpp +++ b/src/IO/Bzip2WriteBuffer.cpp @@ -4,7 +4,7 @@ #if USE_BROTLI # include -# include +# include // Y_IGNORE #include diff --git a/src/IO/Progress.h b/src/IO/Progress.h index e1253ab8eb8..772131d8cb7 100644 --- a/src/IO/Progress.h +++ b/src/IO/Progress.h @@ -6,8 +6,6 @@ #include #include -#include - namespace DB { diff --git a/src/Interpreters/ActionsDAG.cpp b/src/Interpreters/ActionsDAG.cpp index 63b0345b372..e1f1d498367 100644 --- a/src/Interpreters/ActionsDAG.cpp +++ b/src/Interpreters/ActionsDAG.cpp @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -1110,8 +1111,8 @@ ActionsDAGPtr ActionsDAG::makeConvertingActions( const auto * right_arg = &actions_dag->addColumn(std::move(column)); const auto * left_arg = dst_node; - FunctionCast::Diagnostic diagnostic = {dst_node->result_name, res_elem.name}; - FunctionOverloadResolverPtr func_builder_cast = CastOverloadResolver::createImpl(false, std::move(diagnostic)); + FunctionCastBase::Diagnostic diagnostic = {dst_node->result_name, res_elem.name}; + FunctionOverloadResolverPtr func_builder_cast = CastInternalOverloadResolver::createImpl(std::move(diagnostic)); NodeRawConstPtrs children = { left_arg, right_arg }; dst_node = &actions_dag->addFunction(func_builder_cast, std::move(children), {}); @@ -1876,7 +1877,7 @@ ActionsDAGPtr ActionsDAG::cloneActionsForFilterPushDown( predicate->children = {left_arg, right_arg}; auto arguments = prepareFunctionArguments(predicate->children); - FunctionOverloadResolverPtr func_builder_cast = CastOverloadResolver::createImpl(false); + FunctionOverloadResolverPtr func_builder_cast = CastInternalOverloadResolver::createImpl(); predicate->function_builder = func_builder_cast; predicate->function_base = predicate->function_builder->build(arguments); diff --git a/src/Interpreters/Aggregator.cpp b/src/Interpreters/Aggregator.cpp index 0b97d403d01..c26eb10e697 100644 --- a/src/Interpreters/Aggregator.cpp +++ b/src/Interpreters/Aggregator.cpp @@ -977,13 +977,14 @@ bool Aggregator::executeOnBlock(Columns columns, UInt64 num_rows, AggregatedData /// For the case when there are no keys (all aggregate into one row). if (result.type == AggregatedDataVariants::Type::without_key) { -#if USE_EMBEDDED_COMPILER - if (compiled_aggregate_functions_holder) - { - executeWithoutKeyImpl(result.without_key, num_rows, aggregate_functions_instructions.data(), result.aggregates_pool); - } - else -#endif + /// TODO: Enable compilation after investigation +// #if USE_EMBEDDED_COMPILER +// if (compiled_aggregate_functions_holder) +// { +// executeWithoutKeyImpl(result.without_key, num_rows, aggregate_functions_instructions.data(), result.aggregates_pool); +// } +// else +// #endif { executeWithoutKeyImpl(result.without_key, num_rows, aggregate_functions_instructions.data(), result.aggregates_pool); } diff --git a/src/Interpreters/AsynchronousMetrics.cpp b/src/Interpreters/AsynchronousMetrics.cpp index 8efe959a623..fd02aa4abec 100644 --- a/src/Interpreters/AsynchronousMetrics.cpp +++ b/src/Interpreters/AsynchronousMetrics.cpp @@ -88,6 +88,20 @@ AsynchronousMetrics::AsynchronousMetrics( openFileIfExists("/proc/uptime", uptime); openFileIfExists("/proc/net/dev", net_dev); + openSensors(); + openBlockDevices(); + openEDAC(); + openSensorsChips(); +#endif +} + +#if defined(OS_LINUX) +void AsynchronousMetrics::openSensors() +{ + LOG_TRACE(log, "Scanning /sys/class/thermal"); + + thermal.clear(); + for (size_t thermal_device_index = 0;; ++thermal_device_index) { std::unique_ptr file = openFileIfExists(fmt::format("/sys/class/thermal/thermal_zone{}/temp", thermal_device_index)); @@ -101,6 +115,71 @@ AsynchronousMetrics::AsynchronousMetrics( } thermal.emplace_back(std::move(file)); } +} + +void AsynchronousMetrics::openBlockDevices() +{ + LOG_TRACE(log, "Scanning /sys/block"); + + if (!std::filesystem::exists("/sys/block")) + return; + + block_devices_rescan_delay.restart(); + + block_devs.clear(); + + for (const auto & device_dir : std::filesystem::directory_iterator("/sys/block")) + { + String device_name = device_dir.path().filename(); + + /// We are not interested in loopback devices. + if (device_name.starts_with("loop")) + continue; + + std::unique_ptr file = openFileIfExists(device_dir.path() / "stat"); + if (!file) + continue; + + block_devs[device_name] = std::move(file); + } +} + +void AsynchronousMetrics::openEDAC() +{ + LOG_TRACE(log, "Scanning /sys/devices/system/edac"); + + edac.clear(); + + for (size_t edac_index = 0;; ++edac_index) + { + String edac_correctable_file = fmt::format("/sys/devices/system/edac/mc/mc{}/ce_count", edac_index); + String edac_uncorrectable_file = fmt::format("/sys/devices/system/edac/mc/mc{}/ue_count", edac_index); + + bool edac_correctable_file_exists = std::filesystem::exists(edac_correctable_file); + bool edac_uncorrectable_file_exists = std::filesystem::exists(edac_uncorrectable_file); + + if (!edac_correctable_file_exists && !edac_uncorrectable_file_exists) + { + if (edac_index == 0) + continue; + else + break; + } + + edac.emplace_back(); + + if (edac_correctable_file_exists) + edac.back().first = openFileIfExists(edac_correctable_file); + if (edac_uncorrectable_file_exists) + edac.back().second = openFileIfExists(edac_uncorrectable_file); + } +} + +void AsynchronousMetrics::openSensorsChips() +{ + LOG_TRACE(log, "Scanning /sys/class/hwmon"); + + hwmon_devices.clear(); for (size_t hwmon_index = 0;; ++hwmon_index) { @@ -150,61 +229,6 @@ AsynchronousMetrics::AsynchronousMetrics( hwmon_devices[hwmon_name][sensor_name] = std::move(file); } } - - for (size_t edac_index = 0;; ++edac_index) - { - String edac_correctable_file = fmt::format("/sys/devices/system/edac/mc/mc{}/ce_count", edac_index); - String edac_uncorrectable_file = fmt::format("/sys/devices/system/edac/mc/mc{}/ue_count", edac_index); - - bool edac_correctable_file_exists = std::filesystem::exists(edac_correctable_file); - bool edac_uncorrectable_file_exists = std::filesystem::exists(edac_uncorrectable_file); - - if (!edac_correctable_file_exists && !edac_uncorrectable_file_exists) - { - if (edac_index == 0) - continue; - else - break; - } - - edac.emplace_back(); - - if (edac_correctable_file_exists) - edac.back().first = openFileIfExists(edac_correctable_file); - if (edac_uncorrectable_file_exists) - edac.back().second = openFileIfExists(edac_uncorrectable_file); - } - - openBlockDevices(); -#endif -} - -#if defined(OS_LINUX) -void AsynchronousMetrics::openBlockDevices() -{ - LOG_TRACE(log, "Scanning /sys/block"); - - if (!std::filesystem::exists("/sys/block")) - return; - - block_devices_rescan_delay.restart(); - - block_devs.clear(); - - for (const auto & device_dir : std::filesystem::directory_iterator("/sys/block")) - { - String device_name = device_dir.path().filename(); - - /// We are not interested in loopback devices. - if (device_name.starts_with("loop")) - continue; - - std::unique_ptr file = openFileIfExists(device_dir.path() / "stat"); - if (!file) - continue; - - block_devs[device_name] = std::move(file); - } } #endif @@ -967,6 +991,8 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti } catch (...) { + tryLogCurrentException(__PRETTY_FUNCTION__); + /// Try to reopen block devices in case of error /// (i.e. ENOENT means that some disk had been replaced, and it may apperas with a new name) try @@ -977,7 +1003,6 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti { tryLogCurrentException(__PRETTY_FUNCTION__); } - tryLogCurrentException(__PRETTY_FUNCTION__); } if (net_dev) @@ -1066,9 +1091,9 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti } } - for (size_t i = 0, size = thermal.size(); i < size; ++i) + try { - try + for (size_t i = 0, size = thermal.size(); i < size; ++i) { ReadBufferFromFilePRead & in = *thermal[i]; @@ -1077,15 +1102,25 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti readText(temperature, in); new_values[fmt::format("Temperature{}", i)] = temperature * 0.001; } + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + + /// Files maybe re-created on module load/unload + try + { + openSensors(); + } catch (...) { tryLogCurrentException(__PRETTY_FUNCTION__); } } - for (const auto & [hwmon_name, sensors] : hwmon_devices) + try { - try + for (const auto & [hwmon_name, sensors] : hwmon_devices) { for (const auto & [sensor_name, sensor_file] : sensors) { @@ -1106,19 +1141,32 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti new_values[fmt::format("Temperature_{}_{}", hwmon_name, sensor_name)] = temperature * 0.001; } } + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + + /// Files can be re-created on: + /// - module load/unload + /// - suspend/resume cycle + /// So file descriptors should be reopened. + try + { + openSensorsChips(); + } catch (...) { tryLogCurrentException(__PRETTY_FUNCTION__); } } - for (size_t i = 0, size = edac.size(); i < size; ++i) + try { - /// NOTE maybe we need to take difference with previous values. - /// But these metrics should be exceptionally rare, so it's ok to keep them accumulated. - - try + for (size_t i = 0, size = edac.size(); i < size; ++i) { + /// NOTE maybe we need to take difference with previous values. + /// But these metrics should be exceptionally rare, so it's ok to keep them accumulated. + if (edac[i].first) { ReadBufferFromFilePRead & in = *edac[i].first; @@ -1137,6 +1185,16 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti new_values[fmt::format("EDAC{}_Uncorrectable", i)] = errors; } } + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + + /// EDAC files can be re-created on module load/unload + try + { + openEDAC(); + } catch (...) { tryLogCurrentException(__PRETTY_FUNCTION__); diff --git a/src/Interpreters/AsynchronousMetrics.h b/src/Interpreters/AsynchronousMetrics.h index c8677ac3ced..93e77b6bde8 100644 --- a/src/Interpreters/AsynchronousMetrics.h +++ b/src/Interpreters/AsynchronousMetrics.h @@ -183,7 +183,10 @@ private: Stopwatch block_devices_rescan_delay; + void openSensors(); void openBlockDevices(); + void openSensorsChips(); + void openEDAC(); #endif std::unique_ptr thread; diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index bda174776e0..bd15af76db0 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -1088,7 +1088,11 @@ bool Context::hasScalar(const String & name) const void Context::addQueryAccessInfo( - const String & quoted_database_name, const String & full_quoted_table_name, const Names & column_names, const String & projection_name) + const String & quoted_database_name, + const String & full_quoted_table_name, + const Names & column_names, + const String & projection_name, + const String & view_name) { assert(!isGlobalContext() || getApplicationType() == ApplicationType::LOCAL); std::lock_guard lock(query_access_info.mutex); @@ -1098,6 +1102,8 @@ void Context::addQueryAccessInfo( query_access_info.columns.emplace(full_quoted_table_name + "." + backQuoteIfNeed(column_name)); if (!projection_name.empty()) query_access_info.projections.emplace(full_quoted_table_name + "." + backQuoteIfNeed(projection_name)); + if (!view_name.empty()) + query_access_info.views.emplace(view_name); } @@ -2118,7 +2124,6 @@ std::shared_ptr Context::getQueryLog() const return shared->system_logs->query_log; } - std::shared_ptr Context::getQueryThreadLog() const { auto lock = getLock(); @@ -2129,6 +2134,15 @@ std::shared_ptr Context::getQueryThreadLog() const return shared->system_logs->query_thread_log; } +std::shared_ptr Context::getQueryViewsLog() const +{ + auto lock = getLock(); + + if (!shared->system_logs) + return {}; + + return shared->system_logs->query_views_log; +} std::shared_ptr Context::getPartLog(const String & part_database) const { diff --git a/src/Interpreters/Context.h b/src/Interpreters/Context.h index 66fac7e6e70..d3a77e0039b 100644 --- a/src/Interpreters/Context.h +++ b/src/Interpreters/Context.h @@ -70,6 +70,7 @@ struct Progress; class Clusters; class QueryLog; class QueryThreadLog; +class QueryViewsLog; class PartLog; class TextLog; class TraceLog; @@ -219,6 +220,7 @@ private: tables = rhs.tables; columns = rhs.columns; projections = rhs.projections; + views = rhs.views; } QueryAccessInfo(QueryAccessInfo && rhs) = delete; @@ -235,6 +237,7 @@ private: std::swap(tables, rhs.tables); std::swap(columns, rhs.columns); std::swap(projections, rhs.projections); + std::swap(views, rhs.views); } /// To prevent a race between copy-constructor and other uses of this structure. @@ -242,7 +245,8 @@ private: std::set databases{}; std::set tables{}; std::set columns{}; - std::set projections; + std::set projections{}; + std::set views{}; }; QueryAccessInfo query_access_info; @@ -469,7 +473,8 @@ public: const String & quoted_database_name, const String & full_quoted_table_name, const Names & column_names, - const String & projection_name = {}); + const String & projection_name = {}, + const String & view_name = {}); /// Supported factories for records in query_log enum class QueryLogFactories @@ -730,6 +735,7 @@ public: /// Nullptr if the query log is not ready for this moment. std::shared_ptr getQueryLog() const; std::shared_ptr getQueryThreadLog() const; + std::shared_ptr getQueryViewsLog() const; std::shared_ptr getTraceLog() const; std::shared_ptr getTextLog() const; std::shared_ptr getMetricLog() const; diff --git a/src/Interpreters/ConvertStringsToEnumVisitor.cpp b/src/Interpreters/ConvertStringsToEnumVisitor.cpp index fa2e0b6613a..e483bc9b5b6 100644 --- a/src/Interpreters/ConvertStringsToEnumVisitor.cpp +++ b/src/Interpreters/ConvertStringsToEnumVisitor.cpp @@ -43,11 +43,11 @@ void changeIfArguments(ASTPtr & first, ASTPtr & second) String enum_string = makeStringsEnum(values); auto enum_literal = std::make_shared(enum_string); - auto first_cast = makeASTFunction("CAST"); + auto first_cast = makeASTFunction("_CAST"); first_cast->arguments->children.push_back(first); first_cast->arguments->children.push_back(enum_literal); - auto second_cast = makeASTFunction("CAST"); + auto second_cast = makeASTFunction("_CAST"); second_cast->arguments->children.push_back(second); second_cast->arguments->children.push_back(enum_literal); @@ -65,12 +65,12 @@ void changeTransformArguments(ASTPtr & array_to, ASTPtr & other) String enum_string = makeStringsEnum(values); - auto array_cast = makeASTFunction("CAST"); + auto array_cast = makeASTFunction("_CAST"); array_cast->arguments->children.push_back(array_to); array_cast->arguments->children.push_back(std::make_shared("Array(" + enum_string + ")")); array_to = array_cast; - auto other_cast = makeASTFunction("CAST"); + auto other_cast = makeASTFunction("_CAST"); other_cast->arguments->children.push_back(other); other_cast->arguments->children.push_back(std::make_shared(enum_string)); other = other_cast; @@ -183,4 +183,3 @@ void ConvertStringsToEnumMatcher::visit(ASTFunction & function_node, Data & data } } - diff --git a/src/Interpreters/CrashLog.cpp b/src/Interpreters/CrashLog.cpp index a9da804f1d2..6bc23d6cf62 100644 --- a/src/Interpreters/CrashLog.cpp +++ b/src/Interpreters/CrashLog.cpp @@ -6,6 +6,7 @@ #include #include #include +#include #if !defined(ARCADIA_BUILD) # include diff --git a/src/Interpreters/DDLWorker.cpp b/src/Interpreters/DDLWorker.cpp index 47ca2b72db8..c00f62f5133 100644 --- a/src/Interpreters/DDLWorker.cpp +++ b/src/Interpreters/DDLWorker.cpp @@ -158,15 +158,20 @@ DDLWorker::DDLWorker( const Poco::Util::AbstractConfiguration * config, const String & prefix, const String & logger_name, - const CurrentMetrics::Metric * max_entry_metric_) + const CurrentMetrics::Metric * max_entry_metric_, + const CurrentMetrics::Metric * max_pushed_entry_metric_) : context(Context::createCopy(context_)) , log(&Poco::Logger::get(logger_name)) , pool_size(pool_size_) , max_entry_metric(max_entry_metric_) + , max_pushed_entry_metric(max_pushed_entry_metric_) { if (max_entry_metric) CurrentMetrics::set(*max_entry_metric, 0); + if (max_pushed_entry_metric) + CurrentMetrics::set(*max_pushed_entry_metric, 0); + if (1 < pool_size) { LOG_WARNING(log, "DDLWorker is configured to use multiple threads. " @@ -1046,6 +1051,15 @@ String DDLWorker::enqueueQuery(DDLLogEntry & entry) zookeeper->createAncestors(query_path_prefix); String node_path = zookeeper->create(query_path_prefix, entry.toString(), zkutil::CreateMode::PersistentSequential); + if (max_pushed_entry_metric) + { + String str_buf = node_path.substr(query_path_prefix.length()); + DB::ReadBufferFromString in(str_buf); + CurrentMetrics::Metric id; + readText(id, in); + id = std::max(*max_pushed_entry_metric, id); + CurrentMetrics::set(*max_pushed_entry_metric, id); + } /// We cannot create status dirs in a single transaction with previous request, /// because we don't know node_path until previous request is executed. diff --git a/src/Interpreters/DDLWorker.h b/src/Interpreters/DDLWorker.h index d05b9b27611..d2b7c9d169d 100644 --- a/src/Interpreters/DDLWorker.h +++ b/src/Interpreters/DDLWorker.h @@ -44,7 +44,7 @@ class DDLWorker { public: DDLWorker(int pool_size_, const std::string & zk_root_dir, ContextPtr context_, const Poco::Util::AbstractConfiguration * config, const String & prefix, - const String & logger_name = "DDLWorker", const CurrentMetrics::Metric * max_entry_metric_ = nullptr); + const String & logger_name = "DDLWorker", const CurrentMetrics::Metric * max_entry_metric_ = nullptr, const CurrentMetrics::Metric * max_pushed_entry_metric_ = nullptr); virtual ~DDLWorker(); /// Pushes query into DDL queue, returns path to created node @@ -148,6 +148,7 @@ protected: std::atomic max_id = 0; const CurrentMetrics::Metric * max_entry_metric; + const CurrentMetrics::Metric * max_pushed_entry_metric; }; diff --git a/src/Interpreters/DNSCacheUpdater.h b/src/Interpreters/DNSCacheUpdater.h index bbbc2ab3d21..5d5486bd012 100644 --- a/src/Interpreters/DNSCacheUpdater.h +++ b/src/Interpreters/DNSCacheUpdater.h @@ -2,8 +2,6 @@ #include #include -#include - namespace DB { diff --git a/src/Interpreters/InterpreterCreateQuery.cpp b/src/Interpreters/InterpreterCreateQuery.cpp index bf2cf6338aa..4c1a3064c3d 100644 --- a/src/Interpreters/InterpreterCreateQuery.cpp +++ b/src/Interpreters/InterpreterCreateQuery.cpp @@ -764,7 +764,7 @@ void InterpreterCreateQuery::assertOrSetUUID(ASTCreateQuery & create, const Data const auto * kind = create.is_dictionary ? "Dictionary" : "Table"; const auto * kind_upper = create.is_dictionary ? "DICTIONARY" : "TABLE"; - if (database->getEngineName() == "Replicated" && getContext()->getClientInfo().query_kind == ClientInfo::QueryKind::SECONDARY_QUERY + if (database->getEngineName() == "Replicated" && getContext()->getClientInfo().is_replicated_database_internal && !internal) { if (create.uuid == UUIDHelpers::Nil) diff --git a/src/Interpreters/InterpreterInsertQuery.cpp b/src/Interpreters/InterpreterInsertQuery.cpp index e5d4d952a0c..3589176f231 100644 --- a/src/Interpreters/InterpreterInsertQuery.cpp +++ b/src/Interpreters/InterpreterInsertQuery.cpp @@ -261,7 +261,6 @@ BlockIO InterpreterInsertQuery::execute() { InterpreterWatchQuery interpreter_watch{ query.watch, getContext() }; res = interpreter_watch.execute(); - res.pipeline.init(Pipe(std::make_shared(std::move(res.in)))); } for (size_t i = 0; i < out_streams_size; i++) diff --git a/src/Interpreters/InterpreterSelectQuery.cpp b/src/Interpreters/InterpreterSelectQuery.cpp index 49ebd3d48b0..edcef191e73 100644 --- a/src/Interpreters/InterpreterSelectQuery.cpp +++ b/src/Interpreters/InterpreterSelectQuery.cpp @@ -1965,12 +1965,14 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc if (context->hasQueryContext() && !options.is_internal) { + const String view_name{}; auto local_storage_id = storage->getStorageID(); context->getQueryContext()->addQueryAccessInfo( backQuoteIfNeed(local_storage_id.getDatabaseName()), local_storage_id.getFullTableName(), required_columns, - query_info.projection ? query_info.projection->desc->name : ""); + query_info.projection ? query_info.projection->desc->name : "", + view_name); } /// Create step which reads from empty source if storage has no data. diff --git a/src/Interpreters/InterpreterSystemQuery.cpp b/src/Interpreters/InterpreterSystemQuery.cpp index e1ca021deeb..d4ac555add0 100644 --- a/src/Interpreters/InterpreterSystemQuery.cpp +++ b/src/Interpreters/InterpreterSystemQuery.cpp @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -418,6 +419,7 @@ BlockIO InterpreterSystemQuery::execute() [&] { if (auto metric_log = getContext()->getMetricLog()) metric_log->flush(true); }, [&] { if (auto asynchronous_metric_log = getContext()->getAsynchronousMetricLog()) asynchronous_metric_log->flush(true); }, [&] { if (auto opentelemetry_span_log = getContext()->getOpenTelemetrySpanLog()) opentelemetry_span_log->flush(true); }, + [&] { if (auto query_views_log = getContext()->getQueryViewsLog()) query_views_log->flush(true); }, [&] { if (auto zookeeper_log = getContext()->getZooKeeperLog()) zookeeper_log->flush(true); } ); break; diff --git a/src/Interpreters/InterpreterWatchQuery.cpp b/src/Interpreters/InterpreterWatchQuery.cpp index edf0f37c00e..ee96045bbc4 100644 --- a/src/Interpreters/InterpreterWatchQuery.cpp +++ b/src/Interpreters/InterpreterWatchQuery.cpp @@ -71,10 +71,9 @@ BlockIO InterpreterWatchQuery::execute() QueryProcessingStage::Enum from_stage = QueryProcessingStage::FetchColumns; /// Watch storage - streams = storage->watch(required_columns, query_info, getContext(), from_stage, max_block_size, max_streams); + auto pipe = storage->watch(required_columns, query_info, getContext(), from_stage, max_block_size, max_streams); /// Constraints on the result, the quota on the result, and also callback for progress. - if (IBlockInputStream * stream = dynamic_cast(streams[0].get())) { StreamLocalLimits limits; limits.mode = LimitsMode::LIMITS_CURRENT; //-V1048 @@ -82,11 +81,11 @@ BlockIO InterpreterWatchQuery::execute() limits.size_limits.max_bytes = settings.max_result_bytes; limits.size_limits.overflow_mode = settings.result_overflow_mode; - stream->setLimits(limits); - stream->setQuota(getContext()->getQuota()); + pipe.setLimits(limits); + pipe.setQuota(getContext()->getQuota()); } - res.in = streams[0]; + res.pipeline.init(std::move(pipe)); return res; } diff --git a/src/Interpreters/JIT/compileFunction.cpp b/src/Interpreters/JIT/compileFunction.cpp index 5e1d4ca0375..aaf722b505e 100644 --- a/src/Interpreters/JIT/compileFunction.cpp +++ b/src/Interpreters/JIT/compileFunction.cpp @@ -143,8 +143,6 @@ static void compileFunction(llvm::Module & module, const IFunctionBase & functio * } */ - ProfileEvents::increment(ProfileEvents::CompileFunction); - const auto & arg_types = function.getArgumentTypes(); llvm::IRBuilder<> b(module.getContext()); diff --git a/src/Interpreters/Lemmatizers.cpp b/src/Interpreters/Lemmatizers.cpp index 38cd4c33678..78af43285ef 100644 --- a/src/Interpreters/Lemmatizers.cpp +++ b/src/Interpreters/Lemmatizers.cpp @@ -6,8 +6,8 @@ #if USE_NLP #include -#include -#include +#include // Y_IGNORE +#include // Y_IGNORE #include #include diff --git a/src/Interpreters/MutationsInterpreter.cpp b/src/Interpreters/MutationsInterpreter.cpp index fe0594bb58f..4e5e3b4e86b 100644 --- a/src/Interpreters/MutationsInterpreter.cpp +++ b/src/Interpreters/MutationsInterpreter.cpp @@ -503,10 +503,10 @@ ASTPtr MutationsInterpreter::prepare(bool dry_run) } } - auto updated_column = makeASTFunction("CAST", + auto updated_column = makeASTFunction("_CAST", makeASTFunction("if", condition, - makeASTFunction("CAST", + makeASTFunction("_CAST", update_expr->clone(), type_literal), std::make_shared(column)), @@ -920,9 +920,10 @@ BlockInputStreamPtr MutationsInterpreter::execute() return result_stream; } -const Block & MutationsInterpreter::getUpdatedHeader() const +Block MutationsInterpreter::getUpdatedHeader() const { - return *updated_header; + // If it's an index/projection materialization, we don't write any data columns, thus empty header is used + return mutation_kind.mutation_kind == MutationKind::MUTATE_INDEX_PROJECTION ? Block{} : *updated_header; } const ColumnDependencies & MutationsInterpreter::getColumnDependencies() const diff --git a/src/Interpreters/MutationsInterpreter.h b/src/Interpreters/MutationsInterpreter.h index c9a589e6b6d..4f8960ae8f7 100644 --- a/src/Interpreters/MutationsInterpreter.h +++ b/src/Interpreters/MutationsInterpreter.h @@ -53,7 +53,7 @@ public: BlockInputStreamPtr execute(); /// Only changed columns. - const Block & getUpdatedHeader() const; + Block getUpdatedHeader() const; const ColumnDependencies & getColumnDependencies() const; diff --git a/src/Interpreters/OptimizeIfWithConstantConditionVisitor.cpp b/src/Interpreters/OptimizeIfWithConstantConditionVisitor.cpp index a8e2d371e05..a9814ce50f5 100644 --- a/src/Interpreters/OptimizeIfWithConstantConditionVisitor.cpp +++ b/src/Interpreters/OptimizeIfWithConstantConditionVisitor.cpp @@ -1,6 +1,7 @@ #include #include #include +#include #include #include #include @@ -29,7 +30,7 @@ static bool tryExtractConstValueFromCondition(const ASTPtr & condition, bool & v /// cast of numeric constant in condition to UInt8 if (const auto * function = condition->as()) { - if (function->name == "CAST") + if (isFunctionCast(function)) { if (const auto * expr_list = function->arguments->as()) { diff --git a/src/Interpreters/QueryLog.cpp b/src/Interpreters/QueryLog.cpp index 0f7ff579f5d..2cbb9634446 100644 --- a/src/Interpreters/QueryLog.cpp +++ b/src/Interpreters/QueryLog.cpp @@ -68,6 +68,8 @@ NamesAndTypesList QueryLogElement::getNamesAndTypes() std::make_shared(std::make_shared()))}, {"projections", std::make_shared( std::make_shared(std::make_shared()))}, + {"views", std::make_shared( + std::make_shared(std::make_shared()))}, {"exception_code", std::make_shared()}, {"exception", std::make_shared()}, {"stack_trace", std::make_shared()}, @@ -161,6 +163,7 @@ void QueryLogElement::appendToBlock(MutableColumns & columns) const auto & column_tables = typeid_cast(*columns[i++]); auto & column_columns = typeid_cast(*columns[i++]); auto & column_projections = typeid_cast(*columns[i++]); + auto & column_views = typeid_cast(*columns[i++]); auto fill_column = [](const std::set & data, ColumnArray & column) { @@ -178,6 +181,7 @@ void QueryLogElement::appendToBlock(MutableColumns & columns) const fill_column(query_tables, column_tables); fill_column(query_columns, column_columns); fill_column(query_projections, column_projections); + fill_column(query_views, column_views); } columns[i++]->insert(exception_code); diff --git a/src/Interpreters/QueryLog.h b/src/Interpreters/QueryLog.h index aad3e56190b..2713febe1b6 100644 --- a/src/Interpreters/QueryLog.h +++ b/src/Interpreters/QueryLog.h @@ -59,6 +59,7 @@ struct QueryLogElement std::set query_tables; std::set query_columns; std::set query_projections; + std::set query_views; std::unordered_set used_aggregate_functions; std::unordered_set used_aggregate_function_combinators; diff --git a/src/Interpreters/QueryNormalizer.cpp b/src/Interpreters/QueryNormalizer.cpp index ea61ade2b49..7c820622c37 100644 --- a/src/Interpreters/QueryNormalizer.cpp +++ b/src/Interpreters/QueryNormalizer.cpp @@ -256,6 +256,9 @@ void QueryNormalizer::visit(ASTPtr & ast, Data & data) visit(*node_select, ast, data); else if (auto * node_param = ast->as()) throw Exception("Query parameter " + backQuote(node_param->name) + " was not set", ErrorCodes::UNKNOWN_QUERY_PARAMETER); + else if (auto * node_function = ast->as()) + if (node_function->parameters) + visit(node_function->parameters, data); /// If we replace the root of the subtree, we will be called again for the new root, in case the alias is replaced by an alias. if (ast.get() != initial_ast.get()) diff --git a/src/Interpreters/QueryPriorities.h b/src/Interpreters/QueryPriorities.h index 4a271510537..9e18e7bcff3 100644 --- a/src/Interpreters/QueryPriorities.h +++ b/src/Interpreters/QueryPriorities.h @@ -6,8 +6,6 @@ #include #include #include -#include - namespace CurrentMetrics { diff --git a/src/Interpreters/QueryViewsLog.cpp b/src/Interpreters/QueryViewsLog.cpp new file mode 100644 index 00000000000..fa6fcf66a87 --- /dev/null +++ b/src/Interpreters/QueryViewsLog.cpp @@ -0,0 +1,104 @@ +#include "QueryViewsLog.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ +NamesAndTypesList QueryViewsLogElement::getNamesAndTypes() +{ + auto view_status_datatype = std::make_shared(DataTypeEnum8::Values{ + {"QueryStart", static_cast(QUERY_START)}, + {"QueryFinish", static_cast(QUERY_FINISH)}, + {"ExceptionBeforeStart", static_cast(EXCEPTION_BEFORE_START)}, + {"ExceptionWhileProcessing", static_cast(EXCEPTION_WHILE_PROCESSING)}}); + + auto view_type_datatype = std::make_shared(DataTypeEnum8::Values{ + {"Default", static_cast(ViewType::DEFAULT)}, + {"Materialized", static_cast(ViewType::MATERIALIZED)}, + {"Live", static_cast(ViewType::LIVE)}}); + + return { + {"event_date", std::make_shared()}, + {"event_time", std::make_shared()}, + {"event_time_microseconds", std::make_shared(6)}, + {"view_duration_ms", std::make_shared()}, + + {"initial_query_id", std::make_shared()}, + {"view_name", std::make_shared()}, + {"view_uuid", std::make_shared()}, + {"view_type", std::move(view_type_datatype)}, + {"view_query", std::make_shared()}, + {"view_target", std::make_shared()}, + + {"read_rows", std::make_shared()}, + {"read_bytes", std::make_shared()}, + {"written_rows", std::make_shared()}, + {"written_bytes", std::make_shared()}, + {"peak_memory_usage", std::make_shared()}, + {"ProfileEvents", std::make_shared(std::make_shared(), std::make_shared())}, + + {"status", std::move(view_status_datatype)}, + {"exception_code", std::make_shared()}, + {"exception", std::make_shared()}, + {"stack_trace", std::make_shared()}}; +} + +NamesAndAliases QueryViewsLogElement::getNamesAndAliases() +{ + return { + {"ProfileEvents.Names", {std::make_shared(std::make_shared())}, "mapKeys(ProfileEvents)"}, + {"ProfileEvents.Values", {std::make_shared(std::make_shared())}, "mapValues(ProfileEvents)"}}; +} + +void QueryViewsLogElement::appendToBlock(MutableColumns & columns) const +{ + size_t i = 0; + + columns[i++]->insert(DateLUT::instance().toDayNum(event_time).toUnderType()); // event_date + columns[i++]->insert(event_time); + columns[i++]->insert(event_time_microseconds); + columns[i++]->insert(view_duration_ms); + + columns[i++]->insertData(initial_query_id.data(), initial_query_id.size()); + columns[i++]->insertData(view_name.data(), view_name.size()); + columns[i++]->insert(view_uuid); + columns[i++]->insert(view_type); + columns[i++]->insertData(view_query.data(), view_query.size()); + columns[i++]->insertData(view_target.data(), view_target.size()); + + columns[i++]->insert(read_rows); + columns[i++]->insert(read_bytes); + columns[i++]->insert(written_rows); + columns[i++]->insert(written_bytes); + columns[i++]->insert(peak_memory_usage); + + if (profile_counters) + { + auto * column = columns[i++].get(); + ProfileEvents::dumpToMapColumn(*profile_counters, column, true); + } + else + { + columns[i++]->insertDefault(); + } + + columns[i++]->insert(status); + columns[i++]->insert(exception_code); + columns[i++]->insertData(exception.data(), exception.size()); + columns[i++]->insertData(stack_trace.data(), stack_trace.size()); +} + +} diff --git a/src/Interpreters/QueryViewsLog.h b/src/Interpreters/QueryViewsLog.h new file mode 100644 index 00000000000..e751224a51e --- /dev/null +++ b/src/Interpreters/QueryViewsLog.h @@ -0,0 +1,87 @@ +#pragma once + +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +namespace ProfileEvents +{ +class Counters; +} + +namespace DB +{ +class ThreadStatus; + +struct QueryViewsLogElement +{ + using ViewStatus = QueryLogElementType; + + enum class ViewType : int8_t + { + DEFAULT = 1, + MATERIALIZED = 2, + LIVE = 3 + }; + + struct ViewRuntimeStats + { + String target_name; + ViewType type = ViewType::DEFAULT; + std::unique_ptr thread_status = nullptr; + UInt64 elapsed_ms = 0; + std::chrono::time_point event_time; + ViewStatus event_status = ViewStatus::QUERY_START; + + void setStatus(ViewStatus s) + { + event_status = s; + event_time = std::chrono::system_clock::now(); + } + }; + + time_t event_time{}; + Decimal64 event_time_microseconds{}; + UInt64 view_duration_ms{}; + + String initial_query_id; + String view_name; + UUID view_uuid{UUIDHelpers::Nil}; + ViewType view_type{ViewType::DEFAULT}; + String view_query; + String view_target; + + UInt64 read_rows{}; + UInt64 read_bytes{}; + UInt64 written_rows{}; + UInt64 written_bytes{}; + Int64 peak_memory_usage{}; + std::shared_ptr profile_counters; + + ViewStatus status = ViewStatus::QUERY_START; + Int32 exception_code{}; + String exception; + String stack_trace; + + static std::string name() { return "QueryLog"; } + + static NamesAndTypesList getNamesAndTypes(); + static NamesAndAliases getNamesAndAliases(); + void appendToBlock(MutableColumns & columns) const; +}; + + +class QueryViewsLog : public SystemLog +{ + using SystemLog::SystemLog; +}; + +} diff --git a/src/Interpreters/RequiredSourceColumnsVisitor.cpp b/src/Interpreters/RequiredSourceColumnsVisitor.cpp index 2f2a68656bc..21ec94a6917 100644 --- a/src/Interpreters/RequiredSourceColumnsVisitor.cpp +++ b/src/Interpreters/RequiredSourceColumnsVisitor.cpp @@ -123,6 +123,17 @@ void RequiredSourceColumnsMatcher::visit(const ASTSelectQuery & select, const AS data.addColumnAliasIfAny(*node); } + if (const auto & with = select.with()) + { + for (auto & node : with->children) + { + if (const auto * identifier = node->as()) + data.addColumnIdentifier(*identifier); + else + data.addColumnAliasIfAny(*node); + } + } + std::vector out; for (const auto & node : select.children) { diff --git a/src/Interpreters/SynonymsExtensions.cpp b/src/Interpreters/SynonymsExtensions.cpp index 22fa91a4349..6147fa14674 100644 --- a/src/Interpreters/SynonymsExtensions.cpp +++ b/src/Interpreters/SynonymsExtensions.cpp @@ -11,7 +11,7 @@ #include #include -#include +#include // Y_IGNORE namespace DB { diff --git a/src/Interpreters/SystemLog.cpp b/src/Interpreters/SystemLog.cpp index d3224a53ccd..5c5a6e1439a 100644 --- a/src/Interpreters/SystemLog.cpp +++ b/src/Interpreters/SystemLog.cpp @@ -1,13 +1,14 @@ -#include -#include -#include -#include -#include -#include +#include #include #include -#include #include +#include +#include +#include +#include +#include +#include +#include #include #include @@ -104,6 +105,7 @@ SystemLogs::SystemLogs(ContextPtr global_context, const Poco::Util::AbstractConf opentelemetry_span_log = createSystemLog( global_context, "system", "opentelemetry_span_log", config, "opentelemetry_span_log"); + query_views_log = createSystemLog(global_context, "system", "query_views_log", config, "query_views_log"); zookeeper_log = createSystemLog(global_context, "system", "zookeeper_log", config, "zookeeper_log"); if (query_log) @@ -124,6 +126,8 @@ SystemLogs::SystemLogs(ContextPtr global_context, const Poco::Util::AbstractConf logs.emplace_back(asynchronous_metric_log.get()); if (opentelemetry_span_log) logs.emplace_back(opentelemetry_span_log.get()); + if (query_views_log) + logs.emplace_back(query_views_log.get()); if (zookeeper_log) logs.emplace_back(zookeeper_log.get()); diff --git a/src/Interpreters/SystemLog.h b/src/Interpreters/SystemLog.h index b94f3f7d456..a332245439b 100644 --- a/src/Interpreters/SystemLog.h +++ b/src/Interpreters/SystemLog.h @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include @@ -74,6 +73,7 @@ class CrashLog; class MetricLog; class AsynchronousMetricLog; class OpenTelemetrySpanLog; +class QueryViewsLog; class ZooKeeperLog; @@ -111,6 +111,8 @@ struct SystemLogs std::shared_ptr asynchronous_metric_log; /// OpenTelemetry trace spans. std::shared_ptr opentelemetry_span_log; + /// Used to log queries of materialized and live views + std::shared_ptr query_views_log; /// Used to log all actions of ZooKeeper client std::shared_ptr zookeeper_log; diff --git a/src/Interpreters/ThreadStatusExt.cpp b/src/Interpreters/ThreadStatusExt.cpp index 8590b3c94f3..2917a399906 100644 --- a/src/Interpreters/ThreadStatusExt.cpp +++ b/src/Interpreters/ThreadStatusExt.cpp @@ -1,12 +1,17 @@ #include +#include #include -#include #include +#include #include +#include +#include #include #include +#include #include +#include #include #include #include @@ -18,6 +23,14 @@ # include #endif +namespace ProfileEvents +{ +extern const Event SelectedRows; +extern const Event SelectedBytes; +extern const Event InsertedRows; +extern const Event InsertedBytes; +} + /// Implement some methods of ThreadStatus and CurrentThread here to avoid extra linking dependencies in clickhouse_common_io /// TODO It doesn't make sense. @@ -287,8 +300,18 @@ void ThreadStatus::finalizePerformanceCounters() } } +void ThreadStatus::resetPerformanceCountersLastUsage() +{ + *last_rusage = RUsageCounters::current(); + if (taskstats) + taskstats->reset(); +} + void ThreadStatus::initQueryProfiler() { + if (!query_profiled_enabled) + return; + /// query profilers are useless without trace collector auto global_context_ptr = global_context.lock(); if (!global_context_ptr || !global_context_ptr->hasTraceCollector()) @@ -455,6 +478,64 @@ void ThreadStatus::logToQueryThreadLog(QueryThreadLog & thread_log, const String thread_log.add(elem); } +static String getCleanQueryAst(const ASTPtr q, ContextPtr context) +{ + String res = serializeAST(*q, true); + if (auto * masker = SensitiveDataMasker::getInstance()) + masker->wipeSensitiveData(res); + + res = res.substr(0, context->getSettingsRef().log_queries_cut_to_length); + + return res; +} + +void ThreadStatus::logToQueryViewsLog(const ViewRuntimeData & vinfo) +{ + auto query_context_ptr = query_context.lock(); + if (!query_context_ptr) + return; + auto views_log = query_context_ptr->getQueryViewsLog(); + if (!views_log) + return; + + QueryViewsLogElement element; + + element.event_time = time_in_seconds(vinfo.runtime_stats.event_time); + element.event_time_microseconds = time_in_microseconds(vinfo.runtime_stats.event_time); + element.view_duration_ms = vinfo.runtime_stats.elapsed_ms; + + element.initial_query_id = query_id; + element.view_name = vinfo.table_id.getFullTableName(); + element.view_uuid = vinfo.table_id.uuid; + element.view_type = vinfo.runtime_stats.type; + if (vinfo.query) + element.view_query = getCleanQueryAst(vinfo.query, query_context_ptr); + element.view_target = vinfo.runtime_stats.target_name; + + auto events = std::make_shared(performance_counters.getPartiallyAtomicSnapshot()); + element.read_rows = progress_in.read_rows.load(std::memory_order_relaxed); + element.read_bytes = progress_in.read_bytes.load(std::memory_order_relaxed); + element.written_rows = (*events)[ProfileEvents::InsertedRows]; + element.written_bytes = (*events)[ProfileEvents::InsertedBytes]; + element.peak_memory_usage = memory_tracker.getPeak() > 0 ? memory_tracker.getPeak() : 0; + if (query_context_ptr->getSettingsRef().log_profile_events != 0) + { + element.profile_counters = events; + } + + element.status = vinfo.runtime_stats.event_status; + element.exception_code = 0; + if (vinfo.exception) + { + element.exception_code = getExceptionErrorCode(vinfo.exception); + element.exception = getExceptionMessage(vinfo.exception, false); + if (query_context_ptr->getSettingsRef().calculate_text_stack_trace) + element.stack_trace = getExceptionStackTraceString(vinfo.exception); + } + + views_log->add(element); +} + void CurrentThread::initializeQuery() { if (unlikely(!current_thread)) diff --git a/src/Interpreters/addTypeConversionToAST.cpp b/src/Interpreters/addTypeConversionToAST.cpp index ba67ec762a9..2f766880253 100644 --- a/src/Interpreters/addTypeConversionToAST.cpp +++ b/src/Interpreters/addTypeConversionToAST.cpp @@ -20,7 +20,7 @@ namespace ErrorCodes ASTPtr addTypeConversionToAST(ASTPtr && ast, const String & type_name) { - auto func = makeASTFunction("CAST", ast, std::make_shared(type_name)); + auto func = makeASTFunction("_CAST", ast, std::make_shared(type_name)); if (ASTWithAlias * ast_with_alias = dynamic_cast(ast.get())) { diff --git a/src/Interpreters/castColumn.cpp b/src/Interpreters/castColumn.cpp index 181cca1e017..fd71e02ee7e 100644 --- a/src/Interpreters/castColumn.cpp +++ b/src/Interpreters/castColumn.cpp @@ -1,6 +1,7 @@ #include #include +#include namespace DB { @@ -21,7 +22,7 @@ static ColumnPtr castColumn(const ColumnWithTypeAndName & arg, const DataTypePtr } }; - FunctionOverloadResolverPtr func_builder_cast = CastOverloadResolver::createImpl(false); + FunctionOverloadResolverPtr func_builder_cast = CastInternalOverloadResolver::createImpl(); auto func_cast = func_builder_cast->build(arguments); diff --git a/src/Interpreters/executeQuery.cpp b/src/Interpreters/executeQuery.cpp index 1b59f3bc7df..3ebc2eb142c 100644 --- a/src/Interpreters/executeQuery.cpp +++ b/src/Interpreters/executeQuery.cpp @@ -663,6 +663,7 @@ static std::tuple executeQueryImpl( elem.query_tables = info.tables; elem.query_columns = info.columns; elem.query_projections = info.projections; + elem.query_views = info.views; } interpreter->extendQueryLogElem(elem, ast, context, query_database, query_table); @@ -708,6 +709,15 @@ static std::tuple executeQueryImpl( element.thread_ids = std::move(info.thread_ids); element.profile_counters = std::move(info.profile_counters); + /// We need to refresh the access info since dependent views might have added extra information, either during + /// creation of the view (PushingToViewsBlockOutputStream) or while executing its internal SELECT + const auto & access_info = context_ptr->getQueryAccessInfo(); + element.query_databases.insert(access_info.databases.begin(), access_info.databases.end()); + element.query_tables.insert(access_info.tables.begin(), access_info.tables.end()); + element.query_columns.insert(access_info.columns.begin(), access_info.columns.end()); + element.query_projections.insert(access_info.projections.begin(), access_info.projections.end()); + element.query_views.insert(access_info.views.begin(), access_info.views.end()); + const auto & factories_info = context_ptr->getQueryFactoriesInfo(); element.used_aggregate_functions = factories_info.aggregate_functions; element.used_aggregate_function_combinators = factories_info.aggregate_function_combinators; diff --git a/src/Interpreters/inplaceBlockConversions.cpp b/src/Interpreters/inplaceBlockConversions.cpp index 26cf6912bc7..e40e0635a85 100644 --- a/src/Interpreters/inplaceBlockConversions.cpp +++ b/src/Interpreters/inplaceBlockConversions.cpp @@ -52,7 +52,7 @@ void addDefaultRequiredExpressionsRecursively( RequiredSourceColumnsVisitor(columns_context).visit(column_default_expr); NameSet required_columns_names = columns_context.requiredColumns(); - auto expr = makeASTFunction("CAST", column_default_expr, std::make_shared(columns.get(required_column_name).type->getName())); + auto expr = makeASTFunction("_CAST", column_default_expr, std::make_shared(columns.get(required_column_name).type->getName())); if (is_column_in_query && convert_null_to_default) expr = makeASTFunction("ifNull", std::make_shared(required_column_name), std::move(expr)); @@ -101,7 +101,7 @@ ASTPtr convertRequiredExpressions(Block & block, const NamesAndTypesList & requi continue; auto cast_func = makeASTFunction( - "CAST", std::make_shared(required_column.name), std::make_shared(required_column.type->getName())); + "_CAST", std::make_shared(required_column.name), std::make_shared(required_column.type->getName())); conversion_expr_list->children.emplace_back(setAlias(cast_func, required_column.name)); diff --git a/src/Interpreters/ya.make b/src/Interpreters/ya.make index 462c778bf3d..c0816bb671c 100644 --- a/src/Interpreters/ya.make +++ b/src/Interpreters/ya.make @@ -131,6 +131,7 @@ SRCS( QueryNormalizer.cpp QueryParameterVisitor.cpp QueryThreadLog.cpp + QueryViewsLog.cpp RemoveInjectiveFunctionsVisitor.cpp RenameColumnVisitor.cpp ReplaceQueryParameterVisitor.cpp diff --git a/src/Parsers/ASTFunctionHelpers.h b/src/Parsers/ASTFunctionHelpers.h new file mode 100644 index 00000000000..76da2dd1501 --- /dev/null +++ b/src/Parsers/ASTFunctionHelpers.h @@ -0,0 +1,16 @@ +#pragma once + +#include + + +namespace DB +{ + +static bool isFunctionCast(const ASTFunction * function) +{ + if (function) + return function->name == "CAST" || function->name == "_CAST"; + return false; +} + +} diff --git a/src/Processors/Formats/IRowInputFormat.h b/src/Processors/Formats/IRowInputFormat.h index 2ca182b7ffe..19a94d41044 100644 --- a/src/Processors/Formats/IRowInputFormat.h +++ b/src/Processors/Formats/IRowInputFormat.h @@ -5,8 +5,8 @@ #include #include #include -#include +class Stopwatch; namespace DB { diff --git a/src/Processors/Formats/Impl/ArrowBlockInputFormat.cpp b/src/Processors/Formats/Impl/ArrowBlockInputFormat.cpp index 269faac5258..84ca789261f 100644 --- a/src/Processors/Formats/Impl/ArrowBlockInputFormat.cpp +++ b/src/Processors/Formats/Impl/ArrowBlockInputFormat.cpp @@ -22,8 +22,8 @@ namespace ErrorCodes extern const int CANNOT_READ_ALL_DATA; } -ArrowBlockInputFormat::ArrowBlockInputFormat(ReadBuffer & in_, const Block & header_, bool stream_) - : IInputFormat(header_, in_), stream{stream_} +ArrowBlockInputFormat::ArrowBlockInputFormat(ReadBuffer & in_, const Block & header_, bool stream_, const FormatSettings & format_settings_) + : IInputFormat(header_, in_), stream{stream_}, format_settings(format_settings_) { } @@ -102,7 +102,7 @@ void ArrowBlockInputFormat::prepareReader() schema = file_reader->schema(); } - arrow_column_to_ch_column = std::make_unique(getPort().getHeader(), std::move(schema), "Arrow"); + arrow_column_to_ch_column = std::make_unique(getPort().getHeader(), "Arrow", format_settings.arrow.import_nested); if (stream) record_batch_total = -1; @@ -119,9 +119,9 @@ void registerInputFormatProcessorArrow(FormatFactory & factory) [](ReadBuffer & buf, const Block & sample, const RowInputFormatParams & /* params */, - const FormatSettings & /* format_settings */) + const FormatSettings & format_settings) { - return std::make_shared(buf, sample, false); + return std::make_shared(buf, sample, false, format_settings); }); factory.markFormatAsColumnOriented("Arrow"); factory.registerInputFormatProcessor( @@ -129,9 +129,9 @@ void registerInputFormatProcessorArrow(FormatFactory & factory) [](ReadBuffer & buf, const Block & sample, const RowInputFormatParams & /* params */, - const FormatSettings & /* format_settings */) + const FormatSettings & format_settings) { - return std::make_shared(buf, sample, true); + return std::make_shared(buf, sample, true, format_settings); }); } diff --git a/src/Processors/Formats/Impl/ArrowBlockInputFormat.h b/src/Processors/Formats/Impl/ArrowBlockInputFormat.h index 9f458dece7f..705c47c9b17 100644 --- a/src/Processors/Formats/Impl/ArrowBlockInputFormat.h +++ b/src/Processors/Formats/Impl/ArrowBlockInputFormat.h @@ -6,6 +6,7 @@ #if USE_ARROW #include +#include namespace arrow { class RecordBatchReader; } namespace arrow::ipc { class RecordBatchFileReader; } @@ -19,7 +20,7 @@ class ArrowColumnToCHColumn; class ArrowBlockInputFormat : public IInputFormat { public: - ArrowBlockInputFormat(ReadBuffer & in_, const Block & header_, bool stream_); + ArrowBlockInputFormat(ReadBuffer & in_, const Block & header_, bool stream_, const FormatSettings & format_settings_); void resetParser() override; @@ -41,6 +42,8 @@ private: int record_batch_total = 0; int record_batch_current = 0; + const FormatSettings format_settings; + void prepareReader(); }; diff --git a/src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp b/src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp index 84c56f0f2b7..2da4d7d298d 100644 --- a/src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp +++ b/src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp @@ -10,10 +10,12 @@ #include #include #include +#include +#include #include +#include #include #include -#include #include #include #include @@ -22,17 +24,18 @@ #include #include #include +#include #include #include +#include #include +/// UINT16 and UINT32 are processed separately, see comments in readColumnFromArrowColumn. #define FOR_ARROW_NUMERIC_TYPES(M) \ M(arrow::Type::UINT8, DB::UInt8) \ M(arrow::Type::INT8, DB::Int8) \ - M(arrow::Type::UINT16, DB::UInt16) \ M(arrow::Type::INT16, DB::Int16) \ - M(arrow::Type::UINT32, DB::UInt32) \ M(arrow::Type::INT32, DB::Int32) \ M(arrow::Type::UINT64, DB::UInt64) \ M(arrow::Type::INT64, DB::Int64) \ @@ -50,7 +53,6 @@ M(arrow::Type::UINT64, DB::UInt64) \ M(arrow::Type::INT64, DB::UInt64) - namespace DB { @@ -58,47 +60,19 @@ namespace ErrorCodes { extern const int UNKNOWN_TYPE; extern const int VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE; - extern const int CANNOT_CONVERT_TYPE; - extern const int CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN; extern const int THERE_IS_NO_COLUMN; extern const int BAD_ARGUMENTS; + extern const int UNKNOWN_EXCEPTION; } -static const std::initializer_list> arrow_type_to_internal_type = -{ - {arrow::Type::UINT8, "UInt8"}, - {arrow::Type::INT8, "Int8"}, - {arrow::Type::UINT16, "UInt16"}, - {arrow::Type::INT16, "Int16"}, - {arrow::Type::UINT32, "UInt32"}, - {arrow::Type::INT32, "Int32"}, - {arrow::Type::UINT64, "UInt64"}, - {arrow::Type::INT64, "Int64"}, - {arrow::Type::HALF_FLOAT, "Float32"}, - {arrow::Type::FLOAT, "Float32"}, - {arrow::Type::DOUBLE, "Float64"}, - - {arrow::Type::BOOL, "UInt8"}, - {arrow::Type::DATE32, "Date"}, - {arrow::Type::DATE32, "Date32"}, - {arrow::Type::DATE64, "DateTime"}, - {arrow::Type::TIMESTAMP, "DateTime"}, - - {arrow::Type::STRING, "String"}, - {arrow::Type::BINARY, "String"}, - - // TODO: add other types that are convertible to internal ones: - // 0. ENUM? - // 1. UUID -> String - // 2. JSON -> String - // Full list of types: contrib/arrow/cpp/src/arrow/type.h -}; /// Inserts numeric data right into internal column data to reduce an overhead template > -static void fillColumnWithNumericData(std::shared_ptr & arrow_column, IColumn & internal_column) +static ColumnWithTypeAndName readColumnWithNumericData(std::shared_ptr & arrow_column, const String & column_name) { - auto & column_data = static_cast(internal_column).getData(); + auto internal_type = std::make_shared>(); + auto internal_column = internal_type->createColumn(); + auto & column_data = static_cast(*internal_column).getData(); column_data.reserve(arrow_column->length()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) @@ -110,15 +84,18 @@ static void fillColumnWithNumericData(std::shared_ptr & arr const auto * raw_data = reinterpret_cast(buffer->data()); column_data.insert_assume_reserved(raw_data, raw_data + chunk->length()); } + return {std::move(internal_column), std::move(internal_type), column_name}; } /// Inserts chars and offsets right into internal column data to reduce an overhead. /// Internal offsets are shifted by one to the right in comparison with Arrow ones. So the last offset should map to the end of all chars. /// Also internal strings are null terminated. -static void fillColumnWithStringData(std::shared_ptr & arrow_column, IColumn & internal_column) +static ColumnWithTypeAndName readColumnWithStringData(std::shared_ptr & arrow_column, const String & column_name) { - PaddedPODArray & column_chars_t = assert_cast(internal_column).getChars(); - PaddedPODArray & column_offsets = assert_cast(internal_column).getOffsets(); + auto internal_type = std::make_shared(); + auto internal_column = internal_type->createColumn(); + PaddedPODArray & column_chars_t = assert_cast(*internal_column).getChars(); + PaddedPODArray & column_offsets = assert_cast(*internal_column).getOffsets(); size_t chars_t_size = 0; for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) @@ -154,11 +131,14 @@ static void fillColumnWithStringData(std::shared_ptr & arro column_offsets.emplace_back(column_chars_t.size()); } } + return {std::move(internal_column), std::move(internal_type), column_name}; } -static void fillColumnWithBooleanData(std::shared_ptr & arrow_column, IColumn & internal_column) +static ColumnWithTypeAndName readColumnWithBooleanData(std::shared_ptr & arrow_column, const String & column_name) { - auto & column_data = assert_cast &>(internal_column).getData(); + auto internal_type = std::make_shared(); + auto internal_column = internal_type->createColumn(); + auto & column_data = assert_cast &>(*internal_column).getData(); column_data.reserve(arrow_column->length()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) @@ -170,35 +150,14 @@ static void fillColumnWithBooleanData(std::shared_ptr & arr for (size_t bool_i = 0; bool_i != static_cast(chunk.length()); ++bool_i) column_data.emplace_back(chunk.Value(bool_i)); } + return {std::move(internal_column), std::move(internal_type), column_name}; } -/// Arrow stores Parquet::DATE in Int32, while ClickHouse stores Date in UInt16. Therefore, it should be checked before saving -static void fillColumnWithDate32Data(std::shared_ptr & arrow_column, IColumn & internal_column) +static ColumnWithTypeAndName readColumnWithDate32Data(std::shared_ptr & arrow_column, const String & column_name) { - PaddedPODArray & column_data = assert_cast &>(internal_column).getData(); - column_data.reserve(arrow_column->length()); - - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - arrow::Date32Array & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); - - for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) - { - UInt32 days_num = static_cast(chunk.Value(value_i)); - - if (days_num > DATE_LUT_MAX_DAY_NUM) - throw Exception(ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE, - "Input value {} of a column '{}' is greater than max allowed Date value, which is {}", - days_num, internal_column.getName(), DATE_LUT_MAX_DAY_NUM); - - column_data.emplace_back(days_num); - } - } -} - -static void fillDate32ColumnWithDate32Data(std::shared_ptr & arrow_column, IColumn & internal_column) -{ - PaddedPODArray & column_data = assert_cast &>(internal_column).getData(); + auto internal_type = std::make_shared(); + auto internal_column = internal_type->createColumn(); + PaddedPODArray & column_data = assert_cast &>(*internal_column).getData(); column_data.reserve(arrow_column->length()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) @@ -209,18 +168,21 @@ static void fillDate32ColumnWithDate32Data(std::shared_ptr { Int32 days_num = static_cast(chunk.Value(value_i)); if (days_num > DATE_LUT_MAX_EXTEND_DAY_NUM) - throw Exception(ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE, - "Input value {} of a column '{}' is greater than max allowed Date value, which is {}", days_num, internal_column.getName(), DATE_LUT_MAX_DAY_NUM); + throw Exception{ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE, + "Input value {} of a column \"{}\" is greater than max allowed Date value, which is {}", days_num, column_name, DATE_LUT_MAX_DAY_NUM}; column_data.emplace_back(days_num); } } + return {std::move(internal_column), std::move(internal_type), column_name}; } /// Arrow stores Parquet::DATETIME in Int64, while ClickHouse stores DateTime in UInt32. Therefore, it should be checked before saving -static void fillColumnWithDate64Data(std::shared_ptr & arrow_column, IColumn & internal_column) +static ColumnWithTypeAndName readColumnWithDate64Data(std::shared_ptr & arrow_column, const String & column_name) { - auto & column_data = assert_cast &>(internal_column).getData(); + auto internal_type = std::make_shared(); + auto internal_column = internal_type->createColumn(); + auto & column_data = assert_cast &>(*internal_column).getData(); column_data.reserve(arrow_column->length()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) @@ -232,11 +194,14 @@ static void fillColumnWithDate64Data(std::shared_ptr & arro column_data.emplace_back(timestamp); } } + return {std::move(internal_column), std::move(internal_type), column_name}; } -static void fillColumnWithTimestampData(std::shared_ptr & arrow_column, IColumn & internal_column) +static ColumnWithTypeAndName readColumnWithTimestampData(std::shared_ptr & arrow_column, const String & column_name) { - auto & column_data = assert_cast &>(internal_column).getData(); + auto internal_type = std::make_shared(); + auto internal_column = internal_type->createColumn(); + auto & column_data = assert_cast &>(*internal_column).getData(); column_data.reserve(arrow_column->length()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) @@ -268,29 +233,35 @@ static void fillColumnWithTimestampData(std::shared_ptr & a column_data.emplace_back(timestamp); } } + return {std::move(internal_column), std::move(internal_type), column_name}; } template -static void fillColumnWithDecimalData(std::shared_ptr & arrow_column, IColumn & internal_column) +static ColumnWithTypeAndName readColumnWithDecimalData(std::shared_ptr & arrow_column, const String & column_name) { - auto & column = assert_cast &>(internal_column); + const auto * arrow_decimal_type = static_cast(arrow_column->type().get()); + auto internal_type = std::make_shared>(arrow_decimal_type->precision(), arrow_decimal_type->scale()); + auto internal_column = internal_type->createColumn(); + auto & column = assert_cast &>(*internal_column); auto & column_data = column.getData(); column_data.reserve(arrow_column->length()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - auto & chunk = static_cast(*(arrow_column->chunk(chunk_i))); + auto & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) { column_data.emplace_back(chunk.IsNull(value_i) ? DecimalType(0) : *reinterpret_cast(chunk.Value(value_i))); // TODO: copy column } } + return {std::move(internal_column), std::move(internal_type), column_name}; } /// Creates a null bytemap from arrow's null bitmap -static void fillByteMapFromArrowColumn(std::shared_ptr & arrow_column, IColumn & bytemap) +static ColumnPtr readByteMapFromArrowColumn(std::shared_ptr & arrow_column) { - PaddedPODArray & bytemap_data = assert_cast &>(bytemap).getData(); + auto nullmap_column = ColumnUInt8::create(); + PaddedPODArray & bytemap_data = assert_cast &>(*nullmap_column).getData(); bytemap_data.reserve(arrow_column->length()); for (size_t chunk_i = 0; chunk_i != static_cast(arrow_column->num_chunks()); ++chunk_i) @@ -300,11 +271,13 @@ static void fillByteMapFromArrowColumn(std::shared_ptr & ar for (size_t value_i = 0; value_i != static_cast(chunk->length()); ++value_i) bytemap_data.emplace_back(chunk->IsNull(value_i)); } + return nullmap_column; } -static void fillOffsetsFromArrowListColumn(std::shared_ptr & arrow_column, IColumn & offsets) +static ColumnPtr readOffsetsFromArrowListColumn(std::shared_ptr & arrow_column) { - ColumnArray::Offsets & offsets_data = assert_cast &>(offsets).getData(); + auto offsets_column = ColumnUInt64::create(); + ColumnArray::Offsets & offsets_data = assert_cast &>(*offsets_column).getData(); offsets_data.reserve(arrow_column->length()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) @@ -316,18 +289,18 @@ static void fillOffsetsFromArrowListColumn(std::shared_ptr for (int64_t i = 1; i < arrow_offsets.length(); ++i) offsets_data.emplace_back(start + arrow_offsets.Value(i)); } + return offsets_column; } -static ColumnPtr createAndFillColumnWithIndexesData(std::shared_ptr & arrow_column) + +static ColumnPtr readColumnWithIndexesData(std::shared_ptr & arrow_column) { switch (arrow_column->type()->id()) { # define DISPATCH(ARROW_NUMERIC_TYPE, CPP_NUMERIC_TYPE) \ - case ARROW_NUMERIC_TYPE: \ - { \ - auto column = DataTypeNumber().createColumn(); \ - fillColumnWithNumericData(arrow_column, *column); \ - return column; \ - } + case ARROW_NUMERIC_TYPE: \ + { \ + return readColumnWithNumericData(arrow_column, "").column; \ + } FOR_ARROW_INDEXES_TYPES(DISPATCH) # undef DISPATCH default: @@ -335,32 +308,34 @@ static ColumnPtr createAndFillColumnWithIndexesData(std::shared_ptr getNestedArrowColumn(std::shared_ptr & arrow_column) +{ + arrow::ArrayVector array_vector; + array_vector.reserve(arrow_column->num_chunks()); + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + { + arrow::ListArray & list_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + std::shared_ptr chunk = list_chunk.values(); + array_vector.emplace_back(std::move(chunk)); + } + return std::make_shared(array_vector); +} + +static ColumnWithTypeAndName readColumnFromArrowColumn( std::shared_ptr & arrow_column, - IColumn & internal_column, const std::string & column_name, const std::string & format_name, bool is_nullable, - std::unordered_map dictionary_values) + std::unordered_map> & dictionary_values) { - if (internal_column.isNullable()) - { - ColumnNullable & column_nullable = assert_cast(internal_column); - readColumnFromArrowColumn( - arrow_column, column_nullable.getNestedColumn(), column_name, format_name, true, dictionary_values); - fillByteMapFromArrowColumn(arrow_column, column_nullable.getNullMapColumn()); - return; - } - - /// TODO: check if a column is const? if (!is_nullable && arrow_column->null_count() && arrow_column->type()->id() != arrow::Type::LIST && arrow_column->type()->id() != arrow::Type::MAP && arrow_column->type()->id() != arrow::Type::STRUCT) { - throw Exception - { - ErrorCodes::CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, - "Can not insert NULL data into non-nullable column \"{}\".", column_name - }; + auto nested_column = readColumnFromArrowColumn(arrow_column, column_name, format_name, true, dictionary_values); + auto nullmap_column = readByteMapFromArrowColumn(arrow_column); + auto nullable_type = std::make_shared(std::move(nested_column.type)); + auto nullable_column = ColumnNullable::create(std::move(nested_column.column), std::move(nullmap_column)); + return {std::move(nullable_column), std::move(nullable_type), column_name}; } switch (arrow_column->type()->id()) @@ -368,87 +343,92 @@ static void readColumnFromArrowColumn( case arrow::Type::STRING: case arrow::Type::BINARY: //case arrow::Type::FIXED_SIZE_BINARY: - fillColumnWithStringData(arrow_column, internal_column); - break; + return readColumnWithStringData(arrow_column, column_name); case arrow::Type::BOOL: - fillColumnWithBooleanData(arrow_column, internal_column); - break; + return readColumnWithBooleanData(arrow_column, column_name); case arrow::Type::DATE32: - if (WhichDataType(internal_column.getDataType()).isUInt16()) - { - fillColumnWithDate32Data(arrow_column, internal_column); - } - else - { - fillDate32ColumnWithDate32Data(arrow_column, internal_column); - } - break; + return readColumnWithDate32Data(arrow_column, column_name); case arrow::Type::DATE64: - fillColumnWithDate64Data(arrow_column, internal_column); - break; + return readColumnWithDate64Data(arrow_column, column_name); + // ClickHouse writes Date as arrow UINT16 and DateTime as arrow UINT32, + // so, read UINT16 as Date and UINT32 as DateTime to perform correct conversion + // between Date and DateTime further. + case arrow::Type::UINT16: + { + auto column = readColumnWithNumericData(arrow_column, column_name); + column.type = std::make_shared(); + return column; + } + case arrow::Type::UINT32: + { + auto column = readColumnWithNumericData(arrow_column, column_name); + column.type = std::make_shared(); + return column; + } case arrow::Type::TIMESTAMP: - fillColumnWithTimestampData(arrow_column, internal_column); - break; + return readColumnWithTimestampData(arrow_column, column_name); #if defined(ARCADIA_BUILD) - case arrow::Type::DECIMAL: - fillColumnWithDecimalData(arrow_column, internal_column /*, internal_nested_type*/); - break; + case arrow::Type::DECIMAL: + return readColumnWithDecimalData(arrow_column, column_name); #else case arrow::Type::DECIMAL128: - fillColumnWithDecimalData(arrow_column, internal_column /*, internal_nested_type*/); - break; + return readColumnWithDecimalData(arrow_column, column_name); case arrow::Type::DECIMAL256: - fillColumnWithDecimalData(arrow_column, internal_column /*, internal_nested_type*/); - break; + return readColumnWithDecimalData(arrow_column, column_name); #endif - case arrow::Type::MAP: [[fallthrough]]; + case arrow::Type::MAP: + { + auto arrow_nested_column = getNestedArrowColumn(arrow_column); + auto nested_column = readColumnFromArrowColumn(arrow_nested_column, column_name, format_name, false, dictionary_values); + auto offsets_column = readOffsetsFromArrowListColumn(arrow_column); + + const auto * tuple_column = assert_cast(nested_column.column.get()); + const auto * tuple_type = assert_cast(nested_column.type.get()); + auto map_column = ColumnMap::create(std::move(tuple_column->getColumnPtr(0)), std::move(tuple_column->getColumnPtr(1)), std::move(offsets_column)); + auto map_type = std::make_shared(tuple_type->getElements()[0], tuple_type->getElements()[1]); + return {std::move(map_column), std::move(map_type), column_name}; + } case arrow::Type::LIST: { - arrow::ArrayVector array_vector; - array_vector.reserve(arrow_column->num_chunks()); - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - arrow::ListArray & list_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); - std::shared_ptr chunk = list_chunk.values(); - array_vector.emplace_back(std::move(chunk)); - } - auto arrow_nested_column = std::make_shared(array_vector); - - ColumnArray & column_array = arrow_column->type()->id() == arrow::Type::MAP - ? assert_cast(internal_column).getNestedColumn() - : assert_cast(internal_column); - - readColumnFromArrowColumn( - arrow_nested_column, column_array.getData(), column_name, format_name, false, dictionary_values); - - fillOffsetsFromArrowListColumn(arrow_column, column_array.getOffsetsColumn()); - break; + auto arrow_nested_column = getNestedArrowColumn(arrow_column); + auto nested_column = readColumnFromArrowColumn(arrow_nested_column, column_name, format_name, false, dictionary_values); + auto offsets_column = readOffsetsFromArrowListColumn(arrow_column); + auto array_column = ColumnArray::create(std::move(nested_column.column), std::move(offsets_column)); + auto array_type = std::make_shared(nested_column.type); + return {std::move(array_column), std::move(array_type), column_name}; } case arrow::Type::STRUCT: { - ColumnTuple & column_tuple = assert_cast(internal_column); - int fields_count = column_tuple.tupleSize(); - std::vector nested_arrow_columns(fields_count); + auto arrow_type = arrow_column->type(); + auto * arrow_struct_type = assert_cast(arrow_type.get()); + std::vector nested_arrow_columns(arrow_struct_type->num_fields()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { arrow::StructArray & struct_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); - for (int i = 0; i < fields_count; ++i) + for (int i = 0; i < arrow_struct_type->num_fields(); ++i) nested_arrow_columns[i].emplace_back(struct_chunk.field(i)); } - for (int i = 0; i != fields_count; ++i) + Columns tuple_elements; + DataTypes tuple_types; + std::vector tuple_names; + + for (int i = 0; i != arrow_struct_type->num_fields(); ++i) { auto nested_arrow_column = std::make_shared(nested_arrow_columns[i]); - readColumnFromArrowColumn( - nested_arrow_column, column_tuple.getColumn(i), column_name, format_name, false, dictionary_values); + auto element = readColumnFromArrowColumn(nested_arrow_column, arrow_struct_type->field(i)->name(), format_name, false, dictionary_values); + tuple_elements.emplace_back(std::move(element.column)); + tuple_types.emplace_back(std::move(element.type)); + tuple_names.emplace_back(std::move(element.name)); } - break; + + auto tuple_column = ColumnTuple::create(std::move(tuple_elements)); + auto tuple_type = std::make_shared(std::move(tuple_types), std::move(tuple_names)); + return {std::move(tuple_column), std::move(tuple_type), column_name}; } case arrow::Type::DICTIONARY: { - ColumnLowCardinality & column_lc = assert_cast(internal_column); auto & dict_values = dictionary_values[column_name]; - /// Load dictionary values only once and reuse it. if (!dict_values) { @@ -459,14 +439,14 @@ static void readColumnFromArrowColumn( dict_array.emplace_back(dict_chunk.dictionary()); } auto arrow_dict_column = std::make_shared(dict_array); + auto dict_column = readColumnFromArrowColumn(arrow_dict_column, column_name, format_name, false, dictionary_values); - auto dict_column = IColumn::mutate(column_lc.getDictionaryPtr()); - auto * uniq_column = static_cast(dict_column.get()); - auto values_column = uniq_column->getNestedColumn()->cloneEmpty(); - readColumnFromArrowColumn( - arrow_dict_column, *values_column, column_name, format_name, false, dictionary_values); - uniq_column->uniqueInsertRangeFrom(*values_column, 0, values_column->size()); - dict_values = std::move(dict_column); + /// We should convert read column to ColumnUnique. + auto tmp_lc_column = DataTypeLowCardinality(dict_column.type).createColumn(); + auto tmp_dict_column = IColumn::mutate(assert_cast(tmp_lc_column.get())->getDictionaryPtr()); + static_cast(tmp_dict_column.get())->uniqueInsertRangeFrom(*dict_column.column, 0, dict_column.column->size()); + dict_column.column = std::move(tmp_dict_column); + dict_values = std::make_shared(std::move(dict_column)); } arrow::ArrayVector indexes_array; @@ -477,17 +457,14 @@ static void readColumnFromArrowColumn( } auto arrow_indexes_column = std::make_shared(indexes_array); - auto indexes_column = createAndFillColumnWithIndexesData(arrow_indexes_column); - - auto new_column_lc = ColumnLowCardinality::create(dict_values, std::move(indexes_column)); - column_lc = std::move(*new_column_lc); - break; + auto indexes_column = readColumnWithIndexesData(arrow_indexes_column); + auto lc_column = ColumnLowCardinality::create(dict_values->column, std::move(indexes_column)); + auto lc_type = std::make_shared(dict_values->type); + return {std::move(lc_column), std::move(lc_type), column_name}; } # define DISPATCH(ARROW_NUMERIC_TYPE, CPP_NUMERIC_TYPE) \ - case ARROW_NUMERIC_TYPE: \ - fillColumnWithNumericData(arrow_column, internal_column); \ - break; - + case ARROW_NUMERIC_TYPE: \ + return readColumnWithNumericData(arrow_column, column_name); FOR_ARROW_NUMERIC_TYPES(DISPATCH) # undef DISPATCH // TODO: support TIMESTAMP_MICROS and TIMESTAMP_MILLIS with truncated micro- and milliseconds? @@ -495,144 +472,52 @@ static void readColumnFromArrowColumn( // TODO: read UUID as a string? default: throw Exception(ErrorCodes::UNKNOWN_TYPE, - "Unsupported {} type '{}' of an input column '{}'.", format_name, arrow_column->type()->name(), column_name); + "Unsupported {} type '{}' of an input column '{}'.", format_name, arrow_column->type()->name(), column_name); } } -static DataTypePtr getInternalType( - std::shared_ptr arrow_type, - const DataTypePtr & column_type, - const std::string & column_name, - const std::string & format_name) + +// Creating CH header by arrow schema. Will be useful in task about inserting +// data from file without knowing table structure. + +static void checkStatus(const arrow::Status & status, const String & column_name, const String & format_name) { - if (column_type->isNullable()) - { - DataTypePtr nested_type = assert_cast(column_type.get())->getNestedType(); - return makeNullable(getInternalType(arrow_type, nested_type, column_name, format_name)); - } - -#if defined(ARCADIA_BUILD) - if (arrow_type->id() == arrow::Type::DECIMAL) - { - const auto & decimal_type = dynamic_cast(*arrow_type); - return std::make_shared>(decimal_type.precision(), decimal_type.scale()); - } -#else - if (arrow_type->id() == arrow::Type::DECIMAL128) - { - const auto & decimal_type = dynamic_cast(*arrow_type); - return std::make_shared>(decimal_type.precision(), decimal_type.scale()); - } - - if (arrow_type->id() == arrow::Type::DECIMAL256) - { - const auto & decimal_type = dynamic_cast(*arrow_type); - return std::make_shared>(decimal_type.precision(), decimal_type.scale()); - } -#endif - - if (arrow_type->id() == arrow::Type::LIST) - { - const auto & list_type = dynamic_cast(*arrow_type); - auto list_nested_type = list_type.value_type(); - - const DataTypeArray * array_type = typeid_cast(column_type.get()); - if (!array_type) - throw Exception{ErrorCodes::CANNOT_CONVERT_TYPE, - "Cannot convert arrow LIST type to a not Array ClickHouse type {}.", column_type->getName()}; - - return std::make_shared(getInternalType(list_nested_type, array_type->getNestedType(), column_name, format_name)); - } - - if (arrow_type->id() == arrow::Type::STRUCT) - { - const auto & struct_type = dynamic_cast(*arrow_type); - const DataTypeTuple * tuple_type = typeid_cast(column_type.get()); - if (!tuple_type) - throw Exception{ErrorCodes::CANNOT_CONVERT_TYPE, - "Cannot convert arrow STRUCT type to a not Tuple ClickHouse type {}.", column_type->getName()}; - - const DataTypes & tuple_nested_types = tuple_type->getElements(); - int internal_fields_num = tuple_nested_types.size(); - /// If internal column has less elements then arrow struct, we will select only first internal_fields_num columns. - if (internal_fields_num > struct_type.num_fields()) - throw Exception( - ErrorCodes::CANNOT_CONVERT_TYPE, - "Cannot convert arrow STRUCT with {} fields to a ClickHouse Tuple with {} elements: {}.", - struct_type.num_fields(), - internal_fields_num, - column_type->getName()); - - DataTypes nested_types; - for (int i = 0; i < internal_fields_num; ++i) - nested_types.push_back(getInternalType(struct_type.field(i)->type(), tuple_nested_types[i], column_name, format_name)); - - return std::make_shared(std::move(nested_types)); - } - - if (arrow_type->id() == arrow::Type::DICTIONARY) - { - const auto & arrow_dict_type = dynamic_cast(*arrow_type); - const auto * lc_type = typeid_cast(column_type.get()); - /// We allow to insert arrow dictionary into a non-LowCardinality column. - const auto & dict_type = lc_type ? lc_type->getDictionaryType() : column_type; - return std::make_shared(getInternalType(arrow_dict_type.value_type(), dict_type, column_name, format_name)); - } - - if (arrow_type->id() == arrow::Type::MAP) - { - const auto & arrow_map_type = typeid_cast(*arrow_type); - const auto * map_type = typeid_cast(column_type.get()); - if (!map_type) - throw Exception{ErrorCodes::CANNOT_CONVERT_TYPE, "Cannot convert arrow MAP type to a not Map ClickHouse type {}.", column_type->getName()}; - - return std::make_shared( - getInternalType(arrow_map_type.key_type(), map_type->getKeyType(), column_name, format_name), - getInternalType(arrow_map_type.item_type(), map_type->getValueType(), column_name, format_name)); - } - - if (arrow_type->id() == arrow::Type::UINT16 - && (isDate(column_type) || isDateTime(column_type) || isDate32(column_type) || isDateTime64(column_type))) - { - /// Read UInt16 as Date. It will allow correct conversion to DateTime further. - return std::make_shared(); - } - - auto filter = [=](auto && elem) - { - auto which = WhichDataType(column_type); - if (arrow_type->id() == arrow::Type::DATE32 && which.isDateOrDate32()) - { - return (strcmp(elem.second, "Date") == 0 && which.isDate()) - || (strcmp(elem.second, "Date32") == 0 && which.isDate32()); - } - else - { - return elem.first == arrow_type->id(); - } - }; - if (const auto * internal_type_it = std::find_if(arrow_type_to_internal_type.begin(), arrow_type_to_internal_type.end(), filter); - internal_type_it != arrow_type_to_internal_type.end()) - { - return DataTypeFactory::instance().get(internal_type_it->second); - } - - throw Exception(ErrorCodes::CANNOT_CONVERT_TYPE, - "The type '{}' of an input column '{}' is not supported for conversion from {} data format.", - arrow_type->name(), column_name, format_name); + if (!status.ok()) + throw Exception{ErrorCodes::UNKNOWN_EXCEPTION, "Error with a {} column '{}': {}.", format_name, column_name, status.ToString()}; } -ArrowColumnToCHColumn::ArrowColumnToCHColumn(const Block & header_, std::shared_ptr schema_, const std::string & format_name_) - : header(header_), format_name(format_name_) +static Block arrowSchemaToCHHeader(const arrow::Schema & schema, const std::string & format_name) { - for (const auto & field : schema_->fields()) + ColumnsWithTypeAndName sample_columns; + for (const auto & field : schema.fields()) { - if (header.has(field->name())) - { - const auto column_type = recursiveRemoveLowCardinality(header.getByName(field->name()).type); - name_to_internal_type[field->name()] = getInternalType(field->type(), column_type, field->name(), format_name); - } + /// Create empty arrow column by it's type and convert it to ClickHouse column. + arrow::MemoryPool* pool = arrow::default_memory_pool(); + std::unique_ptr array_builder; + arrow::Status status = MakeBuilder(pool, field->type(), &array_builder); + checkStatus(status, field->name(), format_name); + std::shared_ptr arrow_array; + status = array_builder->Finish(&arrow_array); + checkStatus(status, field->name(), format_name); + arrow::ArrayVector array_vector = {arrow_array}; + auto arrow_column = std::make_shared(array_vector); + std::unordered_map> dict_values; + ColumnWithTypeAndName sample_column = readColumnFromArrowColumn(arrow_column, field->name(), format_name, false, dict_values); + sample_columns.emplace_back(std::move(sample_column)); } + return Block(std::move(sample_columns)); +} + +ArrowColumnToCHColumn::ArrowColumnToCHColumn( + const arrow::Schema & schema, const std::string & format_name_, bool import_nested_) + : header(arrowSchemaToCHHeader(schema, format_name_)), format_name(format_name_), import_nested(import_nested_) +{ +} + +ArrowColumnToCHColumn::ArrowColumnToCHColumn( + const Block & header_, const std::string & format_name_, bool import_nested_) + : header(header_), format_name(format_name_), import_nested(import_nested_) +{ } void ArrowColumnToCHColumn::arrowTableToCHChunk(Chunk & res, std::shared_ptr & table) @@ -645,31 +530,48 @@ void ArrowColumnToCHColumn::arrowTableToCHChunk(Chunk & res, std::shared_ptr>; NameToColumnPtr name_to_column_ptr; - for (const auto & column_name : table->ColumnNames()) + for (const auto& column_name : table->ColumnNames()) { std::shared_ptr arrow_column = table->GetColumnByName(column_name); name_to_column_ptr[column_name] = arrow_column; } + std::unordered_map nested_tables; for (size_t column_i = 0, columns = header.columns(); column_i < columns; ++column_i) { const ColumnWithTypeAndName & header_column = header.getByPosition(column_i); - if (name_to_column_ptr.find(header_column.name) == name_to_column_ptr.end()) + bool read_from_nested = false; + String nested_table_name = Nested::extractTableName(header_column.name); + if (!name_to_column_ptr.contains(header_column.name)) + { + /// Check if it's a column from nested table. + if (import_nested && name_to_column_ptr.contains(nested_table_name)) + { + if (!nested_tables.contains(nested_table_name)) + { + std::shared_ptr arrow_column = name_to_column_ptr[nested_table_name]; + ColumnsWithTypeAndName cols = {readColumnFromArrowColumn(arrow_column, nested_table_name, format_name, false, dictionary_values)}; + Block block(cols); + nested_tables[nested_table_name] = std::make_shared(Nested::flatten(block)); + } + + read_from_nested = nested_tables[nested_table_name]->has(header_column.name); + } + + // TODO: What if some columns were not presented? Insert NULLs? What if a column is not nullable? - throw Exception(ErrorCodes::THERE_IS_NO_COLUMN, - "Column '{}' is not presented in input data.", header_column.name); + if (!read_from_nested) + throw Exception{ErrorCodes::THERE_IS_NO_COLUMN, "Column '{}' is not presented in input data.", header_column.name}; + } std::shared_ptr arrow_column = name_to_column_ptr[header_column.name]; - DataTypePtr & internal_type = name_to_internal_type[header_column.name]; - MutableColumnPtr read_column = internal_type->createColumn(); - readColumnFromArrowColumn(arrow_column, *read_column, header_column.name, format_name, false, dictionary_values); - ColumnWithTypeAndName column; - column.name = header_column.name; - column.type = internal_type; - column.column = std::move(read_column); + if (read_from_nested) + column = nested_tables[nested_table_name]->getByName(header_column.name); + else + column = readColumnFromArrowColumn(arrow_column, header_column.name, format_name, false, dictionary_values); column.column = castColumn(column, header_column.type); column.type = header_column.type; @@ -681,5 +583,5 @@ void ArrowColumnToCHColumn::arrowTableToCHChunk(Chunk & res, std::shared_ptr +#include +#include #include @@ -19,19 +21,23 @@ class Chunk; class ArrowColumnToCHColumn { public: - ArrowColumnToCHColumn(const Block & header_, std::shared_ptr schema_, const std::string & format_name_); + ArrowColumnToCHColumn(const Block & header_, const std::string & format_name_, bool import_nested_); + + /// Constructor that create header by arrow schema. It will be useful for inserting + /// data from file without knowing table structure. + ArrowColumnToCHColumn(const arrow::Schema & schema, const std::string & format_name, bool import_nested_); void arrowTableToCHChunk(Chunk & res, std::shared_ptr & table); private: - const Block & header; - std::unordered_map name_to_internal_type; + const Block header; const std::string format_name; + bool import_nested; /// Map {column name : dictionary column}. /// To avoid converting dictionary from Arrow Dictionary /// to LowCardinality every chunk we save it and reuse. - std::unordered_map dictionary_values; + std::unordered_map> dictionary_values; }; } diff --git a/src/Processors/Formats/Impl/CHColumnToArrowColumn.cpp b/src/Processors/Formats/Impl/CHColumnToArrowColumn.cpp index 42aa9e6ddc7..8734f9c7279 100644 --- a/src/Processors/Formats/Impl/CHColumnToArrowColumn.cpp +++ b/src/Processors/Formats/Impl/CHColumnToArrowColumn.cpp @@ -45,8 +45,8 @@ M(UINT64, arrow::UInt64Type) \ M(INT64, arrow::Int64Type) \ M(FLOAT, arrow::FloatType) \ - M(DOUBLE, arrow::DoubleType) \ - M(STRING, arrow::StringType) + M(DOUBLE, arrow::DoubleType) \ + M(BINARY, arrow::BinaryType) namespace DB { @@ -72,6 +72,7 @@ namespace DB {"Date", arrow::uint16()}, /// uint16 is used instead of date32, because Apache Arrow cannot correctly serialize Date32Array. {"DateTime", arrow::uint32()}, /// uint32 is used instead of date64, because we don't need milliseconds. + {"Date32", arrow::date32()}, {"String", arrow::binary()}, {"FixedString", arrow::binary()}, @@ -295,6 +296,7 @@ namespace DB FOR_ARROW_TYPES(DISPATCH) #undef DISPATCH + throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot fill arrow array with {} data.", column_type->getName()); } template @@ -334,7 +336,6 @@ namespace DB size_t end) { const PaddedPODArray & internal_data = assert_cast &>(*write_column).getData(); - //arrow::Date32Builder date_builder; arrow::UInt16Builder & builder = assert_cast(*array_builder); arrow::Status status; @@ -343,7 +344,6 @@ namespace DB if (null_bytemap && (*null_bytemap)[value_i]) status = builder.AppendNull(); else - /// Implicitly converts UInt16 to Int32 status = builder.Append(internal_data[value_i]); checkStatus(status, write_column->getName(), format_name); } @@ -372,6 +372,28 @@ namespace DB } } + static void fillArrowArrayWithDate32ColumnData( + ColumnPtr write_column, + const PaddedPODArray * null_bytemap, + const String & format_name, + arrow::ArrayBuilder* array_builder, + size_t start, + size_t end) + { + const PaddedPODArray & internal_data = assert_cast &>(*write_column).getData(); + arrow::Date32Builder & builder = assert_cast(*array_builder); + arrow::Status status; + + for (size_t value_i = start; value_i < end; ++value_i) + { + if (null_bytemap && (*null_bytemap)[value_i]) + status = builder.AppendNull(); + else + status = builder.Append(internal_data[value_i]); + checkStatus(status, write_column->getName(), format_name); + } + } + static void fillArrowArray( const String & column_name, ColumnPtr & column, @@ -410,6 +432,10 @@ namespace DB { fillArrowArrayWithDateTimeColumnData(column, null_bytemap, format_name, array_builder, start, end); } + else if (isDate32(column_type)) + { + fillArrowArrayWithDate32ColumnData(column, null_bytemap, format_name, array_builder, start, end); + } else if (isArray(column_type)) { fillArrowArrayWithArrayColumnData(column_name, column, column_type, null_bytemap, array_builder, format_name, start, end, dictionary_values); diff --git a/src/Processors/Formats/Impl/ConstantExpressionTemplate.cpp b/src/Processors/Formats/Impl/ConstantExpressionTemplate.cpp index cc9ae5e65bb..1f780a206dd 100644 --- a/src/Processors/Formats/Impl/ConstantExpressionTemplate.cpp +++ b/src/Processors/Formats/Impl/ConstantExpressionTemplate.cpp @@ -639,7 +639,7 @@ void ConstantExpressionTemplate::TemplateStructure::addNodesToCastResult(const I expr = makeASTFunction("assumeNotNull", std::move(expr)); } - expr = makeASTFunction("CAST", std::move(expr), std::make_shared(result_column_type.getName())); + expr = makeASTFunction("_CAST", std::move(expr), std::make_shared(result_column_type.getName())); if (null_as_default) { diff --git a/src/Processors/Formats/Impl/ORCBlockInputFormat.cpp b/src/Processors/Formats/Impl/ORCBlockInputFormat.cpp index 6ee247413e9..9d56d2c8fa8 100644 --- a/src/Processors/Formats/Impl/ORCBlockInputFormat.cpp +++ b/src/Processors/Formats/Impl/ORCBlockInputFormat.cpp @@ -9,6 +9,7 @@ #include #include "ArrowBufferedStreams.h" #include "ArrowColumnToCHColumn.h" +#include namespace DB { @@ -26,7 +27,8 @@ namespace ErrorCodes throw Exception(_s.ToString(), ErrorCodes::BAD_ARGUMENTS); \ } while (false) -ORCBlockInputFormat::ORCBlockInputFormat(ReadBuffer & in_, Block header_) : IInputFormat(std::move(header_), in_) +ORCBlockInputFormat::ORCBlockInputFormat(ReadBuffer & in_, Block header_, const FormatSettings & format_settings_) + : IInputFormat(std::move(header_), in_), format_settings(format_settings_) { } @@ -98,7 +100,11 @@ void ORCBlockInputFormat::prepareReader() std::shared_ptr schema; THROW_ARROW_NOT_OK(file_reader->ReadSchema(&schema)); - arrow_column_to_ch_column = std::make_unique(getPort().getHeader(), schema, "ORC"); + arrow_column_to_ch_column = std::make_unique(getPort().getHeader(), "ORC", format_settings.orc.import_nested); + + std::unordered_set nested_table_names; + if (format_settings.orc.import_nested) + nested_table_names = Nested::getAllTableNames(getPort().getHeader()); /// In ReadStripe column indices should be started from 1, /// because 0 indicates to select all columns. @@ -108,7 +114,8 @@ void ORCBlockInputFormat::prepareReader() /// LIST type require 2 indices, STRUCT - the number of elements + 1, /// so we should recursively count the number of indices we need for this type. int indexes_count = countIndicesForType(schema->field(i)->type()); - if (getPort().getHeader().has(schema->field(i)->name())) + const auto & name = schema->field(i)->name(); + if (getPort().getHeader().has(name) || nested_table_names.contains(name)) { for (int j = 0; j != indexes_count; ++j) include_indices.push_back(index + j); @@ -124,9 +131,9 @@ void registerInputFormatProcessorORC(FormatFactory &factory) [](ReadBuffer &buf, const Block &sample, const RowInputFormatParams &, - const FormatSettings & /* settings */) + const FormatSettings & settings) { - return std::make_shared(buf, sample); + return std::make_shared(buf, sample, settings); }); factory.markFormatAsColumnOriented("ORC"); } diff --git a/src/Processors/Formats/Impl/ORCBlockInputFormat.h b/src/Processors/Formats/Impl/ORCBlockInputFormat.h index f27685a9884..254d0554cb2 100644 --- a/src/Processors/Formats/Impl/ORCBlockInputFormat.h +++ b/src/Processors/Formats/Impl/ORCBlockInputFormat.h @@ -3,6 +3,7 @@ #if USE_ORC #include +#include namespace arrow::adapters::orc { class ORCFileReader; } @@ -14,7 +15,7 @@ class ArrowColumnToCHColumn; class ORCBlockInputFormat : public IInputFormat { public: - ORCBlockInputFormat(ReadBuffer & in_, Block header_); + ORCBlockInputFormat(ReadBuffer & in_, Block header_, const FormatSettings & format_settings_); String getName() const override { return "ORCBlockInputFormat"; } @@ -38,6 +39,8 @@ private: // indices of columns to read from ORC file std::vector include_indices; + const FormatSettings format_settings; + void prepareReader(); }; diff --git a/src/Processors/Formats/Impl/ParquetBlockInputFormat.cpp b/src/Processors/Formats/Impl/ParquetBlockInputFormat.cpp index 07a0e15cb6b..a0b92f98ca9 100644 --- a/src/Processors/Formats/Impl/ParquetBlockInputFormat.cpp +++ b/src/Processors/Formats/Impl/ParquetBlockInputFormat.cpp @@ -11,6 +11,7 @@ #include #include "ArrowBufferedStreams.h" #include "ArrowColumnToCHColumn.h" +#include #include @@ -30,8 +31,8 @@ namespace ErrorCodes throw Exception(_s.ToString(), ErrorCodes::BAD_ARGUMENTS); \ } while (false) -ParquetBlockInputFormat::ParquetBlockInputFormat(ReadBuffer & in_, Block header_) - : IInputFormat(std::move(header_), in_) +ParquetBlockInputFormat::ParquetBlockInputFormat(ReadBuffer & in_, Block header_, const FormatSettings & format_settings_) + : IInputFormat(std::move(header_), in_), format_settings(format_settings_) { } @@ -98,7 +99,11 @@ void ParquetBlockInputFormat::prepareReader() std::shared_ptr schema; THROW_ARROW_NOT_OK(file_reader->GetSchema(&schema)); - arrow_column_to_ch_column = std::make_unique(getPort().getHeader(), schema, "Parquet"); + arrow_column_to_ch_column = std::make_unique(getPort().getHeader(), "Parquet", format_settings.parquet.import_nested); + + std::unordered_set nested_table_names; + if (format_settings.parquet.import_nested) + nested_table_names = Nested::getAllTableNames(getPort().getHeader()); int index = 0; for (int i = 0; i < schema->num_fields(); ++i) @@ -107,7 +112,8 @@ void ParquetBlockInputFormat::prepareReader() /// nested elements, so we should recursively /// count the number of indices we need for this type. int indexes_count = countIndicesForType(schema->field(i)->type()); - if (getPort().getHeader().has(schema->field(i)->name())) + const auto & name = schema->field(i)->name(); + if (getPort().getHeader().has(name) || nested_table_names.contains(name)) { for (int j = 0; j != indexes_count; ++j) column_indices.push_back(index + j); @@ -123,9 +129,9 @@ void registerInputFormatProcessorParquet(FormatFactory &factory) [](ReadBuffer &buf, const Block &sample, const RowInputFormatParams &, - const FormatSettings & /* settings */) + const FormatSettings & settings) { - return std::make_shared(buf, sample); + return std::make_shared(buf, sample, settings); }); factory.markFormatAsColumnOriented("Parquet"); } diff --git a/src/Processors/Formats/Impl/ParquetBlockInputFormat.h b/src/Processors/Formats/Impl/ParquetBlockInputFormat.h index b68f97c005a..c2ed1552423 100644 --- a/src/Processors/Formats/Impl/ParquetBlockInputFormat.h +++ b/src/Processors/Formats/Impl/ParquetBlockInputFormat.h @@ -4,6 +4,7 @@ #if USE_PARQUET #include +#include namespace parquet::arrow { class FileReader; } @@ -17,7 +18,7 @@ class ArrowColumnToCHColumn; class ParquetBlockInputFormat : public IInputFormat { public: - ParquetBlockInputFormat(ReadBuffer & in_, Block header_); + ParquetBlockInputFormat(ReadBuffer & in_, Block header_, const FormatSettings & format_settings_); void resetParser() override; @@ -36,6 +37,7 @@ private: std::vector column_indices; std::unique_ptr arrow_column_to_ch_column; int row_group_current = 0; + const FormatSettings format_settings; }; } diff --git a/src/Processors/Transforms/SquashingChunksTransform.cpp b/src/Processors/Transforms/SquashingChunksTransform.cpp new file mode 100644 index 00000000000..398ce9eb9fb --- /dev/null +++ b/src/Processors/Transforms/SquashingChunksTransform.cpp @@ -0,0 +1,27 @@ +#include + +namespace DB +{ + +SquashingChunksTransform::SquashingChunksTransform( + const Block & header, size_t min_block_size_rows, size_t min_block_size_bytes, bool reserve_memory) + : IAccumulatingTransform(header, header) + , squashing(min_block_size_rows, min_block_size_bytes, reserve_memory) +{ +} + +void SquashingChunksTransform::consume(Chunk chunk) +{ + if (auto block = squashing.add(getInputPort().getHeader().cloneWithColumns(chunk.detachColumns()))) + { + setReadyChunk(Chunk(block.getColumns(), block.rows())); + } +} + +Chunk SquashingChunksTransform::generate() +{ + auto block = squashing.add({}); + return Chunk(block.getColumns(), block.rows()); +} + +} diff --git a/src/Processors/Transforms/SquashingChunksTransform.h b/src/Processors/Transforms/SquashingChunksTransform.h new file mode 100644 index 00000000000..bcacf5abcda --- /dev/null +++ b/src/Processors/Transforms/SquashingChunksTransform.h @@ -0,0 +1,24 @@ +#pragma once +#include +#include + +namespace DB +{ + +class SquashingChunksTransform : public IAccumulatingTransform +{ +public: + explicit SquashingChunksTransform( + const Block & header, size_t min_block_size_rows, size_t min_block_size_bytes, bool reserve_memory = false); + + String getName() const override { return "SquashingTransform"; } + +protected: + void consume(Chunk chunk) override; + Chunk generate() override; + +private: + SquashingTransform squashing; +}; + +} diff --git a/src/Processors/ya.make b/src/Processors/ya.make index 543a08caca5..db0ae80c742 100644 --- a/src/Processors/ya.make +++ b/src/Processors/ya.make @@ -7,8 +7,14 @@ PEERDIR( clickhouse/src/Common contrib/libs/msgpack contrib/libs/protobuf + contrib/libs/apache/arrow ) +ADDINCL( + contrib/libs/apache/arrow/src +) + +CFLAGS(-DUSE_ARROW=1) SRCS( Chunk.cpp @@ -25,8 +31,13 @@ SRCS( Formats/IOutputFormat.cpp Formats/IRowInputFormat.cpp Formats/IRowOutputFormat.cpp + Formats/Impl/ArrowBlockInputFormat.cpp + Formats/Impl/ArrowBlockOutputFormat.cpp + Formats/Impl/ArrowBufferedStreams.cpp + Formats/Impl/ArrowColumnToCHColumn.cpp Formats/Impl/BinaryRowInputFormat.cpp Formats/Impl/BinaryRowOutputFormat.cpp + Formats/Impl/CHColumnToArrowColumn.cpp Formats/Impl/CSVRowInputFormat.cpp Formats/Impl/CSVRowOutputFormat.cpp Formats/Impl/ConstantExpressionTemplate.cpp @@ -165,6 +176,7 @@ SRCS( Transforms/ReverseTransform.cpp Transforms/RollupTransform.cpp Transforms/SortingTransform.cpp + Transforms/SquashingChunksTransform.cpp Transforms/TotalsHavingTransform.cpp Transforms/WindowTransform.cpp Transforms/getSourceFromFromASTInsertQuery.cpp diff --git a/src/Processors/ya.make.in b/src/Processors/ya.make.in index 06230b96be8..7160e80bcce 100644 --- a/src/Processors/ya.make.in +++ b/src/Processors/ya.make.in @@ -6,11 +6,17 @@ PEERDIR( clickhouse/src/Common contrib/libs/msgpack contrib/libs/protobuf + contrib/libs/apache/arrow ) +ADDINCL( + contrib/libs/apache/arrow/src +) + +CFLAGS(-DUSE_ARROW=1) SRCS( - + ) END() diff --git a/src/Server/MySQLHandler.cpp b/src/Server/MySQLHandler.cpp index 375f248d939..52182257ac9 100644 --- a/src/Server/MySQLHandler.cpp +++ b/src/Server/MySQLHandler.cpp @@ -1,8 +1,6 @@ #include "MySQLHandler.h" #include -#include -#include #include #include #include @@ -10,7 +8,6 @@ #include #include #include -#include #include #include #include @@ -20,7 +17,6 @@ #include #include #include -#include #include #include #include @@ -31,7 +27,6 @@ #endif #if USE_SSL -# include # include # include # include @@ -57,6 +52,7 @@ namespace ErrorCodes extern const int NOT_IMPLEMENTED; extern const int MYSQL_CLIENT_INSUFFICIENT_CAPABILITIES; extern const int SUPPORT_IS_DISABLED; + extern const int UNSUPPORTED_METHOD; } @@ -352,8 +348,10 @@ void MySQLHandler::comQuery(ReadBuffer & payload) format_settings.mysql_wire.max_packet_size = max_packet_size; format_settings.mysql_wire.sequence_id = &sequence_id; - auto set_result_details = [&with_output](const String &, const String &, const String &, const String &) + auto set_result_details = [&with_output](const String &, const String &, const String &format, const String &) { + if (format != "MySQLWire") + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "MySQL protocol does not support custom output formats"); with_output = true; }; diff --git a/src/Storages/IStorage.h b/src/Storages/IStorage.h index 2180f92df98..85bfbfb1f84 100644 --- a/src/Storages/IStorage.h +++ b/src/Storages/IStorage.h @@ -264,7 +264,7 @@ public: * * It is guaranteed that the structure of the table will not change over the lifetime of the returned streams (that is, there will not be ALTER, RENAME and DROP). */ - virtual BlockInputStreams watch( + virtual Pipe watch( const Names & /*column_names*/, const SelectQueryInfo & /*query_info*/, ContextPtr /*context*/, diff --git a/src/Storages/LiveView/LiveViewEventsBlockInputStream.h b/src/Storages/LiveView/LiveViewEventsSource.h similarity index 90% rename from src/Storages/LiveView/LiveViewEventsBlockInputStream.h rename to src/Storages/LiveView/LiveViewEventsSource.h index dc6848ec20c..daf9edfef95 100644 --- a/src/Storages/LiveView/LiveViewEventsBlockInputStream.h +++ b/src/Storages/LiveView/LiveViewEventsSource.h @@ -16,7 +16,7 @@ limitations under the License. */ #include #include #include -#include +#include #include @@ -27,7 +27,7 @@ namespace DB * Keeps stream alive by outputting blocks with no rows * based on period specified by the heartbeat interval. */ -class LiveViewEventsBlockInputStream : public IBlockInputStream +class LiveViewEventsSource : public SourceWithProgress { using NonBlockingResult = std::pair; @@ -35,13 +35,14 @@ using NonBlockingResult = std::pair; public: /// length default -2 because we want LIMIT to specify number of updates so that LIMIT 1 waits for 1 update /// and LIMIT 0 just returns data without waiting for any updates - LiveViewEventsBlockInputStream(std::shared_ptr storage_, + LiveViewEventsSource(std::shared_ptr storage_, std::shared_ptr blocks_ptr_, std::shared_ptr blocks_metadata_ptr_, std::shared_ptr active_ptr_, const bool has_limit_, const UInt64 limit_, const UInt64 heartbeat_interval_sec_) - : storage(std::move(storage_)), blocks_ptr(std::move(blocks_ptr_)), + : SourceWithProgress({ColumnWithTypeAndName(ColumnUInt64::create(), std::make_shared(), "version")}), + storage(std::move(storage_)), blocks_ptr(std::move(blocks_ptr_)), blocks_metadata_ptr(std::move(blocks_metadata_ptr_)), active_ptr(std::move(active_ptr_)), has_limit(has_limit_), limit(limit_), @@ -51,22 +52,17 @@ public: active = active_ptr.lock(); } - String getName() const override { return "LiveViewEventsBlockInputStream"; } + String getName() const override { return "LiveViewEventsSource"; } - void cancel(bool kill) override + void onCancel() override { if (isCancelled() || storage->shutdown_called) return; - IBlockInputStream::cancel(kill); + std::lock_guard lock(storage->mutex); storage->condition.notify_all(); } - Block getHeader() const override - { - return {ColumnWithTypeAndName(ColumnUInt64::create(), std::make_shared(), "version")}; - } - void refresh() { if (active && blocks && it == end) @@ -109,10 +105,11 @@ public: return res; } protected: - Block readImpl() override + Chunk generate() override { /// try reading - return tryReadImpl(true).first; + auto block = tryReadImpl(true).first; + return Chunk(block.getColumns(), block.rows()); } /** tryRead method attempts to read a block in either blocking @@ -170,7 +167,7 @@ protected: if (!end_of_blocks) { end_of_blocks = true; - return { getHeader(), true }; + return { getPort().getHeader(), true }; } while (true) { @@ -192,7 +189,7 @@ protected: { // repeat the event block as a heartbeat last_event_timestamp_usec = static_cast(timestamp.epochMicroseconds()); - return { getHeader(), true }; + return { getPort().getHeader(), true }; } } } diff --git a/src/Storages/LiveView/LiveViewBlockOutputStream.h b/src/Storages/LiveView/LiveViewSink.h similarity index 74% rename from src/Storages/LiveView/LiveViewBlockOutputStream.h rename to src/Storages/LiveView/LiveViewSink.h index 6b8a5a2cb9e..433a5554152 100644 --- a/src/Storages/LiveView/LiveViewBlockOutputStream.h +++ b/src/Storages/LiveView/LiveViewSink.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include @@ -9,19 +9,28 @@ namespace DB { -class LiveViewBlockOutputStream : public IBlockOutputStream +class LiveViewSink : public SinkToStorage { -public: - explicit LiveViewBlockOutputStream(StorageLiveView & storage_) : storage(storage_) {} + /// _version column is added manually in sink. + static Block updateHeader(Block block) + { + block.erase("_version"); + return block; + } - void writePrefix() override +public: + explicit LiveViewSink(StorageLiveView & storage_) : SinkToStorage(updateHeader(storage_.getHeader())), storage(storage_) {} + + String getName() const override { return "LiveViewSink"; } + + void onStart() override { new_blocks = std::make_shared(); new_blocks_metadata = std::make_shared(); new_hash = std::make_shared(); } - void writeSuffix() override + void onFinish() override { UInt128 key; String key_str; @@ -65,14 +74,13 @@ public: new_hash.reset(); } - void write(const Block & block) override + void consume(Chunk chunk) override { - new_blocks->push_back(block); + auto block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); block.updateHash(*new_hash); + new_blocks->push_back(std::move(block)); } - Block getHeader() const override { return storage.getHeader(); } - private: using SipHashPtr = std::shared_ptr; diff --git a/src/Storages/LiveView/LiveViewBlockInputStream.h b/src/Storages/LiveView/LiveViewSource.h similarity index 89% rename from src/Storages/LiveView/LiveViewBlockInputStream.h rename to src/Storages/LiveView/LiveViewSource.h index 737e76754c5..af07d8558ad 100644 --- a/src/Storages/LiveView/LiveViewBlockInputStream.h +++ b/src/Storages/LiveView/LiveViewSource.h @@ -1,6 +1,7 @@ #pragma once -#include +#include +#include namespace DB @@ -10,19 +11,20 @@ namespace DB * Keeps stream alive by outputting blocks with no rows * based on period specified by the heartbeat interval. */ -class LiveViewBlockInputStream : public IBlockInputStream +class LiveViewSource : public SourceWithProgress { using NonBlockingResult = std::pair; public: - LiveViewBlockInputStream(std::shared_ptr storage_, + LiveViewSource(std::shared_ptr storage_, std::shared_ptr blocks_ptr_, std::shared_ptr blocks_metadata_ptr_, std::shared_ptr active_ptr_, const bool has_limit_, const UInt64 limit_, const UInt64 heartbeat_interval_sec_) - : storage(std::move(storage_)), blocks_ptr(std::move(blocks_ptr_)), + : SourceWithProgress(storage_->getHeader()) + , storage(std::move(storage_)), blocks_ptr(std::move(blocks_ptr_)), blocks_metadata_ptr(std::move(blocks_metadata_ptr_)), active_ptr(std::move(active_ptr_)), has_limit(has_limit_), limit(limit_), @@ -34,17 +36,15 @@ public: String getName() const override { return "LiveViewBlockInputStream"; } - void cancel(bool kill) override + void onCancel() override { if (isCancelled() || storage->shutdown_called) return; - IBlockInputStream::cancel(kill); + std::lock_guard lock(storage->mutex); storage->condition.notify_all(); } - Block getHeader() const override { return storage->getHeader(); } - void refresh() { if (active && blocks && it == end) @@ -74,10 +74,11 @@ public: } protected: - Block readImpl() override + Chunk generate() override { /// try reading - return tryReadImpl(true).first; + auto block = tryReadImpl(true).first; + return Chunk(block.getColumns(), block.rows()); } /** tryRead method attempts to read a block in either blocking @@ -135,7 +136,7 @@ protected: if (!end_of_blocks) { end_of_blocks = true; - return { getHeader(), true }; + return { getPort().getHeader(), true }; } while (true) { @@ -157,7 +158,7 @@ protected: { // heartbeat last_event_timestamp_usec = static_cast(Poco::Timestamp().epochMicroseconds()); - return { getHeader(), true }; + return { getPort().getHeader(), true }; } } } diff --git a/src/Storages/LiveView/StorageLiveView.cpp b/src/Storages/LiveView/StorageLiveView.cpp index 5f5ce8a4a37..69390850ccc 100644 --- a/src/Storages/LiveView/StorageLiveView.cpp +++ b/src/Storages/LiveView/StorageLiveView.cpp @@ -15,10 +15,11 @@ limitations under the License. */ #include #include #include -#include #include -#include -#include +#include +#include +#include +#include #include #include #include @@ -26,9 +27,9 @@ limitations under the License. */ #include #include -#include -#include -#include +#include +#include +#include #include #include @@ -110,15 +111,23 @@ MergeableBlocksPtr StorageLiveView::collectMergeableBlocks(ContextPtr local_cont InterpreterSelectQuery interpreter(mergeable_query->clone(), local_context, SelectQueryOptions(QueryProcessingStage::WithMergeableState), Names()); - auto view_mergeable_stream = std::make_shared(interpreter.execute().getInputStream()); + auto io = interpreter.execute(); + io.pipeline.addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared(cur_header); + }); - while (Block this_block = view_mergeable_stream->read()) + new_mergeable_blocks->sample_block = io.pipeline.getHeader(); + + PullingPipelineExecutor executor(io.pipeline); + Block this_block; + + while (executor.pull(this_block)) base_blocks->push_back(this_block); new_blocks->push_back(base_blocks); new_mergeable_blocks->blocks = new_blocks; - new_mergeable_blocks->sample_block = view_mergeable_stream->getHeader(); return new_mergeable_blocks; } @@ -133,7 +142,7 @@ Pipes StorageLiveView::blocksToPipes(BlocksPtrs blocks, Block & sample_block) } /// Complete query using input streams from mergeable blocks -BlockInputStreamPtr StorageLiveView::completeQuery(Pipes pipes) +QueryPipeline StorageLiveView::completeQuery(Pipes pipes) { //FIXME it's dangerous to create Context on stack auto block_context = Context::createCopy(getContext()); @@ -147,18 +156,25 @@ BlockInputStreamPtr StorageLiveView::completeQuery(Pipes pipes) std::move(pipes), QueryProcessingStage::WithMergeableState); }; block_context->addExternalTable(getBlocksTableName(), TemporaryTableHolder(getContext(), creator)); - InterpreterSelectQuery select(getInnerBlocksQuery(), block_context, StoragePtr(), nullptr, SelectQueryOptions(QueryProcessingStage::Complete)); - BlockInputStreamPtr data = std::make_shared(select.execute().getInputStream()); + auto io = select.execute(); + io.pipeline.addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared(cur_header); + }); /// Squashing is needed here because the view query can generate a lot of blocks /// even when only one block is inserted into the parent table (e.g. if the query is a GROUP BY /// and two-level aggregation is triggered). - data = std::make_shared( - data, getContext()->getSettingsRef().min_insert_block_size_rows, - getContext()->getSettingsRef().min_insert_block_size_bytes); + io.pipeline.addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared( + cur_header, + getContext()->getSettingsRef().min_insert_block_size_rows, + getContext()->getSettingsRef().min_insert_block_size_bytes); + }); - return data; + return std::move(io.pipeline); } void StorageLiveView::writeIntoLiveView( @@ -166,7 +182,7 @@ void StorageLiveView::writeIntoLiveView( const Block & block, ContextPtr local_context) { - BlockOutputStreamPtr output = std::make_shared(live_view); + auto output = std::make_shared(live_view); /// Check if live view has any readers if not /// just reset blocks to empty and do nothing else @@ -220,10 +236,16 @@ void StorageLiveView::writeIntoLiveView( InterpreterSelectQuery select_block(mergeable_query, local_context, blocks_storage.getTable(), blocks_storage.getTable()->getInMemoryMetadataPtr(), QueryProcessingStage::WithMergeableState); - auto data_mergeable_stream = std::make_shared( - select_block.execute().getInputStream()); + auto io = select_block.execute(); + io.pipeline.addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared(cur_header); + }); - while (Block this_block = data_mergeable_stream->read()) + PullingPipelineExecutor executor(io.pipeline); + Block this_block; + + while (executor.pull(this_block)) new_mergeable_blocks->push_back(this_block); if (new_mergeable_blocks->empty()) @@ -238,8 +260,15 @@ void StorageLiveView::writeIntoLiveView( } } - BlockInputStreamPtr data = live_view.completeQuery(std::move(from)); - copyData(*data, *output); + auto pipeline = live_view.completeQuery(std::move(from)); + pipeline.resize(1); + pipeline.setSinks([&](const Block &, Pipe::StreamType) + { + return std::move(output); + }); + + auto executor = pipeline.execute(); + executor->execute(pipeline.getNumThreads()); } @@ -351,9 +380,11 @@ bool StorageLiveView::getNewBlocks() /// inserted data to be duplicated auto new_mergeable_blocks = collectMergeableBlocks(live_view_context); Pipes from = blocksToPipes(new_mergeable_blocks->blocks, new_mergeable_blocks->sample_block); - BlockInputStreamPtr data = completeQuery(std::move(from)); + auto pipeline = completeQuery(std::move(from)); - while (Block block = data->read()) + PullingPipelineExecutor executor(pipeline); + Block block; + while (executor.pull(block)) { /// calculate hash before virtual column is added block.updateHash(hash); @@ -521,7 +552,7 @@ Pipe StorageLiveView::read( return Pipe(std::make_shared(blocks_ptr, getHeader())); } -BlockInputStreams StorageLiveView::watch( +Pipe StorageLiveView::watch( const Names & /*column_names*/, const SelectQueryInfo & query_info, ContextPtr local_context, @@ -533,7 +564,7 @@ BlockInputStreams StorageLiveView::watch( bool has_limit = false; UInt64 limit = 0; - BlockInputStreamPtr reader; + Pipe reader; if (query.limit_length) { @@ -542,15 +573,15 @@ BlockInputStreams StorageLiveView::watch( } if (query.is_watch_events) - reader = std::make_shared( + reader = Pipe(std::make_shared( std::static_pointer_cast(shared_from_this()), blocks_ptr, blocks_metadata_ptr, active_ptr, has_limit, limit, - local_context->getSettingsRef().live_view_heartbeat_interval.totalSeconds()); + local_context->getSettingsRef().live_view_heartbeat_interval.totalSeconds())); else - reader = std::make_shared( + reader = Pipe(std::make_shared( std::static_pointer_cast(shared_from_this()), blocks_ptr, blocks_metadata_ptr, active_ptr, has_limit, limit, - local_context->getSettingsRef().live_view_heartbeat_interval.totalSeconds()); + local_context->getSettingsRef().live_view_heartbeat_interval.totalSeconds())); { std::lock_guard lock(mutex); @@ -563,7 +594,7 @@ BlockInputStreams StorageLiveView::watch( } processed_stage = QueryProcessingStage::Complete; - return { reader }; + return reader; } NamesAndTypesList StorageLiveView::getVirtuals() const diff --git a/src/Storages/LiveView/StorageLiveView.h b/src/Storages/LiveView/StorageLiveView.h index 23a9c84cb9e..15afc642989 100644 --- a/src/Storages/LiveView/StorageLiveView.h +++ b/src/Storages/LiveView/StorageLiveView.h @@ -52,9 +52,9 @@ using Pipes = std::vector; class StorageLiveView final : public shared_ptr_helper, public IStorage, WithContext { friend struct shared_ptr_helper; -friend class LiveViewBlockInputStream; -friend class LiveViewEventsBlockInputStream; -friend class LiveViewBlockOutputStream; +friend class LiveViewSource; +friend class LiveViewEventsSource; +friend class LiveViewSink; public: ~StorageLiveView() override; @@ -153,7 +153,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockInputStreams watch( + Pipe watch( const Names & column_names, const SelectQueryInfo & query_info, ContextPtr context, @@ -167,7 +167,7 @@ public: /// Collect mergeable blocks and their sample. Must be called holding mutex MergeableBlocksPtr collectMergeableBlocks(ContextPtr context); /// Complete query using input streams from mergeable blocks - BlockInputStreamPtr completeQuery(Pipes pipes); + QueryPipeline completeQuery(Pipes pipes); void setMergeableBlocks(MergeableBlocksPtr blocks) { mergeable_blocks = blocks; } std::shared_ptr getActivePtr() { return active_ptr; } diff --git a/src/Storages/MergeTree/BackgroundJobsExecutor.cpp b/src/Storages/MergeTree/BackgroundJobsExecutor.cpp index 36803ba5197..f3d957117e8 100644 --- a/src/Storages/MergeTree/BackgroundJobsExecutor.cpp +++ b/src/Storages/MergeTree/BackgroundJobsExecutor.cpp @@ -146,6 +146,9 @@ try catch (...) /// Exception while we looking for a task, reschedule { tryLogCurrentException(__PRETTY_FUNCTION__); + + /// Why do we scheduleTask again? + /// To retry on exception, since it may be some temporary exception. scheduleTask(/* with_backoff = */ true); } @@ -180,10 +183,16 @@ void IBackgroundJobExecutor::triggerTask() } void IBackgroundJobExecutor::backgroundTaskFunction() +try { if (!scheduleJob()) scheduleTask(/* with_backoff = */ true); } +catch (...) /// Catch any exception to avoid thread termination. +{ + tryLogCurrentException(__PRETTY_FUNCTION__); + scheduleTask(/* with_backoff = */ true); +} IBackgroundJobExecutor::~IBackgroundJobExecutor() { diff --git a/src/Storages/MergeTree/IMergeTreeReader.cpp b/src/Storages/MergeTree/IMergeTreeReader.cpp index 4efd3d669eb..5378b84a5d0 100644 --- a/src/Storages/MergeTree/IMergeTreeReader.cpp +++ b/src/Storages/MergeTree/IMergeTreeReader.cpp @@ -50,7 +50,7 @@ IMergeTreeReader::IMergeTreeReader( columns_from_part.set_empty_key(StringRef()); for (const auto & column_from_part : part_columns) - columns_from_part.emplace(column_from_part.name, &column_from_part.type); + columns_from_part[column_from_part.name] = &column_from_part.type; } IMergeTreeReader::~IMergeTreeReader() = default; diff --git a/src/Storages/MergeTree/IMergeTreeReader.h b/src/Storages/MergeTree/IMergeTreeReader.h index ab412e48822..8d80719efaf 100644 --- a/src/Storages/MergeTree/IMergeTreeReader.h +++ b/src/Storages/MergeTree/IMergeTreeReader.h @@ -1,9 +1,9 @@ #pragma once #include +#include #include #include -#include namespace DB { @@ -95,11 +95,7 @@ private: /// Actual data type of columns in part -#if !defined(ARCADIA_BUILD) - google::dense_hash_map columns_from_part; -#else - google::sparsehash::dense_hash_map columns_from_part; -#endif + DenseHashMap columns_from_part; }; } diff --git a/src/Storages/MergeTree/KeyCondition.cpp b/src/Storages/MergeTree/KeyCondition.cpp index 235cadfba11..18ca00ebf0d 100644 --- a/src/Storages/MergeTree/KeyCondition.cpp +++ b/src/Storages/MergeTree/KeyCondition.cpp @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -1367,7 +1368,7 @@ bool KeyCondition::tryParseAtomFromAST(const ASTPtr & node, ContextPtr context, { ColumnsWithTypeAndName arguments{ {nullptr, key_expr_type, ""}, {DataTypeString().createColumnConst(1, common_type->getName()), common_type, ""}}; - FunctionOverloadResolverPtr func_builder_cast = CastOverloadResolver::createImpl(false); + FunctionOverloadResolverPtr func_builder_cast = CastInternalOverloadResolver::createImpl(); auto func_cast = func_builder_cast->build(arguments); /// If we know the given range only contains one value, then we treat all functions as positive monotonic. diff --git a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp index 6279d2d7d6f..b6d55828e85 100644 --- a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp +++ b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp @@ -1332,11 +1332,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor { /// We will modify only some of the columns. Other columns and key values can be copied as-is. NameSet updated_columns; - if (mutation_kind != MutationsInterpreter::MutationKind::MUTATE_INDEX_PROJECTION) - { - for (const auto & name_type : updated_header.getNamesAndTypesList()) - updated_columns.emplace(name_type.name); - } + for (const auto & name_type : updated_header.getNamesAndTypesList()) + updated_columns.emplace(name_type.name); auto indices_to_recalc = getIndicesToRecalculate( in, updated_columns, metadata_snapshot, context, materialized_indices, source_part); @@ -1345,7 +1342,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor NameSet files_to_skip = collectFilesToSkip( source_part, - mutation_kind == MutationsInterpreter::MutationKind::MUTATE_INDEX_PROJECTION ? Block{} : updated_header, + updated_header, indices_to_recalc, mrk_extension, projections_to_recalc); @@ -1413,8 +1410,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor metadata_snapshot, indices_to_recalc, projections_to_recalc, - // If it's an index/projection materialization, we don't write any data columns, thus empty header is used - mutation_kind == MutationsInterpreter::MutationKind::MUTATE_INDEX_PROJECTION ? Block{} : updated_header, + updated_header, new_data_part, in, time_of_mutation, @@ -1663,7 +1659,12 @@ NameToNameVector MergeTreeDataMergerMutator::collectFilesForRenames( { if (command.type == MutationCommand::Type::DROP_INDEX) { - if (source_part->checksums.has(INDEX_FILE_PREFIX + command.column_name + ".idx")) + if (source_part->checksums.has(INDEX_FILE_PREFIX + command.column_name + ".idx2")) + { + rename_vector.emplace_back(INDEX_FILE_PREFIX + command.column_name + ".idx2", ""); + rename_vector.emplace_back(INDEX_FILE_PREFIX + command.column_name + mrk_extension, ""); + } + else if (source_part->checksums.has(INDEX_FILE_PREFIX + command.column_name + ".idx")) { rename_vector.emplace_back(INDEX_FILE_PREFIX + command.column_name + ".idx", ""); rename_vector.emplace_back(INDEX_FILE_PREFIX + command.column_name + mrk_extension, ""); @@ -1749,6 +1750,7 @@ NameSet MergeTreeDataMergerMutator::collectFilesToSkip( for (const auto & index : indices_to_recalc) { files_to_skip.insert(index->getFileName() + ".idx"); + files_to_skip.insert(index->getFileName() + ".idx2"); files_to_skip.insert(index->getFileName() + mrk_extension); } for (const auto & projection : projections_to_recalc) @@ -1893,8 +1895,11 @@ std::set MergeTreeDataMergerMutator::getIndicesToRecalculate( { const auto & index = indices[i]; + bool has_index = + source_part->checksums.has(INDEX_FILE_PREFIX + index.name + ".idx") || + source_part->checksums.has(INDEX_FILE_PREFIX + index.name + ".idx2"); // If we ask to materialize and it already exists - if (!source_part->checksums.has(INDEX_FILE_PREFIX + index.name + ".idx") && materialized_indices.count(index.name)) + if (!has_index && materialized_indices.count(index.name)) { if (indices_to_recalc.insert(index_factory.get(index)).second) { diff --git a/src/Storages/MergeTree/MergeTreeDataPartWriterOnDisk.cpp b/src/Storages/MergeTree/MergeTreeDataPartWriterOnDisk.cpp index 9902add9847..4263640c1e0 100644 --- a/src/Storages/MergeTree/MergeTreeDataPartWriterOnDisk.cpp +++ b/src/Storages/MergeTree/MergeTreeDataPartWriterOnDisk.cpp @@ -9,11 +9,6 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; } -namespace -{ - constexpr auto INDEX_FILE_EXTENSION = ".idx"; -} - void MergeTreeDataPartWriterOnDisk::Stream::finalize() { compressed.next(); @@ -165,7 +160,7 @@ void MergeTreeDataPartWriterOnDisk::initSkipIndices() std::make_unique( stream_name, data_part->volume->getDisk(), - part_path + stream_name, INDEX_FILE_EXTENSION, + part_path + stream_name, index_helper->getSerializedFileExtension(), part_path + stream_name, marks_file_extension, default_codec, settings.max_compress_block_size)); skip_indices_aggregators.push_back(index_helper->createIndexAggregator()); diff --git a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp index 0b5351dcf01..c7eb8200957 100644 --- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp +++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp @@ -361,10 +361,11 @@ QueryPlanPtr MergeTreeDataSelectExecutor::read( pipes.emplace_back(std::move(projection_pipe)); pipes.emplace_back(std::move(ordinary_pipe)); auto pipe = Pipe::unitePipes(std::move(pipes)); - // TODO what if pipe is empty? pipe.resize(1); - auto step = std::make_unique(std::move(pipe), "MergeTree(with projection)"); + auto step = std::make_unique( + std::move(pipe), + fmt::format("MergeTree(with {} projection {})", query_info.projection->desc->type, query_info.projection->desc->name)); auto plan = std::make_unique(); plan->addStep(std::move(step)); return plan; @@ -1457,9 +1458,10 @@ MarkRanges MergeTreeDataSelectExecutor::filterMarksUsingIndex( size_t & granules_dropped, Poco::Logger * log) { - if (!part->volume->getDisk()->exists(part->getFullRelativePath() + index_helper->getFileName() + ".idx")) + const std::string & path_prefix = part->getFullRelativePath() + index_helper->getFileName(); + if (!index_helper->getDeserializedFormat(part->volume->getDisk(), path_prefix)) { - LOG_DEBUG(log, "File for index {} does not exist. Skipping it.", backQuote(index_helper->index.name)); + LOG_DEBUG(log, "File for index {} does not exist ({}.*). Skipping it.", backQuote(index_helper->index.name), path_prefix); return ranges; } diff --git a/src/Storages/MergeTree/MergeTreeIndexFullText.cpp b/src/Storages/MergeTree/MergeTreeIndexFullText.cpp index 10136cd1069..1c71d77b334 100644 --- a/src/Storages/MergeTree/MergeTreeIndexFullText.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexFullText.cpp @@ -101,14 +101,17 @@ MergeTreeIndexGranuleFullText::MergeTreeIndexGranuleFullText( void MergeTreeIndexGranuleFullText::serializeBinary(WriteBuffer & ostr) const { if (empty()) - throw Exception("Attempt to write empty fulltext index " + backQuote(index_name), ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Attempt to write empty fulltext index {}.", backQuote(index_name)); for (const auto & bloom_filter : bloom_filters) ostr.write(reinterpret_cast(bloom_filter.getFilter().data()), params.filter_size); } -void MergeTreeIndexGranuleFullText::deserializeBinary(ReadBuffer & istr) +void MergeTreeIndexGranuleFullText::deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) { + if (version != 1) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unknown index version {}.", version); + for (auto & bloom_filter : bloom_filters) { istr.read(reinterpret_cast( diff --git a/src/Storages/MergeTree/MergeTreeIndexFullText.h b/src/Storages/MergeTree/MergeTreeIndexFullText.h index 1385621f97f..d34cbc61da2 100644 --- a/src/Storages/MergeTree/MergeTreeIndexFullText.h +++ b/src/Storages/MergeTree/MergeTreeIndexFullText.h @@ -45,7 +45,7 @@ struct MergeTreeIndexGranuleFullText final : public IMergeTreeIndexGranule ~MergeTreeIndexGranuleFullText() override = default; void serializeBinary(WriteBuffer & ostr) const override; - void deserializeBinary(ReadBuffer & istr) override; + void deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) override; bool empty() const override { return !has_elems; } diff --git a/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.cpp b/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.cpp index b513437fbe1..6a027b8cb8e 100644 --- a/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.cpp @@ -84,10 +84,12 @@ bool MergeTreeIndexGranuleBloomFilter::empty() const return !total_rows; } -void MergeTreeIndexGranuleBloomFilter::deserializeBinary(ReadBuffer & istr) +void MergeTreeIndexGranuleBloomFilter::deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) { if (!empty()) - throw Exception("Cannot read data to a non-empty bloom filter index.", ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot read data to a non-empty bloom filter index."); + if (version != 1) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unknown index version {}.", version); readVarUInt(total_rows, istr); for (auto & filter : bloom_filters) @@ -102,7 +104,7 @@ void MergeTreeIndexGranuleBloomFilter::deserializeBinary(ReadBuffer & istr) void MergeTreeIndexGranuleBloomFilter::serializeBinary(WriteBuffer & ostr) const { if (empty()) - throw Exception("Attempt to write empty bloom filter index.", ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Attempt to write empty bloom filter index."); static size_t atom_size = 8; writeVarUInt(total_rows, ostr); diff --git a/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h b/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h index cdd4b92f80c..82bd91138a7 100644 --- a/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h +++ b/src/Storages/MergeTree/MergeTreeIndexGranuleBloomFilter.h @@ -16,8 +16,7 @@ public: bool empty() const override; void serializeBinary(WriteBuffer & ostr) const override; - - void deserializeBinary(ReadBuffer & istr) override; + void deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) override; const std::vector & getFilters() const { return bloom_filters; } diff --git a/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp b/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp index ebf553295be..3a83afbd280 100644 --- a/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp @@ -40,28 +40,12 @@ void MergeTreeIndexGranuleMinMax::serializeBinary(WriteBuffer & ostr) const const DataTypePtr & type = index_sample_block.getByPosition(i).type; auto serialization = type->getDefaultSerialization(); - if (!type->isNullable()) - { - serialization->serializeBinary(hyperrectangle[i].left, ostr); - serialization->serializeBinary(hyperrectangle[i].right, ostr); - } - else - { - /// NOTE: that this serialization differs from - /// IMergeTreeDataPart::MinMaxIndex::store() due to preserve - /// backward compatibility. - bool is_null = hyperrectangle[i].left.isNull() || hyperrectangle[i].right.isNull(); // one is enough - writeBinary(is_null, ostr); - if (!is_null) - { - serialization->serializeBinary(hyperrectangle[i].left, ostr); - serialization->serializeBinary(hyperrectangle[i].right, ostr); - } - } + serialization->serializeBinary(hyperrectangle[i].left, ostr); + serialization->serializeBinary(hyperrectangle[i].right, ostr); } } -void MergeTreeIndexGranuleMinMax::deserializeBinary(ReadBuffer & istr) +void MergeTreeIndexGranuleMinMax::deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) { hyperrectangle.clear(); Field min_val; @@ -72,29 +56,53 @@ void MergeTreeIndexGranuleMinMax::deserializeBinary(ReadBuffer & istr) const DataTypePtr & type = index_sample_block.getByPosition(i).type; auto serialization = type->getDefaultSerialization(); - if (!type->isNullable()) + switch (version) { - serialization->deserializeBinary(min_val, istr); - serialization->deserializeBinary(max_val, istr); - } - else - { - /// NOTE: that this serialization differs from - /// IMergeTreeDataPart::MinMaxIndex::load() due to preserve - /// backward compatibility. - bool is_null; - readBinary(is_null, istr); - if (!is_null) - { + case 1: + if (!type->isNullable()) + { + serialization->deserializeBinary(min_val, istr); + serialization->deserializeBinary(max_val, istr); + } + else + { + /// NOTE: that this serialization differs from + /// IMergeTreeDataPart::MinMaxIndex::load() to preserve + /// backward compatibility. + /// + /// But this is deprecated format, so this is OK. + + bool is_null; + readBinary(is_null, istr); + if (!is_null) + { + serialization->deserializeBinary(min_val, istr); + serialization->deserializeBinary(max_val, istr); + } + else + { + min_val = Null(); + max_val = Null(); + } + } + break; + + /// New format with proper Nullable support for values that includes Null values + case 2: serialization->deserializeBinary(min_val, istr); serialization->deserializeBinary(max_val, istr); - } - else - { - min_val = Null(); - max_val = Null(); - } + + // NULL_LAST + if (min_val.isNull()) + min_val = PositiveInfinity(); + if (max_val.isNull()) + max_val = PositiveInfinity(); + + break; + default: + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unknown index version {}.", version); } + hyperrectangle.emplace_back(min_val, true, max_val, true); } } @@ -203,6 +211,15 @@ bool MergeTreeIndexMinMax::mayBenefitFromIndexForIn(const ASTPtr & node) const return false; } +MergeTreeIndexFormat MergeTreeIndexMinMax::getDeserializedFormat(const DiskPtr disk, const std::string & relative_path_prefix) const +{ + if (disk->exists(relative_path_prefix + ".idx2")) + return {2, ".idx2"}; + else if (disk->exists(relative_path_prefix + ".idx")) + return {1, ".idx"}; + return {0 /* unknown */, ""}; +} + MergeTreeIndexPtr minmaxIndexCreator( const IndexDescription & index) { diff --git a/src/Storages/MergeTree/MergeTreeIndexMinMax.h b/src/Storages/MergeTree/MergeTreeIndexMinMax.h index 97b9b874484..0e05e25fb36 100644 --- a/src/Storages/MergeTree/MergeTreeIndexMinMax.h +++ b/src/Storages/MergeTree/MergeTreeIndexMinMax.h @@ -21,7 +21,7 @@ struct MergeTreeIndexGranuleMinMax final : public IMergeTreeIndexGranule ~MergeTreeIndexGranuleMinMax() override = default; void serializeBinary(WriteBuffer & ostr) const override; - void deserializeBinary(ReadBuffer & istr) override; + void deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) override; bool empty() const override { return hyperrectangle.empty(); } @@ -81,6 +81,9 @@ public: const SelectQueryInfo & query, ContextPtr context) const override; bool mayBenefitFromIndexForIn(const ASTPtr & node) const override; + + const char* getSerializedFileExtension() const override { return ".idx2"; } + MergeTreeIndexFormat getDeserializedFormat(const DiskPtr disk, const std::string & path_prefix) const override; }; } diff --git a/src/Storages/MergeTree/MergeTreeIndexReader.cpp b/src/Storages/MergeTree/MergeTreeIndexReader.cpp index eaba247009b..0a0f2511914 100644 --- a/src/Storages/MergeTree/MergeTreeIndexReader.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexReader.cpp @@ -1,5 +1,29 @@ #include +namespace +{ + +using namespace DB; + +std::unique_ptr makeIndexReader( + const std::string & extension, + MergeTreeIndexPtr index, + MergeTreeData::DataPartPtr part, + size_t marks_count, + const MarkRanges & all_mark_ranges, + MergeTreeReaderSettings settings) +{ + return std::make_unique( + part->volume->getDisk(), + part->getFullRelativePath() + index->getFileName(), extension, marks_count, + all_mark_ranges, + std::move(settings), nullptr, nullptr, + part->getFileSizeOrZero(index->getFileName() + extension), + &part->index_granularity_info, + ReadBufferFromFileBase::ProfileCallback{}, CLOCK_MONOTONIC_COARSE); +} + +} namespace DB { @@ -7,27 +31,28 @@ namespace DB MergeTreeIndexReader::MergeTreeIndexReader( MergeTreeIndexPtr index_, MergeTreeData::DataPartPtr part_, size_t marks_count_, const MarkRanges & all_mark_ranges_, MergeTreeReaderSettings settings) - : index(index_), stream( - part_->volume->getDisk(), - part_->getFullRelativePath() + index->getFileName(), ".idx", marks_count_, - all_mark_ranges_, - std::move(settings), nullptr, nullptr, - part_->getFileSizeOrZero(index->getFileName() + ".idx"), - &part_->index_granularity_info, - ReadBufferFromFileBase::ProfileCallback{}, CLOCK_MONOTONIC_COARSE) + : index(index_) { - stream.seekToStart(); + const std::string & path_prefix = part_->getFullRelativePath() + index->getFileName(); + auto index_format = index->getDeserializedFormat(part_->volume->getDisk(), path_prefix); + + stream = makeIndexReader(index_format.extension, index_, part_, marks_count_, all_mark_ranges_, std::move(settings)); + version = index_format.version; + + stream->seekToStart(); } +MergeTreeIndexReader::~MergeTreeIndexReader() = default; + void MergeTreeIndexReader::seek(size_t mark) { - stream.seekToMark(mark); + stream->seekToMark(mark); } MergeTreeIndexGranulePtr MergeTreeIndexReader::read() { auto granule = index->createIndexGranule(); - granule->deserializeBinary(*stream.data_buffer); + granule->deserializeBinary(*stream->data_buffer, version); return granule; } diff --git a/src/Storages/MergeTree/MergeTreeIndexReader.h b/src/Storages/MergeTree/MergeTreeIndexReader.h index 68d681458be..4facd43c175 100644 --- a/src/Storages/MergeTree/MergeTreeIndexReader.h +++ b/src/Storages/MergeTree/MergeTreeIndexReader.h @@ -1,5 +1,6 @@ #pragma once +#include #include #include #include @@ -16,6 +17,7 @@ public: size_t marks_count_, const MarkRanges & all_mark_ranges_, MergeTreeReaderSettings settings); + ~MergeTreeIndexReader(); void seek(size_t mark); @@ -23,7 +25,8 @@ public: private: MergeTreeIndexPtr index; - MergeTreeReaderStream stream; + std::unique_ptr stream; + uint8_t version = 0; }; } diff --git a/src/Storages/MergeTree/MergeTreeIndexSet.cpp b/src/Storages/MergeTree/MergeTreeIndexSet.cpp index 6cee80983d6..024b87c9a3e 100644 --- a/src/Storages/MergeTree/MergeTreeIndexSet.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexSet.cpp @@ -48,8 +48,7 @@ MergeTreeIndexGranuleSet::MergeTreeIndexGranuleSet( void MergeTreeIndexGranuleSet::serializeBinary(WriteBuffer & ostr) const { if (empty()) - throw Exception( - "Attempt to write empty set index " + backQuote(index_name), ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Attempt to write empty set index {}.", backQuote(index_name)); const auto & size_type = DataTypePtr(std::make_shared()); auto size_serialization = size_type->getDefaultSerialization(); @@ -80,8 +79,11 @@ void MergeTreeIndexGranuleSet::serializeBinary(WriteBuffer & ostr) const } } -void MergeTreeIndexGranuleSet::deserializeBinary(ReadBuffer & istr) +void MergeTreeIndexGranuleSet::deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) { + if (version != 1) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unknown index version {}.", version); + block.clear(); Field field_rows; diff --git a/src/Storages/MergeTree/MergeTreeIndexSet.h b/src/Storages/MergeTree/MergeTreeIndexSet.h index 28afe4f714d..23b336d274b 100644 --- a/src/Storages/MergeTree/MergeTreeIndexSet.h +++ b/src/Storages/MergeTree/MergeTreeIndexSet.h @@ -28,7 +28,7 @@ struct MergeTreeIndexGranuleSet final : public IMergeTreeIndexGranule MutableColumns && columns_); void serializeBinary(WriteBuffer & ostr) const override; - void deserializeBinary(ReadBuffer & istr) override; + void deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) override; size_t size() const { return block.rows(); } bool empty() const override { return !size(); } diff --git a/src/Storages/MergeTree/MergeTreeIndices.h b/src/Storages/MergeTree/MergeTreeIndices.h index 674daeb480d..557af891b74 100644 --- a/src/Storages/MergeTree/MergeTreeIndices.h +++ b/src/Storages/MergeTree/MergeTreeIndices.h @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -17,13 +18,37 @@ constexpr auto INDEX_FILE_PREFIX = "skp_idx_"; namespace DB { +using MergeTreeIndexVersion = uint8_t; +struct MergeTreeIndexFormat +{ + MergeTreeIndexVersion version; + const char* extension; + + operator bool() const { return version != 0; } +}; + /// Stores some info about a single block of data. struct IMergeTreeIndexGranule { virtual ~IMergeTreeIndexGranule() = default; + /// Serialize always last version. virtual void serializeBinary(WriteBuffer & ostr) const = 0; - virtual void deserializeBinary(ReadBuffer & istr) = 0; + + /// Version of the index to deserialize: + /// + /// - 2 -- minmax index for proper Nullable support, + /// - 1 -- everything else. + /// + /// Implementation is responsible for version check, + /// and throw LOGICAL_ERROR in case of unsupported version. + /// + /// See also: + /// - IMergeTreeIndex::getSerializedFileExtension() + /// - IMergeTreeIndex::getDeserializedFormat() + /// - MergeTreeDataMergerMutator::collectFilesToSkip() + /// - MergeTreeDataMergerMutator::collectFilesForRenames() + virtual void deserializeBinary(ReadBuffer & istr, MergeTreeIndexVersion version) = 0; virtual bool empty() const = 0; }; @@ -73,9 +98,26 @@ struct IMergeTreeIndex virtual ~IMergeTreeIndex() = default; - /// gets filename without extension + /// Returns filename without extension. String getFileName() const { return INDEX_FILE_PREFIX + index.name; } + /// Returns extension for serialization. + /// Reimplement if you want new index format. + /// + /// NOTE: In case getSerializedFileExtension() is reimplemented, + /// getDeserializedFormat() should be reimplemented too, + /// and check all previous extensions too + /// (to avoid breaking backward compatibility). + virtual const char* getSerializedFileExtension() const { return ".idx"; } + + /// Returns extension for deserialization. + /// + /// Return pair. + virtual MergeTreeIndexFormat getDeserializedFormat(const DiskPtr, const std::string & /* relative_path_prefix */) const + { + return {1, ".idx"}; + } + /// Checks whether the column is in data skipping index. virtual bool mayBenefitFromIndexForIn(const ASTPtr & node) const = 0; diff --git a/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp b/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp index b812c6d2923..3477397adb7 100644 --- a/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp +++ b/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp @@ -1,6 +1,6 @@ #include "PostgreSQLReplicationHandler.h" -#include +#include #include #include #include diff --git a/src/Storages/StorageExternalDistributed.cpp b/src/Storages/StorageExternalDistributed.cpp index 32b9c7e9245..f20e49fe23a 100644 --- a/src/Storages/StorageExternalDistributed.cpp +++ b/src/Storages/StorageExternalDistributed.cpp @@ -98,7 +98,7 @@ StorageExternalDistributed::StorageExternalDistributed( context->getSettingsRef().postgresql_connection_pool_size, context->getSettingsRef().postgresql_connection_pool_wait_timeout); - shard = StoragePostgreSQL::create(table_id_, std::move(pool), remote_table, columns_, constraints_, String{}, context); + shard = StoragePostgreSQL::create(table_id_, std::move(pool), remote_table, columns_, constraints_, String{}); break; } #endif diff --git a/src/Storages/StorageInMemoryMetadata.cpp b/src/Storages/StorageInMemoryMetadata.cpp index dad83f64c70..91f69cdac7d 100644 --- a/src/Storages/StorageInMemoryMetadata.cpp +++ b/src/Storages/StorageInMemoryMetadata.cpp @@ -1,7 +1,7 @@ #include -#include -#include +#include +#include #include #include #include @@ -320,18 +320,13 @@ Block StorageInMemoryMetadata::getSampleBlockForColumns( { Block res; -#if !defined(ARCADIA_BUILD) - google::dense_hash_map virtuals_map; -#else - google::sparsehash::dense_hash_map virtuals_map; -#endif - + DenseHashMap virtuals_map; virtuals_map.set_empty_key(StringRef()); /// Virtual columns must be appended after ordinary, because user can /// override them. for (const auto & column : virtuals) - virtuals_map.emplace(column.name, &column.type); + virtuals_map[column.name] = &column.type; for (const auto & name : column_names) { @@ -475,13 +470,8 @@ bool StorageInMemoryMetadata::hasSelectQuery() const namespace { -#if !defined(ARCADIA_BUILD) - using NamesAndTypesMap = google::dense_hash_map; - using UniqueStrings = google::dense_hash_set; -#else - using NamesAndTypesMap = google::sparsehash::dense_hash_map; - using UniqueStrings = google::sparsehash::dense_hash_set; -#endif + using NamesAndTypesMap = DenseHashMap; + using UniqueStrings = DenseHashSet; String listOfColumns(const NamesAndTypesList & available_columns) { diff --git a/src/Storages/StorageMergeTree.cpp b/src/Storages/StorageMergeTree.cpp index 0763e2a25c4..32c2c76dd10 100644 --- a/src/Storages/StorageMergeTree.cpp +++ b/src/Storages/StorageMergeTree.cpp @@ -959,9 +959,19 @@ std::shared_ptr StorageMergeTree::se if (!commands_for_size_validation.empty()) { - MutationsInterpreter interpreter( - shared_from_this(), metadata_snapshot, commands_for_size_validation, getContext(), false); - commands_size += interpreter.evaluateCommandsSize(); + try + { + MutationsInterpreter interpreter( + shared_from_this(), metadata_snapshot, commands_for_size_validation, getContext(), false); + commands_size += interpreter.evaluateCommandsSize(); + } + catch (...) + { + MergeTreeMutationEntry & entry = it->second; + entry.latest_fail_time = time(nullptr); + entry.latest_fail_reason = getCurrentExceptionMessage(false); + continue; + } } if (current_ast_elements + commands_size >= max_ast_elements) @@ -971,17 +981,21 @@ std::shared_ptr StorageMergeTree::se commands.insert(commands.end(), it->second.commands.begin(), it->second.commands.end()); } - auto new_part_info = part->info; - new_part_info.mutation = current_mutations_by_version.rbegin()->first; + if (!commands.empty()) + { + auto new_part_info = part->info; + new_part_info.mutation = current_mutations_by_version.rbegin()->first; - future_part.parts.push_back(part); - future_part.part_info = new_part_info; - future_part.name = part->getNewName(new_part_info); - future_part.type = part->getType(); + future_part.parts.push_back(part); + future_part.part_info = new_part_info; + future_part.name = part->getNewName(new_part_info); + future_part.type = part->getType(); - tagger = std::make_unique(future_part, MergeTreeDataMergerMutator::estimateNeededDiskSpace({part}), *this, metadata_snapshot, true); - return std::make_shared(future_part, std::move(tagger), commands); + tagger = std::make_unique(future_part, MergeTreeDataMergerMutator::estimateNeededDiskSpace({part}), *this, metadata_snapshot, true); + return std::make_shared(future_part, std::move(tagger), commands); + } } + return {}; } @@ -1036,6 +1050,7 @@ bool StorageMergeTree::scheduleDataProcessingJob(IBackgroundJobExecutor & execut auto share_lock = lockForShare(RWLockImpl::NO_QUERY, getSettings()->lock_acquire_timeout_for_background_operations); + bool has_mutations; { std::unique_lock lock(currently_processing_in_background_mutex); if (merger_mutator.merges_blocker.isCancelled()) @@ -1044,6 +1059,15 @@ bool StorageMergeTree::scheduleDataProcessingJob(IBackgroundJobExecutor & execut merge_entry = selectPartsToMerge(metadata_snapshot, false, {}, false, nullptr, share_lock, lock); if (!merge_entry) mutate_entry = selectPartsToMutate(metadata_snapshot, nullptr, share_lock); + + has_mutations = !current_mutations_by_version.empty(); + } + + if (!mutate_entry && has_mutations) + { + /// Notify in case of errors + std::lock_guard lock(mutation_wait_mutex); + mutation_wait_event.notify_all(); } if (merge_entry) diff --git a/src/Storages/StorageMongoDB.cpp b/src/Storages/StorageMongoDB.cpp index a973efd7277..3bdef7fd295 100644 --- a/src/Storages/StorageMongoDB.cpp +++ b/src/Storages/StorageMongoDB.cpp @@ -15,7 +15,7 @@ #include #include #include -#include +#include namespace DB { diff --git a/src/Storages/StorageMySQL.cpp b/src/Storages/StorageMySQL.cpp index 431fda530f4..79bb1f59cc7 100644 --- a/src/Storages/StorageMySQL.cpp +++ b/src/Storages/StorageMySQL.cpp @@ -4,7 +4,7 @@ #include #include -#include +#include #include #include #include diff --git a/src/Storages/StoragePostgreSQL.cpp b/src/Storages/StoragePostgreSQL.cpp index b71f2415fd8..603a52b2801 100644 --- a/src/Storages/StoragePostgreSQL.cpp +++ b/src/Storages/StoragePostgreSQL.cpp @@ -1,7 +1,7 @@ #include "StoragePostgreSQL.h" #if USE_LIBPQXX -#include +#include #include #include @@ -47,12 +47,10 @@ StoragePostgreSQL::StoragePostgreSQL( const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, const String & comment, - ContextPtr context_, const String & remote_table_schema_) : IStorage(table_id_) , remote_table_name(remote_table_name_) , remote_table_schema(remote_table_schema_) - , global_context(context_) , pool(std::move(pool_)) { StorageInMemoryMetadata storage_metadata; @@ -347,7 +345,6 @@ void registerStoragePostgreSQL(StorageFactory & factory) args.columns, args.constraints, args.comment, - args.getContext(), remote_table_schema); }, { diff --git a/src/Storages/StoragePostgreSQL.h b/src/Storages/StoragePostgreSQL.h index 064fa481f9d..bd5cd317c3d 100644 --- a/src/Storages/StoragePostgreSQL.h +++ b/src/Storages/StoragePostgreSQL.h @@ -27,7 +27,6 @@ public: const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, const String & comment, - ContextPtr context_, const std::string & remote_table_schema_ = ""); String getName() const override { return "PostgreSQL"; } @@ -48,7 +47,6 @@ private: String remote_table_name; String remote_table_schema; - ContextPtr global_context; postgres::PoolWithFailoverPtr pool; }; diff --git a/src/Storages/StorageProxy.h b/src/Storages/StorageProxy.h index 521a2b8d642..c81ef6febdc 100644 --- a/src/Storages/StorageProxy.h +++ b/src/Storages/StorageProxy.h @@ -40,7 +40,7 @@ public: return getNested()->getQueryProcessingStage(context, to_stage, getNested()->getInMemoryMetadataPtr(), info); } - BlockInputStreams watch( + Pipe watch( const Names & column_names, const SelectQueryInfo & query_info, ContextPtr context, diff --git a/src/Storages/StorageSQLite.cpp b/src/Storages/StorageSQLite.cpp index ba66083fea5..758284e8d50 100644 --- a/src/Storages/StorageSQLite.cpp +++ b/src/Storages/StorageSQLite.cpp @@ -2,7 +2,7 @@ #if USE_SQLITE #include -#include +#include #include #include #include @@ -78,8 +78,7 @@ Pipe StorageSQLite::read( sample_block.insert({column_data.type, column_data.name}); } - return Pipe(std::make_shared( - std::make_shared(sqlite_db, query, sample_block, max_block_size))); + return Pipe(std::make_shared(sqlite_db, query, sample_block, max_block_size)); } diff --git a/src/TableFunctions/TableFunctionMySQL.cpp b/src/TableFunctions/TableFunctionMySQL.cpp index f8e0c41634b..09f9cf8b1f5 100644 --- a/src/TableFunctions/TableFunctionMySQL.cpp +++ b/src/TableFunctions/TableFunctionMySQL.cpp @@ -8,7 +8,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/TableFunctions/TableFunctionPostgreSQL.cpp b/src/TableFunctions/TableFunctionPostgreSQL.cpp index ceea29b335b..d701728479b 100644 --- a/src/TableFunctions/TableFunctionPostgreSQL.cpp +++ b/src/TableFunctions/TableFunctionPostgreSQL.cpp @@ -37,7 +37,6 @@ StoragePtr TableFunctionPostgreSQL::executeImpl(const ASTPtr & /*ast_function*/, columns, ConstraintsDescription{}, String{}, - context, remote_table_schema); result->startup(); diff --git a/tests/clickhouse-test b/tests/clickhouse-test index f6833cfbd09..c627810a550 100755 --- a/tests/clickhouse-test +++ b/tests/clickhouse-test @@ -650,6 +650,10 @@ def run_tests_array(all_tests_with_params): status += " - having exception in stdout:\n{}\n".format( '\n'.join(stdout.split('\n')[:100])) status += 'Database: ' + testcase_args.testcase_database + elif '@@SKIP@@' in stdout: + skipped_total += 1 + skip_reason = stdout.replace('@@SKIP@@', '').rstrip("\n") + status += MSG_SKIPPED + f" - {skip_reason}\n" elif reference_file is None: status += MSG_UNKNOWN status += print_test_time(total_time) diff --git a/tests/integration/ci-runner.py b/tests/integration/ci-runner.py index ecd4cb8d4e7..bf7549a83c4 100755 --- a/tests/integration/ci-runner.py +++ b/tests/integration/ci-runner.py @@ -125,13 +125,13 @@ def clear_ip_tables_and_restart_daemons(): logging.info("Dump iptables after run %s", subprocess.check_output("iptables -L", shell=True)) try: logging.info("Killing all alive docker containers") - subprocess.check_output("docker kill $(docker ps -q)", shell=True) + subprocess.check_output("timeout -s 9 10m docker kill $(docker ps -q)", shell=True) except subprocess.CalledProcessError as err: logging.info("docker kill excepted: " + str(err)) try: logging.info("Removing all docker containers") - subprocess.check_output("docker rm $(docker ps -a -q) --force", shell=True) + subprocess.check_output("timeout -s 9 10m docker rm $(docker ps -a -q) --force", shell=True) except subprocess.CalledProcessError as err: logging.info("docker rm excepted: " + str(err)) @@ -264,7 +264,7 @@ class ClickhouseIntegrationTestsRunner: out_file = "all_tests.txt" out_file_full = "all_tests_full.txt" cmd = "cd {repo_path}/tests/integration && " \ - "./runner --tmpfs {image_cmd} ' --setup-plan' " \ + "timeout -s 9 1h ./runner --tmpfs {image_cmd} ' --setup-plan' " \ "| tee {out_file_full} | grep '::' | sed 's/ (fixtures used:.*//g' | sed 's/^ *//g' | sed 's/ *$//g' " \ "| grep -v 'SKIPPED' | sort -u > {out_file}".format( repo_path=repo_path, image_cmd=image_cmd, out_file=out_file, out_file_full=out_file_full) @@ -376,6 +376,24 @@ class ClickhouseIntegrationTestsRunner: res.add(path) return res + def try_run_test_group(self, repo_path, test_group, tests_in_group, num_tries, num_workers): + try: + return self.run_test_group(repo_path, test_group, tests_in_group, num_tries, num_workers) + except Exception as e: + logging.info("Failed to run {}:\n{}".format(str(test_group), str(e))) + counters = { + "ERROR": [], + "PASSED": [], + "FAILED": [], + "SKIPPED": [], + "FLAKY": [], + } + tests_times = defaultdict(float) + for test in tests_in_group: + counters["ERROR"].append(test) + tests_times[test] = 0 + return counters, tests_times, [] + def run_test_group(self, repo_path, test_group, tests_in_group, num_tries, num_workers): counters = { "ERROR": [], @@ -419,7 +437,7 @@ class ClickhouseIntegrationTestsRunner: test_cmd = ' '.join([test for test in sorted(test_names)]) parallel_cmd = " --parallel {} ".format(num_workers) if num_workers > 0 else "" - cmd = "cd {}/tests/integration && ./runner --tmpfs {} -t {} {} '-rfEp --run-id={} --color=no --durations=0 {}' | tee {}".format( + cmd = "cd {}/tests/integration && timeout -s 9 1h ./runner --tmpfs {} -t {} {} '-rfEp --run-id={} --color=no --durations=0 {}' | tee {}".format( repo_path, image_cmd, test_cmd, parallel_cmd, i, _get_deselect_option(self.should_skip_tests()), info_path) log_basename = test_group_str + "_" + str(i) + ".log" @@ -507,7 +525,7 @@ class ClickhouseIntegrationTestsRunner: for i in range(TRIES_COUNT): final_retry += 1 logging.info("Running tests for the %s time", i) - counters, tests_times, log_paths = self.run_test_group(repo_path, "flaky", tests_to_run, 1, 1) + counters, tests_times, log_paths = self.try_run_test_group(repo_path, "flaky", tests_to_run, 1, 1) logs += log_paths if counters["FAILED"]: logging.info("Found failed tests: %s", ' '.join(counters["FAILED"])) @@ -583,7 +601,7 @@ class ClickhouseIntegrationTestsRunner: for group, tests in items_to_run: logging.info("Running test group %s countaining %s tests", group, len(tests)) - group_counters, group_test_times, log_paths = self.run_test_group(repo_path, group, tests, MAX_RETRY, NUM_WORKERS) + group_counters, group_test_times, log_paths = self.try_run_test_group(repo_path, group, tests, MAX_RETRY, NUM_WORKERS) total_tests = 0 for counter, value in group_counters.items(): logging.info("Tests from group %s stats, %s count %s", group, counter, len(value)) diff --git a/tests/integration/test_alter_update_cast_keep_nullable/__init__.py b/tests/integration/test_alter_update_cast_keep_nullable/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_alter_update_cast_keep_nullable/configs/users.xml b/tests/integration/test_alter_update_cast_keep_nullable/configs/users.xml new file mode 100644 index 00000000000..aa2f240b831 --- /dev/null +++ b/tests/integration/test_alter_update_cast_keep_nullable/configs/users.xml @@ -0,0 +1,8 @@ + + + + + 1 + + + diff --git a/tests/integration/test_alter_update_cast_keep_nullable/test.py b/tests/integration/test_alter_update_cast_keep_nullable/test.py new file mode 100644 index 00000000000..497a9e21d94 --- /dev/null +++ b/tests/integration/test_alter_update_cast_keep_nullable/test.py @@ -0,0 +1,36 @@ +import pytest +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) + +node1 = cluster.add_instance('node1', user_configs=['configs/users.xml'], with_zookeeper=True) + +@pytest.fixture(scope="module") +def started_cluster(): + try: + cluster.start() + yield cluster + finally: + cluster.shutdown() + +def test_cast_keep_nullable(started_cluster): + setting = node1.query("SELECT value FROM system.settings WHERE name='cast_keep_nullable'") + assert(setting.strip() == "1") + + result = node1.query(""" + DROP TABLE IF EXISTS t; + CREATE TABLE t (x UInt64) ENGINE = MergeTree ORDER BY tuple(); + INSERT INTO t SELECT number FROM numbers(10); + SELECT * FROM t; + """) + assert(result.strip() == "0\n1\n2\n3\n4\n5\n6\n7\n8\n9") + + error = node1.query_and_get_error(""" + SET mutations_sync = 1; + ALTER TABLE t UPDATE x = x % 3 = 0 ? NULL : x WHERE x % 2 = 1;  + """) + assert("DB::Exception: Cannot convert NULL value to non-Nullable type" in error) + + result = node1.query("SELECT * FROM t;") + assert(result.strip() == "0\n1\n2\n3\n4\n5\n6\n7\n8\n9") + diff --git a/tests/integration/test_config_corresponding_root/configs/config.xml b/tests/integration/test_config_corresponding_root/configs/config.xml index 4e130afa84d..a518bd88b2e 100644 --- a/tests/integration/test_config_corresponding_root/configs/config.xml +++ b/tests/integration/test_config_corresponding_root/configs/config.xml @@ -304,14 +304,13 @@ system part_log
+ toYYYYMM(event_date) 7500
- --> - - + @@ -838,13 +838,13 @@ system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- --> + + system + query_views_log
+ toYYYYMM(event_date) + 7500 +
+ system part_log
+ toYYYYMM(event_date) 7500
- --> system part_log
+ toYYYYMM(event_date) 7500
- -->