Merge branch 'master' into fix-bad-cast

This commit is contained in:
alexey-milovidov 2021-08-15 09:06:42 +03:00 committed by GitHub
commit d88cf81d71
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
300 changed files with 4995 additions and 2031 deletions

View File

@ -1,8 +1,8 @@
---
name: Bug report
about: Create a report to help us improve ClickHouse
about: Wrong behaviour (visible to users) in official ClickHouse release.
title: ''
labels: bug
labels: 'potential bug'
assignees: ''
---

View File

@ -1,3 +1,102 @@
### ClickHouse release v21.8, 2021-08-12
#### New Features
* Add support for a part of SQL/JSON standard. [#24148](https://github.com/ClickHouse/ClickHouse/pull/24148) ([l1tsolaiki](https://github.com/l1tsolaiki), [Kseniia Sumarokova](https://github.com/kssenii)).
* Collect common system metrics (in `system.asynchronous_metrics` and `system.asynchronous_metric_log`) on CPU usage, disk usage, memory usage, IO, network, files, load average, CPU frequencies, thermal sensors, EDAC counters, system uptime; also added metrics about the scheduling jitter and the time spent collecting the metrics. It works similar to `atop` in ClickHouse and allows access to monitoring data even if you have no additional tools installed. Close [#9430](https://github.com/ClickHouse/ClickHouse/issues/9430). [#24416](https://github.com/ClickHouse/ClickHouse/pull/24416) ([alexey-milovidov](https://github.com/alexey-milovidov), [Yegor Levankov](https://github.com/elevankoff)).
* Add MaterializedPostgreSQL table engine and database engine. This database engine allows replicating a whole database or any subset of database tables. [#20470](https://github.com/ClickHouse/ClickHouse/pull/20470) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add new functions `leftPad()`, `rightPad()`, `leftPadUTF8()`, `rightPadUTF8()`. [#26075](https://github.com/ClickHouse/ClickHouse/pull/26075) ([Vitaly Baranov](https://github.com/vitlibar)).
* Add the `FIRST` keyword to the `ADD INDEX` command to be able to add the index at the beginning of the indices list. [#25904](https://github.com/ClickHouse/ClickHouse/pull/25904) ([xjewer](https://github.com/xjewer)).
* Introduce `system.data_skipping_indices` table containing information about existing data skipping indices. Close [#7659](https://github.com/ClickHouse/ClickHouse/issues/7659). [#25693](https://github.com/ClickHouse/ClickHouse/pull/25693) ([Dmitry Novik](https://github.com/novikd)).
* Add `bin`/`unbin` functions. [#25609](https://github.com/ClickHouse/ClickHouse/pull/25609) ([zhaoyu](https://github.com/zxc111)).
* Support `Map` and `UInt128`, `Int128`, `UInt256`, `Int256` types in `mapAdd` and `mapSubtract` functions. [#25596](https://github.com/ClickHouse/ClickHouse/pull/25596) ([Ildus Kurbangaliev](https://github.com/ildus)).
* Support `DISTINCT ON (columns)` expression, close [#25404](https://github.com/ClickHouse/ClickHouse/issues/25404). [#25589](https://github.com/ClickHouse/ClickHouse/pull/25589) ([Zijie Lu](https://github.com/TszKitLo40)).
* Add an ability to reset a custom setting to default and remove it from the table's metadata. It allows rolling back the change without knowing the system/config's default. Closes [#14449](https://github.com/ClickHouse/ClickHouse/issues/14449). [#17769](https://github.com/ClickHouse/ClickHouse/pull/17769) ([xjewer](https://github.com/xjewer)).
* Render pipelines as graphs in Web UI if `EXPLAIN PIPELINE graph = 1` query is submitted. [#26067](https://github.com/ClickHouse/ClickHouse/pull/26067) ([alexey-milovidov](https://github.com/alexey-milovidov)).
#### Performance Improvements
* Compile aggregate functions. Use option `compile_aggregate_expressions` to enable it. [#24789](https://github.com/ClickHouse/ClickHouse/pull/24789) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve latency of short queries that require reading from tables with many columns. [#26371](https://github.com/ClickHouse/ClickHouse/pull/26371) ([Anton Popov](https://github.com/CurtizJ)).
#### Improvements
* Use `Map` data type for system logs tables (`system.query_log`, `system.query_thread_log`, `system.processes`, `system.opentelemetry_span_log`). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes [#18698](https://github.com/ClickHouse/ClickHouse/issues/18698). [#23934](https://github.com/ClickHouse/ClickHouse/pull/23934), [#25773](https://github.com/ClickHouse/ClickHouse/pull/25773) ([hexiaoting](https://github.com/hexiaoting), [sundy-li](https://github.com/sundy-li), [Maksim Kita](https://github.com/kitaisreal)).
* For a dictionary with a complex key containing only one attribute, allow not wrapping the key expression in tuple for functions `dictGet`, `dictHas`. [#26130](https://github.com/ClickHouse/ClickHouse/pull/26130) ([Maksim Kita](https://github.com/kitaisreal)).
* Implement function `bin`/`hex` from `AggregateFunction` states. [#26094](https://github.com/ClickHouse/ClickHouse/pull/26094) ([zhaoyu](https://github.com/zxc111)).
* Support arguments of `UUID` type for `empty` and `notEmpty` functions. `UUID` is empty if it is all zeros (nil UUID). Closes [#3446](https://github.com/ClickHouse/ClickHouse/issues/3446). [#25974](https://github.com/ClickHouse/ClickHouse/pull/25974) ([zhaoyu](https://github.com/zxc111)).
* Add support for `SET SQL_SELECT_LIMIT` in MySQL protocol. Closes [#17115](https://github.com/ClickHouse/ClickHouse/issues/17115). [#25972](https://github.com/ClickHouse/ClickHouse/pull/25972) ([Kseniia Sumarokova](https://github.com/kssenii)).
* More instrumentation for network interaction: add counters for recv/send bytes; add gauges for recvs/sends. Added missing documentation. Close [#5897](https://github.com/ClickHouse/ClickHouse/issues/5897). [#25962](https://github.com/ClickHouse/ClickHouse/pull/25962) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add setting `optimize_move_to_prewhere_if_final`. If query has `FINAL`, the optimization `move_to_prewhere` will be enabled only if both `optimize_move_to_prewhere` and `optimize_move_to_prewhere_if_final` are enabled. Closes [#8684](https://github.com/ClickHouse/ClickHouse/issues/8684). [#25940](https://github.com/ClickHouse/ClickHouse/pull/25940) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow complex quoted identifiers of JOINed tables. Close [#17861](https://github.com/ClickHouse/ClickHouse/issues/17861). [#25924](https://github.com/ClickHouse/ClickHouse/pull/25924) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add support for Unicode (e.g. Chinese, Cyrillic) components in `Nested` data types. Close [#25594](https://github.com/ClickHouse/ClickHouse/issues/25594). [#25923](https://github.com/ClickHouse/ClickHouse/pull/25923) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Allow `quantiles*` functions to work with `aggregate_functions_null_for_empty`. Close [#25892](https://github.com/ClickHouse/ClickHouse/issues/25892). [#25919](https://github.com/ClickHouse/ClickHouse/pull/25919) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Allow parameters for parametric aggregate functions to be arbitrary constant expressions (e.g., `1 + 2`), not just literals. It also allows using the query parameters (in parameterized queries like `{param:UInt8}`) inside parametric aggregate functions. Closes [#11607](https://github.com/ClickHouse/ClickHouse/issues/11607). [#25910](https://github.com/ClickHouse/ClickHouse/pull/25910) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Correctly throw the exception on the attempt to parse an invalid `Date`. Closes [#6481](https://github.com/ClickHouse/ClickHouse/issues/6481). [#25909](https://github.com/ClickHouse/ClickHouse/pull/25909) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Support for multiple includes in configuration. It is possible to include users configuration, remote server configuration from multiple sources. Simply place `<include />` element with `from_zk`, `from_env` or `incl` attribute, and it will be replaced with the substitution. [#24404](https://github.com/ClickHouse/ClickHouse/pull/24404) ([nvartolomei](https://github.com/nvartolomei)).
* Support for queries with a column named `"null"` (it must be specified in back-ticks or double quotes) and `ON CLUSTER`. Closes [#24035](https://github.com/ClickHouse/ClickHouse/issues/24035). [#25907](https://github.com/ClickHouse/ClickHouse/pull/25907) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Support `LowCardinality`, `Decimal`, and `UUID` for `JSONExtract`. Closes [#24606](https://github.com/ClickHouse/ClickHouse/issues/24606). [#25900](https://github.com/ClickHouse/ClickHouse/pull/25900) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Convert history file from `readline` format to `replxx` format. [#25888](https://github.com/ClickHouse/ClickHouse/pull/25888) ([Azat Khuzhin](https://github.com/azat)).
* Fix an issue which can lead to intersecting parts after `DROP PART` or background deletion of an empty part. [#25884](https://github.com/ClickHouse/ClickHouse/pull/25884) ([alesapin](https://github.com/alesapin)).
* Better handling of lost parts for `ReplicatedMergeTree` tables. Fixes rare inconsistencies in `ReplicationQueue`. Fixes [#10368](https://github.com/ClickHouse/ClickHouse/issues/10368). [#25820](https://github.com/ClickHouse/ClickHouse/pull/25820) ([alesapin](https://github.com/alesapin)).
* Allow starting clickhouse-client with unreadable working directory. [#25817](https://github.com/ClickHouse/ClickHouse/pull/25817) ([ianton-ru](https://github.com/ianton-ru)).
* Fix "No available columns" error for `Merge` storage. [#25801](https://github.com/ClickHouse/ClickHouse/pull/25801) ([Azat Khuzhin](https://github.com/azat)).
* MySQL Engine now supports the exchange of column comments between MySQL and ClickHouse. [#25795](https://github.com/ClickHouse/ClickHouse/pull/25795) ([Storozhuk Kostiantyn](https://github.com/sand6255)).
* Fix inconsistent behaviour of `GROUP BY` constant on empty set. Closes [#6842](https://github.com/ClickHouse/ClickHouse/issues/6842). [#25786](https://github.com/ClickHouse/ClickHouse/pull/25786) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Cancel already running merges in partition on `DROP PARTITION` and `TRUNCATE` for `ReplicatedMergeTree`. Resolves [#17151](https://github.com/ClickHouse/ClickHouse/issues/17151). [#25684](https://github.com/ClickHouse/ClickHouse/pull/25684) ([tavplubix](https://github.com/tavplubix)).
* Support ENUM` data type for MaterializeMySQL. [#25676](https://github.com/ClickHouse/ClickHouse/pull/25676) ([Storozhuk Kostiantyn](https://github.com/sand6255)).
* Support materialized and aliased columns in JOIN, close [#13274](https://github.com/ClickHouse/ClickHouse/issues/13274). [#25634](https://github.com/ClickHouse/ClickHouse/pull/25634) ([Vladimir C](https://github.com/vdimir)).
* Fix possible logical race condition between `ALTER TABLE ... DETACH` and background merges. [#25605](https://github.com/ClickHouse/ClickHouse/pull/25605) ([Azat Khuzhin](https://github.com/azat)).
* Make `NetworkReceiveElapsedMicroseconds` metric to correctly include the time spent waiting for data from the client to `INSERT`. Close [#9958](https://github.com/ClickHouse/ClickHouse/issues/9958). [#25602](https://github.com/ClickHouse/ClickHouse/pull/25602) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Support `TRUNCATE TABLE` for S3 and HDFS. Close [#25530](https://github.com/ClickHouse/ClickHouse/issues/25530). [#25550](https://github.com/ClickHouse/ClickHouse/pull/25550) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support for dynamic reloading of config to change number of threads in pool for background jobs execution (merges, mutations, fetches). [#25548](https://github.com/ClickHouse/ClickHouse/pull/25548) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow extracting of non-string element as string using `JSONExtract`. This is for [#25414](https://github.com/ClickHouse/ClickHouse/issues/25414). [#25452](https://github.com/ClickHouse/ClickHouse/pull/25452) ([Amos Bird](https://github.com/amosbird)).
* Support regular expression in `Database` argument for `StorageMerge`. Close [#776](https://github.com/ClickHouse/ClickHouse/issues/776). [#25064](https://github.com/ClickHouse/ClickHouse/pull/25064) ([flynn](https://github.com/ucasfl)).
* Web UI: if the value looks like a URL, automatically generate a link. [#25965](https://github.com/ClickHouse/ClickHouse/pull/25965) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Make `sudo service clickhouse-server start` to work on systems with `systemd` like Centos 8. Close [#14298](https://github.com/ClickHouse/ClickHouse/issues/14298). Close [#17799](https://github.com/ClickHouse/ClickHouse/issues/17799). [#25921](https://github.com/ClickHouse/ClickHouse/pull/25921) ([alexey-milovidov](https://github.com/alexey-milovidov)).
#### Bug Fixes
* Fix incorrect `SET ROLE` in some cases. [#26707](https://github.com/ClickHouse/ClickHouse/pull/26707) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix potential `nullptr` dereference in window functions. Fix [#25276](https://github.com/ClickHouse/ClickHouse/issues/25276). [#26668](https://github.com/ClickHouse/ClickHouse/pull/26668) ([Alexander Kuzmenkov](https://github.com/akuzm)).
* Fix incorrect function names of `groupBitmapAnd/Or/Xor`. Fix [#26557](https://github.com/ClickHouse/ClickHouse/pull/26557) ([Amos Bird](https://github.com/amosbird)).
* Fix crash in RabbitMQ shutdown in case RabbitMQ setup was not started. Closes [#26504](https://github.com/ClickHouse/ClickHouse/issues/26504). [#26529](https://github.com/ClickHouse/ClickHouse/pull/26529) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix issues with `CREATE DICTIONARY` query if dictionary name or database name was quoted. Closes [#26491](https://github.com/ClickHouse/ClickHouse/issues/26491). [#26508](https://github.com/ClickHouse/ClickHouse/pull/26508) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix broken name resolution after rewriting column aliases. Fix [#26432](https://github.com/ClickHouse/ClickHouse/issues/26432). [#26475](https://github.com/ClickHouse/ClickHouse/pull/26475) ([Amos Bird](https://github.com/amosbird)).
* Fix infinite non-joined block stream in `partial_merge_join` close [#26325](https://github.com/ClickHouse/ClickHouse/issues/26325). [#26374](https://github.com/ClickHouse/ClickHouse/pull/26374) ([Vladimir C](https://github.com/vdimir)).
* Fix possible crash when login as dropped user. Fix [#26073](https://github.com/ClickHouse/ClickHouse/issues/26073). [#26363](https://github.com/ClickHouse/ClickHouse/pull/26363) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix `optimize_distributed_group_by_sharding_key` for multiple columns (leads to incorrect result w/ `optimize_skip_unused_shards=1`/`allow_nondeterministic_optimize_skip_unused_shards=1` and multiple columns in sharding key expression). [#26353](https://github.com/ClickHouse/ClickHouse/pull/26353) ([Azat Khuzhin](https://github.com/azat)).
* `CAST` from `Date` to `DateTime` (or `DateTime64`) was not using the timezone of the `DateTime` type. It can also affect the comparison between `Date` and `DateTime`. Inference of the common type for `Date` and `DateTime` also was not using the corresponding timezone. It affected the results of function `if` and array construction. Closes [#24128](https://github.com/ClickHouse/ClickHouse/issues/24128). [#24129](https://github.com/ClickHouse/ClickHouse/pull/24129) ([Maksim Kita](https://github.com/kitaisreal)).
* Fixed rare bug in lost replica recovery that may cause replicas to diverge. [#26321](https://github.com/ClickHouse/ClickHouse/pull/26321) ([tavplubix](https://github.com/tavplubix)).
* Fix zstd decompression in case there are escape sequences at the end of internal buffer. Closes [#26013](https://github.com/ClickHouse/ClickHouse/issues/26013). [#26314](https://github.com/ClickHouse/ClickHouse/pull/26314) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix logical error on join with totals, close [#26017](https://github.com/ClickHouse/ClickHouse/issues/26017). [#26250](https://github.com/ClickHouse/ClickHouse/pull/26250) ([Vladimir C](https://github.com/vdimir)).
* Remove excessive newline in `thread_name` column in `system.stack_trace` table. Fix [#24124](https://github.com/ClickHouse/ClickHouse/issues/24124). [#26210](https://github.com/ClickHouse/ClickHouse/pull/26210) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix `joinGet` with `LowCarinality` columns, close [#25993](https://github.com/ClickHouse/ClickHouse/issues/25993). [#26118](https://github.com/ClickHouse/ClickHouse/pull/26118) ([Vladimir C](https://github.com/vdimir)).
* Fix possible crash in `pointInPolygon` if the setting `validate_polygons` is turned off. [#26113](https://github.com/ClickHouse/ClickHouse/pull/26113) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix throwing exception when iterate over non-existing remote directory. [#26087](https://github.com/ClickHouse/ClickHouse/pull/26087) ([ianton-ru](https://github.com/ianton-ru)).
* Fix rare server crash because of `abort` in ZooKeeper client. Fixes [#25813](https://github.com/ClickHouse/ClickHouse/issues/25813). [#26079](https://github.com/ClickHouse/ClickHouse/pull/26079) ([alesapin](https://github.com/alesapin)).
* Fix wrong thread count estimation for right subquery join in some cases. Close [#24075](https://github.com/ClickHouse/ClickHouse/issues/24075). [#26052](https://github.com/ClickHouse/ClickHouse/pull/26052) ([Vladimir C](https://github.com/vdimir)).
* Fixed incorrect `sequence_id` in MySQL protocol packets that ClickHouse sends on exception during query execution. It might cause MySQL client to reset connection to ClickHouse server. Fixes [#21184](https://github.com/ClickHouse/ClickHouse/issues/21184). [#26051](https://github.com/ClickHouse/ClickHouse/pull/26051) ([tavplubix](https://github.com/tavplubix)).
* Fix possible mismatched header when using normal projection with `PREWHERE`. Fix [#26020](https://github.com/ClickHouse/ClickHouse/issues/26020). [#26038](https://github.com/ClickHouse/ClickHouse/pull/26038) ([Amos Bird](https://github.com/amosbird)).
* Fix formatting of type `Map` with integer keys to `JSON`. [#25982](https://github.com/ClickHouse/ClickHouse/pull/25982) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible deadlock during query profiler stack unwinding. Fix [#25968](https://github.com/ClickHouse/ClickHouse/issues/25968). [#25970](https://github.com/ClickHouse/ClickHouse/pull/25970) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix crash on call `dictGet()` with bad arguments. [#25913](https://github.com/ClickHouse/ClickHouse/pull/25913) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fixed `scram-sha-256` authentication for PostgreSQL engines. Closes [#24516](https://github.com/ClickHouse/ClickHouse/issues/24516). [#25906](https://github.com/ClickHouse/ClickHouse/pull/25906) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix extremely long backoff for background tasks when the background pool is full. Fixes [#25836](https://github.com/ClickHouse/ClickHouse/issues/25836). [#25893](https://github.com/ClickHouse/ClickHouse/pull/25893) ([alesapin](https://github.com/alesapin)).
* Fix ARM exception handling with non default page size. Fixes [#25512](https://github.com/ClickHouse/ClickHouse/issues/25512), [#25044](https://github.com/ClickHouse/ClickHouse/issues/25044), [#24901](https://github.com/ClickHouse/ClickHouse/issues/24901), [#23183](https://github.com/ClickHouse/ClickHouse/issues/23183), [#20221](https://github.com/ClickHouse/ClickHouse/issues/20221), [#19703](https://github.com/ClickHouse/ClickHouse/issues/19703), [#19028](https://github.com/ClickHouse/ClickHouse/issues/19028), [#18391](https://github.com/ClickHouse/ClickHouse/issues/18391), [#18121](https://github.com/ClickHouse/ClickHouse/issues/18121), [#17994](https://github.com/ClickHouse/ClickHouse/issues/17994), [#12483](https://github.com/ClickHouse/ClickHouse/issues/12483). [#25854](https://github.com/ClickHouse/ClickHouse/pull/25854) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix sharding_key from column w/o function for `remote()` (before `select * from remote('127.1', system.one, dummy)` leads to `Unknown column: dummy, there are only columns .` error). [#25824](https://github.com/ClickHouse/ClickHouse/pull/25824) ([Azat Khuzhin](https://github.com/azat)).
* Fixed `Not found column ...` and `Missing column ...` errors when selecting from `MaterializeMySQL`. Fixes [#23708](https://github.com/ClickHouse/ClickHouse/issues/23708), [#24830](https://github.com/ClickHouse/ClickHouse/issues/24830), [#25794](https://github.com/ClickHouse/ClickHouse/issues/25794). [#25822](https://github.com/ClickHouse/ClickHouse/pull/25822) ([tavplubix](https://github.com/tavplubix)).
* Fix `optimize_skip_unused_shards_rewrite_in` for non-UInt64 types (may select incorrect shards eventually or throw `Cannot infer type of an empty tuple` or `Function tuple requires at least one argument`). [#25798](https://github.com/ClickHouse/ClickHouse/pull/25798) ([Azat Khuzhin](https://github.com/azat)).
* Fix rare bug with `DROP PART` query for `ReplicatedMergeTree` tables which can lead to error message `Unexpected merged part intersecting drop range`. [#25783](https://github.com/ClickHouse/ClickHouse/pull/25783) ([alesapin](https://github.com/alesapin)).
* Fix bug in `TTL` with `GROUP BY` expression which refuses to execute `TTL` after first execution in part. [#25743](https://github.com/ClickHouse/ClickHouse/pull/25743) ([alesapin](https://github.com/alesapin)).
* Allow StorageMerge to access tables with aliases. Closes [#6051](https://github.com/ClickHouse/ClickHouse/issues/6051). [#25694](https://github.com/ClickHouse/ClickHouse/pull/25694) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix slow dict join in some cases, close [#24209](https://github.com/ClickHouse/ClickHouse/issues/24209). [#25618](https://github.com/ClickHouse/ClickHouse/pull/25618) ([Vladimir C](https://github.com/vdimir)).
* Fix `ALTER MODIFY COLUMN` of columns, which participates in TTL expressions. [#25554](https://github.com/ClickHouse/ClickHouse/pull/25554) ([Anton Popov](https://github.com/CurtizJ)).
* Fix assertion in `PREWHERE` with non-UInt8 type, close [#19589](https://github.com/ClickHouse/ClickHouse/issues/19589). [#25484](https://github.com/ClickHouse/ClickHouse/pull/25484) ([Vladimir C](https://github.com/vdimir)).
* Fix some fuzzed msan crash. Fixes [#22517](https://github.com/ClickHouse/ClickHouse/issues/22517). [#26428](https://github.com/ClickHouse/ClickHouse/pull/26428) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update `chown` cmd check in `clickhouse-server` docker entrypoint. It fixes error 'cluster pod restart failed (or timeout)' on kubernetes. [#26545](https://github.com/ClickHouse/ClickHouse/pull/26545) ([Ky Li](https://github.com/Kylinrix)).
### ClickHouse release v21.7, 2021-07-09
#### Backward Incompatible Change
@ -1183,13 +1282,6 @@
* PODArray: Avoid call to memcpy with (nullptr, 0) arguments (Fix UBSan report). This fixes [#18525](https://github.com/ClickHouse/ClickHouse/issues/18525). [#18526](https://github.com/ClickHouse/ClickHouse/pull/18526) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Minor improvement for path concatenation of zookeeper paths inside DDLWorker. [#17767](https://github.com/ClickHouse/ClickHouse/pull/17767) ([Bharat Nallan](https://github.com/bharatnc)).
* Allow to reload symbols from debug file. This PR also fixes a build-id issue. [#17637](https://github.com/ClickHouse/ClickHouse/pull/17637) ([Amos Bird](https://github.com/amosbird)).
* TestFlows: fixes to LDAP tests that fail due to slow test execution. [#18790](https://github.com/ClickHouse/ClickHouse/pull/18790) ([vzakaznikov](https://github.com/vzakaznikov)).
* TestFlows: Merging requirements for AES encryption functions. Updating aes_encryption tests to use new requirements. Updating TestFlows version to 1.6.72. [#18221](https://github.com/ClickHouse/ClickHouse/pull/18221) ([vzakaznikov](https://github.com/vzakaznikov)).
* TestFlows: Updating TestFlows version to the latest 1.6.72. Re-generating requirements.py. [#18208](https://github.com/ClickHouse/ClickHouse/pull/18208) ([vzakaznikov](https://github.com/vzakaznikov)).
* TestFlows: Updating TestFlows README.md to include "How To Debug Why Test Failed" section. [#17808](https://github.com/ClickHouse/ClickHouse/pull/17808) ([vzakaznikov](https://github.com/vzakaznikov)).
* TestFlows: tests for RBAC [ACCESS MANAGEMENT](https://clickhouse.tech/docs/en/sql-reference/statements/grant/#grant-access-management) privileges. [#17804](https://github.com/ClickHouse/ClickHouse/pull/17804) ([MyroTk](https://github.com/MyroTk)).
* TestFlows: RBAC tests for SHOW, TRUNCATE, KILL, and OPTIMIZE. - Updates to old tests. - Resolved comments from #https://github.com/ClickHouse/ClickHouse/pull/16977. [#17657](https://github.com/ClickHouse/ClickHouse/pull/17657) ([MyroTk](https://github.com/MyroTk)).
* TestFlows: Added RBAC tests for `ATTACH`, `CREATE`, `DROP`, and `DETACH`. [#16977](https://github.com/ClickHouse/ClickHouse/pull/16977) ([MyroTk](https://github.com/MyroTk)).
## [Changelog for 2020](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/whats-new/changelog/2020.md)

View File

@ -13,3 +13,6 @@ ClickHouse® is an open-source column-oriented database management system that a
* [Code Browser](https://clickhouse.tech/codebrowser/html_report/ClickHouse/index.html) with syntax highlight and navigation.
* [Contacts](https://clickhouse.tech/#contacts) can help to get your questions answered if there are any.
* You can also [fill this form](https://clickhouse.tech/#meet) to meet Yandex ClickHouse team in person.
## Upcoming Events
* [SF Bay Area ClickHouse August Community Meetup (online)](https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/279109379/) on 25 August 2021.

View File

@ -60,6 +60,7 @@ DateLUTImpl::DateLUTImpl(const std::string & time_zone_)
offset_at_start_of_epoch = cctz_time_zone.lookup(cctz_time_zone.lookup(epoch).pre).offset;
offset_at_start_of_lut = cctz_time_zone.lookup(cctz_time_zone.lookup(lut_start).pre).offset;
offset_is_whole_number_of_hours_during_epoch = true;
offset_is_whole_number_of_minutes_during_epoch = true;
cctz::civil_day date = lut_start;
@ -108,6 +109,9 @@ DateLUTImpl::DateLUTImpl(const std::string & time_zone_)
if (offset_is_whole_number_of_hours_during_epoch && start_of_day > 0 && start_of_day % 3600)
offset_is_whole_number_of_hours_during_epoch = false;
if (offset_is_whole_number_of_minutes_during_epoch && start_of_day > 0 && start_of_day % 60)
offset_is_whole_number_of_minutes_during_epoch = false;
/// If UTC offset was changed this day.
/// Change in time zone without transition is possible, e.g. Moscow 1991 Sun, 31 Mar, 02:00 MSK to EEST
cctz::time_zone::civil_transition transition{};

View File

@ -193,6 +193,7 @@ private:
/// UTC offset at the beginning of the first supported year.
Time offset_at_start_of_lut;
bool offset_is_whole_number_of_hours_during_epoch;
bool offset_is_whole_number_of_minutes_during_epoch;
/// Time zone name.
std::string time_zone;
@ -251,18 +252,23 @@ private:
}
template <typename T, typename Divisor>
static inline T roundDown(T x, Divisor divisor)
inline T roundDown(T x, Divisor divisor) const
{
static_assert(std::is_integral_v<T> && std::is_integral_v<Divisor>);
assert(divisor > 0);
if (likely(x >= 0))
return x / divisor * divisor;
if (likely(offset_is_whole_number_of_hours_during_epoch))
{
if (likely(x >= 0))
return x / divisor * divisor;
/// Integer division for negative numbers rounds them towards zero (up).
/// We will shift the number so it will be rounded towards -inf (down).
/// Integer division for negative numbers rounds them towards zero (up).
/// We will shift the number so it will be rounded towards -inf (down).
return (x + 1 - divisor) / divisor * divisor;
}
return (x + 1 - divisor) / divisor * divisor;
Time date = find(x).date;
return date + (x - date) / divisor * divisor;
}
public:
@ -459,10 +465,21 @@ public:
inline unsigned toSecond(Time t) const
{
auto res = t % 60;
if (likely(res >= 0))
return res;
return res + 60;
if (likely(offset_is_whole_number_of_minutes_during_epoch))
{
Time res = t % 60;
if (likely(res >= 0))
return res;
return res + 60;
}
LUTIndex index = findIndex(t);
Time time = t - lut[index].date;
if (time >= lut[index].time_at_offset_change())
time += lut[index].amount_of_offset_change();
return time % 60;
}
inline unsigned toMinute(Time t) const
@ -483,29 +500,11 @@ public:
}
/// NOTE: Assuming timezone offset is a multiple of 15 minutes.
inline Time toStartOfMinute(Time t) const { return roundDown(t, 60); }
inline Time toStartOfFiveMinute(Time t) const { return roundDown(t, 300); }
inline Time toStartOfFifteenMinutes(Time t) const { return roundDown(t, 900); }
inline Time toStartOfTenMinutes(Time t) const
{
if (t >= 0 && offset_is_whole_number_of_hours_during_epoch)
return t / 600 * 600;
/// More complex logic is for Nepal - it has offset 05:45. Australia/Eucla is also unfortunate.
Time date = find(t).date;
return date + (t - date) / 600 * 600;
}
/// NOTE: Assuming timezone transitions are multiple of hours. Lord Howe Island in Australia is a notable exception.
inline Time toStartOfHour(Time t) const
{
if (t >= 0 && offset_is_whole_number_of_hours_during_epoch)
return t / 3600 * 3600;
Time date = find(t).date;
return date + (t - date) / 3600 * 3600;
}
inline Time toStartOfMinute(Time t) const { return toStartOfMinuteInterval(t, 1); }
inline Time toStartOfFiveMinute(Time t) const { return toStartOfMinuteInterval(t, 5); }
inline Time toStartOfFifteenMinutes(Time t) const { return toStartOfMinuteInterval(t, 15); }
inline Time toStartOfTenMinutes(Time t) const { return toStartOfMinuteInterval(t, 10); }
inline Time toStartOfHour(Time t) const { return roundDown(t, 3600); }
/** Number of calendar day since the beginning of UNIX epoch (1970-01-01 is zero)
* We use just two bytes for it. It covers the range up to 2105 and slightly more.
@ -903,25 +902,24 @@ public:
inline Time toStartOfMinuteInterval(Time t, UInt64 minutes) const
{
if (minutes == 1)
return toStartOfMinute(t);
UInt64 divisor = 60 * minutes;
if (likely(offset_is_whole_number_of_minutes_during_epoch))
{
if (likely(t >= 0))
return t / divisor * divisor;
return (t + 1 - divisor) / divisor * divisor;
}
/** In contrast to "toStartOfHourInterval" function above,
* the minute intervals are not aligned to the midnight.
* You will get unexpected results if for example, you round down to 60 minute interval
* and there was a time shift to 30 minutes.
*
* But this is not specified in docs and can be changed in future.
*/
UInt64 seconds = 60 * minutes;
return roundDown(t, seconds);
Time date = find(t).date;
return date + (t - date) / divisor * divisor;
}
inline Time toStartOfSecondInterval(Time t, UInt64 seconds) const
{
if (seconds == 1)
return t;
if (seconds % 60 == 0)
return toStartOfMinuteInterval(t, seconds / 60);
return roundDown(t, seconds);
}
@ -955,7 +953,7 @@ public:
inline Time makeDateTime(Int16 year, UInt8 month, UInt8 day_of_month, UInt8 hour, UInt8 minute, UInt8 second) const
{
size_t index = makeLUTIndex(year, month, day_of_month);
UInt32 time_offset = hour * 3600 + minute * 60 + second;
Time time_offset = hour * 3600 + minute * 60 + second;
if (time_offset >= lut[index].time_at_offset_change())
time_offset -= lut[index].amount_of_offset_change();

View File

@ -1,4 +1,5 @@
#include <sys/auxv.h>
#include "atomic.h"
#include <unistd.h> // __environ
#include <errno.h>
@ -17,18 +18,7 @@ static size_t __find_auxv(unsigned long type)
return (size_t) -1;
}
__attribute__((constructor)) static void __auxv_init()
{
size_t i;
for (i = 0; __environ[i]; i++);
__auxv = (unsigned long *) (__environ + i + 1);
size_t secure_idx = __find_auxv(AT_SECURE);
if (secure_idx != ((size_t) -1))
__auxv_secure = __auxv[secure_idx];
}
unsigned long getauxval(unsigned long type)
unsigned long __getauxval(unsigned long type)
{
if (type == AT_SECURE)
return __auxv_secure;
@ -43,3 +33,38 @@ unsigned long getauxval(unsigned long type)
errno = ENOENT;
return 0;
}
static void * volatile getauxval_func;
static unsigned long __auxv_init(unsigned long type)
{
if (!__environ)
{
// __environ is not initialized yet so we can't initialize __auxv right now.
// That's normally occurred only when getauxval() is called from some sanitizer's internal code.
errno = ENOENT;
return 0;
}
// Initialize __auxv and __auxv_secure.
size_t i;
for (i = 0; __environ[i]; i++);
__auxv = (unsigned long *) (__environ + i + 1);
size_t secure_idx = __find_auxv(AT_SECURE);
if (secure_idx != ((size_t) -1))
__auxv_secure = __auxv[secure_idx];
// Now we've initialized __auxv, next time getauxval() will only call __get_auxval().
a_cas_p(&getauxval_func, (void *)__auxv_init, (void *)__getauxval);
return __getauxval(type);
}
// First time getauxval() will call __auxv_init().
static void * volatile getauxval_func = (void *)__auxv_init;
unsigned long getauxval(unsigned long type)
{
return ((unsigned long (*)(unsigned long))getauxval_func)(type);
}

View File

@ -296,7 +296,7 @@ void Pool::initialize()
Pool::Connection * Pool::allocConnection(bool dont_throw_if_failed_first_time)
{
std::unique_ptr<Connection> conn_ptr{new Connection};
std::unique_ptr conn_ptr = std::make_unique<Connection>();
try
{

View File

@ -26,17 +26,14 @@ target_include_directories(roaring SYSTEM BEFORE PUBLIC "${LIBRARY_DIR}/include"
target_include_directories(roaring SYSTEM BEFORE PUBLIC "${LIBRARY_DIR}/cpp")
# We redirect malloc/free family of functions to different functions that will track memory in ClickHouse.
# It will make this library depend on linking to 'clickhouse_common_io' library that is not done explicitly via 'target_link_libraries'.
# And we check that all libraries dependencies are satisfied and all symbols are resolved if we do build with shared libraries.
# That's why we enable it only in static build.
# Also note that we exploit implicit function declarations.
if (USE_STATIC_LIBRARIES)
target_compile_definitions(roaring PRIVATE
target_compile_definitions(roaring PRIVATE
-Dmalloc=clickhouse_malloc
-Dcalloc=clickhouse_calloc
-Drealloc=clickhouse_realloc
-Dreallocarray=clickhouse_reallocarray
-Dfree=clickhouse_free
-Dposix_memalign=clickhouse_posix_memalign)
endif ()
target_link_libraries(roaring PUBLIC clickhouse_common_io)

View File

@ -155,6 +155,10 @@ Normally ClickHouse is statically linked into a single static `clickhouse` binar
-DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1
```
Note that in this configuration there is no single `clickhouse` binary, and you have to run `clickhouse-server`, `clickhouse-client` etc.
Note that the split build has several drawbacks:
* There is no single `clickhouse` binary, and you have to run `clickhouse-server`, `clickhouse-client`, etc.
* Risk of segfault if you run any of the programs while rebuilding the project.
* You cannot run the integration tests since they only work a single complete binary.
* You can't easily copy the binaries elsewhere. Instead of moving a single binary you'll need to copy all binaries and libraries.
[Original article](https://clickhouse.tech/docs/en/development/build/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
toc_priority: 29
toc_title: MaterializedMySQL
toc_title: "[experimental] MaterializedMySQL"
---
# [experimental] MaterializedMySQL {#materialized-mysql}
@ -27,28 +27,33 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo
- `password` — User password.
**Engine Settings**
- `max_rows_in_buffer` — Max rows that data is allowed to cache in memory(for single table and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `65505`.
- `max_bytes_in_buffer` — Max bytes that data is allowed to cache in memory(for single table and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `1048576`.
- `max_rows_in_buffers` — Max rows that data is allowed to cache in memory(for database and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `65505`.
- `max_bytes_in_buffers` — Max bytes that data is allowed to cache in memory(for database and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `1048576`.
- `max_flush_data_time` — Max milliseconds that data is allowed to cache in memory(for database and the cache data unable to query). when this time is exceeded, the data will be materialized. Default: `1000`.
- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disable retry. Default: `1000`.
- `allows_query_when_mysql_lost` — Allow query materialized table when mysql is lost. Default: `0` (`false`).
```
- `max_rows_in_buffer` — Maximum number of rows that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`.
- `max_bytes_in_buffer` — Maximum number of bytes that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`.
- `max_rows_in_buffers` — Maximum number of rows that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`.
- `max_bytes_in_buffers` — Maximum number of bytes that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`.
- `max_flush_data_time` — Maximum number of milliseconds that data is allowed to cache in memory (for database and the cache data unable to query). When this time is exceeded, the data will be materialized. Default: `1000`.
- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disables retry. Default: `1000`.
- `allows_query_when_mysql_lost` — Allows to query a materialized table when MySQL is lost. Default: `0` (`false`).
```sql
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***')
SETTINGS
allows_query_when_mysql_lost=true,
max_wait_time_when_mysql_unavailable=10000;
```
**Settings on MySQL-server side**
**Settings on MySQL-server Side**
For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-side configuration settings that should be set:
For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-side configuration settings that must be set:
- `default_authentication_plugin = mysql_native_password` since `MaterializedMySQL` can only authorize with this method.
- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication. Pay attention that while turning this mode `On` you should also specify `enforce_gtid_consistency = on`.
- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication.
## Virtual columns {#virtual-columns}
!!! attention "Attention"
While turning on `gtid_mode` you should also specify `enforce_gtid_consistency = on`.
## Virtual Columns {#virtual-columns}
When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns.
@ -78,13 +83,13 @@ When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree](
| BLOB | [String](../../sql-reference/data-types/string.md) |
| BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) |
Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication.
[Nullable](../../sql-reference/data-types/nullable.md) is supported.
Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication.
## Specifics and Recommendations {#specifics-and-recommendations}
### Compatibility restrictions
### Compatibility Restrictions {#compatibility-restrictions}
Apart of the data types limitations there are few restrictions comparing to `MySQL` databases, that should be resolved before replication will be possible:

View File

@ -39,7 +39,10 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
...
INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2,
...
PROJECTION projection_name_1 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]),
PROJECTION projection_name_2 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY])
) ENGINE = MergeTree()
ORDER BY expr
[PARTITION BY expr]
@ -385,6 +388,24 @@ Functions with a constant argument that is less than ngram size cant be used
- `s != 1`
- `NOT startsWith(s, 'test')`
### Projections {#projections}
Projections are like materialized views but defined in part-level. It provides consistency guarantees along with automatic usage in queries.
#### Query {#projection-query}
A projection query is what defines a projection. It has the following grammar:
`SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]`
It implicitly selects data from the parent table.
#### Storage {#projection-storage}
Projections are stored inside the part directory. It's similar to an index but contains a subdirectory that stores an anonymous MergeTree table's part. The table is induced by the definition query of the projection. If there is a GROUP BY clause, the underlying storage engine becomes AggregatedMergeTree, and all aggregate functions are converted to AggregateFunction. If there is an ORDER BY clause, the MergeTree table will use it as its primary key expression. During the merge process, the projection part will be merged via its storage's merge routine. The checksum of the parent table's part will combine the projection's part. Other maintenance jobs are similar to skip indices.
#### Query Analysis {#projection-query-analysis}
1. Check if the projection can be used to answer the given query, that is, it generates the same answer as querying the base table.
2. Select the best feasible match, which contains the least granules to read.
3. The query pipeline which uses projections will be different from the one that uses the original parts. If the projection is absent in some parts, we can add the pipeline to "project" it on the fly.
## Concurrent Data Access {#concurrent-data-access}
For concurrent table access, we use multi-versioning. In other words, when a table is simultaneously read and updated, data is read from a set of parts that is current at the time of the query. There are no lengthy locks. Inserts do not get in the way of read operations.

View File

@ -892,6 +892,33 @@ If the table does not exist, ClickHouse will create it. If the structure of the
</query_thread_log>
```
## query_views_log {#server_configuration_parameters-query_views_log}
Setting for logging views dependant of queries received with the [log_query_views=1](../../operations/settings/settings.md#settings-log-query-views) setting.
Queries are logged in the [system.query_views_log](../../operations/system-tables/query_thread_log.md#system_tables-query_views_log) table, not in a separate file. You can change the name of the table in the `table` parameter (see below).
Use the following parameters to configure logging:
- `database` Name of the database.
- `table` Name of the system table the queries will be logged in.
- `partition_by` — [Custom partitioning key](../../engines/table-engines/mergetree-family/custom-partitioning-key.md) for a system table. Can't be used if `engine` defined.
- `engine` - [MergeTree Engine Definition](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) for a system table. Can't be used if `partition_by` defined.
- `flush_interval_milliseconds` Interval for flushing data from the buffer in memory to the table.
If the table does not exist, ClickHouse will create it. If the structure of the query views log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
**Example**
``` xml
<query_views_log>
<database>system</database>
<table>query_views_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_views_log>
```
## text_log {#server_configuration_parameters-text_log}
Settings for the [text_log](../../operations/system-tables/text_log.md#system_tables-text_log) system table for logging text messages.

View File

@ -890,7 +890,7 @@ log_queries_min_type='EXCEPTION_WHILE_PROCESSING'
Setting up query threads logging.
Queries threads runned by ClickHouse with this setup are logged according to the rules in the [query_thread_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_thread_log) server configuration parameter.
Queries threads run by ClickHouse with this setup are logged according to the rules in the [query_thread_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_thread_log) server configuration parameter.
Example:
@ -898,6 +898,19 @@ Example:
log_query_threads=1
```
## log_query_views {#settings-log-query-views}
Setting up query views logging.
When a query run by ClickHouse with this setup on has associated views (materialized or live views), they are logged in the [query_views_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_views_log) server configuration parameter.
Example:
``` text
log_query_views=1
```
## log_comment {#settings-log-comment}
Specifies the value for the `log_comment` field of the [system.query_log](../system-tables/query_log.md) table and comment text for the server log.

View File

@ -50,6 +50,7 @@ Columns:
- `query_kind` ([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md)) — Type of the query.
- `databases` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the databases present in the query.
- `tables` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the tables present in the query.
- `views` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the (materialized or live) views present in the query.
- `columns` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the columns present in the query.
- `projections` ([String](../../sql-reference/data-types/string.md)) — Names of the projections used during the query execution.
- `exception_code` ([Int32](../../sql-reference/data-types/int-uint.md)) — Code of an exception.
@ -180,5 +181,6 @@ used_table_functions: []
**See Also**
- [system.query_thread_log](../../operations/system-tables/query_thread_log.md#system_tables-query_thread_log) — This table contains information about each query execution thread.
- [system.query_views_log](../../operations/system-tables/query_views_log.md#system_tables-query_views_log) — This table contains information about each view executed during a query.
[Original article](https://clickhouse.tech/docs/en/operations/system-tables/query_log) <!--hide-->

View File

@ -112,5 +112,6 @@ ProfileEvents: {'Query':1,'SelectQuery':1,'ReadCompressedBytes':36,'Compr
**See Also**
- [system.query_log](../../operations/system-tables/query_log.md#system_tables-query_log) — Description of the `query_log` system table which contains common information about queries execution.
- [system.query_views_log](../../operations/system-tables/query_views_log.md#system_tables-query_views_log) — This table contains information about each view executed during a query.
[Original article](https://clickhouse.tech/docs/en/operations/system-tables/query_thread_log) <!--hide-->

View File

@ -0,0 +1,81 @@
# system.query_views_log {#system_tables-query_views_log}
Contains information about the dependent views executed when running a query, for example, the view type or the execution time.
To start logging:
1. Configure parameters in the [query_views_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_views_log) section.
2. Set [log_query_views](../../operations/settings/settings.md#settings-log-query-views) to 1.
The flushing period of data is set in `flush_interval_milliseconds` parameter of the [query_views_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_views_log) server settings section. To force flushing, use the [SYSTEM FLUSH LOGS](../../sql-reference/statements/system.md#query_language-system-flush_logs) query.
ClickHouse does not delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details.
Columns:
- `event_date` ([Date](../../sql-reference/data-types/date.md)) — The date when the last event of the view happened.
- `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — The date and time when the view finished execution.
- `event_time_microseconds` ([DateTime](../../sql-reference/data-types/datetime.md)) — The date and time when the view finished execution with microseconds precision.
- `view_duration_ms` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Duration of view execution (sum of its stages) in milliseconds.
- `initial_query_id` ([String](../../sql-reference/data-types/string.md)) — ID of the initial query (for distributed query execution).
- `view_name` ([String](../../sql-reference/data-types/string.md)) — Name of the view.
- `view_uuid` ([UUID](../../sql-reference/data-types/uuid.md)) — UUID of the view.
- `view_type` ([Enum8](../../sql-reference/data-types/enum.md)) — Type of the view. Values:
- `'Default' = 1` — [Default views](../../sql-reference/statements/create/view.md#normal). Should not appear in this log.
- `'Materialized' = 2` — [Materialized views](../../sql-reference/statements/create/view.md#materialized).
- `'Live' = 3` — [Live views](../../sql-reference/statements/create/view.md#live-view).
- `view_query` ([String](../../sql-reference/data-types/string.md)) — The query executed by the view.
- `view_target` ([String](../../sql-reference/data-types/string.md)) — The name of the view target table.
- `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of read rows.
- `read_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of read bytes.
- `written_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of written rows.
- `written_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of written bytes.
- `peak_memory_usage` ([Int64](../../sql-reference/data-types/int-uint.md)) — The maximum difference between the amount of allocated and freed memory in context of this view.
- `ProfileEvents` ([Map(String, UInt64)](../../sql-reference/data-types/array.md)) — ProfileEvents that measure different metrics. The description of them could be found in the table [system.events](../../operations/system-tables/events.md#system_tables-events).
- `status` ([Enum8](../../sql-reference/data-types/enum.md)) — Status of the view. Values:
- `'QueryStart' = 1` — Successful start the view execution. Should not appear.
- `'QueryFinish' = 2` — Successful end of the view execution.
- `'ExceptionBeforeStart' = 3` — Exception before the start of the view execution.
- `'ExceptionWhileProcessing' = 4` — Exception during the view execution.
- `exception_code` ([Int32](../../sql-reference/data-types/int-uint.md)) — Code of an exception.
- `exception` ([String](../../sql-reference/data-types/string.md)) — Exception message.
- `stack_trace` ([String](../../sql-reference/data-types/string.md)) — [Stack trace](https://en.wikipedia.org/wiki/Stack_trace). An empty string, if the query was completed successfully.
**Example**
``` sql
SELECT * FROM system.query_views_log LIMIT 1 \G
```
``` text
Row 1:
──────
event_date: 2021-06-22
event_time: 2021-06-22 13:23:07
event_time_microseconds: 2021-06-22 13:23:07.738221
view_duration_ms: 0
initial_query_id: c3a1ac02-9cad-479b-af54-9e9c0a7afd70
view_name: default.matview_inner
view_uuid: 00000000-0000-0000-0000-000000000000
view_type: Materialized
view_query: SELECT * FROM default.table_b
view_target: default.`.inner.matview_inner`
read_rows: 4
read_bytes: 64
written_rows: 2
written_bytes: 32
peak_memory_usage: 4196188
ProfileEvents: {'FileOpen':2,'WriteBufferFromFileDescriptorWrite':2,'WriteBufferFromFileDescriptorWriteBytes':187,'IOBufferAllocs':3,'IOBufferAllocBytes':3145773,'FunctionExecute':3,'DiskWriteElapsedMicroseconds':13,'InsertedRows':2,'InsertedBytes':16,'SelectedRows':4,'SelectedBytes':48,'ContextLock':16,'RWLockAcquiredReadLocks':1,'RealTimeMicroseconds':698,'SoftPageFaults':4,'OSReadChars':463}
status: QueryFinish
exception_code: 0
exception:
stack_trace:
```
**See Also**
- [system.query_log](../../operations/system-tables/query_log.md#system_tables-query_log) — Description of the `query_log` system table which contains common information about queries execution.
- [system.query_thread_log](../../operations/system-tables/query_thread_log.md#system_tables-query_thread_log) — This table contains information about each query execution thread.
[Original article](https://clickhouse.tech/docs/en/operations/system_tables/query_thread_log) <!--hide-->

View File

@ -15,6 +15,6 @@ When creating tables, numeric parameters for string fields can be set (e.g. `VAR
ClickHouse does not have the concept of encodings. Strings can contain an arbitrary set of bytes, which are stored and output as-is.
If you need to store texts, we recommend using UTF-8 encoding. At the very least, if your terminal uses UTF-8 (as recommended), you can read and write your values without making conversions.
Similarly, certain functions for working with strings have separate variations that work under the assumption that the string contains a set of bytes representing a UTF-8 encoded text.
For example, the length function calculates the string length in bytes, while the lengthUTF8 function calculates the string length in Unicode code points, assuming that the value is UTF-8 encoded.
For example, the [length](../functions/string-functions.md#length) function calculates the string length in bytes, while the [lengthUTF8](../functions/string-functions.md#lengthutf8) function calculates the string length in Unicode code points, assuming that the value is UTF-8 encoded.
[Original article](https://clickhouse.tech/docs/en/data_types/string/) <!--hide-->

View File

@ -7,19 +7,89 @@ toc_title: Arrays
## empty {#function-empty}
Returns 1 for an empty array, or 0 for a non-empty array.
The result type is UInt8.
The function also works for strings.
Checks whether the input array is empty.
Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT empty(arr) FROM table` transforms to `SELECT arr.size0 = 0 FROM TABLE`.
**Syntax**
``` sql
empty([x])
```
An array is considered empty if it does not contain any elements.
!!! note "Note"
Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT empty(arr) FROM TABLE;` transforms to `SELECT arr.size0 = 0 FROM TABLE;`.
The function also works for [strings](string-functions.md#empty) or [UUID](uuid-functions.md#empty).
**Arguments**
- `[x]` — Input array. [Array](../data-types/array.md).
**Returned value**
- Returns `1` for an empty array or `0` for a non-empty array.
Type: [UInt8](../data-types/int-uint.md).
**Example**
Query:
```sql
SELECT empty([]);
```
Result:
```text
┌─empty(array())─┐
│ 1 │
└────────────────┘
```
## notEmpty {#function-notempty}
Returns 0 for an empty array, or 1 for a non-empty array.
The result type is UInt8.
The function also works for strings.
Checks whether the input array is non-empty.
Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT notEmpty(arr) FROM table` transforms to `SELECT arr.size0 != 0 FROM TABLE`.
**Syntax**
``` sql
notEmpty([x])
```
An array is considered non-empty if it contains at least one element.
!!! note "Note"
Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT notEmpty(arr) FROM table` transforms to `SELECT arr.size0 != 0 FROM TABLE`.
The function also works for [strings](string-functions.md#notempty) or [UUID](uuid-functions.md#notempty).
**Arguments**
- `[x]` — Input array. [Array](../data-types/array.md).
**Returned value**
- Returns `1` for a non-empty array or `0` for an empty array.
Type: [UInt8](../data-types/int-uint.md).
**Example**
Query:
```sql
SELECT notEmpty([1,2]);
```
Result:
```text
┌─notEmpty([1, 2])─┐
│ 1 │
└──────────────────┘
```
## length {#array_functions-length}

View File

@ -10,17 +10,83 @@ toc_title: Strings
## empty {#empty}
Returns 1 for an empty string or 0 for a non-empty string.
The result type is UInt8.
Checks whether the input string is empty.
**Syntax**
``` sql
empty(x)
```
A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.
The function also works for arrays or UUID.
UUID is empty if it is all zeros (nil UUID).
The function also works for [arrays](array-functions.md#function-empty) or [UUID](uuid-functions.md#empty).
**Arguments**
- `x` — Input value. [String](../data-types/string.md).
**Returned value**
- Returns `1` for an empty string or `0` for a non-empty string.
Type: [UInt8](../data-types/int-uint.md).
**Example**
Query:
```sql
SELECT empty('');
```
Result:
```text
┌─empty('')─┐
│ 1 │
└───────────┘
```
## notEmpty {#notempty}
Returns 0 for an empty string or 1 for a non-empty string.
The result type is UInt8.
The function also works for arrays or UUID.
Checks whether the input string is non-empty.
**Syntax**
``` sql
notEmpty(x)
```
A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.
The function also works for [arrays](array-functions.md#function-notempty) or [UUID](uuid-functions.md#notempty).
**Arguments**
- `x` — Input value. [String](../data-types/string.md).
**Returned value**
- Returns `1` for a non-empty string or `0` for an empty string string.
Type: [UInt8](../data-types/int-uint.md).
**Example**
Query:
```sql
SELECT notEmpty('text');
```
Result:
```text
┌─notEmpty('text')─┐
│ 1 │
└──────────────────┘
```
## length {#length}
@ -43,6 +109,158 @@ The result type is UInt64.
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it does not throw an exception).
The result type is UInt64.
## leftPad {#leftpad}
Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL `LPAD` function.
**Syntax**
``` sql
leftPad('string', 'length'[, 'pad_string'])
```
**Arguments**
- `string` — Input string that needs to be padded. [String](../data-types/string.md).
- `length` — The length of the resulting string. [UInt](../data-types/int-uint.md). If the value is less than the input string length, then the input string is returned as-is.
- `pad_string` — The string to pad the input string with. [String](../data-types/string.md). Optional. If not specified, then the input string is padded with spaces.
**Returned value**
- The resulting string of the given length.
Type: [String](../data-types/string.md).
**Example**
Query:
``` sql
SELECT leftPad('abc', 7, '*'), leftPad('def', 7);
```
Result:
``` text
┌─leftPad('abc', 7, '*')─┬─leftPad('def', 7)─┐
****abc │ def │
└────────────────────────┴───────────────────┘
```
## leftPadUTF8 {#leftpadutf8}
Pads the current string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL `LPAD` function. While in the [leftPad](#leftpad) function the length is measured in bytes, here in the `leftPadUTF8` function it is measured in code points.
**Syntax**
``` sql
leftPadUTF8('string','length'[, 'pad_string'])
```
**Arguments**
- `string` — Input string that needs to be padded. [String](../data-types/string.md).
- `length` — The length of the resulting string. [UInt](../data-types/int-uint.md). If the value is less than the input string length, then the input string is returned as-is.
- `pad_string` — The string to pad the input string with. [String](../data-types/string.md). Optional. If not specified, then the input string is padded with spaces.
**Returned value**
- The resulting string of the given length.
Type: [String](../data-types/string.md).
**Example**
Query:
``` sql
SELECT leftPadUTF8('абвг', 7, '*'), leftPadUTF8('дежз', 7);
```
Result:
``` text
┌─leftPadUTF8('абвг', 7, '*')─┬─leftPadUTF8('дежз', 7)─┐
***абвг │ дежз │
└─────────────────────────────┴────────────────────────┘
```
## rightPad {#rightpad}
Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL `RPAD` function.
**Syntax**
``` sql
rightPad('string', 'length'[, 'pad_string'])
```
**Arguments**
- `string` — Input string that needs to be padded. [String](../data-types/string.md).
- `length` — The length of the resulting string. [UInt](../data-types/int-uint.md). If the value is less than the input string length, then the input string is returned as-is.
- `pad_string` — The string to pad the input string with. [String](../data-types/string.md). Optional. If not specified, then the input string is padded with spaces.
**Returned value**
- The resulting string of the given length.
Type: [String](../data-types/string.md).
**Example**
Query:
``` sql
SELECT rightPad('abc', 7, '*'), rightPad('abc', 7);
```
Result:
``` text
┌─rightPad('abc', 7, '*')─┬─rightPad('abc', 7)─┐
│ abc**** │ abc │
└─────────────────────────┴────────────────────┘
```
## rightPadUTF8 {#rightpadutf8}
Pads the current string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. Similarly to the MySQL `RPAD` function. While in the [rightPad](#rightpad) function the length is measured in bytes, here in the `rightPadUTF8` function it is measured in code points.
**Syntax**
``` sql
rightPadUTF8('string','length'[, 'pad_string'])
```
**Arguments**
- `string` — Input string that needs to be padded. [String](../data-types/string.md).
- `length` — The length of the resulting string. [UInt](../data-types/int-uint.md). If the value is less than the input string length, then the input string is returned as-is.
- `pad_string` — The string to pad the input string with. [String](../data-types/string.md). Optional. If not specified, then the input string is padded with spaces.
**Returned value**
- The resulting string of the given length.
Type: [String](../data-types/string.md).
**Example**
Query:
``` sql
SELECT rightPadUTF8('абвг', 7, '*'), rightPadUTF8('абвг', 7);
```
Result:
``` text
┌─rightPadUTF8('абвг', 7, '*')─┬─rightPadUTF8('абвг', 7)─┐
│ абвг*** │ абвг │
└──────────────────────────────┴─────────────────────────┘
```
## lower, lcase {#lower}
Converts ASCII Latin symbols in a string to lowercase.

View File

@ -9,7 +9,7 @@ The functions for working with UUID are listed below.
## generateUUIDv4 {#uuid-function-generate}
Generates the [UUID](../../sql-reference/data-types/uuid.md) of [version 4](https://tools.ietf.org/html/rfc4122#section-4.4).
Generates the [UUID](../data-types/uuid.md) of [version 4](https://tools.ietf.org/html/rfc4122#section-4.4).
``` sql
generateUUIDv4()
@ -37,6 +37,90 @@ SELECT * FROM t_uuid
└──────────────────────────────────────┘
```
## empty {#empty}
Checks whether the input UUID is empty.
**Syntax**
```sql
empty(UUID)
```
The UUID is considered empty if it contains all zeros (zero UUID).
The function also works for [arrays](array-functions.md#function-empty) or [strings](string-functions.md#empty).
**Arguments**
- `x` — Input UUID. [UUID](../data-types/uuid.md).
**Returned value**
- Returns `1` for an empty UUID or `0` for a non-empty UUID.
Type: [UInt8](../data-types/int-uint.md).
**Example**
To generate the UUID value, ClickHouse provides the [generateUUIDv4](#uuid-function-generate) function.
Query:
```sql
SELECT empty(generateUUIDv4());
```
Result:
```text
┌─empty(generateUUIDv4())─┐
│ 0 │
└─────────────────────────┘
```
## notEmpty {#notempty}
Checks whether the input UUID is non-empty.
**Syntax**
```sql
notEmpty(UUID)
```
The UUID is considered empty if it contains all zeros (zero UUID).
The function also works for [arrays](array-functions.md#function-notempty) or [strings](string-functions.md#notempty).
**Arguments**
- `x` — Input UUID. [UUID](../data-types/uuid.md).
**Returned value**
- Returns `1` for a non-empty UUID or `0` for an empty UUID.
Type: [UInt8](../data-types/int-uint.md).
**Example**
To generate the UUID value, ClickHouse provides the [generateUUIDv4](#uuid-function-generate) function.
Query:
```sql
SELECT notEmpty(generateUUIDv4());
```
Result:
```text
┌─notEmpty(generateUUIDv4())─┐
│ 1 │
└────────────────────────────┘
```
## toUUID (x) {#touuid-x}
Converts String type value to UUID type.

View File

@ -6,23 +6,55 @@ toc_title: DISTINCT
If `SELECT DISTINCT` is specified, only unique rows will remain in a query result. Thus only a single row will remain out of all the sets of fully matching rows in the result.
## Null Processing {#null-processing}
You can specify the list of columns that must have unique values: `SELECT DISTINCT ON (column1, column2,...)`. If the columns are not specified, all of them are taken into consideration.
`DISTINCT` works with [NULL](../../../sql-reference/syntax.md#null-literal) as if `NULL` were a specific value, and `NULL==NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` occur only once. It differs from `NULL` processing in most other contexts.
Consider the table:
## Alternatives {#alternatives}
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 2 │ 2 │ 2 │
│ 1 │ 1 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
It is possible to obtain the same result by applying [GROUP BY](../../../sql-reference/statements/select/group-by.md) across the same set of values as specified as `SELECT` clause, without using any aggregate functions. But there are few differences from `GROUP BY` approach:
Using `DISTINCT` without specifying columns:
- `DISTINCT` can be applied together with `GROUP BY`.
- When [ORDER BY](../../../sql-reference/statements/select/order-by.md) is omitted and [LIMIT](../../../sql-reference/statements/select/limit.md) is defined, the query stops running immediately after the required number of different rows has been read.
- Data blocks are output as they are processed, without waiting for the entire query to finish running.
```sql
SELECT DISTINCT * FROM t1;
```
## Examples {#examples}
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 1 │ 1 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
Using `DISTINCT` with specified columns:
```sql
SELECT DISTINCT ON (a,b) * FROM t1;
```
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
## DISTINCT and ORDER BY {#distinct-orderby}
ClickHouse supports using the `DISTINCT` and `ORDER BY` clauses for different columns in one query. The `DISTINCT` clause is executed before the `ORDER BY` clause.
Example table:
Consider the table:
``` text
┌─a─┬─b─┐
@ -33,7 +65,11 @@ Example table:
└───┴───┘
```
When selecting data with the `SELECT DISTINCT a FROM t1 ORDER BY b ASC` query, we get the following result:
Selecting data:
```sql
SELECT DISTINCT a FROM t1 ORDER BY b ASC;
```
``` text
┌─a─┐
@ -42,8 +78,11 @@ When selecting data with the `SELECT DISTINCT a FROM t1 ORDER BY b ASC` query, w
│ 3 │
└───┘
```
Selecting data with the different sorting direction:
If we change the sorting direction `SELECT DISTINCT a FROM t1 ORDER BY b DESC`, we get the following result:
```sql
SELECT DISTINCT a FROM t1 ORDER BY b DESC;
```
``` text
┌─a─┐
@ -56,3 +95,15 @@ If we change the sorting direction `SELECT DISTINCT a FROM t1 ORDER BY b DESC`,
Row `2, 4` was cut before sorting.
Take this implementation specificity into account when programming queries.
## Null Processing {#null-processing}
`DISTINCT` works with [NULL](../../../sql-reference/syntax.md#null-literal) as if `NULL` were a specific value, and `NULL==NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` occur only once. It differs from `NULL` processing in most other contexts.
## Alternatives {#alternatives}
It is possible to obtain the same result by applying [GROUP BY](../../../sql-reference/statements/select/group-by.md) across the same set of values as specified as `SELECT` clause, without using any aggregate functions. But there are few differences from `GROUP BY` approach:
- `DISTINCT` can be applied together with `GROUP BY`.
- When [ORDER BY](../../../sql-reference/statements/select/order-by.md) is omitted and [LIMIT](../../../sql-reference/statements/select/limit.md) is defined, the query stops running immediately after the required number of different rows has been read.
- Data blocks are output as they are processed, without waiting for the entire query to finish running.

View File

@ -13,7 +13,7 @@ toc_title: Overview
``` sql
[WITH expr_list|(subquery)]
SELECT [DISTINCT] expr_list
SELECT [DISTINCT [ON (column1, column2, ...)]] expr_list
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE sample_coeff]
[ARRAY JOIN ...]
@ -36,6 +36,8 @@ All clauses are optional, except for the required list of expressions immediatel
Specifics of each optional clause are covered in separate sections, which are listed in the same order as they are executed:
- [WITH clause](../../../sql-reference/statements/select/with.md)
- [SELECT clause](#select-clause)
- [DISTINCT clause](../../../sql-reference/statements/select/distinct.md)
- [FROM clause](../../../sql-reference/statements/select/from.md)
- [SAMPLE clause](../../../sql-reference/statements/select/sample.md)
- [JOIN clause](../../../sql-reference/statements/select/join.md)
@ -44,8 +46,6 @@ Specifics of each optional clause are covered in separate sections, which are li
- [GROUP BY clause](../../../sql-reference/statements/select/group-by.md)
- [LIMIT BY clause](../../../sql-reference/statements/select/limit-by.md)
- [HAVING clause](../../../sql-reference/statements/select/having.md)
- [SELECT clause](#select-clause)
- [DISTINCT clause](../../../sql-reference/statements/select/distinct.md)
- [LIMIT clause](../../../sql-reference/statements/select/limit.md)
- [OFFSET clause](../../../sql-reference/statements/select/offset.md)
- [UNION clause](../../../sql-reference/statements/select/union.md)

View File

@ -168,7 +168,13 @@ sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
cmake -D CMAKE_BUILD_TYPE=Debug ..
Вы можете изменить вариант сборки, выполнив эту команду в директории build.
В случае использования на разработческой машине старого HDD или SSD, а также при желании использовать меньше места для артефактов сборки можно использовать следующую команду:
```bash
cmake -DUSE_DEBUG_HELPERS=1 -DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1 ..
```
При этом надо учесть, что получаемые в результате сборки исполнимые файлы будут динамически слинкованы с библиотеками, и поэтому фактически станут непереносимыми на другие компьютеры (либо для этого нужно будет предпринять значительно больше усилий по сравнению со статической сборкой). Плюсом же в данном случае является значительно меньшее время сборки (это проявляется не на первой сборке, а на последующих, после внесения изменений в исходный код - тратится меньшее время на линковку по сравнению со статической сборкой) и значительно меньшее использование места на жёстком диске (экономия более, чем в 3 раза по сравнению со статической сборкой). Для целей разработки, когда планируются только отладочные запуски на том же компьютере, где осуществлялась сборка, это может быть наиболее удобным вариантом.
Вы можете изменить вариант сборки, выполнив новую команду в директории build.
Запустите ninja для сборки:

View File

@ -1,10 +1,12 @@
---
toc_priority: 29
toc_title: MaterializedMySQL
toc_title: "[experimental] MaterializedMySQL"
---
# [экспериментальный] MaterializedMySQL {#materialized-mysql}
**Это экспериментальный движок, который не следует использовать в продакшене.**
Создает базу данных ClickHouse со всеми таблицами, существующими в MySQL, и всеми данными в этих таблицах.
Сервер ClickHouse работает как реплика MySQL. Он читает файл binlog и выполняет DDL and DML-запросы.
@ -23,6 +25,32 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo
- `user` — пользователь MySQL.
- `password` — пароль пользователя.
**Настройки движка**
- `max_rows_in_buffer` — максимальное количество строк, содержимое которых может кешироваться в памяти (для одной таблицы и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `65 505`.
- `max_bytes_in_buffer` — максимальное количество байтов, которое разрешено кешировать в памяти (для одной таблицы и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `1 048 576`.
- `max_rows_in_buffers` — максимальное количество строк, содержимое которых может кешироваться в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `65 505`.
- `max_bytes_in_buffers` — максимальное количество байтов, которое разрешено кешировать данным в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `1 048 576`.
- `max_flush_data_time` — максимальное время в миллисекундах, в течение которого разрешено кешировать данные в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества указанного периода, данные будут материализованы. Значение по умолчанию: `1000`.
- `max_wait_time_when_mysql_unavailable` — интервал между повторными попытками, если MySQL недоступен. Указывается в миллисекундах. Отрицательное значение отключает повторные попытки. Значение по умолчанию: `1000`.
- `allows_query_when_mysql_lost` — признак, разрешен ли запрос к материализованной таблице при потере соединения с MySQL. Значение по умолчанию: `0` (`false`).
```sql
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***')
SETTINGS
allows_query_when_mysql_lost=true,
max_wait_time_when_mysql_unavailable=10000;
```
**Настройки на стороне MySQL-сервера**
Для правильной работы `MaterializedMySQL` следует обязательно указать на сервере MySQL следующие параметры конфигурации:
- `default_authentication_plugin = mysql_native_password``MaterializedMySQL` может авторизоваться только с помощью этого метода.
- `gtid_mode = on` — ведение журнала на основе GTID является обязательным для обеспечения правильной репликации.
!!! attention "Внимание"
При включении `gtid_mode` вы также должны указать `enforce_gtid_consistency = on`.
## Виртуальные столбцы {#virtual-columns}
При работе с движком баз данных `MaterializedMySQL` используются таблицы семейства [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) с виртуальными столбцами `_sign` и `_version`.
@ -51,13 +79,21 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo
| STRING | [String](../../sql-reference/data-types/string.md) |
| VARCHAR, VAR_STRING | [String](../../sql-reference/data-types/string.md) |
| BLOB | [String](../../sql-reference/data-types/string.md) |
Другие типы не поддерживаются. Если таблица MySQL содержит столбец другого типа, ClickHouse выдаст исключение "Неподдерживаемый тип данных" ("Unhandled data type") и остановит репликацию.
| BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) |
Тип [Nullable](../../sql-reference/data-types/nullable.md) поддерживается.
Другие типы не поддерживаются. Если таблица MySQL содержит столбец другого типа, ClickHouse выдаст исключение "Неподдерживаемый тип данных" ("Unhandled data type") и остановит репликацию.
## Особенности и рекомендации {#specifics-and-recommendations}
### Ограничения совместимости {#compatibility-restrictions}
Кроме ограничений на типы данных, существует несколько ограничений по сравнению с базами данных MySQL, которые следует решить до того, как станет возможной репликация:
- Каждая таблица в MySQL должна содержать `PRIMARY KEY`.
- Репликация для таблиц, содержащих строки со значениями полей `ENUM` вне диапазона значений (определяется размерностью `ENUM`), не будет работать.
### DDL-запросы {#ddl-queries}
DDL-запросы в MySQL конвертируются в соответствующие DDL-запросы в ClickHouse ([ALTER](../../sql-reference/statements/alter/index.md), [CREATE](../../sql-reference/statements/create/index.md), [DROP](../../sql-reference/statements/drop.md), [RENAME](../../sql-reference/statements/rename.md)). Если ClickHouse не может конвертировать какой-либо DDL-запрос, он его игнорирует.
@ -158,3 +194,4 @@ SELECT * FROM mysql.test;
└───┴─────┴──────┘
```
[Оригинальная статья](https://clickhouse.tech/docs/ru/engines/database-engines/materialized-mysql/) <!--hide-->

View File

@ -375,6 +375,24 @@ INDEX b (u64 * length(str), i32 + f64 * 100, date, str) TYPE set(100) GRANULARIT
- `s != 1`
- `NOT startsWith(s, 'test')`
### Проекции {#projections}
Проекции похожи на материализованные представления, но определяются на уровне партов. Это обеспечивает гарантии согласованности наряду с автоматическим использованием в запросах.
#### Запрос {#projection-query}
Запрос проекции — это то, что определяет проекцию. Он имеет следующую грамматику:
`SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]`
Он неявно выбирает данные из родительской таблицы.
#### Хранение {#projection-storage}
Проекции хранятся в каталоге парта. Это похоже на хранение индексов, но используется подкаталог, в котором хранится анонимный парт таблицы MergeTree. Таблица создается запросом определения проекции. Если есть конструкция GROUP BY, то базовый механизм хранения становится AggregatedMergeTree, а все агрегатные функции преобразуются в AggregateFunction. Если есть конструкция ORDER BY, таблица MergeTree будет использовать его в качестве выражения первичного ключа. Во время процесса слияния парт проекции будет слит с помощью процедуры слияния ее хранилища. Контрольная сумма парта родительской таблицы будет включать парт проекции. Другие процедуры аналогичны индексам пропуска данных.
#### Анализ запросов {#projection-query-analysis}
1. Проверить, можно ли использовать проекцию в данном запросе, то есть, что с ней выходит тот же результат, что и с запросом к базовой таблице.
2. Выбрать наиболее подходящее совпадение, содержащее наименьшее количество гранул для чтения.
3. План запроса, который использует проекции, будет отличаться от того, который использует исходные парты. При отсутствии проекции в некоторых партах можно расширить план, чтобы «проецировать» на лету.
## Конкурентный доступ к данным {#concurrent-data-access}
Для конкурентного доступа к таблице используется мультиверсионность. То есть, при одновременном чтении и обновлении таблицы, данные будут читаться из набора кусочков, актуального на момент запроса. Длинных блокировок нет. Вставки никак не мешают чтениям.

View File

@ -7,19 +7,89 @@ toc_title: "Массивы"
## empty {#function-empty}
Возвращает 1 для пустого массива, и 0 для непустого массива.
Тип результата - UInt8.
Функция также работает для строк.
Проверяет, является ли входной массив пустым.
Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT empty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 = 0 FROM TABLE`.
**Синтаксис**
``` sql
empty([x])
```
Массив считается пустым, если он не содержит ни одного элемента.
!!! note "Примечание"
Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT empty(arr) FROM TABLE` преобразуется к запросу `SELECT arr.size0 = 0 FROM TABLE`.
Функция также поддерживает работу с типами [String](string-functions.md#empty) и [UUID](uuid-functions.md#empty).
**Параметры**
- `[x]` — массив на входе функции. [Array](../data-types/array.md).
**Возвращаемое значение**
- Возвращает `1` для пустого массива или `0` — для непустого массива.
Тип: [UInt8](../data-types/int-uint.md).
**Пример**
Запрос:
```sql
SELECT empty([]);
```
Ответ:
```text
┌─empty(array())─┐
│ 1 │
└────────────────┘
```
## notEmpty {#function-notempty}
Возвращает 0 для пустого массива, и 1 для непустого массива.
Тип результата - UInt8.
Функция также работает для строк.
Проверяет, является ли входной массив непустым.
Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT notEmpty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 != 0 FROM TABLE`.
**Синтаксис**
``` sql
notEmpty([x])
```
Массив считается непустым, если он содержит хотя бы один элемент.
!!! note "Примечание"
Функцию можно оптимизировать, если включить настройку [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). При `optimize_functions_to_subcolumns = 1` функция читает только подстолбец [size0](../../sql-reference/data-types/array.md#array-size) вместо чтения и обработки всего столбца массива. Запрос `SELECT notEmpty(arr) FROM table` преобразуется к запросу `SELECT arr.size0 != 0 FROM TABLE`.
Функция также поддерживает работу с типами [String](string-functions.md#notempty) и [UUID](uuid-functions.md#notempty).
**Параметры**
- `[x]` — массив на входе функции. [Array](../data-types/array.md).
**Возвращаемое значение**
- Возвращает `1` для непустого массива или `0` — для пустого массива.
Тип: [UInt8](../data-types/int-uint.md).
**Пример**
Запрос:
```sql
SELECT notEmpty([1,2]);
```
Результат:
```text
┌─notEmpty([1, 2])─┐
│ 1 │
└──────────────────┘
```
## length {#array_functions-length}

View File

@ -7,16 +7,83 @@ toc_title: "Функции для работы со строками"
## empty {#empty}
Возвращает 1 для пустой строки, и 0 для непустой строки.
Тип результата — UInt8.
Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт.
Функция также работает для массивов.
Проверяет, является ли входная строка пустой.
**Синтаксис**
``` sql
empty(x)
```
Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт.
Функция также поддерживает работу с типами [Array](array-functions.md#function-empty) и [UUID](uuid-functions.md#empty).
**Параметры**
- `x` — Входная строка. [String](../data-types/string.md).
**Возвращаемое значение**
- Возвращает `1` для пустой строки и `0` — для непустой строки.
Тип: [UInt8](../data-types/int-uint.md).
**Пример**
Запрос:
```sql
SELECT notempty('text');
```
Результат:
```text
┌─empty('')─┐
│ 1 │
└───────────┘
```
## notEmpty {#notempty}
Возвращает 0 для пустой строки, и 1 для непустой строки.
Тип результата — UInt8.
Функция также работает для массивов.
Проверяет, является ли входная строка непустой.
**Синтаксис**
``` sql
notEmpty(x)
```
Строка считается непустой, если содержит хотя бы один байт, пусть даже это пробел или нулевой байт.
Функция также поддерживает работу с типами [Array](array-functions.md#function-notempty) и [UUID](uuid-functions.md#notempty).
**Параметры**
- `x` — Входная строка. [String](../data-types/string.md).
**Возвращаемое значение**
- Возвращает `1` для непустой строки и `0` — для пустой строки.
Тип: [UInt8](../data-types/int-uint.md).
**Пример**
Запрос:
```sql
SELECT notEmpty('text');
```
Результат:
```text
┌─notEmpty('text')─┐
│ 1 │
└──────────────────┘
```
## length {#length}
@ -39,6 +106,158 @@ toc_title: "Функции для работы со строками"
Возвращает длину строки в кодовых точках Unicode (не символах), при допущении, что строка содержит набор байтов, являющийся текстом в кодировке UTF-8. Если допущение не выполнено, возвращает какой-нибудь результат (не кидает исключение).
Тип результата — UInt64.
## leftPad {#leftpad}
Дополняет текущую строку слева пробелами или указанной строкой (несколько раз, если необходимо), пока результирующая строка не достигнет заданной длины. Соответствует MySQL функции `LPAD`.
**Синтаксис**
``` sql
leftPad('string', 'length'[, 'pad_string'])
```
**Параметры**
- `string` — входная строка, которую необходимо дополнить. [String](../data-types/string.md).
- `length` — длина результирующей строки. [UInt](../data-types/int-uint.md). Если указанное значение меньше, чем длина входной строки, то входная строка возвращается как есть.
- `pad_string` — строка, используемая для дополнения входной строки. [String](../data-types/string.md). Необязательный параметр. Если не указано, то входная строка дополняется пробелами.
**Возвращаемое значение**
- Результирующая строка заданной длины.
Type: [String](../data-types/string.md).
**Пример**
Запрос:
``` sql
SELECT leftPad('abc', 7, '*'), leftPad('def', 7);
```
Результат:
``` text
┌─leftPad('abc', 7, '*')─┬─leftPad('def', 7)─┐
****abc │ def │
└────────────────────────┴───────────────────┘
```
## leftPadUTF8 {#leftpadutf8}
Дополняет текущую строку слева пробелами или указанной строкой (несколько раз, если необходимо), пока результирующая строка не достигнет заданной длины. Соответствует MySQL функции `LPAD`. В отличие от функции [leftPad](#leftpad), измеряет длину строки не в байтах, а в кодовых точках Unicode.
**Синтаксис**
``` sql
leftPadUTF8('string','length'[, 'pad_string'])
```
**Параметры**
- `string` — входная строка, которую необходимо дополнить. [String](../data-types/string.md).
- `length` — длина результирующей строки. [UInt](../data-types/int-uint.md). Если указанное значение меньше, чем длина входной строки, то входная строка возвращается как есть.
- `pad_string` — строка, используемая для дополнения входной строки. [String](../data-types/string.md). Необязательный параметр. Если не указано, то входная строка дополняется пробелами.
**Возвращаемое значение**
- Результирующая строка заданной длины.
Type: [String](../data-types/string.md).
**Пример**
Запрос:
``` sql
SELECT leftPadUTF8('абвг', 7, '*'), leftPadUTF8('дежз', 7);
```
Результат:
``` text
┌─leftPadUTF8('абвг', 7, '*')─┬─leftPadUTF8('дежз', 7)─┐
***абвг │ дежз │
└─────────────────────────────┴────────────────────────┘
```
## rightPad {#rightpad}
Дополняет текущую строку справа пробелами или указанной строкой (несколько раз, если необходимо), пока результирующая строка не достигнет заданной длины. Соответствует MySQL функции `RPAD`.
**Синтаксис**
``` sql
rightPad('string', 'length'[, 'pad_string'])
```
**Параметры**
- `string` — входная строка, которую необходимо дополнить. [String](../data-types/string.md).
- `length` — длина результирующей строки. [UInt](../data-types/int-uint.md). Если указанное значение меньше, чем длина входной строки, то входная строка возвращается как есть.
- `pad_string` — строка, используемая для дополнения входной строки. [String](../data-types/string.md). Необязательный параметр. Если не указано, то входная строка дополняется пробелами.
**Возвращаемое значение**
- Результирующая строка заданной длины.
Type: [String](../data-types/string.md).
**Пример**
Запрос:
``` sql
SELECT rightPad('abc', 7, '*'), rightPad('abc', 7);
```
Результат:
``` text
┌─rightPad('abc', 7, '*')─┬─rightPad('abc', 7)─┐
│ abc**** │ abc │
└─────────────────────────┴────────────────────┘
```
## rightPadUTF8 {#rightpadutf8}
Дополняет текущую строку слева пробелами или указанной строкой (несколько раз, если необходимо), пока результирующая строка не достигнет заданной длины. Соответствует MySQL функции `RPAD`. В отличие от функции [rightPad](#rightpad), измеряет длину строки не в байтах, а в кодовых точках Unicode.
**Синтаксис**
``` sql
rightPadUTF8('string','length'[, 'pad_string'])
```
**Параметры**
- `string` — входная строка, которую необходимо дополнить. [String](../data-types/string.md).
- `length` — длина результирующей строки. [UInt](../data-types/int-uint.md). Если указанное значение меньше, чем длина входной строки, то входная строка возвращается как есть.
- `pad_string` — строка, используемая для дополнения входной строки. [String](../data-types/string.md). Необязательный параметр. Если не указано, то входная строка дополняется пробелами.
**Возвращаемое значение**
- Результирующая строка заданной длины.
Type: [String](../data-types/string.md).
**Пример**
Запрос:
``` sql
SELECT rightPadUTF8('абвг', 7, '*'), rightPadUTF8('абвг', 7);
```
Результат:
``` text
┌─rightPadUTF8('абвг', 7, '*')─┬─rightPadUTF8('абвг', 7)─┐
│ абвг*** │ абвг │
└──────────────────────────────┴─────────────────────────┘
```
## lower, lcase {#lower}
Переводит ASCII-символы латиницы в строке в нижний регистр.

View File

@ -35,6 +35,90 @@ SELECT * FROM t_uuid
└──────────────────────────────────────┘
```
## empty {#empty}
Проверяет, является ли входной UUID пустым.
**Синтаксис**
```sql
empty(UUID)
```
UUID считается пустым, если он содержит все нули (нулевой UUID).
Функция также поддерживает работу с типами [Array](array-functions.md#function-empty) и [String](string-functions.md#empty).
**Параметры**
- `x` — UUID на входе функции. [UUID](../data-types/uuid.md).
**Возвращаемое значение**
- Возвращает `1` для пустого UUID или `0` — для непустого UUID.
Тип: [UInt8](../data-types/int-uint.md).
**Пример**
Для генерации UUID-значений предназначена функция [generateUUIDv4](#uuid-function-generate).
Запрос:
```sql
SELECT empty(generateUUIDv4());
```
Ответ:
```text
┌─empty(generateUUIDv4())─┐
│ 0 │
└─────────────────────────┘
```
## notEmpty {#notempty}
Проверяет, является ли входной UUID непустым.
**Синтаксис**
```sql
notEmpty(UUID)
```
UUID считается пустым, если он содержит все нули (нулевой UUID).
Функция также поддерживает работу с типами [Array](array-functions.md#function-notempty) и [String](string-functions.md#function-notempty).
**Параметры**
- `x` — UUID на входе функции. [UUID](../data-types/uuid.md).
**Возвращаемое значение**
- Возвращает `1` для непустого UUID или `0` — для пустого UUID.
Тип: [UInt8](../data-types/int-uint.md).
**Пример**
Для генерации UUID-значений предназначена функция [generateUUIDv4](#uuid-function-generate).
Запрос:
```sql
SELECT notEmpty(generateUUIDv4());
```
Результат:
```text
┌─notEmpty(generateUUIDv4())─┐
│ 1 │
└────────────────────────────┘
```
## toUUID (x) {#touuid-x}
Преобразует значение типа String в тип UUID.

View File

@ -0,0 +1,23 @@
---
toc_priority: 49
toc_title: PROJECTION
---
# Манипуляции с проекциями {#manipulations-with-projections}
Доступны следующие операции:
- `ALTER TABLE [db].name ADD PROJECTION name AS SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]` — добавляет описание проекции в метаданные.
- `ALTER TABLE [db].name DROP PROJECTION name` — удаляет описание проекции из метаданных и удаляет файлы проекции с диска.
- `ALTER TABLE [db.]table MATERIALIZE PROJECTION name IN PARTITION partition_name` — перестраивает проекцию в указанной партиции. Реализовано как [мутация](../../../sql-reference/statements/alter/index.md#mutations).
- `ALTER TABLE [db.]table CLEAR PROJECTION name IN PARTITION partition_name` — удаляет файлы проекции с диска без удаления описания.
Комманды ADD, DROP и CLEAR — легковесны, поскольку они только меняют метаданные или удаляют файлы.
Также команды реплицируются, синхронизируя описания проекций в метаданных с помощью ZooKeeper.
!!! note "Note"
Манипуляции с проекциями поддерживаются только для таблиц с движком [`*MergeTree`](../../../engines/table-engines/mergetree-family/mergetree.md) (включая [replicated](../../../engines/table-engines/mergetree-family/replication.md) варианты).

View File

@ -6,19 +6,51 @@ toc_title: DISTINCT
Если указан `SELECT DISTINCT`, то в результате запроса останутся только уникальные строки. Таким образом, из всех наборов полностью совпадающих строк в результате останется только одна строка.
## Обработка NULL {#null-processing}
Вы можете указать столбцы, по которым хотите отбирать уникальные значения: `SELECT DISTINCT ON (column1, column2,...)`. Если столбцы не указаны, то отбираются строки, в которых значения уникальны во всех столбцах.
`DISTINCT` работает с [NULL](../../syntax.md#null-literal) как-будто `NULL` — обычное значение и `NULL==NULL`. Другими словами, в результате `DISTINCT`, различные комбинации с `NULL` встретятся только один раз. Это отличается от обработки `NULL` в большинстве других контекстов.
Рассмотрим таблицу:
## Альтернативы {#alternatives}
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 2 │ 2 │ 2 │
│ 1 │ 1 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
Такой же результат можно получить, применив секцию [GROUP BY](group-by.md) для того же набора значений, которые указан в секции `SELECT`, без использования каких-либо агрегатных функций. Но есть от `GROUP BY` несколько отличий:
Использование `DISTINCT` без указания столбцов:
- `DISTINCT` может применяться вместе с `GROUP BY`.
- Когда секция [ORDER BY](order-by.md) опущена, а секция [LIMIT](limit.md) присутствует, запрос прекращает выполнение сразу после считывания необходимого количества различных строк.
- Блоки данных выводятся по мере их обработки, не дожидаясь завершения выполнения всего запроса.
```sql
SELECT DISTINCT * FROM t1;
```
## Примеры {#examples}
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 1 │ 1 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
Использование `DISTINCT` с указанием столбцов:
```sql
SELECT DISTINCT ON (a,b) * FROM t1;
```
```text
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
```
## DISTINCT и ORDER BY {#distinct-orderby}
ClickHouse поддерживает использование секций `DISTINCT` и `ORDER BY` для разных столбцов в одном запросе. Секция `DISTINCT` выполняется до секции `ORDER BY`.
@ -56,3 +88,16 @@ ClickHouse поддерживает использование секций `DIS
Ряд `2, 4` был разрезан перед сортировкой.
Учитывайте эту специфику при разработке запросов.
## Обработка NULL {#null-processing}
`DISTINCT` работает с [NULL](../../syntax.md#null-literal) как-будто `NULL` — обычное значение и `NULL==NULL`. Другими словами, в результате `DISTINCT`, различные комбинации с `NULL` встретятся только один раз. Это отличается от обработки `NULL` в большинстве других контекстов.
## Альтернативы {#alternatives}
Можно получить такой же результат, применив [GROUP BY](group-by.md) для того же набора значений, которые указан в секции `SELECT`, без использования каких-либо агрегатных функций. Но есть несколько отличий от `GROUP BY`:
- `DISTINCT` может применяться вместе с `GROUP BY`.
- Когда секция [ORDER BY](order-by.md) опущена, а секция [LIMIT](limit.md) присутствует, запрос прекращает выполнение сразу после считывания необходимого количества различных строк.
- Блоки данных выводятся по мере их обработки, не дожидаясь завершения выполнения всего запроса.

View File

@ -11,7 +11,7 @@ toc_title: "Обзор"
``` sql
[WITH expr_list|(subquery)]
SELECT [DISTINCT] expr_list
SELECT [DISTINCT [ON (column1, column2, ...)]] expr_list
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE sample_coeff]
[ARRAY JOIN ...]
@ -34,6 +34,8 @@ SELECT [DISTINCT] expr_list
Особенности каждой необязательной секции рассматриваются в отдельных разделах, которые перечислены в том же порядке, в каком они выполняются:
- [Секция WITH](with.md)
- [Секция SELECT](#select-clause)
- [Секция DISTINCT](distinct.md)
- [Секция FROM](from.md)
- [Секция SAMPLE](sample.md)
- [Секция JOIN](join.md)
@ -42,8 +44,6 @@ SELECT [DISTINCT] expr_list
- [Секция GROUP BY](group-by.md)
- [Секция LIMIT BY](limit-by.md)
- [Секция HAVING](having.md)
- [Секция SELECT](#select-clause)
- [Секция DISTINCT](distinct.md)
- [Секция LIMIT](limit.md)
[Секция OFFSET](offset.md)
- [Секция UNION ALL](union.md)

View File

@ -7,7 +7,6 @@
#include <pcg-random/pcg_random.hpp>
#include <Common/randomSeed.h>
#include <Common/Stopwatch.h>
#include <Core/Field.h>
#include <Parsers/IAST.h>

View File

@ -97,7 +97,7 @@
#endif
#if USE_SSL
# if USE_INTERNAL_SSL_LIBRARY
# if USE_INTERNAL_SSL_LIBRARY && !defined(ARCADIA_BUILD)
# include <Compression/CompressionCodecEncrypted.h>
# endif
# include <Poco/Net/Context.h>
@ -126,6 +126,7 @@ namespace CurrentMetrics
extern const Metric VersionInteger;
extern const Metric MemoryTracking;
extern const Metric MaxDDLEntryID;
extern const Metric MaxPushedDDLEntryID;
}
namespace fs = std::filesystem;
@ -1468,7 +1469,8 @@ if (ThreadFuzzer::instance().isEffective())
if (pool_size < 1)
throw Exception("distributed_ddl.pool_size should be greater then 0", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
global_context->setDDLWorker(std::make_unique<DDLWorker>(pool_size, ddl_zookeeper_path, global_context, &config(),
"distributed_ddl", "DDLWorker", &CurrentMetrics::MaxDDLEntryID));
"distributed_ddl", "DDLWorker",
&CurrentMetrics::MaxDDLEntryID, &CurrentMetrics::MaxPushedDDLEntryID));
}
for (auto & server : *servers)

View File

@ -320,7 +320,7 @@
The amount of data in mapped files can be monitored
in system.metrics, system.metric_log by the MMappedFiles, MMappedFileBytes metrics
and in system.asynchronous_metrics, system.asynchronous_metrics_log by the MMapCacheCells metric,
and also in system.events, system.processes, system.query_log, system.query_thread_log by the
and also in system.events, system.processes, system.query_log, system.query_thread_log, system.query_views_log by the
CreatedReadBufferMMap, CreatedReadBufferMMapFailed, MMappedFileCacheHits, MMappedFileCacheMisses events.
Note that the amount of data in mapped files does not consume memory directly and is not accounted
in query or server memory usage - because this memory can be discarded similar to OS page cache.
@ -878,14 +878,23 @@
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_thread_log>
<!-- Query views log. Has information about all dependent views associated with a query.
Used only for queries with setting log_query_views = 1. -->
<query_views_log>
<database>system</database>
<table>query_views_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_views_log>
<!-- Uncomment if use part log.
Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads).
Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads).-->
<part_log>
<database>system</database>
<table>part_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</part_log>
-->
<!-- Uncomment to write text log into table.
Text log contains all information from usual server log but stores it in structured and efficient way.
@ -955,6 +964,7 @@
<flush_interval_milliseconds>1000</flush_interval_milliseconds>
</crash_log>
<!-- Parameters for embedded dictionaries, used in Yandex.Metrica.
See https://clickhouse.yandex/docs/en/dicts/internal_dicts/
-->

View File

@ -271,7 +271,7 @@ mark_cache_size: 5368709120
# The amount of data in mapped files can be monitored
# in system.metrics, system.metric_log by the MMappedFiles, MMappedFileBytes metrics
# and in system.asynchronous_metrics, system.asynchronous_metrics_log by the MMapCacheCells metric,
# and also in system.events, system.processes, system.query_log, system.query_thread_log by the
# and also in system.events, system.processes, system.query_log, system.query_thread_log, system.query_views_log by the
# CreatedReadBufferMMap, CreatedReadBufferMMapFailed, MMappedFileCacheHits, MMappedFileCacheMisses events.
# Note that the amount of data in mapped files does not consume memory directly and is not accounted
# in query or server memory usage - because this memory can be discarded similar to OS page cache.
@ -731,12 +731,21 @@ query_thread_log:
partition_by: toYYYYMM(event_date)
flush_interval_milliseconds: 7500
# Query views log. Has information about all dependent views associated with a query.
# Used only for queries with setting log_query_views = 1.
query_views_log:
database: system
table: query_views_log
partition_by: toYYYYMM(event_date)
flush_interval_milliseconds: 7500
# Uncomment if use part log.
# Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads).
# part_log:
# database: system
# table: part_log
# flush_interval_milliseconds: 7500
part_log:
database: system
table: part_log
partition_by: toYYYYMM(event_date)
flush_interval_milliseconds: 7500
# Uncomment to write text log into table.
# Text log contains all information from usual server log but stores it in structured and efficient way.

View File

@ -46,7 +46,6 @@ SRCS(
SettingsProfilesInfo.cpp
User.cpp
UsersConfigAccessStorage.cpp
tests/gtest_access_rights_ops.cpp
)

View File

@ -8,7 +8,7 @@ PEERDIR(
SRCS(
<? find . -name '*.cpp' | grep -v -F examples | sed 's/^\.\// /' | sort ?>
<? find . -name '*.cpp' | grep -v -F tests | grep -v -F examples | sed 's/^\.\// /' | sort ?>
)
END()

View File

@ -299,10 +299,11 @@ target_link_libraries(clickhouse_common_io
${ZLIB_LIBRARIES}
pcg_random
Poco::Foundation
roaring
)
# Make dbms depend on roaring instead of clickhouse_common_io so that roaring itself can depend on clickhouse_common_io
# That way we we can redirect malloc/free functions avoiding circular dependencies
dbms_target_link_libraries(PUBLIC roaring)
if (USE_RDKAFKA)
dbms_target_link_libraries(PRIVATE ${CPPKAFKA_LIBRARY} ${RDKAFKA_LIBRARY})

View File

@ -194,6 +194,7 @@ public:
const IColumnUnique & getDictionary() const { return dictionary.getColumnUnique(); }
IColumnUnique & getDictionary() { return dictionary.getColumnUnique(); }
const ColumnPtr & getDictionaryPtr() const { return dictionary.getColumnUniquePtr(); }
ColumnPtr & getDictionaryPtr() { return dictionary.getColumnUniquePtr(); }
/// IColumnUnique & getUnique() { return static_cast<IColumnUnique &>(*column_unique); }
/// ColumnPtr getUniquePtr() const { return column_unique; }

View File

@ -60,6 +60,7 @@
M(BrokenDistributedFilesToInsert, "Number of files for asynchronous insertion into Distributed tables that has been marked as broken. This metric will starts from 0 on start. Number of files for every shard is summed.") \
M(TablesToDropQueueSize, "Number of dropped tables, that are waiting for background data removal.") \
M(MaxDDLEntryID, "Max processed DDL entry of DDLWorker.") \
M(MaxPushedDDLEntryID, "Max DDL entry of DDLWorker that pushed to zookeeper.") \
M(PartsTemporary, "The part is generating now, it is not in data_parts list.") \
M(PartsPreCommitted, "The part is in data_parts, but not used for SELECTs.") \
M(PartsCommitted, "Active data part, used by current and upcoming SELECTs.") \

29
src/Common/DenseHashMap.h Normal file
View File

@ -0,0 +1,29 @@
#pragma once
#include <unordered_map>
/// DenseHashMap is a wrapper for google::dense_hash_map.
/// Some hacks are needed to make it work in "Arcadia".
/// "Arcadia" is a proprietary monorepository in Yandex.
/// It uses slightly changed version of sparsehash with a different set of hash functions (which we don't need).
/// Those defines are needed to make it compile.
#if defined(ARCADIA_BUILD)
#define HASH_FUN_H <unordered_map>
template <typename T>
struct THash;
#endif
#include <sparsehash/dense_hash_map>
#if !defined(ARCADIA_BUILD)
template <class Key, class T, class HashFcn = std::hash<Key>,
class EqualKey = std::equal_to<Key>,
class Alloc = google::libc_allocator_with_realloc<std::pair<const Key, T>>>
using DenseHashMap = google::dense_hash_map<Key, T, HashFcn, EqualKey, Alloc>;
#else
template <class Key, class T, class HashFcn = std::hash<Key>,
class EqualKey = std::equal_to<Key>,
class Alloc = google::sparsehash::libc_allocator_with_realloc<std::pair<const Key, T>>>
using DenseHashMap = google::sparsehash::dense_hash_map<Key, T, HashFcn, EqualKey, Alloc>;
#undef THash
#endif

25
src/Common/DenseHashSet.h Normal file
View File

@ -0,0 +1,25 @@
#pragma once
/// DenseHashSet is a wrapper for google::dense_hash_set.
/// See comment in DenseHashMap.h
#if defined(ARCADIA_BUILD)
#define HASH_FUN_H <unordered_map>
template <typename T>
struct THash;
#endif
#include <sparsehash/dense_hash_set>
#if !defined(ARCADIA_BUILD)
template <class Value, class HashFcn = std::hash<Value>,
class EqualKey = std::equal_to<Value>,
class Alloc = google::libc_allocator_with_realloc<Value>>
using DenseHashSet = google::dense_hash_set<Value, HashFcn, EqualKey, Alloc>;
#else
template <class Value, class HashFcn = std::hash<Value>,
class EqualKey = std::equal_to<Value>,
class Alloc = google::sparsehash::libc_allocator_with_realloc<Value>>
using DenseHashSet = google::sparsehash::dense_hash_set<Value, HashFcn, EqualKey, Alloc>;
#undef THash
#endif

View File

@ -94,6 +94,22 @@ std::string getExceptionStackTraceString(const std::exception & e)
#endif
}
std::string getExceptionStackTraceString(std::exception_ptr e)
{
try
{
std::rethrow_exception(e);
}
catch (const std::exception & exception)
{
return getExceptionStackTraceString(exception);
}
catch (...)
{
return {};
}
}
std::string Exception::getStackTraceString() const
{
@ -380,6 +396,30 @@ int getCurrentExceptionCode()
}
}
int getExceptionErrorCode(std::exception_ptr e)
{
try
{
std::rethrow_exception(e);
}
catch (const Exception & exception)
{
return exception.code();
}
catch (const Poco::Exception &)
{
return ErrorCodes::POCO_EXCEPTION;
}
catch (const std::exception &)
{
return ErrorCodes::STD_EXCEPTION;
}
catch (...)
{
return ErrorCodes::UNKNOWN_EXCEPTION;
}
}
void rethrowFirstException(const Exceptions & exceptions)
{

View File

@ -82,6 +82,7 @@ private:
std::string getExceptionStackTraceString(const std::exception & e);
std::string getExceptionStackTraceString(std::exception_ptr e);
/// Contains an additional member `saved_errno`. See the throwFromErrno function.
@ -167,6 +168,7 @@ std::string getCurrentExceptionMessage(bool with_stacktrace, bool check_embedded
/// Returns error code from ErrorCodes
int getCurrentExceptionCode();
int getExceptionErrorCode(std::exception_ptr e);
/// An execution status of any piece of code, contains return code and optional error

View File

@ -183,9 +183,6 @@ void MemoryTracker::allocImpl(Int64 size, bool throw_if_memory_exceeded)
std::bernoulli_distribution fault(fault_probability);
if (unlikely(fault_probability && fault(thread_local_rng)) && memoryTrackerCanThrow(level, true) && throw_if_memory_exceeded)
{
ProfileEvents::increment(ProfileEvents::QueryMemoryLimitExceeded);
amount.fetch_sub(size, std::memory_order_relaxed);
/// Prevent recursion. Exception::ctor -> std::string -> new[] -> MemoryTracker::alloc
BlockerInThread untrack_lock(VariableContext::Global);
@ -363,7 +360,7 @@ void MemoryTracker::setOrRaiseHardLimit(Int64 value)
{
/// This is just atomic set to maximum.
Int64 old_value = hard_limit.load(std::memory_order_relaxed);
while (old_value < value && !hard_limit.compare_exchange_weak(old_value, value))
while ((value == 0 || old_value < value) && !hard_limit.compare_exchange_weak(old_value, value))
;
}
@ -371,6 +368,6 @@ void MemoryTracker::setOrRaiseHardLimit(Int64 value)
void MemoryTracker::setOrRaiseProfilerLimit(Int64 value)
{
Int64 old_value = profiler_limit.load(std::memory_order_relaxed);
while (old_value < value && !profiler_limit.compare_exchange_weak(old_value, value))
while ((value == 0 || old_value < value) && !profiler_limit.compare_exchange_weak(old_value, value))
;
}

View File

@ -0,0 +1,25 @@
#pragma once
/// SparseHashMap is a wrapper for google::sparse_hash_map.
/// See comment in DenseHashMap.h
#if defined(ARCADIA_BUILD)
#define HASH_FUN_H <unordered_map>
template <typename T>
struct THash;
#endif
#include <sparsehash/sparse_hash_map>
#if !defined(ARCADIA_BUILD)
template <class Key, class T, class HashFcn = std::hash<Key>,
class EqualKey = std::equal_to<Key>,
class Alloc = google::libc_allocator_with_realloc<std::pair<const Key, T>>>
using SparseHashMap = google::sparse_hash_map<Key, T, HashFcn, EqualKey, Alloc>;
#else
template <class Key, class T, class HashFcn = std::hash<Key>,
class EqualKey = std::equal_to<Key>,
class Alloc = google::sparsehash::libc_allocator_with_realloc<std::pair<const Key, T>>>
using SparseHashMap = google::sparsehash::sparse_hash_map<Key, T, HashFcn, EqualKey, Alloc>;
#undef THash
#endif

View File

@ -149,7 +149,11 @@ ThreadStatus::~ThreadStatus()
if (deleter)
deleter();
current_thread = nullptr;
/// Only change current_thread if it's currently being used by this ThreadStatus
/// For example, PushingToViewsBlockOutputStream creates and deletes ThreadStatus instances while running in the main query thread
if (current_thread == this)
current_thread = nullptr;
}
void ThreadStatus::updatePerformanceCounters()

View File

@ -37,6 +37,8 @@ struct RUsageCounters;
struct PerfEventsCounters;
class TaskStatsInfoGetter;
class InternalTextLogsQueue;
struct ViewRuntimeData;
class QueryViewsLog;
using InternalTextLogsQueuePtr = std::shared_ptr<InternalTextLogsQueue>;
using InternalTextLogsQueueWeakPtr = std::weak_ptr<InternalTextLogsQueue>;
@ -143,6 +145,7 @@ protected:
Poco::Logger * log = nullptr;
friend class CurrentThread;
friend class PushingToViewsBlockOutputStream;
/// Use ptr not to add extra dependencies in the header
std::unique_ptr<RUsageCounters> last_rusage;
@ -151,6 +154,9 @@ protected:
/// Is used to send logs from logs_queue to client in case of fatal errors.
std::function<void()> fatal_error_callback;
/// It is used to avoid enabling the query profiler when you have multiple ThreadStatus in the same thread
bool query_profiled_enabled = true;
public:
ThreadStatus();
~ThreadStatus();
@ -210,9 +216,13 @@ public:
/// Update ProfileEvents and dumps info to system.query_thread_log
void finalizePerformanceCounters();
/// Set the counters last usage to now
void resetPerformanceCountersLastUsage();
/// Detaches thread from the thread group and the query, dumps performance counters if they have not been dumped
void detachQuery(bool exit_if_already_detached = false, bool thread_exits = false);
protected:
void applyQuerySettings();
@ -224,6 +234,8 @@ protected:
void logToQueryThreadLog(QueryThreadLog & thread_log, const String & current_database, std::chrono::time_point<std::chrono::system_clock> now);
void logToQueryViewsLog(const ViewRuntimeData & vinfo);
void assertState(const std::initializer_list<int> & permitted_states, const char * description = nullptr) const;

View File

@ -102,6 +102,7 @@ SRCS(
ZooKeeper/ZooKeeperNodeCache.cpp
checkStackSize.cpp
clearPasswordFromCommandLine.cpp
clickhouse_malloc.cpp
createHardLink.cpp
escapeForFileName.cpp
filesystemHelpers.cpp
@ -116,6 +117,7 @@ SRCS(
hex.cpp
isLocalAddress.cpp
malloc.cpp
memory.cpp
new_delete.cpp
parseAddress.cpp
parseGlobs.cpp

View File

@ -1,13 +1,15 @@
#include <Common/config.h>
#if !defined(ARCADIA_BUILD)
# include <Common/config.h>
#endif
#include <Compression/CompressionFactory.h>
#if USE_SSL && USE_INTERNAL_SSL_LIBRARY
#include <Compression/CompressionCodecEncrypted.h>
#include <Parsers/ASTLiteral.h>
#include <cassert>
#include <openssl/digest.h>
#include <openssl/digest.h> // Y_IGNORE
#include <openssl/err.h>
#include <openssl/hkdf.h>
#include <openssl/hkdf.h> // Y_IGNORE
#include <string_view>
namespace DB

View File

@ -2,11 +2,11 @@
// This depends on BoringSSL-specific API, notably <openssl/aead.h>.
#include <Common/config.h>
#if USE_SSL && USE_INTERNAL_SSL_LIBRARY
#if USE_SSL && USE_INTERNAL_SSL_LIBRARY && !defined(ARCADIA_BUILD)
#include <Compression/ICompressionCodec.h>
#include <boost/noncopyable.hpp>
#include <openssl/aead.h>
#include <openssl/aead.h> // Y_IGNORE
#include <optional>
namespace DB

View File

@ -1,6 +1,5 @@
#include <Coordination/KeeperStorageDispatcher.h>
#include <Common/setThreadName.h>
#include <Common/Stopwatch.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <future>
#include <chrono>

View File

@ -6,7 +6,7 @@
namespace DB
{
/** Common part for implementation of MySQLBlockInputStream, MongoDBBlockInputStream and others.
/** Common part for implementation of MySQLSource, MongoDBSource and others.
*/
struct ExternalResultDescription
{

View File

@ -6,7 +6,7 @@
#include <IO/WriteHelpers.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromString.h>
#include <sparsehash/dense_hash_map>
#include <Common/DenseHashMap.h>
namespace DB
@ -163,11 +163,7 @@ NamesAndTypesList NamesAndTypesList::filter(const Names & names) const
NamesAndTypesList NamesAndTypesList::addTypes(const Names & names) const
{
/// NOTE: It's better to make a map in `IStorage` than to create it here every time again.
#if !defined(ARCADIA_BUILD)
google::dense_hash_map<StringRef, const DataTypePtr *, StringRefHash> types;
#else
google::sparsehash::dense_hash_map<StringRef, const DataTypePtr *, StringRefHash> types;
#endif
DenseHashMap<StringRef, const DataTypePtr *, StringRefHash> types;
types.set_empty_key(StringRef());
for (const auto & column : *this)

View File

@ -173,7 +173,7 @@ class IColumn;
M(Bool, log_queries, 1, "Log requests and write the log to the system table.", 0) \
M(Bool, log_formatted_queries, 0, "Log formatted queries and write the log to the system table.", 0) \
M(LogQueriesType, log_queries_min_type, QueryLogElementType::QUERY_START, "Minimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING.", 0) \
M(Milliseconds, log_queries_min_query_duration_ms, 0, "Minimal time for the query to run, to get to the query_log/query_thread_log.", 0) \
M(Milliseconds, log_queries_min_query_duration_ms, 0, "Minimal time for the query to run, to get to the query_log/query_thread_log/query_views_log.", 0) \
M(UInt64, log_queries_cut_to_length, 100000, "If query length is greater than specified threshold (in bytes), then cut query when writing to query log. Also limit length of printed query in ordinary text log.", 0) \
\
M(DistributedProductMode, distributed_product_mode, DistributedProductMode::DENY, "How are distributed subqueries performed inside IN or JOIN sections?", IMPORTANT) \
@ -352,9 +352,10 @@ class IColumn;
M(UInt64, max_network_bandwidth_for_user, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running user queries. Zero means unlimited.", 0)\
M(UInt64, max_network_bandwidth_for_all_users, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running queries. Zero means unlimited.", 0) \
\
M(Bool, log_profile_events, true, "Log query performance statistics into the query_log and query_thread_log.", 0) \
M(Bool, log_profile_events, true, "Log query performance statistics into the query_log, query_thread_log and query_views_log.", 0) \
M(Bool, log_query_settings, true, "Log query settings into the query_log.", 0) \
M(Bool, log_query_threads, true, "Log query threads into system.query_thread_log table. This setting have effect only when 'log_queries' is true.", 0) \
M(Bool, log_query_views, true, "Log query dependent views into system.query_views_log table. This setting have effect only when 'log_queries' is true.", 0) \
M(String, log_comment, "", "Log comment into system.query_log table and server log. It can be set to arbitrary string no longer than max_query_size.", 0) \
M(LogsLevel, send_logs_level, LogsLevel::fatal, "Send server text logs with specified minimum level to client. Valid values: 'trace', 'debug', 'information', 'warning', 'error', 'fatal', 'none'", 0) \
M(Bool, enable_optimize_predicate_expression, 1, "If it is set to true, optimize predicates to subqueries.", 0) \
@ -527,6 +528,9 @@ class IColumn;
M(Bool, input_format_tsv_empty_as_default, false, "Treat empty fields in TSV input as default values.", 0) \
M(Bool, input_format_tsv_enum_as_number, false, "Treat inserted enum values in TSV formats as enum indices \\N", 0) \
M(Bool, input_format_null_as_default, true, "For text input formats initialize null fields with default values if data type of this field is not nullable", 0) \
M(Bool, input_format_arrow_import_nested, false, "Allow to insert array of structs into Nested table in Arrow input format.", 0) \
M(Bool, input_format_orc_import_nested, false, "Allow to insert array of structs into Nested table in ORC input format.", 0) \
M(Bool, input_format_parquet_import_nested, false, "Allow to insert array of structs into Nested table in Parquet input format.", 0) \
\
M(DateTimeInputFormat, date_time_input_format, FormatSettings::DateTimeInputFormat::Basic, "Method to read DateTime from text input formats. Possible values: 'basic' and 'best_effort'.", 0) \
M(DateTimeOutputFormat, date_time_output_format, FormatSettings::DateTimeOutputFormat::Simple, "Method to write DateTime to text output. Possible values: 'simple', 'iso', 'unix_timestamp'.", 0) \

View File

@ -3,7 +3,8 @@
#include <Poco/Timespan.h>
#include <common/types.h>
#include <DataStreams/SizeLimits.h>
#include <Common/Stopwatch.h>
class Stopwatch;
namespace DB
{

View File

@ -1,3 +1,5 @@
#include "MongoDBSource.h"
#include <string>
#include <vector>
@ -15,7 +17,6 @@
#include <Common/assert_cast.h>
#include <Common/quoteString.h>
#include <common/range.h>
#include <DataStreams/MongoDBBlockInputStream.h>
#include <Poco/URI.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Poco/Version.h>

View File

@ -1,4 +1,4 @@
#include "PostgreSQLBlockInputStream.h"
#include "PostgreSQLSource.h"
#if USE_LIBPQXX
#include <Columns/ColumnNullable.h>
@ -73,7 +73,7 @@ void PostgreSQLSource<T>::init(const Block & sample_block)
template<typename T>
void PostgreSQLSource<T>::onStart()
{
if (connection_holder)
if (!tx)
tx = std::make_shared<T>(connection_holder->get());
stream = std::make_unique<pqxx::stream_from>(*tx, pqxx::from_query, std::string_view(query_str));

View File

@ -76,19 +76,6 @@ public:
const Block & sample_block_,
const UInt64 max_block_size_)
: PostgreSQLSource<T>(tx_, query_str_, sample_block_, max_block_size_, false) {}
Chunk generate() override
{
if (!is_initialized)
{
Base::stream = std::make_unique<pqxx::stream_from>(*Base::tx, pqxx::from_query, std::string_view(Base::query_str));
is_initialized = true;
}
return Base::generate();
}
bool is_initialized = false;
};
}

View File

@ -1,24 +1,31 @@
#include <DataStreams/ConvertingBlockInputStream.h>
#include <DataStreams/MaterializingBlockInputStream.h>
#include <DataStreams/OneBlockInputStream.h>
#include <DataStreams/PushingToSinkBlockOutputStream.h>
#include <DataStreams/PushingToViewsBlockOutputStream.h>
#include <DataStreams/SquashingBlockInputStream.h>
#include <DataStreams/OneBlockInputStream.h>
#include <DataStreams/MaterializingBlockInputStream.h>
#include <DataStreams/copyData.h>
#include <DataTypes/NestedUtils.h>
#include <Interpreters/InterpreterSelectQuery.h>
#include <Interpreters/InterpreterInsertQuery.h>
#include <Interpreters/Context.h>
#include <Interpreters/InterpreterInsertQuery.h>
#include <Interpreters/InterpreterSelectQuery.h>
#include <Parsers/ASTInsertQuery.h>
#include <Common/CurrentThread.h>
#include <Common/setThreadName.h>
#include <Common/ThreadPool.h>
#include <Common/checkStackSize.h>
#include <Storages/MergeTree/ReplicatedMergeTreeSink.h>
#include <Storages/StorageValues.h>
#include <Storages/LiveView/StorageLiveView.h>
#include <Storages/MergeTree/ReplicatedMergeTreeSink.h>
#include <Storages/StorageMaterializedView.h>
#include <Storages/StorageValues.h>
#include <Common/CurrentThread.h>
#include <Common/MemoryTracker.h>
#include <Common/ThreadPool.h>
#include <Common/ThreadProfileEvents.h>
#include <Common/ThreadStatus.h>
#include <Common/checkStackSize.h>
#include <Common/setThreadName.h>
#include <common/logger_useful.h>
#include <DataStreams/PushingToSinkBlockOutputStream.h>
#include <common/scope_guard.h>
#include <atomic>
#include <chrono>
namespace DB
{
@ -79,9 +86,12 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
ASTPtr query;
BlockOutputStreamPtr out;
QueryViewsLogElement::ViewType type = QueryViewsLogElement::ViewType::DEFAULT;
String target_name = database_table.getFullTableName();
if (auto * materialized_view = dynamic_cast<StorageMaterializedView *>(dependent_table.get()))
{
type = QueryViewsLogElement::ViewType::MATERIALIZED;
addTableLock(
materialized_view->lockForShare(getContext()->getInitialQueryId(), getContext()->getSettingsRef().lock_acquire_timeout));
@ -89,6 +99,7 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
auto inner_table_id = inner_table->getStorageID();
auto inner_metadata_snapshot = inner_table->getInMemoryMetadataPtr();
query = dependent_metadata_snapshot->getSelectQuery().inner_query;
target_name = inner_table_id.getFullTableName();
std::unique_ptr<ASTInsertQuery> insert = std::make_unique<ASTInsertQuery>();
insert->table_id = inner_table_id;
@ -114,14 +125,57 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
BlockIO io = interpreter.execute();
out = io.out;
}
else if (dynamic_cast<const StorageLiveView *>(dependent_table.get()))
else if (const auto * live_view = dynamic_cast<const StorageLiveView *>(dependent_table.get()))
{
type = QueryViewsLogElement::ViewType::LIVE;
query = live_view->getInnerQuery(); // Used only to log in system.query_views_log
out = std::make_shared<PushingToViewsBlockOutputStream>(
dependent_table, dependent_metadata_snapshot, insert_context, ASTPtr(), true);
}
else
out = std::make_shared<PushingToViewsBlockOutputStream>(
dependent_table, dependent_metadata_snapshot, insert_context, ASTPtr());
views.emplace_back(ViewInfo{std::move(query), database_table, std::move(out), nullptr, 0 /* elapsed_ms */});
/// If the materialized view is executed outside of a query, for example as a result of SYSTEM FLUSH LOGS or
/// SYSTEM FLUSH DISTRIBUTED ..., we can't attach to any thread group and we won't log, so there is no point on collecting metrics
std::unique_ptr<ThreadStatus> thread_status = nullptr;
ThreadGroupStatusPtr running_group = current_thread && current_thread->getThreadGroup()
? current_thread->getThreadGroup()
: MainThreadStatus::getInstance().thread_group;
if (running_group)
{
/// We are creating a ThreadStatus per view to store its metrics individually
/// Since calling ThreadStatus() changes current_thread we save it and restore it after the calls
/// Later on, before doing any task related to a view, we'll switch to its ThreadStatus, do the work,
/// and switch back to the original thread_status.
auto * original_thread = current_thread;
SCOPE_EXIT({ current_thread = original_thread; });
thread_status = std::make_unique<ThreadStatus>();
/// Disable query profiler for this ThreadStatus since the running (main query) thread should already have one
/// If we didn't disable it, then we could end up with N + 1 (N = number of dependencies) profilers which means
/// N times more interruptions
thread_status->query_profiled_enabled = false;
thread_status->setupState(running_group);
}
QueryViewsLogElement::ViewRuntimeStats runtime_stats{
target_name,
type,
std::move(thread_status),
0,
std::chrono::system_clock::now(),
QueryViewsLogElement::ViewStatus::EXCEPTION_BEFORE_START};
views.emplace_back(ViewRuntimeData{std::move(query), database_table, std::move(out), nullptr, std::move(runtime_stats)});
/// Add the view to the query access info so it can appear in system.query_log
if (!no_destination)
{
getContext()->getQueryContext()->addQueryAccessInfo(
backQuoteIfNeed(database_table.getDatabaseName()), target_name, {}, "", database_table.getFullTableName());
}
}
/// Do not push to destination table if the flag is set
@ -136,7 +190,6 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
}
}
Block PushingToViewsBlockOutputStream::getHeader() const
{
/// If we don't write directly to the destination
@ -147,6 +200,39 @@ Block PushingToViewsBlockOutputStream::getHeader() const
return metadata_snapshot->getSampleBlockWithVirtuals(storage->getVirtuals());
}
/// Auxiliary function to do the setup and teardown to run a view individually and collect its metrics inside the view ThreadStatus
void inline runViewStage(ViewRuntimeData & view, const std::string & action, std::function<void()> stage)
{
Stopwatch watch;
auto * original_thread = current_thread;
SCOPE_EXIT({ current_thread = original_thread; });
if (view.runtime_stats.thread_status)
{
/// Change thread context to store individual metrics per view. Once the work in done, go back to the original thread
view.runtime_stats.thread_status->resetPerformanceCountersLastUsage();
current_thread = view.runtime_stats.thread_status.get();
}
try
{
stage();
}
catch (Exception & ex)
{
ex.addMessage(action + " " + view.table_id.getNameForLogs());
view.setException(std::current_exception());
}
catch (...)
{
view.setException(std::current_exception());
}
if (view.runtime_stats.thread_status)
view.runtime_stats.thread_status->updatePerformanceCounters();
view.runtime_stats.elapsed_ms += watch.elapsedMilliseconds();
}
void PushingToViewsBlockOutputStream::write(const Block & block)
{
@ -169,39 +255,34 @@ void PushingToViewsBlockOutputStream::write(const Block & block)
output->write(block);
}
/// Don't process materialized views if this block is duplicate
if (!getContext()->getSettingsRef().deduplicate_blocks_in_dependent_materialized_views && replicated_output && replicated_output->lastBlockIsDuplicate())
if (views.empty())
return;
// Insert data into materialized views only after successful insert into main table
/// Don't process materialized views if this block is duplicate
const Settings & settings = getContext()->getSettingsRef();
if (settings.parallel_view_processing && views.size() > 1)
if (!settings.deduplicate_blocks_in_dependent_materialized_views && replicated_output && replicated_output->lastBlockIsDuplicate())
return;
size_t max_threads = 1;
if (settings.parallel_view_processing)
max_threads = settings.max_threads ? std::min(static_cast<size_t>(settings.max_threads), views.size()) : views.size();
if (max_threads > 1)
{
// Push to views concurrently if enabled and more than one view is attached
ThreadPool pool(std::min(size_t(settings.max_threads), views.size()));
ThreadPool pool(max_threads);
for (auto & view : views)
{
auto thread_group = CurrentThread::getGroup();
pool.scheduleOrThrowOnError([=, &view, this]
{
pool.scheduleOrThrowOnError([&] {
setThreadName("PushingToViews");
if (thread_group)
CurrentThread::attachToIfDetached(thread_group);
process(block, view);
runViewStage(view, "while pushing to view", [&]() { process(block, view); });
});
}
// Wait for concurrent view processing
pool.wait();
}
else
{
// Process sequentially
for (auto & view : views)
{
process(block, view);
if (view.exception)
std::rethrow_exception(view.exception);
runViewStage(view, "while pushing to view", [&]() { process(block, view); });
}
}
}
@ -213,14 +294,11 @@ void PushingToViewsBlockOutputStream::writePrefix()
for (auto & view : views)
{
try
runViewStage(view, "while writing prefix to view", [&] { view.out->writePrefix(); });
if (view.exception)
{
view.out->writePrefix();
}
catch (Exception & ex)
{
ex.addMessage("while write prefix to view " + view.table_id.getNameForLogs());
throw;
logQueryViews();
std::rethrow_exception(view.exception);
}
}
}
@ -230,95 +308,82 @@ void PushingToViewsBlockOutputStream::writeSuffix()
if (output)
output->writeSuffix();
std::exception_ptr first_exception;
if (views.empty())
return;
const Settings & settings = getContext()->getSettingsRef();
bool parallel_processing = false;
auto process_suffix = [](ViewRuntimeData & view)
{
view.out->writeSuffix();
view.runtime_stats.setStatus(QueryViewsLogElement::ViewStatus::QUERY_FINISH);
};
static std::string stage_step = "while writing suffix to view";
/// Run writeSuffix() for views in separate thread pool.
/// In could have been done in PushingToViewsBlockOutputStream::process, however
/// it is not good if insert into main table fail but into view succeed.
if (settings.parallel_view_processing && views.size() > 1)
const Settings & settings = getContext()->getSettingsRef();
size_t max_threads = 1;
if (settings.parallel_view_processing)
max_threads = settings.max_threads ? std::min(static_cast<size_t>(settings.max_threads), views.size()) : views.size();
bool exception_happened = false;
if (max_threads > 1)
{
parallel_processing = true;
// Push to views concurrently if enabled and more than one view is attached
ThreadPool pool(std::min(size_t(settings.max_threads), views.size()));
auto thread_group = CurrentThread::getGroup();
ThreadPool pool(max_threads);
std::atomic_uint8_t exception_count = 0;
for (auto & view : views)
{
if (view.exception)
continue;
pool.scheduleOrThrowOnError([thread_group, &view, this]
{
exception_happened = true;
continue;
}
pool.scheduleOrThrowOnError([&] {
setThreadName("PushingToViews");
if (thread_group)
CurrentThread::attachToIfDetached(thread_group);
Stopwatch watch;
try
{
view.out->writeSuffix();
}
catch (...)
{
view.exception = std::current_exception();
}
view.elapsed_ms += watch.elapsedMilliseconds();
LOG_TRACE(log, "Pushing from {} to {} took {} ms.",
storage->getStorageID().getNameForLogs(),
view.table_id.getNameForLogs(),
view.elapsed_ms);
runViewStage(view, stage_step, [&] { process_suffix(view); });
if (view.exception)
exception_count.fetch_add(1, std::memory_order_relaxed);
});
}
// Wait for concurrent view processing
pool.wait();
exception_happened |= exception_count.load(std::memory_order_relaxed) != 0;
}
else
{
for (auto & view : views)
{
if (view.exception)
{
exception_happened = true;
continue;
}
runViewStage(view, stage_step, [&] { process_suffix(view); });
if (view.exception)
exception_happened = true;
}
}
for (auto & view : views)
{
if (view.exception)
{
if (!first_exception)
first_exception = view.exception;
continue;
}
if (parallel_processing)
continue;
Stopwatch watch;
try
{
view.out->writeSuffix();
}
catch (Exception & ex)
{
ex.addMessage("while write prefix to view " + view.table_id.getNameForLogs());
throw;
}
view.elapsed_ms += watch.elapsedMilliseconds();
LOG_TRACE(log, "Pushing from {} to {} took {} ms.",
storage->getStorageID().getNameForLogs(),
view.table_id.getNameForLogs(),
view.elapsed_ms);
if (!view.exception)
LOG_TRACE(
log,
"Pushing ({}) from {} to {} took {} ms.",
max_threads <= 1 ? "sequentially" : ("parallel " + std::to_string(max_threads)),
storage->getStorageID().getNameForLogs(),
view.table_id.getNameForLogs(),
view.runtime_stats.elapsed_ms);
}
if (first_exception)
std::rethrow_exception(first_exception);
if (exception_happened)
checkExceptionsInViews();
UInt64 milliseconds = main_watch.elapsedMilliseconds();
if (views.size() > 1)
{
LOG_DEBUG(log, "Pushing from {} to {} views took {} ms.",
storage->getStorageID().getNameForLogs(), views.size(),
milliseconds);
UInt64 milliseconds = main_watch.elapsedMilliseconds();
LOG_DEBUG(log, "Pushing from {} to {} views took {} ms.", storage->getStorageID().getNameForLogs(), views.size(), milliseconds);
}
logQueryViews();
}
void PushingToViewsBlockOutputStream::flush()
@ -330,70 +395,103 @@ void PushingToViewsBlockOutputStream::flush()
view.out->flush();
}
void PushingToViewsBlockOutputStream::process(const Block & block, ViewInfo & view)
void PushingToViewsBlockOutputStream::process(const Block & block, ViewRuntimeData & view)
{
Stopwatch watch;
BlockInputStreamPtr in;
try
/// We need keep InterpreterSelectQuery, until the processing will be finished, since:
///
/// - We copy Context inside InterpreterSelectQuery to support
/// modification of context (Settings) for subqueries
/// - InterpreterSelectQuery lives shorter than query pipeline.
/// It's used just to build the query pipeline and no longer needed
/// - ExpressionAnalyzer and then, Functions, that created in InterpreterSelectQuery,
/// **can** take a reference to Context from InterpreterSelectQuery
/// (the problem raises only when function uses context from the
/// execute*() method, like FunctionDictGet do)
/// - These objects live inside query pipeline (DataStreams) and the reference become dangling.
std::optional<InterpreterSelectQuery> select;
if (view.runtime_stats.type == QueryViewsLogElement::ViewType::MATERIALIZED)
{
BlockInputStreamPtr in;
/// We create a table with the same name as original table and the same alias columns,
/// but it will contain single block (that is INSERT-ed into main table).
/// InterpreterSelectQuery will do processing of alias columns.
/// We need keep InterpreterSelectQuery, until the processing will be finished, since:
///
/// - We copy Context inside InterpreterSelectQuery to support
/// modification of context (Settings) for subqueries
/// - InterpreterSelectQuery lives shorter than query pipeline.
/// It's used just to build the query pipeline and no longer needed
/// - ExpressionAnalyzer and then, Functions, that created in InterpreterSelectQuery,
/// **can** take a reference to Context from InterpreterSelectQuery
/// (the problem raises only when function uses context from the
/// execute*() method, like FunctionDictGet do)
/// - These objects live inside query pipeline (DataStreams) and the reference become dangling.
std::optional<InterpreterSelectQuery> select;
auto local_context = Context::createCopy(select_context);
local_context->addViewSource(
StorageValues::create(storage->getStorageID(), metadata_snapshot->getColumns(), block, storage->getVirtuals()));
select.emplace(view.query, local_context, SelectQueryOptions());
in = std::make_shared<MaterializingBlockInputStream>(select->execute().getInputStream());
if (view.query)
{
/// We create a table with the same name as original table and the same alias columns,
/// but it will contain single block (that is INSERT-ed into main table).
/// InterpreterSelectQuery will do processing of alias columns.
auto local_context = Context::createCopy(select_context);
local_context->addViewSource(
StorageValues::create(storage->getStorageID(), metadata_snapshot->getColumns(), block, storage->getVirtuals()));
select.emplace(view.query, local_context, SelectQueryOptions());
in = std::make_shared<MaterializingBlockInputStream>(select->execute().getInputStream());
/// Squashing is needed here because the materialized view query can generate a lot of blocks
/// even when only one block is inserted into the parent table (e.g. if the query is a GROUP BY
/// and two-level aggregation is triggered).
in = std::make_shared<SquashingBlockInputStream>(
in, getContext()->getSettingsRef().min_insert_block_size_rows, getContext()->getSettingsRef().min_insert_block_size_bytes);
in = std::make_shared<ConvertingBlockInputStream>(in, view.out->getHeader(), ConvertingBlockInputStream::MatchColumnsMode::Name);
}
else
in = std::make_shared<OneBlockInputStream>(block);
in->readPrefix();
while (Block result_block = in->read())
{
Nested::validateArraySizes(result_block);
view.out->write(result_block);
}
in->readSuffix();
/// Squashing is needed here because the materialized view query can generate a lot of blocks
/// even when only one block is inserted into the parent table (e.g. if the query is a GROUP BY
/// and two-level aggregation is triggered).
in = std::make_shared<SquashingBlockInputStream>(
in, getContext()->getSettingsRef().min_insert_block_size_rows, getContext()->getSettingsRef().min_insert_block_size_bytes);
in = std::make_shared<ConvertingBlockInputStream>(in, view.out->getHeader(), ConvertingBlockInputStream::MatchColumnsMode::Name);
}
catch (Exception & ex)
else
in = std::make_shared<OneBlockInputStream>(block);
in->setProgressCallback([this](const Progress & progress)
{
ex.addMessage("while pushing to view " + view.table_id.getNameForLogs());
view.exception = std::current_exception();
}
catch (...)
CurrentThread::updateProgressIn(progress);
this->onProgress(progress);
});
in->readPrefix();
while (Block result_block = in->read())
{
view.exception = std::current_exception();
Nested::validateArraySizes(result_block);
view.out->write(result_block);
}
view.elapsed_ms += watch.elapsedMilliseconds();
in->readSuffix();
}
void PushingToViewsBlockOutputStream::checkExceptionsInViews()
{
for (auto & view : views)
{
if (view.exception)
{
logQueryViews();
std::rethrow_exception(view.exception);
}
}
}
void PushingToViewsBlockOutputStream::logQueryViews()
{
const auto & settings = getContext()->getSettingsRef();
const UInt64 min_query_duration = settings.log_queries_min_query_duration_ms.totalMilliseconds();
const QueryViewsLogElement::ViewStatus min_status = settings.log_queries_min_type;
if (views.empty() || !settings.log_queries || !settings.log_query_views)
return;
for (auto & view : views)
{
if ((min_query_duration && view.runtime_stats.elapsed_ms <= min_query_duration) || (view.runtime_stats.event_status < min_status))
continue;
try
{
if (view.runtime_stats.thread_status)
view.runtime_stats.thread_status->logToQueryViewsLog(view);
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
}
void PushingToViewsBlockOutputStream::onProgress(const Progress & progress)
{
if (getContext()->getProgressCallback())
getContext()->getProgressCallback()(progress);
}
}

View File

@ -1,6 +1,7 @@
#pragma once
#include <DataStreams/IBlockOutputStream.h>
#include <Interpreters/QueryViewsLog.h>
#include <Parsers/IAST_fwd.h>
#include <Storages/IStorage.h>
#include <Common/Stopwatch.h>
@ -8,13 +9,28 @@
namespace Poco
{
class Logger;
};
}
namespace DB
{
class ReplicatedMergeTreeSink;
struct ViewRuntimeData
{
const ASTPtr query;
StorageID table_id;
BlockOutputStreamPtr out;
std::exception_ptr exception;
QueryViewsLogElement::ViewRuntimeStats runtime_stats;
void setException(std::exception_ptr e)
{
exception = e;
runtime_stats.setStatus(QueryViewsLogElement::ViewStatus::EXCEPTION_WHILE_PROCESSING);
}
};
/** Writes data to the specified table and to all dependent materialized views.
*/
class PushingToViewsBlockOutputStream : public IBlockOutputStream, WithContext
@ -33,6 +49,7 @@ public:
void flush() override;
void writePrefix() override;
void writeSuffix() override;
void onProgress(const Progress & progress) override;
private:
StoragePtr storage;
@ -44,20 +61,13 @@ private:
ASTPtr query_ptr;
Stopwatch main_watch;
struct ViewInfo
{
ASTPtr query;
StorageID table_id;
BlockOutputStreamPtr out;
std::exception_ptr exception;
UInt64 elapsed_ms = 0;
};
std::vector<ViewInfo> views;
std::vector<ViewRuntimeData> views;
ContextMutablePtr select_context;
ContextMutablePtr insert_context;
void process(const Block & block, ViewInfo & view);
void process(const Block & block, ViewRuntimeData & view);
void checkExceptionsInViews();
void logQueryViews();
};

View File

@ -1,4 +1,4 @@
#include "SQLiteBlockInputStream.h"
#include "SQLiteSource.h"
#if USE_SQLITE
#include <common/range.h>
@ -22,21 +22,18 @@ namespace ErrorCodes
extern const int SQLITE_ENGINE_ERROR;
}
SQLiteBlockInputStream::SQLiteBlockInputStream(
SQLitePtr sqlite_db_,
const String & query_str_,
const Block & sample_block,
const UInt64 max_block_size_)
: query_str(query_str_)
SQLiteSource::SQLiteSource(
SQLitePtr sqlite_db_,
const String & query_str_,
const Block & sample_block,
const UInt64 max_block_size_)
: SourceWithProgress(sample_block.cloneEmpty())
, query_str(query_str_)
, max_block_size(max_block_size_)
, sqlite_db(std::move(sqlite_db_))
{
description.init(sample_block);
}
void SQLiteBlockInputStream::readPrefix()
{
sqlite3_stmt * compiled_stmt = nullptr;
int status = sqlite3_prepare_v2(sqlite_db.get(), query_str.c_str(), query_str.size() + 1, &compiled_stmt, nullptr);
@ -48,11 +45,10 @@ void SQLiteBlockInputStream::readPrefix()
compiled_statement = std::unique_ptr<sqlite3_stmt, StatementDeleter>(compiled_stmt, StatementDeleter());
}
Block SQLiteBlockInputStream::readImpl()
Chunk SQLiteSource::generate()
{
if (!compiled_statement)
return Block();
return {};
MutableColumns columns = description.sample_block.cloneEmptyColumns();
size_t num_rows = 0;
@ -73,30 +69,30 @@ Block SQLiteBlockInputStream::readImpl()
else if (status != SQLITE_ROW)
{
throw Exception(ErrorCodes::SQLITE_ENGINE_ERROR,
"Expected SQLITE_ROW status, but got status {}. Error: {}, Message: {}",
status, sqlite3_errstr(status), sqlite3_errmsg(sqlite_db.get()));
"Expected SQLITE_ROW status, but got status {}. Error: {}, Message: {}",
status, sqlite3_errstr(status), sqlite3_errmsg(sqlite_db.get()));
}
int column_count = sqlite3_column_count(compiled_statement.get());
for (const auto idx : collections::range(0, column_count))
{
const auto & sample = description.sample_block.getByPosition(idx);
if (sqlite3_column_type(compiled_statement.get(), idx) == SQLITE_NULL)
for (int column_index = 0; column_index < column_count; ++column_index)
{
if (sqlite3_column_type(compiled_statement.get(), column_index) == SQLITE_NULL)
{
insertDefaultSQLiteValue(*columns[idx], *sample.column);
columns[column_index]->insertDefault();
continue;
}
if (description.types[idx].second)
auto & [type, is_nullable] = description.types[column_index];
if (is_nullable)
{
ColumnNullable & column_nullable = assert_cast<ColumnNullable &>(*columns[idx]);
insertValue(column_nullable.getNestedColumn(), description.types[idx].first, idx);
ColumnNullable & column_nullable = assert_cast<ColumnNullable &>(*columns[column_index]);
insertValue(column_nullable.getNestedColumn(), type, column_index);
column_nullable.getNullMapData().emplace_back(0);
}
else
{
insertValue(*columns[idx], description.types[idx].first, idx);
insertValue(*columns[column_index], type, column_index);
}
}
@ -104,18 +100,16 @@ Block SQLiteBlockInputStream::readImpl()
break;
}
return description.sample_block.cloneWithColumns(std::move(columns));
}
void SQLiteBlockInputStream::readSuffix()
{
if (compiled_statement)
if (num_rows == 0)
{
compiled_statement.reset();
return {};
}
return Chunk(std::move(columns), num_rows);
}
void SQLiteBlockInputStream::insertValue(IColumn & column, const ExternalResultDescription::ValueType type, size_t idx)
void SQLiteSource::insertValue(IColumn & column, ExternalResultDescription::ValueType type, size_t idx)
{
switch (type)
{

View File

@ -6,32 +6,28 @@
#if USE_SQLITE
#include <Core/ExternalResultDescription.h>
#include <DataStreams/IBlockInputStream.h>
#include <Processors/Sources/SourceWithProgress.h>
#include <sqlite3.h> // Y_IGNORE
namespace DB
{
class SQLiteBlockInputStream : public IBlockInputStream
class SQLiteSource : public SourceWithProgress
{
using SQLitePtr = std::shared_ptr<sqlite3>;
public:
SQLiteBlockInputStream(SQLitePtr sqlite_db_,
SQLiteSource(SQLitePtr sqlite_db_,
const String & query_str_,
const Block & sample_block,
UInt64 max_block_size_);
String getName() const override { return "SQLite"; }
Block getHeader() const override { return description.sample_block.cloneEmpty(); }
private:
void insertDefaultSQLiteValue(IColumn & column, const IColumn & sample_column)
{
column.insertFrom(sample_column, 0);
}
using ValueType = ExternalResultDescription::ValueType;
@ -40,19 +36,14 @@ private:
void operator()(sqlite3_stmt * stmt) { sqlite3_finalize(stmt); }
};
void readPrefix() override;
Chunk generate() override;
Block readImpl() override;
void readSuffix() override;
void insertValue(IColumn & column, const ExternalResultDescription::ValueType type, size_t idx);
void insertValue(IColumn & column, ExternalResultDescription::ValueType type, size_t idx);
String query_str;
UInt64 max_block_size;
ExternalResultDescription description;
SQLitePtr sqlite_db;
std::unique_ptr<sqlite3_stmt, StatementDeleter> compiled_statement;
};

View File

@ -29,7 +29,7 @@ SRCS(
ITTLAlgorithm.cpp
InternalTextLogsRowOutputStream.cpp
MaterializingBlockInputStream.cpp
MongoDBBlockInputStream.cpp
MongoDBSource.cpp
NativeBlockInputStream.cpp
NativeBlockOutputStream.cpp
PushingToViewsBlockOutputStream.cpp
@ -37,7 +37,7 @@ SRCS(
RemoteBlockOutputStream.cpp
RemoteQueryExecutor.cpp
RemoteQueryExecutorReadContext.cpp
SQLiteBlockInputStream.cpp
SQLiteSource.cpp
SizeLimits.cpp
SquashingBlockInputStream.cpp
SquashingBlockOutputStream.cpp

View File

@ -208,6 +208,18 @@ void validateArraySizes(const Block & block)
}
}
std::unordered_set<String> getAllTableNames(const Block & block)
{
std::unordered_set<String> nested_table_names;
for (auto & name : block.getNames())
{
auto nested_table_name = Nested::extractTableName(name);
if (!nested_table_name.empty())
nested_table_names.insert(nested_table_name);
}
return nested_table_names;
}
}
}

View File

@ -28,6 +28,9 @@ namespace Nested
/// Check that sizes of arrays - elements of nested data structures - are equal.
void validateArraySizes(const Block & block);
/// Get all nested tables names from a block.
std::unordered_set<String> getAllTableNames(const Block & block);
}
}

View File

@ -8,7 +8,6 @@
#include <Interpreters/executeQuery.h>
#include <Parsers/queryToString.h>
#include <Common/Exception.h>
#include <Common/Stopwatch.h>
#include <Common/ZooKeeper/KeeperException.h>
#include <Common/ZooKeeper/Types.h>
#include <Common/ZooKeeper/ZooKeeper.h>

View File

@ -11,7 +11,7 @@
# include <DataTypes/convertMySQLDataType.h>
# include <Databases/MySQL/DatabaseMySQL.h>
# include <Databases/MySQL/FetchTablesColumnsList.h>
# include <Formats/MySQLBlockInputStream.h>
# include <Formats/MySQLSource.h>
# include <Processors/Executors/PullingPipelineExecutor.h>
# include <Processors/QueryPipeline.h>
# include <IO/Operators.h>

View File

@ -10,7 +10,7 @@
#include <DataTypes/DataTypesNumber.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
#include <Processors/QueryPipeline.h>
#include <Formats/MySQLBlockInputStream.h>
#include <Formats/MySQLSource.h>
#include <IO/WriteBufferFromString.h>
#include <IO/WriteHelpers.h>
#include <IO/Operators.h>

View File

@ -5,7 +5,7 @@
#include <Core/Block.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypesNumber.h>
#include <Formats/MySQLBlockInputStream.h>
#include <Formats/MySQLSource.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
#include <Processors/QueryPipeline.h>
#include <IO/ReadBufferFromFile.h>

View File

@ -16,7 +16,7 @@
# include <DataStreams/copyData.h>
# include <Databases/MySQL/DatabaseMaterializedMySQL.h>
# include <Databases/MySQL/MaterializeMetadata.h>
# include <Formats/MySQLBlockInputStream.h>
# include <Formats/MySQLSource.h>
# include <IO/ReadBufferFromString.h>
# include <Interpreters/Context.h>
# include <Interpreters/executeQuery.h>

View File

@ -164,7 +164,7 @@ StoragePtr DatabasePostgreSQL::tryGetTable(const String & table_name, ContextPtr
}
StoragePtr DatabasePostgreSQL::fetchTable(const String & table_name, ContextPtr local_context, const bool table_checked) const
StoragePtr DatabasePostgreSQL::fetchTable(const String & table_name, ContextPtr, const bool table_checked) const
{
if (!cache_tables || !cached_tables.count(table_name))
{
@ -179,7 +179,7 @@ StoragePtr DatabasePostgreSQL::fetchTable(const String & table_name, ContextPtr
auto storage = StoragePostgreSQL::create(
StorageID(database_name, table_name), pool, table_name,
ColumnsDescription{*columns}, ConstraintsDescription{}, String{}, local_context, postgres_schema);
ColumnsDescription{*columns}, ConstraintsDescription{}, String{}, postgres_schema);
if (cache_tables)
cached_tables[table_name] = storage;

View File

@ -10,7 +10,7 @@
#include <Common/ProfileEvents.h>
#include <Common/ProfilingScopedRWLock.h>
#include <Dictionaries/DictionaryBlockInputStream.h>
#include <Dictionaries//DictionarySource.h>
#include <Dictionaries/HierarchyDictionariesUtils.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
@ -18,21 +18,21 @@
namespace ProfileEvents
{
extern const Event DictCacheKeysRequested;
extern const Event DictCacheKeysRequestedMiss;
extern const Event DictCacheKeysRequestedFound;
extern const Event DictCacheKeysExpired;
extern const Event DictCacheKeysNotFound;
extern const Event DictCacheKeysHit;
extern const Event DictCacheRequestTimeNs;
extern const Event DictCacheRequests;
extern const Event DictCacheLockWriteNs;
extern const Event DictCacheLockReadNs;
extern const Event DictCacheKeysRequested;
extern const Event DictCacheKeysRequestedMiss;
extern const Event DictCacheKeysRequestedFound;
extern const Event DictCacheKeysExpired;
extern const Event DictCacheKeysNotFound;
extern const Event DictCacheKeysHit;
extern const Event DictCacheRequestTimeNs;
extern const Event DictCacheRequests;
extern const Event DictCacheLockWriteNs;
extern const Event DictCacheLockReadNs;
}
namespace CurrentMetrics
{
extern const Metric DictCacheRequests;
extern const Metric DictCacheRequests;
}
namespace DB

View File

@ -36,10 +36,10 @@ void registerDictionarySourceCassandra(DictionarySourceFactory & factory)
#if USE_CASSANDRA
#include <IO/WriteHelpers.h>
#include <Common/SipHash.h>
#include "CassandraBlockInputStream.h"
#include <common/logger_useful.h>
#include <Common/SipHash.h>
#include <IO/WriteHelpers.h>
#include <Dictionaries/CassandraSource.h>
namespace DB
{
@ -49,7 +49,7 @@ namespace ErrorCodes
extern const int INVALID_CONFIG_PARAMETER;
}
CassandraSettings::CassandraSettings(
CassandraDictionarySource::Configuration::Configuration(
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix)
: host(config.getString(config_prefix + ".host"))
@ -66,7 +66,7 @@ CassandraSettings::CassandraSettings(
setConsistency(config.getString(config_prefix + ".consistency", "One"));
}
void CassandraSettings::setConsistency(const String & config_str)
void CassandraDictionarySource::Configuration::setConsistency(const String & config_str)
{
if (config_str == "One")
consistency = CASS_CONSISTENCY_ONE;
@ -96,19 +96,19 @@ static const size_t max_block_size = 8192;
CassandraDictionarySource::CassandraDictionarySource(
const DictionaryStructure & dict_struct_,
const CassandraSettings & settings_,
const Configuration & configuration_,
const Block & sample_block_)
: log(&Poco::Logger::get("CassandraDictionarySource"))
, dict_struct(dict_struct_)
, settings(settings_)
, configuration(configuration_)
, sample_block(sample_block_)
, query_builder(dict_struct, settings.db, "", settings.table, settings.where, IdentifierQuotingStyle::DoubleQuotes)
, query_builder(dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::DoubleQuotes)
{
cassandraCheck(cass_cluster_set_contact_points(cluster, settings.host.c_str()));
if (settings.port)
cassandraCheck(cass_cluster_set_port(cluster, settings.port));
cass_cluster_set_credentials(cluster, settings.user.c_str(), settings.password.c_str());
cassandraCheck(cass_cluster_set_consistency(cluster, settings.consistency));
cassandraCheck(cass_cluster_set_contact_points(cluster, configuration.host.c_str()));
if (configuration.port)
cassandraCheck(cass_cluster_set_port(cluster, configuration.port));
cass_cluster_set_credentials(cluster, configuration.user.c_str(), configuration.password.c_str());
cassandraCheck(cass_cluster_set_consistency(cluster, configuration.consistency));
}
CassandraDictionarySource::CassandraDictionarySource(
@ -118,14 +118,14 @@ CassandraDictionarySource::CassandraDictionarySource(
Block & sample_block_)
: CassandraDictionarySource(
dict_struct_,
CassandraSettings(config, config_prefix),
Configuration(config, config_prefix),
sample_block_)
{
}
void CassandraDictionarySource::maybeAllowFiltering(String & query) const
{
if (!settings.allow_filtering)
if (!configuration.allow_filtering)
return;
query.pop_back(); /// remove semicolon
query += " ALLOW FILTERING;";
@ -141,7 +141,7 @@ Pipe CassandraDictionarySource::loadAll()
std::string CassandraDictionarySource::toString() const
{
return "Cassandra: " + settings.db + '.' + settings.table;
return "Cassandra: " + configuration.db + '.' + configuration.table;
}
Pipe CassandraDictionarySource::loadIds(const std::vector<UInt64> & ids)
@ -162,7 +162,7 @@ Pipe CassandraDictionarySource::loadKeys(const Columns & key_columns, const std:
for (const auto & row : requested_rows)
{
SipHash partition_key;
for (size_t i = 0; i < settings.partition_key_prefix; ++i)
for (size_t i = 0; i < configuration.partition_key_prefix; ++i)
key_columns[i]->updateHashWithValue(row, partition_key);
partitions[partition_key.get64()].push_back(row);
}
@ -170,7 +170,7 @@ Pipe CassandraDictionarySource::loadKeys(const Columns & key_columns, const std:
Pipes pipes;
for (const auto & partition : partitions)
{
String query = query_builder.composeLoadKeysQuery(key_columns, partition.second, ExternalQueryBuilder::CASSANDRA_SEPARATE_PARTITION_KEY, settings.partition_key_prefix);
String query = query_builder.composeLoadKeysQuery(key_columns, partition.second, ExternalQueryBuilder::CASSANDRA_SEPARATE_PARTITION_KEY, configuration.partition_key_prefix);
maybeAllowFiltering(query);
LOG_INFO(log, "Loading keys for partition hash {} using query: {}", partition.first, query);
pipes.push_back(Pipe(std::make_shared<CassandraSource>(getSession(), query, sample_block, max_block_size)));

View File

@ -14,33 +14,35 @@
namespace DB
{
struct CassandraSettings
{
String host;
UInt16 port;
String user;
String password;
String db;
String table;
CassConsistency consistency;
bool allow_filtering;
/// TODO get information about key from the driver
size_t partition_key_prefix;
size_t max_threads;
String where;
CassandraSettings(const Poco::Util::AbstractConfiguration & config, const String & config_prefix);
void setConsistency(const String & config_str);
};
class CassandraDictionarySource final : public IDictionarySource
{
public:
struct Configuration
{
String host;
UInt16 port;
String user;
String password;
String db;
String table;
String query;
CassConsistency consistency;
bool allow_filtering;
/// TODO get information about key from the driver
size_t partition_key_prefix;
size_t max_threads;
String where;
Configuration(const Poco::Util::AbstractConfiguration & config, const String & config_prefix);
void setConsistency(const String & config_str);
};
CassandraDictionarySource(
const DictionaryStructure & dict_struct,
const CassandraSettings & settings_,
const Configuration & configuration,
const Block & sample_block);
CassandraDictionarySource(
@ -59,7 +61,7 @@ public:
DictionarySourcePtr clone() const override
{
return std::make_unique<CassandraDictionarySource>(dict_struct, settings, sample_block);
return std::make_unique<CassandraDictionarySource>(dict_struct, configuration, sample_block);
}
Pipe loadIds(const std::vector<UInt64> & ids) override;
@ -76,7 +78,7 @@ private:
Poco::Logger * log;
const DictionaryStructure dict_struct;
const CassandraSettings settings;
const Configuration configuration;
Block sample_block;
ExternalQueryBuilder query_builder;

View File

@ -10,7 +10,7 @@
#include <Columns/ColumnsNumber.h>
#include <Core/ExternalResultDescription.h>
#include <IO/ReadHelpers.h>
#include "CassandraBlockInputStream.h"
#include "CassandraSource.h"
namespace DB

View File

@ -67,7 +67,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(
: update_time{std::chrono::system_clock::from_time_t(0)}
, dict_struct{dict_struct_}
, configuration{configuration_}
, query_builder{dict_struct, configuration.db, "", configuration.table, configuration.where, IdentifierQuotingStyle::Backticks}
, query_builder{dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::Backticks}
, sample_block{sample_block_}
, context(Context::createCopy(context_))
, pool{createPool(configuration)}
@ -83,7 +83,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(const ClickHouseDictionar
, dict_struct{other.dict_struct}
, configuration{other.configuration}
, invalidate_query_response{other.invalidate_query_response}
, query_builder{dict_struct, configuration.db, "", configuration.table, configuration.where, IdentifierQuotingStyle::Backticks}
, query_builder{dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::Backticks}
, sample_block{other.sample_block}
, context(Context::createCopy(other.context))
, pool{createPool(configuration)}
@ -241,7 +241,8 @@ void registerDictionarySourceClickHouse(DictionarySourceFactory & factory)
.user = config.getString(settings_config_prefix + ".user", "default"),
.password = config.getString(settings_config_prefix + ".password", ""),
.db = config.getString(settings_config_prefix + ".db", default_database),
.table = config.getString(settings_config_prefix + ".table"),
.table = config.getString(settings_config_prefix + ".table", ""),
.query = config.getString(settings_config_prefix + ".query", ""),
.where = config.getString(settings_config_prefix + ".where", ""),
.invalidate_query = config.getString(settings_config_prefix + ".invalidate_query", ""),
.update_field = config.getString(settings_config_prefix + ".update_field", ""),

View File

@ -25,6 +25,7 @@ public:
const std::string password;
const std::string db;
const std::string table;
const std::string query;
const std::string where;
const std::string invalidate_query;
const std::string update_field;

View File

@ -648,6 +648,16 @@ static const PaddedPODArray<T> & getColumnVectorData(
}
}
template <typename T>
static ColumnPtr getColumnFromPODArray(const PaddedPODArray<T> & array)
{
auto column_vector = ColumnVector<T>::create();
column_vector->getData().reserve(array.size());
column_vector->getData().insert(array.begin(), array.end());
return column_vector;
}
}

View File

@ -1,4 +1,5 @@
#include "DictionaryBlockInputStream.h"
#include "DictionarySource.h"
#include <Dictionaries/DictionaryHelpers.h>
namespace DB
{
@ -12,7 +13,7 @@ DictionarySourceData::DictionarySourceData(
std::shared_ptr<const IDictionary> dictionary_, PaddedPODArray<UInt64> && ids_, const Names & column_names_)
: num_rows(ids_.size())
, dictionary(dictionary_)
, column_names(column_names_)
, column_names(column_names_.begin(), column_names_.end())
, ids(std::move(ids_))
, key_type(DictionaryInputStreamKeyType::Id)
{
@ -24,7 +25,7 @@ DictionarySourceData::DictionarySourceData(
const Names & column_names_)
: num_rows(keys.size())
, dictionary(dictionary_)
, column_names(column_names_)
, column_names(column_names_.begin(), column_names_.end())
, key_type(DictionaryInputStreamKeyType::ComplexKey)
{
const DictionaryStructure & dictionary_structure = dictionary->getStructure();
@ -39,7 +40,7 @@ DictionarySourceData::DictionarySourceData(
GetColumnsFunction && get_view_columns_function_)
: num_rows(data_columns_.front()->size())
, dictionary(dictionary_)
, column_names(column_names_)
, column_names(column_names_.begin(), column_names_.end())
, data_columns(data_columns_)
, get_key_columns_function(std::move(get_key_columns_function_))
, get_view_columns_function(std::move(get_view_columns_function_))
@ -102,8 +103,6 @@ Block DictionarySourceData::fillBlock(
const DataTypes & types,
ColumnsWithTypeAndName && view) const
{
std::unordered_set<std::string> names(column_names.begin(), column_names.end());
DataTypes data_types = types;
ColumnsWithTypeAndName block_columns;
@ -114,13 +113,13 @@ Block DictionarySourceData::fillBlock(
data_types.push_back(key.type);
for (const auto & column : view)
if (names.find(column.name) != names.end())
if (column_names.find(column.name) != column_names.end())
block_columns.push_back(column);
const DictionaryStructure & structure = dictionary->getStructure();
ColumnPtr ids_column = getColumnFromIds(ids_to_fill);
ColumnPtr ids_column = getColumnFromPODArray(ids_to_fill);
if (structure.id && names.find(structure.id->name) != names.end())
if (structure.id && column_names.find(structure.id->name) != column_names.end())
{
block_columns.emplace_back(ids_column, std::make_shared<DataTypeUInt64>(), structure.id->name);
}
@ -129,7 +128,7 @@ Block DictionarySourceData::fillBlock(
for (const auto & attribute : structure.attributes)
{
if (names.find(attribute.name) != names.end())
if (column_names.find(attribute.name) != column_names.end())
{
ColumnPtr column;
@ -159,13 +158,6 @@ Block DictionarySourceData::fillBlock(
return Block(block_columns);
}
ColumnPtr DictionarySourceData::getColumnFromIds(const PaddedPODArray<UInt64> & ids_to_fill)
{
auto column_vector = ColumnVector<UInt64>::create();
column_vector->getData().assign(ids_to_fill);
return column_vector;
}
void DictionarySourceData::fillKeyColumns(
const PaddedPODArray<StringRef> & keys,
size_t start,

View File

@ -7,19 +7,14 @@
#include <Columns/IColumn.h>
#include <Core/Names.h>
#include <DataTypes/DataTypesNumber.h>
#include <common/logger_useful.h>
#include "DictionaryBlockInputStreamBase.h"
#include "DictionaryStructure.h"
#include "IDictionary.h"
#include <Dictionaries/DictionaryStructure.h>
#include <Dictionaries/IDictionary.h>
#include <Dictionaries/DictionarySourceBase.h>
namespace DB
{
/// TODO: Remove this class
/* BlockInputStream implementation for external dictionaries
* read() returns blocks consisting of the in-memory contents of the dictionaries
*/
class DictionarySourceData
{
public:
@ -56,8 +51,6 @@ private:
const DataTypes & types,
ColumnsWithTypeAndName && view) const;
static ColumnPtr getColumnFromIds(const PaddedPODArray<UInt64> & ids_to_fill);
static void fillKeyColumns(
const PaddedPODArray<StringRef> & keys,
size_t start,
@ -67,7 +60,7 @@ private:
const size_t num_rows;
std::shared_ptr<const IDictionary> dictionary;
Names column_names;
std::unordered_set<std::string> column_names;
PaddedPODArray<UInt64> ids;
ColumnsWithTypeAndName key_columns;

View File

@ -1,4 +1,4 @@
#include "DictionaryBlockInputStreamBase.h"
#include "DictionarySourceBase.h"
namespace DB
{

View File

@ -21,10 +21,23 @@ ExternalQueryBuilder::ExternalQueryBuilder(
const std::string & db_,
const std::string & schema_,
const std::string & table_,
const std::string & query_,
const std::string & where_,
IdentifierQuotingStyle quoting_style_)
: dict_struct(dict_struct_), db(db_), schema(schema_), table(table_), where(where_), quoting_style(quoting_style_)
{}
: dict_struct(dict_struct_)
, db(db_)
, schema(schema_)
, table(table_)
, query(query_)
, where(where_)
, quoting_style(quoting_style_)
{
if (table.empty() && query.empty())
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Setting `table` or `query` must be non empty");
if (!query.empty() && (!table.empty() || !where.empty()))
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Setting `table` or `where` cannot be used with `query` parameter");
}
void ExternalQueryBuilder::writeQuoted(const std::string & s, WriteBuffer & out) const
@ -52,10 +65,17 @@ void ExternalQueryBuilder::writeQuoted(const std::string & s, WriteBuffer & out)
std::string ExternalQueryBuilder::composeLoadAllQuery() const
{
WriteBufferFromOwnString out;
composeLoadAllQuery(out);
writeChar(';', out);
return out.str();
if (query.empty())
{
WriteBufferFromOwnString out;
composeLoadAllQuery(out);
writeChar(';', out);
return out.str();
}
else
{
return query;
}
}
void ExternalQueryBuilder::composeLoadAllQuery(WriteBuffer & out) const
@ -152,74 +172,314 @@ void ExternalQueryBuilder::composeLoadAllQuery(WriteBuffer & out) const
std::string ExternalQueryBuilder::composeUpdateQuery(const std::string & update_field, const std::string & time_point) const
{
WriteBufferFromOwnString out;
composeLoadAllQuery(out);
if (!where.empty())
writeString(" AND ", out);
if (query.empty())
{
composeLoadAllQuery(out);
if (!where.empty())
writeString(" AND ", out);
else
writeString(" WHERE ", out);
composeUpdateCondition(update_field, time_point, out);
writeChar(';', out);
return out.str();
}
else
writeString(" WHERE ", out);
{
writeString(query, out);
writeString(update_field, out);
writeString(" >= '", out);
writeString(time_point, out);
writeChar('\'', out);
auto condition_position = query.find("{condition}");
if (condition_position == std::string::npos)
{
writeString(" WHERE ", out);
composeUpdateCondition(update_field, time_point, out);
writeString(";", out);
writeChar(';', out);
return out.str();
return out.str();
}
WriteBufferFromOwnString condition_value_buffer;
composeUpdateCondition(update_field, time_point, condition_value_buffer);
const auto & condition_value = condition_value_buffer.str();
auto query_copy = query;
query_copy.replace(condition_position, condition_value.size(), condition_value);
return query_copy;
}
}
std::string ExternalQueryBuilder::composeLoadIdsQuery(const std::vector<UInt64> & ids)
std::string ExternalQueryBuilder::composeLoadIdsQuery(const std::vector<UInt64> & ids) const
{
if (!dict_struct.id)
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Simple key required for method");
WriteBufferFromOwnString out;
writeString("SELECT ", out);
if (!dict_struct.id->expression.empty())
if (query.empty())
{
writeParenthesisedString(dict_struct.id->expression, out);
writeString(" AS ", out);
}
writeString("SELECT ", out);
writeQuoted(dict_struct.id->name, out);
for (const auto & attr : dict_struct.attributes)
{
writeString(", ", out);
if (!attr.expression.empty())
if (!dict_struct.id->expression.empty())
{
writeParenthesisedString(attr.expression, out);
writeParenthesisedString(dict_struct.id->expression, out);
writeString(" AS ", out);
}
writeQuoted(attr.name, out);
}
writeQuoted(dict_struct.id->name, out);
writeString(" FROM ", out);
if (!db.empty())
for (const auto & attr : dict_struct.attributes)
{
writeString(", ", out);
if (!attr.expression.empty())
{
writeParenthesisedString(attr.expression, out);
writeString(" AS ", out);
}
writeQuoted(attr.name, out);
}
writeString(" FROM ", out);
if (!db.empty())
{
writeQuoted(db, out);
writeChar('.', out);
}
if (!schema.empty())
{
writeQuoted(schema, out);
writeChar('.', out);
}
writeQuoted(table, out);
writeString(" WHERE ", out);
if (!where.empty())
{
writeString(where, out);
writeString(" AND ", out);
}
composeIdsCondition(ids, out);
writeString(";", out);
return out.str();
}
else
{
writeQuoted(db, out);
writeChar('.', out);
writeString(query, out);
auto condition_position = query.find("{condition}");
if (condition_position == std::string::npos)
{
writeString(" WHERE ", out);
composeIdsCondition(ids, out);
writeString(";", out);
return out.str();
}
WriteBufferFromOwnString condition_value_buffer;
composeIdsCondition(ids, condition_value_buffer);
const auto & condition_value = condition_value_buffer.str();
auto query_copy = query;
query_copy.replace(condition_position, condition_value.size(), condition_value);
return query_copy;
}
if (!schema.empty())
}
std::string ExternalQueryBuilder::composeLoadKeysQuery(
const Columns & key_columns, const std::vector<size_t> & requested_rows, LoadKeysMethod method, size_t partition_key_prefix) const
{
if (!dict_struct.key)
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Composite key required for method");
if (key_columns.size() != dict_struct.key->size())
throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of key_columns does not equal to the size of dictionary key");
WriteBufferFromOwnString out;
if (query.empty())
{
writeQuoted(schema, out);
writeChar('.', out);
writeString("SELECT ", out);
auto first = true;
for (const auto & key_or_attribute : boost::join(*dict_struct.key, dict_struct.attributes))
{
if (!first)
writeString(", ", out);
first = false;
if (!key_or_attribute.expression.empty())
{
writeParenthesisedString(key_or_attribute.expression, out);
writeString(" AS ", out);
}
writeQuoted(key_or_attribute.name, out);
}
writeString(" FROM ", out);
if (!db.empty())
{
writeQuoted(db, out);
writeChar('.', out);
}
if (!schema.empty())
{
writeQuoted(schema, out);
writeChar('.', out);
}
writeQuoted(table, out);
writeString(" WHERE ", out);
if (!where.empty())
{
if (method != CASSANDRA_SEPARATE_PARTITION_KEY)
writeString("(", out);
writeString(where, out);
if (method != CASSANDRA_SEPARATE_PARTITION_KEY)
writeString(") AND (", out);
else
writeString(" AND ", out);
}
composeKeysCondition(key_columns, requested_rows, method, partition_key_prefix, out);
writeString(";", out);
return out.str();
}
writeQuoted(table, out);
writeString(" WHERE ", out);
if (!where.empty())
else
{
writeString(where, out);
writeString(" AND ", out);
writeString(query, out);
auto condition_position = query.find("{condition}");
if (condition_position == std::string::npos)
{
writeString(" WHERE ", out);
composeKeysCondition(key_columns, requested_rows, method, partition_key_prefix, out);
writeString(";", out);
return out.str();
}
WriteBufferFromOwnString condition_value_buffer;
composeKeysCondition(key_columns, requested_rows, method, partition_key_prefix, condition_value_buffer);
const auto & condition_value = condition_value_buffer.str();
auto query_copy = query;
query_copy.replace(condition_position, condition_value.size(), condition_value);
return query_copy;
}
}
void ExternalQueryBuilder::composeKeyCondition(const Columns & key_columns, size_t row, WriteBuffer & out,
size_t beg, size_t end) const
{
auto first = true;
for (size_t i = beg; i < end; ++i)
{
if (!first)
writeString(" AND ", out);
first = false;
const auto & key_description = (*dict_struct.key)[i];
/// key_i=value_i
writeQuoted(key_description.name, out);
writeString("=", out);
key_description.type_serialization->serializeTextQuoted(*key_columns[i], row, out, format_settings);
}
}
void ExternalQueryBuilder::composeInWithTuples(const Columns & key_columns, const std::vector<size_t> & requested_rows,
WriteBuffer & out, size_t beg, size_t end) const
{
composeKeyTupleDefinition(out, beg, end);
writeString(" IN (", out);
bool first = true;
for (const auto row : requested_rows)
{
if (!first)
writeString(", ", out);
first = false;
composeKeyTuple(key_columns, row, out, beg, end);
}
writeString(")", out);
}
void ExternalQueryBuilder::composeKeyTupleDefinition(WriteBuffer & out, size_t beg, size_t end) const
{
if (!dict_struct.key)
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Composite key required for method");
writeChar('(', out);
auto first = true;
for (size_t i = beg; i < end; ++i)
{
if (!first)
writeString(", ", out);
first = false;
writeQuoted((*dict_struct.key)[i].name, out);
}
writeChar(')', out);
}
void ExternalQueryBuilder::composeKeyTuple(const Columns & key_columns, size_t row, WriteBuffer & out, size_t beg, size_t end) const
{
writeString("(", out);
auto first = true;
for (size_t i = beg; i < end; ++i)
{
if (!first)
writeString(", ", out);
first = false;
auto serialization = (*dict_struct.key)[i].type_serialization;
serialization->serializeTextQuoted(*key_columns[i], row, out, format_settings);
}
writeString(")", out);
}
void ExternalQueryBuilder::composeUpdateCondition(const std::string & update_field, const std::string & time_point, WriteBuffer & out)
{
writeString(update_field, out);
writeString(" >= '", out);
writeString(time_point, out);
writeChar('\'', out);
}
void ExternalQueryBuilder::composeIdsCondition(const std::vector<UInt64> & ids, WriteBuffer & out) const
{
writeQuoted(dict_struct.id->name, out);
writeString(" IN (", out);
@ -233,67 +493,12 @@ std::string ExternalQueryBuilder::composeLoadIdsQuery(const std::vector<UInt64>
writeString(DB::toString(id), out);
}
writeString(");", out);
return out.str();
writeString(")", out);
}
std::string ExternalQueryBuilder::composeLoadKeysQuery(
const Columns & key_columns, const std::vector<size_t> & requested_rows, LoadKeysMethod method, size_t partition_key_prefix)
void ExternalQueryBuilder::composeKeysCondition(const Columns & key_columns, const std::vector<size_t> & requested_rows, LoadKeysMethod method, size_t partition_key_prefix, WriteBuffer & out) const
{
if (!dict_struct.key)
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Composite key required for method");
if (key_columns.size() != dict_struct.key->size())
throw Exception(ErrorCodes::LOGICAL_ERROR, "The size of key_columns does not equal to the size of dictionary key");
WriteBufferFromOwnString out;
writeString("SELECT ", out);
auto first = true;
for (const auto & key_or_attribute : boost::join(*dict_struct.key, dict_struct.attributes))
{
if (!first)
writeString(", ", out);
first = false;
if (!key_or_attribute.expression.empty())
{
writeParenthesisedString(key_or_attribute.expression, out);
writeString(" AS ", out);
}
writeQuoted(key_or_attribute.name, out);
}
writeString(" FROM ", out);
if (!db.empty())
{
writeQuoted(db, out);
writeChar('.', out);
}
if (!schema.empty())
{
writeQuoted(schema, out);
writeChar('.', out);
}
writeQuoted(table, out);
writeString(" WHERE ", out);
if (!where.empty())
{
if (method != CASSANDRA_SEPARATE_PARTITION_KEY)
writeString("(", out);
writeString(where, out);
if (method != CASSANDRA_SEPARATE_PARTITION_KEY)
writeString(") AND (", out);
else
writeString(" AND ", out);
}
bool first = true;
if (method == AND_OR_CHAIN)
{
@ -334,92 +539,6 @@ std::string ExternalQueryBuilder::composeLoadKeysQuery(
{
writeString(")", out);
}
writeString(";", out);
return out.str();
}
void ExternalQueryBuilder::composeKeyCondition(const Columns & key_columns, const size_t row, WriteBuffer & out,
size_t beg, size_t end) const
{
auto first = true;
for (size_t i = beg; i < end; ++i)
{
if (!first)
writeString(" AND ", out);
first = false;
const auto & key_description = (*dict_struct.key)[i];
/// key_i=value_i
writeQuoted(key_description.name, out);
writeString("=", out);
key_description.type_serialization->serializeTextQuoted(*key_columns[i], row, out, format_settings);
}
}
void ExternalQueryBuilder::composeInWithTuples(const Columns & key_columns, const std::vector<size_t> & requested_rows,
WriteBuffer & out, size_t beg, size_t end)
{
composeKeyTupleDefinition(out, beg, end);
writeString(" IN (", out);
bool first = true;
for (const auto row : requested_rows)
{
if (!first)
writeString(", ", out);
first = false;
composeKeyTuple(key_columns, row, out, beg, end);
}
writeString(")", out);
}
void ExternalQueryBuilder::composeKeyTupleDefinition(WriteBuffer & out, size_t beg, size_t end) const
{
if (!dict_struct.key)
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Composite key required for method");
writeChar('(', out);
auto first = true;
for (size_t i = beg; i < end; ++i)
{
if (!first)
writeString(", ", out);
first = false;
writeQuoted((*dict_struct.key)[i].name, out);
}
writeChar(')', out);
}
void ExternalQueryBuilder::composeKeyTuple(const Columns & key_columns, const size_t row, WriteBuffer & out, size_t beg, size_t end) const
{
writeString("(", out);
auto first = true;
for (size_t i = beg; i < end; ++i)
{
if (!first)
writeString(", ", out);
first = false;
auto serialization = (*dict_struct.key)[i].type_serialization;
serialization->serializeTextQuoted(*key_columns[i], row, out, format_settings);
}
writeString(")", out);
}
}

View File

@ -21,6 +21,7 @@ struct ExternalQueryBuilder
const std::string db;
const std::string schema;
const std::string table;
const std::string query;
const std::string where;
IdentifierQuotingStyle quoting_style;
@ -31,6 +32,7 @@ struct ExternalQueryBuilder
const std::string & db_,
const std::string & schema_,
const std::string & table_,
const std::string & query_,
const std::string & where_,
IdentifierQuotingStyle quoting_style_);
@ -41,7 +43,7 @@ struct ExternalQueryBuilder
std::string composeUpdateQuery(const std::string & update_field, const std::string & time_point) const;
/** Generate a query to load data by set of UInt64 keys. */
std::string composeLoadIdsQuery(const std::vector<UInt64> & ids);
std::string composeLoadIdsQuery(const std::vector<UInt64> & ids) const;
/** Generate a query to load data by set of composite keys.
* There are three methods of specification of composite keys in WHERE:
@ -56,7 +58,7 @@ struct ExternalQueryBuilder
CASSANDRA_SEPARATE_PARTITION_KEY,
};
std::string composeLoadKeysQuery(const Columns & key_columns, const std::vector<size_t> & requested_rows, LoadKeysMethod method, size_t partition_key_prefix = 0);
std::string composeLoadKeysQuery(const Columns & key_columns, const std::vector<size_t> & requested_rows, LoadKeysMethod method, size_t partition_key_prefix = 0) const;
private:
@ -67,16 +69,25 @@ private:
/// In the following methods `beg` and `end` specifies which columns to write in expression
/// Expression in form (x = c1 AND y = c2 ...)
void composeKeyCondition(const Columns & key_columns, const size_t row, WriteBuffer & out, size_t beg, size_t end) const;
void composeKeyCondition(const Columns & key_columns, size_t row, WriteBuffer & out, size_t beg, size_t end) const;
/// Expression in form (x, y, ...) IN ((c1, c2, ...), ...)
void composeInWithTuples(const Columns & key_columns, const std::vector<size_t> & requested_rows, WriteBuffer & out, size_t beg, size_t end);
void composeInWithTuples(const Columns & key_columns, const std::vector<size_t> & requested_rows, WriteBuffer & out, size_t beg, size_t end) const;
/// Expression in form (x, y, ...)
void composeKeyTupleDefinition(WriteBuffer & out, size_t beg, size_t end) const;
/// Expression in form (c1, c2, ...)
void composeKeyTuple(const Columns & key_columns, const size_t row, WriteBuffer & out, size_t beg, size_t end) const;
void composeKeyTuple(const Columns & key_columns, size_t row, WriteBuffer & out, size_t beg, size_t end) const;
/// Compose update condition
static void composeUpdateCondition(const std::string & update_field, const std::string & time_point, WriteBuffer & out);
/// Compose ids condition
void composeIdsCondition(const std::vector<UInt64> & ids, WriteBuffer & out) const;
/// Compose keys condition
void composeKeysCondition(const Columns & key_columns, const std::vector<size_t> & requested_rows, LoadKeysMethod method, size_t partition_key_prefix, WriteBuffer & out) const;
/// Write string with specified quoting style.
void writeQuoted(const std::string & s, WriteBuffer & out) const;

View File

@ -13,7 +13,7 @@
#include <Processors/QueryPipeline.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
#include <Dictionaries/DictionaryBlockInputStream.h>
#include <Dictionaries//DictionarySource.h>
#include <Dictionaries/DictionaryFactory.h>
#include <Dictionaries/HierarchyDictionariesUtils.h>

View File

@ -6,7 +6,7 @@
#include <Columns/ColumnNullable.h>
#include <Functions/FunctionHelpers.h>
#include <Dictionaries/DictionaryBlockInputStream.h>
#include <Dictionaries//DictionarySource.h>
#include <Dictionaries/DictionaryFactory.h>
#include <Dictionaries/HierarchyDictionariesUtils.h>

View File

@ -5,7 +5,7 @@
#include <variant>
#include <optional>
#include <sparsehash/sparse_hash_map>
#include <Common/SparseHashMap.h>
#include <Common/HashTable/HashMap.h>
#include <Common/HashTable/HashSet.h>
@ -125,14 +125,6 @@ private:
HashMap<UInt64, Value>,
HashMapWithSavedHash<StringRef, Value, DefaultHash<StringRef>>>;
#if !defined(ARCADIA_BUILD)
template <typename Key, typename Value>
using SparseHashMap = google::sparse_hash_map<Key, Value, DefaultHash<Key>>;
#else
template <typename Key, typename Value>
using SparseHashMap = google::sparsehash::sparse_hash_map<Key, Value, DefaultHash<Key>>;
#endif
template <typename Value>
using CollectionTypeSparse = std::conditional_t<
dictionary_key_type == DictionaryKeyType::simple,

View File

@ -13,7 +13,7 @@
#include <common/itoa.h>
#include <common/map.h>
#include <common/range.h>
#include <Dictionaries/DictionaryBlockInputStream.h>
#include <Dictionaries/DictionarySource.h>
#include <Dictionaries/DictionaryFactory.h>
#include <Functions/FunctionHelpers.h>

View File

@ -50,7 +50,7 @@ void registerDictionarySourceMongoDB(DictionarySourceFactory & factory)
// Poco/MongoDB/BSONWriter.h:54: void writeCString(const std::string & value);
// src/IO/WriteHelpers.h:146 #define writeCString(s, buf)
#include <IO/WriteHelpers.h>
#include <DataStreams/MongoDBBlockInputStream.h>
#include <DataStreams/MongoDBSource.h>
namespace DB

View File

@ -22,6 +22,7 @@ static const size_t default_num_tries_on_connection_loss = 3;
namespace ErrorCodes
{
extern const int SUPPORT_IS_DISABLED;
extern const int UNSUPPORTED_METHOD;
}
void registerDictionarySourceMysql(DictionarySourceFactory & factory)
@ -41,11 +42,19 @@ void registerDictionarySourceMysql(DictionarySourceFactory & factory)
auto settings_config_prefix = config_prefix + ".mysql";
auto table = config.getString(settings_config_prefix + ".table", "");
auto where = config.getString(settings_config_prefix + ".where", "");
auto query = config.getString(settings_config_prefix + ".query", "");
if (query.empty() && table.empty())
throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "MySQL dictionary source configuration must contain table or query field");
MySQLDictionarySource::Configuration configuration
{
.db = config.getString(settings_config_prefix + ".db", ""),
.table = config.getString(settings_config_prefix + ".table"),
.where = config.getString(settings_config_prefix + ".where", ""),
.table = table,
.query = query,
.where = where,
.invalidate_query = config.getString(settings_config_prefix + ".invalidate_query", ""),
.update_field = config.getString(settings_config_prefix + ".update_field", ""),
.update_lag = config.getUInt64(settings_config_prefix + ".update_lag", 1),
@ -94,7 +103,7 @@ MySQLDictionarySource::MySQLDictionarySource(
, configuration(configuration_)
, pool(std::move(pool_))
, sample_block(sample_block_)
, query_builder(dict_struct, configuration.db, "", configuration.table, configuration.where, IdentifierQuotingStyle::Backticks)
, query_builder(dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::Backticks)
, load_all_query(query_builder.composeLoadAllQuery())
, settings(settings_)
{
@ -108,7 +117,7 @@ MySQLDictionarySource::MySQLDictionarySource(const MySQLDictionarySource & other
, configuration(other.configuration)
, pool(other.pool)
, sample_block(other.sample_block)
, query_builder{dict_struct, configuration.db, "", configuration.table, configuration.where, IdentifierQuotingStyle::Backticks}
, query_builder{dict_struct, configuration.db, "", configuration.table, configuration.query, configuration.where, IdentifierQuotingStyle::Backticks}
, load_all_query{other.load_all_query}
, last_modification{other.last_modification}
, invalidate_query_response{other.invalidate_query_response}
@ -128,7 +137,7 @@ std::string MySQLDictionarySource::getUpdateFieldAndDate()
else
{
update_time = std::chrono::system_clock::now();
return query_builder.composeLoadAllQuery();
return load_all_query;
}
}

View File

@ -12,7 +12,7 @@
# include "DictionaryStructure.h"
# include "ExternalQueryBuilder.h"
# include "IDictionarySource.h"
# include <Formats/MySQLBlockInputStream.h>
# include <Formats/MySQLSource.h>
namespace Poco
{
@ -35,6 +35,7 @@ public:
{
const std::string db;
const std::string table;
const std::string query;
const std::string where;
const std::string invalidate_query;
const std::string update_field;

View File

@ -3,14 +3,14 @@
#include <numeric>
#include <cmath>
#include "DictionaryBlockInputStream.h"
#include "DictionaryFactory.h"
#include <Columns/ColumnArray.h>
#include <Columns/ColumnTuple.h>
#include <DataTypes/DataTypeArray.h>
#include <Functions/FunctionHelpers.h>
#include <DataTypes/DataTypesDecimal.h>
#include <Dictionaries/DictionaryFactory.h>
#include <Dictionaries/DictionarySource.h>
namespace DB
{

View File

@ -7,7 +7,7 @@
#if USE_LIBPQXX
#include <Columns/ColumnString.h>
#include <DataTypes/DataTypeString.h>
#include <DataStreams/PostgreSQLBlockInputStream.h>
#include <DataStreams/PostgreSQLSource.h>
#include "readInvalidateQuery.h"
#include <Interpreters/Context.h>
#endif
@ -27,7 +27,7 @@ static const UInt64 max_block_size = 8192;
namespace
{
ExternalQueryBuilder makeExternalQueryBuilder(const DictionaryStructure & dict_struct, const String & schema, const String & table, const String & where)
ExternalQueryBuilder makeExternalQueryBuilder(const DictionaryStructure & dict_struct, const String & schema, const String & table, const String & query, const String & where)
{
auto schema_value = schema;
auto table_value = table;
@ -41,7 +41,7 @@ namespace
}
}
/// Do not need db because it is already in a connection string.
return {dict_struct, "", schema_value, table_value, where, IdentifierQuotingStyle::DoubleQuotes};
return {dict_struct, "", schema_value, table_value, query, where, IdentifierQuotingStyle::DoubleQuotes};
}
}
@ -56,7 +56,7 @@ PostgreSQLDictionarySource::PostgreSQLDictionarySource(
, pool(std::move(pool_))
, sample_block(sample_block_)
, log(&Poco::Logger::get("PostgreSQLDictionarySource"))
, query_builder(makeExternalQueryBuilder(dict_struct, configuration.schema, configuration.table, configuration.where))
, query_builder(makeExternalQueryBuilder(dict_struct, configuration.schema, configuration.table, configuration.query, configuration.where))
, load_all_query(query_builder.composeLoadAllQuery())
{
}
@ -69,7 +69,7 @@ PostgreSQLDictionarySource::PostgreSQLDictionarySource(const PostgreSQLDictionar
, pool(other.pool)
, sample_block(other.sample_block)
, log(&Poco::Logger::get("PostgreSQLDictionarySource"))
, query_builder(makeExternalQueryBuilder(dict_struct, configuration.schema, configuration.table, configuration.where))
, query_builder(makeExternalQueryBuilder(dict_struct, configuration.schema, configuration.table, configuration.query, configuration.where))
, load_all_query(query_builder.composeLoadAllQuery())
, update_time(other.update_time)
, invalidate_query_response(other.invalidate_query_response)
@ -198,6 +198,7 @@ void registerDictionarySourcePostgreSQL(DictionarySourceFactory & factory)
.db = config.getString(fmt::format("{}.db", settings_config_prefix), ""),
.schema = config.getString(fmt::format("{}.schema", settings_config_prefix), ""),
.table = config.getString(fmt::format("{}.table", settings_config_prefix), ""),
.query = config.getString(fmt::format("{}.query", settings_config_prefix), ""),
.where = config.getString(fmt::format("{}.where", settings_config_prefix), ""),
.invalidate_query = config.getString(fmt::format("{}.invalidate_query", settings_config_prefix), ""),
.update_field = config.getString(fmt::format("{}.update_field", settings_config_prefix), ""),

View File

@ -26,6 +26,7 @@ public:
const String db;
const String schema;
const String table;
const String query;
const String where;
const String invalidate_query;
const String update_field;

View File

@ -1,14 +1,14 @@
#pragma once
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypesNumber.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnVector.h>
#include <Columns/IColumn.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypesNumber.h>
#include <common/range.h>
#include "DictionaryBlockInputStreamBase.h"
#include "DictionaryStructure.h"
#include "IDictionary.h"
#include "RangeHashedDictionary.h"
#include <Dictionaries/DictionaryStructure.h>
#include <Dictionaries/IDictionary.h>
#include <Dictionaries/DictionarySourceBase.h>
#include <Dictionaries/DictionaryHelpers.h>
#include <Dictionaries/RangeHashedDictionary.h>
namespace DB
@ -31,8 +31,6 @@ public:
size_t getNumRows() const { return ids.size(); }
private:
template <typename T>
ColumnPtr getColumnFromPODArray(const PaddedPODArray<T> & array) const;
Block fillBlock(
const PaddedPODArray<Key> & ids_to_fill,
@ -86,17 +84,6 @@ Block RangeDictionarySourceData<RangeType>::getBlock(size_t start, size_t length
return fillBlock(block_ids, block_start_dates, block_end_dates);
}
template <typename RangeType>
template <typename T>
ColumnPtr RangeDictionarySourceData<RangeType>::getColumnFromPODArray(const PaddedPODArray<T> & array) const
{
auto column_vector = ColumnVector<T>::create();
column_vector->getData().reserve(array.size());
column_vector->getData().insert(array.begin(), array.end());
return column_vector;
}
template <typename RangeType>
PaddedPODArray<Int64> RangeDictionarySourceData<RangeType>::makeDateKey(
const PaddedPODArray<RangeType> & block_start_dates, const PaddedPODArray<RangeType> & block_end_dates) const

View File

@ -2,11 +2,11 @@
#include <Columns/ColumnNullable.h>
#include <Functions/FunctionHelpers.h>
#include <Common/TypeList.h>
#include <common/range.h>
#include "DictionaryFactory.h"
#include "RangeDictionaryBlockInputStream.h"
#include <Interpreters/castColumn.h>
#include <DataTypes/DataTypesDecimal.h>
#include <Dictionaries/DictionaryFactory.h>
#include <Dictionaries/RangeDictionarySource.h>
namespace
{

View File

@ -9,10 +9,10 @@
#include <Columns/ColumnString.h>
#include <Common/HashTable/HashMap.h>
#include <Common/HashTable/HashSet.h>
#include "DictionaryStructure.h"
#include "IDictionary.h"
#include "IDictionarySource.h"
#include "DictionaryHelpers.h"
#include <Dictionaries/DictionaryStructure.h>
#include <Dictionaries/IDictionary.h>
#include <Dictionaries/IDictionarySource.h>
#include <Dictionaries/DictionaryHelpers.h>
namespace DB
{

Some files were not shown because too many files have changed in this diff Show More