Merge remote-tracking branch 'origin/master' into rf/add-deltasum

This commit is contained in:
Russ Frank 2021-02-08 15:35:16 -05:00
commit 746dc1ddae
230 changed files with 6354 additions and 948 deletions

View File

@ -1,3 +1,148 @@
## ClickHouse release 21.2
### ClickHouse release v21.2.2.8-stable, 2021-02-07
#### Backward Incompatible Change
* Bitwise functions (`bitAnd`, `bitOr`, etc) are forbidden for floating point arguments. Now you have to do explicit cast to integer. [#19853](https://github.com/ClickHouse/ClickHouse/pull/19853) ([Azat Khuzhin](https://github.com/azat)).
* Forbid `lcm`/`gcd` for floats. [#19532](https://github.com/ClickHouse/ClickHouse/pull/19532) ([Azat Khuzhin](https://github.com/azat)).
* Fix memory tracking for `OPTIMIZE TABLE`/merges; account query memory limits and sampling for `OPTIMIZE TABLE`/merges. [#18772](https://github.com/ClickHouse/ClickHouse/pull/18772) ([Azat Khuzhin](https://github.com/azat)).
* Disallow floating point column as partition key, see [#18421](https://github.com/ClickHouse/ClickHouse/issues/18421#event-4147046255). [#18464](https://github.com/ClickHouse/ClickHouse/pull/18464) ([hexiaoting](https://github.com/hexiaoting)).
* Excessive parenthesis in type definitions no longer supported, example: `Array((UInt8))`.
#### New Feature
* Added `PostgreSQL` table engine (both select/insert, with support for multidimensional arrays), also as table function. Added `PostgreSQL` dictionary source. Added `PostgreSQL` database engine. [#18554](https://github.com/ClickHouse/ClickHouse/pull/18554) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Data type `Nested` now supports arbitrary levels of nesting. Introduced subcolumns of complex types, such as `size0` in `Array`, `null` in `Nullable`, names of `Tuple` elements, which can be read without reading of whole column. [#17310](https://github.com/ClickHouse/ClickHouse/pull/17310) ([Anton Popov](https://github.com/CurtizJ)).
* Added `Nullable` support for `FlatDictionary`, `HashedDictionary`, `ComplexKeyHashedDictionary`, `DirectDictionary`, `ComplexKeyDirectDictionary`, `RangeHashedDictionary`. [#18236](https://github.com/ClickHouse/ClickHouse/pull/18236) ([Maksim Kita](https://github.com/kitaisreal)).
* Adds a new table called `system.distributed_ddl_queue` that displays the queries in the DDL worker queue. [#17656](https://github.com/ClickHouse/ClickHouse/pull/17656) ([Bharat Nallan](https://github.com/bharatnc)).
* Added support of mapping LDAP group names, and attribute values in general, to local roles for users from ldap user directories. [#17211](https://github.com/ClickHouse/ClickHouse/pull/17211) ([Denis Glazachev](https://github.com/traceon)).
* Support insert into table function `cluster`, and for both table functions `remote` and `cluster`, support distributing data across nodes by specify sharding key. Close [#16752](https://github.com/ClickHouse/ClickHouse/issues/16752). [#18264](https://github.com/ClickHouse/ClickHouse/pull/18264) ([flynn](https://github.com/ucasFL)).
* Add function `decodeXMLComponent` to decode characters for XML. Example: `SELECT decodeXMLComponent('Hello,"world"!')` [#17659](https://github.com/ClickHouse/ClickHouse/issues/17659). [#18542](https://github.com/ClickHouse/ClickHouse/pull/18542) ([nauta](https://github.com/nautaa)).
* Added functions `parseDateTimeBestEffortUSOrZero`, `parseDateTimeBestEffortUSOrNull`. [#19712](https://github.com/ClickHouse/ClickHouse/pull/19712) ([Maksim Kita](https://github.com/kitaisreal)).
* Add `sign` math function. [#19527](https://github.com/ClickHouse/ClickHouse/pull/19527) ([flynn](https://github.com/ucasFL)).
* Add information about used features (functions, table engines, etc) into system.query_log. [#18495](https://github.com/ClickHouse/ClickHouse/issues/18495). [#19371](https://github.com/ClickHouse/ClickHouse/pull/19371) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Function `formatDateTime` support the `%Q` modification to format date to quarter. [#19224](https://github.com/ClickHouse/ClickHouse/pull/19224) ([Jianmei Zhang](https://github.com/zhangjmruc)).
* Support MetaKey+Enter hotkey binding in play UI. [#19012](https://github.com/ClickHouse/ClickHouse/pull/19012) ([sundyli](https://github.com/sundy-li)).
* Add three functions for map data type: 1. `mapContains(map, key)` to check weather map.keys include the second parameter key. 2. `mapKeys(map)` return all the keys in Array format 3. `mapValues(map)` return all the values in Array format. [#18788](https://github.com/ClickHouse/ClickHouse/pull/18788) ([hexiaoting](https://github.com/hexiaoting)).
* Add `log_comment` setting related to [#18494](https://github.com/ClickHouse/ClickHouse/issues/18494). [#18549](https://github.com/ClickHouse/ClickHouse/pull/18549) ([Zijie Lu](https://github.com/TszKitLo40)).
* Add support of tuple argument to `argMin` and `argMax` functions. [#17359](https://github.com/ClickHouse/ClickHouse/pull/17359) ([Ildus Kurbangaliev](https://github.com/ildus)).
* Support `EXISTS VIEW` syntax. [#18552](https://github.com/ClickHouse/ClickHouse/pull/18552) ([Du Chuan](https://github.com/spongedu)).
* Add `SELECT ALL` syntax. closes [#18706](https://github.com/ClickHouse/ClickHouse/issues/18706). [#18723](https://github.com/ClickHouse/ClickHouse/pull/18723) ([flynn](https://github.com/ucasFL)).
#### Performance Improvement
* Faster parts removal by lowering the number of `stat` syscalls. This returns the optimization that existed while ago. More safe interface of `IDisk`. This closes [#19065](https://github.com/ClickHouse/ClickHouse/issues/19065). [#19086](https://github.com/ClickHouse/ClickHouse/pull/19086) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Aliases declared in `WITH` statement are properly used in index analysis. Queries like `WITH column AS alias SELECT ... WHERE alias = ...` may use index now. [#18896](https://github.com/ClickHouse/ClickHouse/pull/18896) ([Amos Bird](https://github.com/amosbird)).
* Add `optimize_alias_column_prediction` (on by default), that will: - Respect aliased columns in WHERE during partition pruning and skipping data using secondary indexes; - Respect aliased columns in WHERE for trivial count queries for optimize_trivial_count; - Respect aliased columns in GROUP BY/ORDER BY for optimize_aggregation_in_order/optimize_read_in_order. [#16995](https://github.com/ClickHouse/ClickHouse/pull/16995) ([sundyli](https://github.com/sundy-li)).
* Speed up aggregate function `sum`. Improvement only visible on synthetic benchmarks and not very practical. [#19216](https://github.com/ClickHouse/ClickHouse/pull/19216) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Update libc++ and use another ABI to provide better performance. [#18914](https://github.com/ClickHouse/ClickHouse/pull/18914) ([Danila Kutenin](https://github.com/danlark1)).
* Rewrite `sumIf()` and `sum(if())` function to `countIf()` function when logically equivalent. [#17041](https://github.com/ClickHouse/ClickHouse/pull/17041) ([flynn](https://github.com/ucasFL)).
* Use a connection pool for S3 connections, controlled by the `s3_max_connections` settings. [#13405](https://github.com/ClickHouse/ClickHouse/pull/13405) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Add support for zstd long option for better compression of string columns to save space. [#17184](https://github.com/ClickHouse/ClickHouse/pull/17184) ([ygrek](https://github.com/ygrek)).
* Slightly improve server latency by removing access to configuration on every connection. [#19863](https://github.com/ClickHouse/ClickHouse/pull/19863) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Reduce lock contention for multiple layers of the `Buffer` engine. [#19379](https://github.com/ClickHouse/ClickHouse/pull/19379) ([Azat Khuzhin](https://github.com/azat)).
* Support splitting `Filter` step of query plan into `Expression + Filter` pair. Together with `Expression + Expression` merging optimization ([#17458](https://github.com/ClickHouse/ClickHouse/issues/17458)) it may delay execution for some expressions after `Filter` step. [#19253](https://github.com/ClickHouse/ClickHouse/pull/19253) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### Improvement
* `SELECT count() FROM table` now can be executed if only one any column can be selected from the `table`. This PR fixes [#10639](https://github.com/ClickHouse/ClickHouse/issues/10639). [#18233](https://github.com/ClickHouse/ClickHouse/pull/18233) ([Vitaly Baranov](https://github.com/vitlibar)).
* Set charset to `utf8mb4` when interacting with remote MySQL servers. Fixes [#19795](https://github.com/ClickHouse/ClickHouse/issues/19795). [#19800](https://github.com/ClickHouse/ClickHouse/pull/19800) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* `S3` table function now supports `auto` compression mode (autodetect). This closes [#18754](https://github.com/ClickHouse/ClickHouse/issues/18754). [#19793](https://github.com/ClickHouse/ClickHouse/pull/19793) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Correctly output infinite arguments for `formatReadableTimeDelta` function. In previous versions, there was implicit conversion to implementation specific integer value. [#19791](https://github.com/ClickHouse/ClickHouse/pull/19791) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Table function `S3` will use global region if the region can't be determined exactly. This closes [#10998](https://github.com/ClickHouse/ClickHouse/issues/10998). [#19750](https://github.com/ClickHouse/ClickHouse/pull/19750) ([Vladimir Chebotarev](https://github.com/excitoon)).
* In distributed queries if the setting `async_socket_for_remote` is enabled, it was possible to get stack overflow at least in debug build configuration if very deeply nested data type is used in table (e.g. `Array(Array(Array(...more...)))`). This fixes [#19108](https://github.com/ClickHouse/ClickHouse/issues/19108). This change introduces minor backward incompatibility: excessive parenthesis in type definitions no longer supported, example: `Array((UInt8))`. [#19736](https://github.com/ClickHouse/ClickHouse/pull/19736) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add separate pool for message brokers (RabbitMQ and Kafka). [#19722](https://github.com/ClickHouse/ClickHouse/pull/19722) ([Azat Khuzhin](https://github.com/azat)).
* Fix rare `max_number_of_merges_with_ttl_in_pool` limit overrun (more merges with TTL can be assigned) for non-replicated MergeTree. [#19708](https://github.com/ClickHouse/ClickHouse/pull/19708) ([alesapin](https://github.com/alesapin)).
* Dictionary: better error message during attribute parsing. [#19678](https://github.com/ClickHouse/ClickHouse/pull/19678) ([Maksim Kita](https://github.com/kitaisreal)).
* Add an option to disable validation of checksums on reading. Should never be used in production. Please do not expect any benefits in disabling it. It may only be used for experiments and benchmarks. The setting only applicable for tables of MergeTree family. Checksums are always validated for other table engines and when receiving data over network. In my observations there is no performance difference or it is less than 0.5%. [#19588](https://github.com/ClickHouse/ClickHouse/pull/19588) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Support constant result in function `multiIf`. [#19533](https://github.com/ClickHouse/ClickHouse/pull/19533) ([Maksim Kita](https://github.com/kitaisreal)).
* Enable function length/empty/notEmpty for datatype Map, which returns keys number in Map. [#19530](https://github.com/ClickHouse/ClickHouse/pull/19530) ([taiyang-li](https://github.com/taiyang-li)).
* Add `--reconnect` option to `clickhouse-benchmark`. When this option is specified, it will reconnect before every request. This is needed for testing. [#19872](https://github.com/ClickHouse/ClickHouse/pull/19872) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Support using the new location of `.debug` file. This fixes [#19348](https://github.com/ClickHouse/ClickHouse/issues/19348). [#19520](https://github.com/ClickHouse/ClickHouse/pull/19520) ([Amos Bird](https://github.com/amosbird)).
* `toIPv6` function parses `IPv4` addresses. [#19518](https://github.com/ClickHouse/ClickHouse/pull/19518) ([Bharat Nallan](https://github.com/bharatnc)).
* Add `http_referer` field to `system.query_log`, `system.processes`, etc. This closes [#19389](https://github.com/ClickHouse/ClickHouse/issues/19389). [#19390](https://github.com/ClickHouse/ClickHouse/pull/19390) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Improve MySQL compatibility by making more functions case insensitive and adding aliases. [#19387](https://github.com/ClickHouse/ClickHouse/pull/19387) ([Daniil Kondratyev](https://github.com/dankondr)).
* Add metrics for MergeTree parts (Wide/Compact/InMemory) types. [#19381](https://github.com/ClickHouse/ClickHouse/pull/19381) ([Azat Khuzhin](https://github.com/azat)).
* Allow docker to be executed with arbitrary uid. [#19374](https://github.com/ClickHouse/ClickHouse/pull/19374) ([filimonov](https://github.com/filimonov)).
* Fix wrong alignment of values of `IPv4` data type in Pretty formats. They were aligned to the right, not to the left. This closes [#19184](https://github.com/ClickHouse/ClickHouse/issues/19184). [#19339](https://github.com/ClickHouse/ClickHouse/pull/19339) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Allow change `max_server_memory_usage` without restart. This closes [#18154](https://github.com/ClickHouse/ClickHouse/issues/18154). [#19186](https://github.com/ClickHouse/ClickHouse/pull/19186) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* The exception when function `bar` is called with certain NaN argument may be slightly misleading in previous versions. This fixes [#19088](https://github.com/ClickHouse/ClickHouse/issues/19088). [#19107](https://github.com/ClickHouse/ClickHouse/pull/19107) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Explicitly set uid / gid of clickhouse user & group to the fixed values (101) in clickhouse-server images. [#19096](https://github.com/ClickHouse/ClickHouse/pull/19096) ([filimonov](https://github.com/filimonov)).
* Fixed `PeekableReadBuffer: Memory limit exceed` error when inserting data with huge strings. Fixes [#18690](https://github.com/ClickHouse/ClickHouse/issues/18690). [#18979](https://github.com/ClickHouse/ClickHouse/pull/18979) ([tavplubix](https://github.com/tavplubix)).
* Docker image: several improvements for clickhouse-server entrypoint. [#18954](https://github.com/ClickHouse/ClickHouse/pull/18954) ([filimonov](https://github.com/filimonov)).
* Add `normalizeQueryKeepNames` and `normalizedQueryHashKeepNames` to normalize queries without masking long names with `?`. This helps better analyze complex query logs. [#18910](https://github.com/ClickHouse/ClickHouse/pull/18910) ([Amos Bird](https://github.com/amosbird)).
* Check per-block checksum of the distributed batch on the sender before sending (without reading the file twice, the checksums will be verified while reading), this will avoid stuck of the INSERT on the receiver (on truncated .bin file on the sender). Avoid reading .bin files twice for batched INSERT (it was required to calculate rows/bytes to take squashing into account, now this information included into the header, backward compatible is preserved). [#18853](https://github.com/ClickHouse/ClickHouse/pull/18853) ([Azat Khuzhin](https://github.com/azat)).
* Fix issues with RIGHT and FULL JOIN of tables with aggregate function states. In previous versions exception about `cloneResized` method was thrown. [#18818](https://github.com/ClickHouse/ClickHouse/pull/18818) ([templarzq](https://github.com/templarzq)).
* Added prefix-based S3 endpoint settings. [#18812](https://github.com/ClickHouse/ClickHouse/pull/18812) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Add [UInt8, UInt16, UInt32, UInt64] arguments types support for bitmapTransform, bitmapSubsetInRange, bitmapSubsetLimit, bitmapContains functions. This closes [#18713](https://github.com/ClickHouse/ClickHouse/issues/18713). [#18791](https://github.com/ClickHouse/ClickHouse/pull/18791) ([sundyli](https://github.com/sundy-li)).
* Allow CTE (Common Table Expressions) to be further aliased. Propagate CSE (Common Subexpressions Elimination) to subqueries in the same level when `enable_global_with_statement = 1`. This fixes [#17378](https://github.com/ClickHouse/ClickHouse/issues/17378) . This fixes https://github.com/ClickHouse/ClickHouse/pull/16575#issuecomment-753416235 . [#18684](https://github.com/ClickHouse/ClickHouse/pull/18684) ([Amos Bird](https://github.com/amosbird)).
* Update librdkafka to v1.6.0-RC2. Fixes [#18668](https://github.com/ClickHouse/ClickHouse/issues/18668). [#18671](https://github.com/ClickHouse/ClickHouse/pull/18671) ([filimonov](https://github.com/filimonov)).
* In case of unexpected exceptions automatically restart background thread which is responsible for execution of distributed DDL queries. Fixes [#17991](https://github.com/ClickHouse/ClickHouse/issues/17991). [#18285](https://github.com/ClickHouse/ClickHouse/pull/18285) ([徐炘](https://github.com/weeds085490)).
* Updated AWS C++ SDK in order to utilize global regions in S3. [#17870](https://github.com/ClickHouse/ClickHouse/pull/17870) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Added support for `WITH ... [AND] [PERIODIC] REFRESH [interval_in_sec]` clause when creating `LIVE VIEW` tables. [#14822](https://github.com/ClickHouse/ClickHouse/pull/14822) ([vzakaznikov](https://github.com/vzakaznikov)).
* Restrict `MODIFY TTL` queries for `MergeTree` tables created in old syntax. Previously the query succeeded, but actually it had no effect. [#19064](https://github.com/ClickHouse/ClickHouse/pull/19064) ([Anton Popov](https://github.com/CurtizJ)).
#### Bug Fix
* Fix index analysis of binary functions with constant argument which leads to wrong query results. This fixes [#18364](https://github.com/ClickHouse/ClickHouse/issues/18364). [#18373](https://github.com/ClickHouse/ClickHouse/pull/18373) ([Amos Bird](https://github.com/amosbird)).
* Fix starting the server with tables having default expressions containing dictGet(). Allow getting return type of dictGet() without loading dictionary. [#19805](https://github.com/ClickHouse/ClickHouse/pull/19805) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix server crash after query with `if` function with `Tuple` type of then/else branches result. `Tuple` type must contain `Array` or another complex type. Fixes [#18356](https://github.com/ClickHouse/ClickHouse/issues/18356). [#20133](https://github.com/ClickHouse/ClickHouse/pull/20133) ([alesapin](https://github.com/alesapin)).
* `MaterializeMySQL` (experimental feature): Fix replication for statements that update several tables. [#20066](https://github.com/ClickHouse/ClickHouse/pull/20066) ([Håvard Kvålen](https://github.com/havardk)).
* Prevent "Connection refused" in docker during initialization script execution. [#20012](https://github.com/ClickHouse/ClickHouse/pull/20012) ([filimonov](https://github.com/filimonov)).
* `EmbeddedRocksDB` is an experimental storage. Fix the issue with lack of proper type checking. Simplified code. This closes [#19967](https://github.com/ClickHouse/ClickHouse/issues/19967). [#19972](https://github.com/ClickHouse/ClickHouse/pull/19972) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix a segfault in function `fromModifiedJulianDay` when the argument type is `Nullable(T)` for any integral types other than Int32. [#19959](https://github.com/ClickHouse/ClickHouse/pull/19959) ([PHO](https://github.com/depressed-pho)).
* The function `greatCircleAngle` returned inaccurate results in previous versions. This closes [#19769](https://github.com/ClickHouse/ClickHouse/issues/19769). [#19789](https://github.com/ClickHouse/ClickHouse/pull/19789) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix rare bug when some replicated operations (like mutation) cannot process some parts after data corruption. Fixes [#19593](https://github.com/ClickHouse/ClickHouse/issues/19593). [#19702](https://github.com/ClickHouse/ClickHouse/pull/19702) ([alesapin](https://github.com/alesapin)).
* Background thread which executes `ON CLUSTER` queries might hang waiting for dropped replicated table to do something. It's fixed. [#19684](https://github.com/ClickHouse/ClickHouse/pull/19684) ([yiguolei](https://github.com/yiguolei)).
* Fix wrong deserialization of columns description. It makes INSERT into a table with a column named `\` impossible. [#19479](https://github.com/ClickHouse/ClickHouse/pull/19479) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Mark distributed batch as broken in case of empty data block in one of files. [#19449](https://github.com/ClickHouse/ClickHouse/pull/19449) ([Azat Khuzhin](https://github.com/azat)).
* Fixed very rare bug that might cause mutation to hang after `DROP/DETACH/REPLACE/MOVE PARTITION`. It was partially fixed by [#15537](https://github.com/ClickHouse/ClickHouse/issues/15537) for the most cases. [#19443](https://github.com/ClickHouse/ClickHouse/pull/19443) ([tavplubix](https://github.com/tavplubix)).
* Fix possible error `Extremes transform was already added to pipeline`. Fixes [#14100](https://github.com/ClickHouse/ClickHouse/issues/14100). [#19430](https://github.com/ClickHouse/ClickHouse/pull/19430) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix default value in join types with non-zero default (e.g. some Enums). Closes [#18197](https://github.com/ClickHouse/ClickHouse/issues/18197). [#19360](https://github.com/ClickHouse/ClickHouse/pull/19360) ([vdimir](https://github.com/vdimir)).
* Do not mark file for distributed send as broken on EOF. [#19290](https://github.com/ClickHouse/ClickHouse/pull/19290) ([Azat Khuzhin](https://github.com/azat)).
* Fix leaking of pipe fd for `async_socket_for_remote`. [#19153](https://github.com/ClickHouse/ClickHouse/pull/19153) ([Azat Khuzhin](https://github.com/azat)).
* Fix infinite reading from file in `ORC` format (was introduced in [#10580](https://github.com/ClickHouse/ClickHouse/issues/10580)). Fixes [#19095](https://github.com/ClickHouse/ClickHouse/issues/19095). [#19134](https://github.com/ClickHouse/ClickHouse/pull/19134) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix issue in merge tree data writer which can lead to marks with bigger size than fixed granularity size. Fixes [#18913](https://github.com/ClickHouse/ClickHouse/issues/18913). [#19123](https://github.com/ClickHouse/ClickHouse/pull/19123) ([alesapin](https://github.com/alesapin)).
* Fix startup bug when clickhouse was not able to read compression codec from `LowCardinality(Nullable(...))` and throws exception `Attempt to read after EOF`. Fixes [#18340](https://github.com/ClickHouse/ClickHouse/issues/18340). [#19101](https://github.com/ClickHouse/ClickHouse/pull/19101) ([alesapin](https://github.com/alesapin)).
* Simplify the implementation of `tupleHammingDistance`. Support for tuples of any equal length. Fixes [#19029](https://github.com/ClickHouse/ClickHouse/issues/19029). [#19084](https://github.com/ClickHouse/ClickHouse/pull/19084) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Make sure `groupUniqArray` returns correct type for argument of Enum type. This closes [#17875](https://github.com/ClickHouse/ClickHouse/issues/17875). [#19019](https://github.com/ClickHouse/ClickHouse/pull/19019) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix possible error `Expected single dictionary argument for function` if use function `ignore` with `LowCardinality` argument. Fixes [#14275](https://github.com/ClickHouse/ClickHouse/issues/14275). [#19016](https://github.com/ClickHouse/ClickHouse/pull/19016) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix inserting of `LowCardinality` column to table with `TinyLog` engine. Fixes [#18629](https://github.com/ClickHouse/ClickHouse/issues/18629). [#19010](https://github.com/ClickHouse/ClickHouse/pull/19010) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix minor issue in JOIN: Join tries to materialize const columns, but our code waits for them in other places. [#18982](https://github.com/ClickHouse/ClickHouse/pull/18982) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Disable `optimize_move_functions_out_of_any` because optimization is not always correct. This closes [#18051](https://github.com/ClickHouse/ClickHouse/issues/18051). This closes [#18973](https://github.com/ClickHouse/ClickHouse/issues/18973). [#18981](https://github.com/ClickHouse/ClickHouse/pull/18981) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix possible exception `QueryPipeline stream: different number of columns` caused by merging of query plan's `Expression` steps. Fixes [#18190](https://github.com/ClickHouse/ClickHouse/issues/18190). [#18980](https://github.com/ClickHouse/ClickHouse/pull/18980) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed very rare deadlock at shutdown. [#18977](https://github.com/ClickHouse/ClickHouse/pull/18977) ([tavplubix](https://github.com/tavplubix)).
* Fixed rare crashes when server run out of memory. [#18976](https://github.com/ClickHouse/ClickHouse/pull/18976) ([tavplubix](https://github.com/tavplubix)).
* Fix incorrect behavior when `ALTER TABLE ... DROP PART 'part_name'` query removes all deduplication blocks for the whole partition. Fixes [#18874](https://github.com/ClickHouse/ClickHouse/issues/18874). [#18969](https://github.com/ClickHouse/ClickHouse/pull/18969) ([alesapin](https://github.com/alesapin)).
* Fixed issue [#18894](https://github.com/ClickHouse/ClickHouse/issues/18894) Add a check to avoid exception when long column alias('table.column' style, usually auto-generated by BI tools like Looker) equals to long table name. [#18968](https://github.com/ClickHouse/ClickHouse/pull/18968) ([Daniel Qin](https://github.com/mathfool)).
* Fix error `Task was not found in task queue` (possible only for remote queries, with `async_socket_for_remote = 1`). [#18964](https://github.com/ClickHouse/ClickHouse/pull/18964) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix bug when mutation with some escaped text (like `ALTER ... UPDATE e = CAST('foo', 'Enum8(\'foo\' = 1')` serialized incorrectly. Fixes [#18878](https://github.com/ClickHouse/ClickHouse/issues/18878). [#18944](https://github.com/ClickHouse/ClickHouse/pull/18944) ([alesapin](https://github.com/alesapin)).
* ATTACH PARTITION will reset mutations. [#18804](https://github.com/ClickHouse/ClickHouse/issues/18804). [#18935](https://github.com/ClickHouse/ClickHouse/pull/18935) ([fastio](https://github.com/fastio)).
* Fix issue with `bitmapOrCardinality` that may lead to nullptr dereference. This closes [#18911](https://github.com/ClickHouse/ClickHouse/issues/18911). [#18912](https://github.com/ClickHouse/ClickHouse/pull/18912) ([sundyli](https://github.com/sundy-li)).
* Fixed `Attempt to read after eof` error when trying to `CAST` `NULL` from `Nullable(String)` to `Nullable(Decimal(P, S))`. Now function `CAST` returns `NULL` when it cannot parse decimal from nullable string. Fixes [#7690](https://github.com/ClickHouse/ClickHouse/issues/7690). [#18718](https://github.com/ClickHouse/ClickHouse/pull/18718) ([Winter Zhang](https://github.com/zhang2014)).
* Fix data type convert issue for MySQL engine. [#18124](https://github.com/ClickHouse/ClickHouse/pull/18124) ([bo zeng](https://github.com/mis98zb)).
* Fix clickhouse-client abort exception while executing only `select`. [#19790](https://github.com/ClickHouse/ClickHouse/pull/19790) ([taiyang-li](https://github.com/taiyang-li)).
#### Build/Testing/Packaging Improvement
* Run [SQLancer](https://twitter.com/RiggerManuel/status/1352345625480884228) (logical SQL fuzzer) in CI. [#19006](https://github.com/ClickHouse/ClickHouse/pull/19006) ([Ilya Yatsishin](https://github.com/qoega)).
* Query Fuzzer will fuzz newly added tests more extensively. This closes [#18916](https://github.com/ClickHouse/ClickHouse/issues/18916). [#19185](https://github.com/ClickHouse/ClickHouse/pull/19185) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Integrate with [Big List of Naughty Strings](https://github.com/minimaxir/big-list-of-naughty-strings/) for better fuzzing. [#19480](https://github.com/ClickHouse/ClickHouse/pull/19480) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add integration tests run with MSan. [#18974](https://github.com/ClickHouse/ClickHouse/pull/18974) ([alesapin](https://github.com/alesapin)).
* Fixed MemorySanitizer errors in cyrus-sasl and musl. [#19821](https://github.com/ClickHouse/ClickHouse/pull/19821) ([Ilya Yatsishin](https://github.com/qoega)).
* Insuffiient arguments check in `positionCaseInsensitiveUTF8` function triggered address sanitizer. [#19720](https://github.com/ClickHouse/ClickHouse/pull/19720) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Remove --project-directory for docker-compose in integration test. Fix logs formatting from docker container. [#19706](https://github.com/ClickHouse/ClickHouse/pull/19706) ([Ilya Yatsishin](https://github.com/qoega)).
* Made generation of macros.xml easier for integration tests. No more excessive logging from dicttoxml. dicttoxml project is not active for 5+ years. [#19697](https://github.com/ClickHouse/ClickHouse/pull/19697) ([Ilya Yatsishin](https://github.com/qoega)).
* Allow to explicitly enable or disable watchdog via environment variable `CLICKHOUSE_WATCHDOG_ENABLE`. By default it is enabled if server is not attached to terminal. [#19522](https://github.com/ClickHouse/ClickHouse/pull/19522) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Allow building ClickHouse with Kafka support on arm64. [#19369](https://github.com/ClickHouse/ClickHouse/pull/19369) ([filimonov](https://github.com/filimonov)).
* Allow building librdkafka without ssl. [#19337](https://github.com/ClickHouse/ClickHouse/pull/19337) ([filimonov](https://github.com/filimonov)).
* Restore Kafka input in FreeBSD builds. [#18924](https://github.com/ClickHouse/ClickHouse/pull/18924) ([Alexandre Snarskii](https://github.com/snar)).
* Fix potential nullptr dereference in table function `VALUES`. [#19357](https://github.com/ClickHouse/ClickHouse/pull/19357) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Avoid UBSan reports in `arrayElement` function, `substring` and `arraySum`. Fixes [#19305](https://github.com/ClickHouse/ClickHouse/issues/19305). Fixes [#19287](https://github.com/ClickHouse/ClickHouse/issues/19287). This closes [#19336](https://github.com/ClickHouse/ClickHouse/issues/19336). [#19347](https://github.com/ClickHouse/ClickHouse/pull/19347) ([alexey-milovidov](https://github.com/alexey-milovidov)).
## ClickHouse release 21.1 ## ClickHouse release 21.1
### ClickHouse release v21.1.3.32-stable, 2021-02-03 ### ClickHouse release v21.1.3.32-stable, 2021-02-03

View File

@ -8,7 +8,7 @@ ClickHouse® is an open-source column-oriented database management system that a
* [Tutorial](https://clickhouse.tech/docs/en/getting_started/tutorial/) shows how to set up and query small ClickHouse cluster. * [Tutorial](https://clickhouse.tech/docs/en/getting_started/tutorial/) shows how to set up and query small ClickHouse cluster.
* [Documentation](https://clickhouse.tech/docs/en/) provides more in-depth information. * [Documentation](https://clickhouse.tech/docs/en/) provides more in-depth information.
* [YouTube channel](https://www.youtube.com/c/ClickHouseDB) has a lot of content about ClickHouse in video format. * [YouTube channel](https://www.youtube.com/c/ClickHouseDB) has a lot of content about ClickHouse in video format.
* [Slack](https://join.slack.com/t/clickhousedb/shared_invite/zt-d2zxkf9e-XyxDa_ucfPxzuH4SJIm~Ng) and [Telegram](https://telegram.me/clickhouse_en) allow to chat with ClickHouse users in real-time. * [Slack](https://join.slack.com/t/clickhousedb/shared_invite/zt-ly9m4w1x-6j7x5Ts_pQZqrctAbRZ3cg) and [Telegram](https://telegram.me/clickhouse_en) allow to chat with ClickHouse users in real-time.
* [Blog](https://clickhouse.yandex/blog/en/) contains various ClickHouse-related articles, as well as announcements and reports about events. * [Blog](https://clickhouse.yandex/blog/en/) contains various ClickHouse-related articles, as well as announcements and reports about events.
* [Code Browser](https://clickhouse.tech/codebrowser/html_report/ClickHouse/index.html) with syntax highlight and navigation. * [Code Browser](https://clickhouse.tech/codebrowser/html_report/ClickHouse/index.html) with syntax highlight and navigation.
* [Yandex.Messenger channel](https://yandex.ru/chat/#/join/20e380d9-c7be-4123-ab06-e95fb946975e) shares announcements and useful links in Russian. * [Yandex.Messenger channel](https://yandex.ru/chat/#/join/20e380d9-c7be-4123-ab06-e95fb946975e) shares announcements and useful links in Russian.

View File

@ -11,7 +11,7 @@ endif ()
target_compile_options(base64_scalar PRIVATE -falign-loops) target_compile_options(base64_scalar PRIVATE -falign-loops)
if (ARCH_AMD64) if (ARCH_AMD64)
target_compile_options(base64_ssse3 PRIVATE -mssse3 -falign-loops) target_compile_options(base64_ssse3 PRIVATE -mno-avx -mno-avx2 -mssse3 -falign-loops)
target_compile_options(base64_avx PRIVATE -falign-loops -mavx) target_compile_options(base64_avx PRIVATE -falign-loops -mavx)
target_compile_options(base64_avx2 PRIVATE -falign-loops -mavx2) target_compile_options(base64_avx2 PRIVATE -falign-loops -mavx2)
else () else ()

View File

@ -252,6 +252,7 @@ if (NOT EXTERNAL_HYPERSCAN_LIBRARY_FOUND)
target_compile_definitions (hyperscan PUBLIC USE_HYPERSCAN=1) target_compile_definitions (hyperscan PUBLIC USE_HYPERSCAN=1)
target_compile_options (hyperscan target_compile_options (hyperscan
PRIVATE -g0 # Library has too much debug information PRIVATE -g0 # Library has too much debug information
-mno-avx -mno-avx2 # The library is using dynamic dispatch and is confused if AVX is enabled globally
-march=corei7 -O2 -fno-strict-aliasing -fno-omit-frame-pointer -fvisibility=hidden # The options from original build system -march=corei7 -O2 -fno-strict-aliasing -fno-omit-frame-pointer -fvisibility=hidden # The options from original build system
-fno-sanitize=undefined # Assume the library takes care of itself -fno-sanitize=undefined # Assume the library takes care of itself
) )

2
contrib/poco vendored

@ -1 +1 @@
Subproject commit e11f3c971570cf6a31006cd21cadf41a259c360a Subproject commit fbaaba4a02e29987b8c584747a496c79528f125f

View File

@ -319,6 +319,7 @@ function run_tests
# In fasttest, ENABLE_LIBRARIES=0, so rocksdb engine is not enabled by default # In fasttest, ENABLE_LIBRARIES=0, so rocksdb engine is not enabled by default
01504_rocksdb 01504_rocksdb
01686_rocksdb
# Look at DistributedFilesToInsert, so cannot run in parallel. # Look at DistributedFilesToInsert, so cannot run in parallel.
01460_DistributedFilesToInsert 01460_DistributedFilesToInsert

View File

@ -61,7 +61,7 @@ RUN python3 -m pip install \
aerospike \ aerospike \
avro \ avro \
cassandra-driver \ cassandra-driver \
confluent-kafka \ confluent-kafka==1.5.0 \
dict2xml \ dict2xml \
dicttoxml \ dicttoxml \
docker \ docker \

View File

@ -46,7 +46,7 @@ toc_title: Adopters
| <a href="https://www.exness.com" class="favicon">Exness</a> | Trading | Metrics, Logging | — | — | [Talk in Russian, May 2019](https://youtu.be/_rpU-TvSfZ8?t=3215) | | <a href="https://www.exness.com" class="favicon">Exness</a> | Trading | Metrics, Logging | — | — | [Talk in Russian, May 2019](https://youtu.be/_rpU-TvSfZ8?t=3215) |
| <a href="https://fastnetmon.com/" class="favicon">FastNetMon</a> | DDoS Protection | Main Product | | — | [Official website](https://fastnetmon.com/docs-fnm-advanced/fastnetmon-advanced-traffic-persistency/) | | <a href="https://fastnetmon.com/" class="favicon">FastNetMon</a> | DDoS Protection | Main Product | | — | [Official website](https://fastnetmon.com/docs-fnm-advanced/fastnetmon-advanced-traffic-persistency/) |
| <a href="https://www.flipkart.com/" class="favicon">Flipkart</a> | e-Commerce | — | — | — | [Talk in English, July 2020](https://youtu.be/GMiXCMFDMow?t=239) | | <a href="https://www.flipkart.com/" class="favicon">Flipkart</a> | e-Commerce | — | — | — | [Talk in English, July 2020](https://youtu.be/GMiXCMFDMow?t=239) |
| <a href="https://fun.co/rp" class="favicon">FunCorp</a> | Games | | — | | [Article](https://www.altinity.com/blog/migrating-from-redshift-to-clickhouse) | | <a href="https://fun.co/rp" class="favicon">FunCorp</a> | Games | | — | 14 bn records/day as of Jan 2021 | [Article](https://www.altinity.com/blog/migrating-from-redshift-to-clickhouse) |
| <a href="https://geniee.co.jp" class="favicon">Geniee</a> | Ad network | Main product | — | — | [Blog post in Japanese, July 2017](https://tech.geniee.co.jp/entry/2017/07/20/160100) | | <a href="https://geniee.co.jp" class="favicon">Geniee</a> | Ad network | Main product | — | — | [Blog post in Japanese, July 2017](https://tech.geniee.co.jp/entry/2017/07/20/160100) |
| <a href="https://www.genotek.ru/" class="favicon">Genotek</a> | Bioinformatics | Main product | — | — | [Video, August 2020](https://youtu.be/v3KyZbz9lEE) | | <a href="https://www.genotek.ru/" class="favicon">Genotek</a> | Bioinformatics | Main product | — | — | [Video, August 2020](https://youtu.be/v3KyZbz9lEE) |
| <a href="https://www.huya.com/" class="favicon">HUYA</a> | Video Streaming | Analytics | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/7.%20ClickHouse万亿数据分析实践%20李本旺(sundy-li)%20虎牙.pdf) | | <a href="https://www.huya.com/" class="favicon">HUYA</a> | Video Streaming | Analytics | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/7.%20ClickHouse万亿数据分析实践%20李本旺(sundy-li)%20虎牙.pdf) |
@ -74,6 +74,7 @@ toc_title: Adopters
| <a href="https://getnoc.com/" class="favicon">NOC Project</a> | Network Monitoring | Analytics | Main Product | — | [Official Website](https://getnoc.com/features/big-data/) | | <a href="https://getnoc.com/" class="favicon">NOC Project</a> | Network Monitoring | Analytics | Main Product | — | [Official Website](https://getnoc.com/features/big-data/) |
| <a href="https://www.nuna.com/" class="favicon">Nuna Inc.</a> | Health Data Analytics | — | — | — | [Talk in English, July 2020](https://youtu.be/GMiXCMFDMow?t=170) | | <a href="https://www.nuna.com/" class="favicon">Nuna Inc.</a> | Health Data Analytics | — | — | — | [Talk in English, July 2020](https://youtu.be/GMiXCMFDMow?t=170) |
| <a href="https://www.oneapm.com/" class="favicon">OneAPM</a> | Monitorings and Data Analysis | Main product | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/8.%20clickhouse在OneAPM的应用%20杜龙.pdf) | | <a href="https://www.oneapm.com/" class="favicon">OneAPM</a> | Monitorings and Data Analysis | Main product | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/8.%20clickhouse在OneAPM的应用%20杜龙.pdf) |
| <a href="https://panelbear.com/" class="favicon">Panelbear | Analytics | Monitoring and Analytics | — | — | [Tech Stack, November 2020](https://panelbear.com/blog/tech-stack/) |
| <a href="https://www.percent.cn/" class="favicon">Percent 百分点</a> | Analytics | Main Product | — | — | [Slides in Chinese, June 2019](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup24/4.%20ClickHouse万亿数据双中心的设计与实践%20.pdf) | | <a href="https://www.percent.cn/" class="favicon">Percent 百分点</a> | Analytics | Main Product | — | — | [Slides in Chinese, June 2019](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup24/4.%20ClickHouse万亿数据双中心的设计与实践%20.pdf) |
| <a href="https://www.percona.com/" class="favicon">Percona</a> | Performance analysis | Percona Monitoring and Management | — | — | [Official website, Mar 2020](https://www.percona.com/blog/2020/03/30/advanced-query-analysis-in-percona-monitoring-and-management-with-direct-clickhouse-access/) | | <a href="https://www.percona.com/" class="favicon">Percona</a> | Performance analysis | Percona Monitoring and Management | — | — | [Official website, Mar 2020](https://www.percona.com/blog/2020/03/30/advanced-query-analysis-in-percona-monitoring-and-management-with-direct-clickhouse-access/) |
| <a href="https://plausible.io/" class="favicon">Plausible</a> | Analytics | Main Product | — | — | [Blog post, June 2020](https://twitter.com/PlausibleHQ/status/1273889629087969280) | | <a href="https://plausible.io/" class="favicon">Plausible</a> | Analytics | Main Product | — | — | [Blog post, June 2020](https://twitter.com/PlausibleHQ/status/1273889629087969280) |

View File

@ -29,6 +29,8 @@ Lets look at the section of the users.xml file that defines quotas.
<!-- Unlimited. Just collect data for the specified time interval. --> <!-- Unlimited. Just collect data for the specified time interval. -->
<queries>0</queries> <queries>0</queries>
<query_selects>0</query_selects>
<query_inserts>0</query_inserts>
<errors>0</errors> <errors>0</errors>
<result_rows>0</result_rows> <result_rows>0</result_rows>
<read_rows>0</read_rows> <read_rows>0</read_rows>
@ -48,6 +50,8 @@ The resource consumption calculated for each interval is output to the server lo
<duration>3600</duration> <duration>3600</duration>
<queries>1000</queries> <queries>1000</queries>
<query_selects>100</query_selects>
<query_inserts>100</query_inserts>
<errors>100</errors> <errors>100</errors>
<result_rows>1000000000</result_rows> <result_rows>1000000000</result_rows>
<read_rows>100000000000</read_rows> <read_rows>100000000000</read_rows>
@ -58,6 +62,8 @@ The resource consumption calculated for each interval is output to the server lo
<duration>86400</duration> <duration>86400</duration>
<queries>10000</queries> <queries>10000</queries>
<query_selects>10000</query_selects>
<query_inserts>10000</query_inserts>
<errors>1000</errors> <errors>1000</errors>
<result_rows>5000000000</result_rows> <result_rows>5000000000</result_rows>
<read_rows>500000000000</read_rows> <read_rows>500000000000</read_rows>
@ -74,6 +80,10 @@ Here are the amounts that can be restricted:
`queries` The total number of requests. `queries` The total number of requests.
`query_selects` The total number of select requests.
`query_inserts` The total number of insert requests.
`errors` The number of queries that threw an exception. `errors` The number of queries that threw an exception.
`result_rows` The total number of rows given as a result. `result_rows` The total number of rows given as a result.

View File

@ -6,29 +6,62 @@ This table contains information about events that occurred with [data parts](../
The `system.part_log` table contains the following columns: The `system.part_log` table contains the following columns:
- `event_type` (Enum) — Type of the event that occurred with the data part. Can have one of the following values: - `query_id` ([String](../../sql-reference/data-types/string.md)) — Identifier of the `INSERT` query that created this data part.
- `event_type` ([Enum8](../../sql-reference/data-types/enum.md)) — Type of the event that occurred with the data part. Can have one of the following values:
- `NEW_PART` — Inserting of a new data part. - `NEW_PART` — Inserting of a new data part.
- `MERGE_PARTS` — Merging of data parts. - `MERGE_PARTS` — Merging of data parts.
- `DOWNLOAD_PART` — Downloading a data part. - `DOWNLOAD_PART` — Downloading a data part.
- `REMOVE_PART` — Removing or detaching a data part using [DETACH PARTITION](../../sql-reference/statements/alter/partition.md#alter_detach-partition). - `REMOVE_PART` — Removing or detaching a data part using [DETACH PARTITION](../../sql-reference/statements/alter/partition.md#alter_detach-partition).
- `MUTATE_PART` — Mutating of a data part. - `MUTATE_PART` — Mutating of a data part.
- `MOVE_PART` — Moving the data part from the one disk to another one. - `MOVE_PART` — Moving the data part from the one disk to another one.
- `event_date` (Date) — Event date. - `event_date` ([Date](../../sql-reference/data-types/date.md)) — Event date.
- `event_time` (DateTime) — Event time. - `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Event time.
- `duration_ms` (UInt64) — Duration. - `duration_ms` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Duration.
- `database` (String) — Name of the database the data part is in. - `database` ([String](../../sql-reference/data-types/string.md)) — Name of the database the data part is in.
- `table` (String) — Name of the table the data part is in. - `table` ([String](../../sql-reference/data-types/string.md)) — Name of the table the data part is in.
- `part_name` (String) — Name of the data part. - `part_name` ([String](../../sql-reference/data-types/string.md)) — Name of the data part.
- `partition_id` (String) — ID of the partition that the data part was inserted to. The column takes the all value if the partitioning is by `tuple()`. - `partition_id` ([String](../../sql-reference/data-types/string.md)) — ID of the partition that the data part was inserted to. The column takes the `all` value if the partitioning is by `tuple()`.
- `rows` (UInt64) — The number of rows in the data part. - `path_on_disk` ([String](../../sql-reference/data-types/string.md)) — Absolute path to the folder with data part files.
- `size_in_bytes` (UInt64) — Size of the data part in bytes. - `rows` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The number of rows in the data part.
- `merged_from` (Array(String)) — An array of names of the parts which the current part was made up from (after the merge). - `size_in_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Size of the data part in bytes.
- `bytes_uncompressed` (UInt64) — Size of uncompressed bytes. - `merged_from` ([Array(String)](../../sql-reference/data-types/array.md)) — An array of names of the parts which the current part was made up from (after the merge).
- `read_rows` (UInt64) — The number of rows was read during the merge. - `bytes_uncompressed` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Size of uncompressed bytes.
- `read_bytes` (UInt64) — The number of bytes was read during the merge. - `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The number of rows was read during the merge.
- `error` (UInt16) — The code number of the occurred error. - `read_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The number of bytes was read during the merge.
- `exception` (String) — Text message of the occurred error. - `peak_memory_usage` ([Int64](../../sql-reference/data-types/int-uint.md)) — The maximum difference between the amount of allocated and freed memory in context of this thread.
- `error` ([UInt16](../../sql-reference/data-types/int-uint.md)) — The code number of the occurred error.
- `exception` ([String](../../sql-reference/data-types/string.md)) — Text message of the occurred error.
The `system.part_log` table is created after the first inserting data to the `MergeTree` table. The `system.part_log` table is created after the first inserting data to the `MergeTree` table.
**Example**
``` sql
SELECT * FROM system.part_log LIMIT 1 FORMAT Vertical;
```
``` text
Row 1:
──────
query_id: 983ad9c7-28d5-4ae1-844e-603116b7de31
event_type: NewPart
event_date: 2021-02-02
event_time: 2021-02-02 11:14:28
duration_ms: 35
database: default
table: log_mt_2
part_name: all_1_1_0
partition_id: all
path_on_disk: db/data/default/log_mt_2/all_1_1_0/
rows: 115418
size_in_bytes: 1074311
merged_from: []
bytes_uncompressed: 0
read_rows: 0
read_bytes: 0
peak_memory_usage: 0
error: 0
exception:
```
[Original article](https://clickhouse.tech/docs/en/operations/system_tables/part_log) <!--hide--> [Original article](https://clickhouse.tech/docs/en/operations/system_tables/part_log) <!--hide-->

View File

@ -9,6 +9,8 @@ Columns:
- `0` — Interval is not randomized. - `0` — Interval is not randomized.
- `1` — Interval is randomized. - `1` — Interval is randomized.
- `max_queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of queries. - `max_queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of queries.
- `max_query_selects` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of select queries.
- `max_query_inserts` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of insert queries.
- `max_errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of errors. - `max_errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of errors.
- `max_result_rows` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of result rows. - `max_result_rows` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of result rows.
- `max_result_bytes` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of RAM volume in bytes used to store a queries result. - `max_result_bytes` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of RAM volume in bytes used to store a queries result.

View File

@ -9,6 +9,8 @@ Columns:
- `end_time`([Nullable](../../sql-reference/data-types/nullable.md)([DateTime](../../sql-reference/data-types/datetime.md))) — End time for calculating resource consumption. - `end_time`([Nullable](../../sql-reference/data-types/nullable.md)([DateTime](../../sql-reference/data-types/datetime.md))) — End time for calculating resource consumption.
- `duration` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Length of the time interval for calculating resource consumption, in seconds. - `duration` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Length of the time interval for calculating resource consumption, in seconds.
- `queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of requests on this interval. - `queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of requests on this interval.
- `query_selects` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of select requests on this interval.
- `query_inserts` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of insert requests on this interval.
- `max_queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of requests. - `max_queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of requests.
- `errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The number of queries that threw an exception. - `errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The number of queries that threw an exception.
- `max_errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of errors. - `max_errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of errors.

View File

@ -11,6 +11,10 @@ Columns:
- `duration` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt32](../../sql-reference/data-types/int-uint.md))) — Length of the time interval for calculating resource consumption, in seconds. - `duration` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt32](../../sql-reference/data-types/int-uint.md))) — Length of the time interval for calculating resource consumption, in seconds.
- `queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of requests in this interval. - `queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of requests in this interval.
- `max_queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of requests. - `max_queries` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of requests.
- `query_selects` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of select requests in this interval.
- `max_query_selects` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of select requests.
- `query_inserts` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of insert requests in this interval.
- `max_query_inserts` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of insert requests.
- `errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The number of queries that threw an exception. - `errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The number of queries that threw an exception.
- `max_errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of errors. - `max_errors` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Maximum number of errors.
- `result_rows` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of rows given as a result. - `result_rows` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — The total number of rows given as a result.

View File

@ -1,12 +1,16 @@
# system.zookeeper {#system-zookeeper} # system.zookeeper {#system-zookeeper}
The table does not exist if ZooKeeper is not configured. Allows reading data from the ZooKeeper cluster defined in the config. The table does not exist if ZooKeeper is not configured. Allows reading data from the ZooKeeper cluster defined in the config.
The query must have a path equality condition in the WHERE clause. This is the path in ZooKeeper for the children that you want to get data for. The query must either have a path = condition or a `path IN` condition set with the `WHERE` clause as shown below. This corresponds to the path of the children in ZooKeeper that you want to get data for.
The query `SELECT * FROM system.zookeeper WHERE path = '/clickhouse'` outputs data for all children on the `/clickhouse` node. The query `SELECT * FROM system.zookeeper WHERE path = '/clickhouse'` outputs data for all children on the `/clickhouse` node.
To output data for all root nodes, write path = /. To output data for all root nodes, write path = /.
If the path specified in path doesnt exist, an exception will be thrown. If the path specified in path doesnt exist, an exception will be thrown.
The query `SELECT * FROM system.zookeeper WHERE path IN ('/', '/clickhouse')` outputs data for all children on the `/` and `/clickhouse` node.
If in the specified path collection has doesn't exist path, an exception will be thrown.
It can be used to do a batch of ZooKeeper path queries.
Columns: Columns:
- `name` (String) — The name of the node. - `name` (String) — The name of the node.

View File

@ -4,13 +4,42 @@ toc_priority: 106
# argMax {#agg-function-argmax} # argMax {#agg-function-argmax}
Syntax: `argMax(arg, val)` or `argMax(tuple(arg, val))` Calculates the `arg` value for a maximum `val` value. If there are several different values of `arg` for maximum values of `val`, returns the first of these values encountered.
Calculates the `arg` value for a maximum `val` value. If there are several different values of `arg` for maximum values of `val`, the first of these values encountered is output. Tuple version of this function will return the tuple with the maximum `val` value. It is convenient for use with [SimpleAggregateFunction](../../../sql-reference/data-types/simpleaggregatefunction.md).
Tuple version of this function will return the tuple with the maximum `val` value. It is convinient for use with `SimpleAggregateFunction`. **Syntax**
**Example:** ``` sql
argMax(arg, val)
```
or
``` sql
argMax(tuple(arg, val))
```
**Parameters**
- `arg` — Argument.
- `val` — Value.
**Returned value**
- `arg` value that corresponds to maximum `val` value.
Type: matches `arg` type.
For tuple in the input:
- Tuple `(arg, val)`, where `val` is the maximum value and `arg` is a corresponding value.
Type: [Tuple](../../../sql-reference/data-types/tuple.md).
**Example**
Input table:
``` text ``` text
┌─user─────┬─salary─┐ ┌─user─────┬─salary─┐
@ -20,12 +49,18 @@ Tuple version of this function will return the tuple with the maximum `val` valu
└──────────┴────────┘ └──────────┴────────┘
``` ```
Query:
``` sql ``` sql
SELECT argMax(user, salary), argMax(tuple(user, salary)) FROM salary SELECT argMax(user, salary), argMax(tuple(user, salary)) FROM salary;
``` ```
Result:
``` text ``` text
┌─argMax(user, salary)─┬─argMax(tuple(user, salary))─┐ ┌─argMax(user, salary)─┬─argMax(tuple(user, salary))─┐
│ director │ ('director',5000) │ │ director │ ('director',5000) │
└──────────────────────┴─────────────────────────────┘ └──────────────────────┴─────────────────────────────┘
``` ```
[Original article](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/argmax/) <!--hide-->

View File

@ -4,13 +4,42 @@ toc_priority: 105
# argMin {#agg-function-argmin} # argMin {#agg-function-argmin}
Syntax: `argMin(arg, val)` or `argMin(tuple(arg, val))` Calculates the `arg` value for a minimum `val` value. If there are several different values of `arg` for minimum values of `val`, returns the first of these values encountered.
Calculates the `arg` value for a minimal `val` value. If there are several different values of `arg` for minimal values of `val`, the first of these values encountered is output. Tuple version of this function will return the tuple with the minimum `val` value. It is convenient for use with [SimpleAggregateFunction](../../../sql-reference/data-types/simpleaggregatefunction.md).
Tuple version of this function will return the tuple with the minimal `val` value. It is convinient for use with `SimpleAggregateFunction`. **Syntax**
**Example:** ``` sql
argMin(arg, val)
```
or
``` sql
argMin(tuple(arg, val))
```
**Parameters**
- `arg` — Argument.
- `val` — Value.
**Returned value**
- `arg` value that corresponds to minimum `val` value.
Type: matches `arg` type.
For tuple in the input:
- Tuple `(arg, val)`, where `val` is the minimum value and `arg` is a corresponding value.
Type: [Tuple](../../../sql-reference/data-types/tuple.md).
**Example**
Input table:
``` text ``` text
┌─user─────┬─salary─┐ ┌─user─────┬─salary─┐
@ -20,12 +49,18 @@ Tuple version of this function will return the tuple with the minimal `val` valu
└──────────┴────────┘ └──────────┴────────┘
``` ```
Query:
``` sql ``` sql
SELECT argMin(user, salary), argMin(tuple(user, salary)) FROM salary SELECT argMin(user, salary), argMin(tuple(user, salary)) FROM salary;
``` ```
Result:
``` text ``` text
┌─argMin(user, salary)─┬─argMin(tuple(user, salary))─┐ ┌─argMin(user, salary)─┬─argMin(tuple(user, salary))─┐
│ worker │ ('worker',1000) │ │ worker │ ('worker',1000) │
└──────────────────────┴─────────────────────────────┘ └──────────────────────┴─────────────────────────────┘
``` ```
[Original article](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/argmin/) <!--hide-->

View File

@ -0,0 +1,71 @@
---
toc_priority: 310
toc_title: mannWhitneyUTest
---
# mannWhitneyUTest {#mannwhitneyutest}
Applies the Mann-Whitney rank test to samples from two populations.
**Syntax**
``` sql
mannWhitneyUTest[(alternative[, continuity_correction])](sample_data, sample_index)
```
Values of both samples are in the `sample_data` column. If `sample_index` equals to 0 then the value in that row belongs to the sample from the first population. Otherwise it belongs to the sample from the second population.
The null hypothesis is that two populations are stochastically equal. Also one-sided hypothesises can be tested. This test does not assume that data have normal distribution.
**Parameters**
- `alternative` — alternative hypothesis. (Optional, default: `'two-sided'`.) [String](../../../sql-reference/data-types/string.md).
- `'two-sided'`;
- `'greater'`;
- `'less'`.
- `continuity_correction` - if not 0 then continuity correction in the normal approximation for the p-value is applied. (Optional, default: 1.) [UInt64](../../../sql-reference/data-types/int-uint.md).
- `sample_data` — sample data. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) or [Decimal](../../../sql-reference/data-types/decimal.md).
- `sample_index` — sample index. [Integer](../../../sql-reference/data-types/int-uint.md).
**Returned values**
[Tuple](../../../sql-reference/data-types/tuple.md) with two elements:
- calculated U-statistic. [Float64](../../../sql-reference/data-types/float.md).
- calculated p-value. [Float64](../../../sql-reference/data-types/float.md).
**Example**
Input table:
``` text
┌─sample_data─┬─sample_index─┐
│ 10 │ 0 │
│ 11 │ 0 │
│ 12 │ 0 │
│ 1 │ 1 │
│ 2 │ 1 │
│ 3 │ 1 │
└─────────────┴──────────────┘
```
Query:
``` sql
SELECT mannWhitneyUTest('greater')(sample_data, sample_index) FROM mww_ttest;
```
Result:
``` text
┌─mannWhitneyUTest('greater')(sample_data, sample_index)─┐
│ (9,0.04042779918503192) │
└────────────────────────────────────────────────────────┘
```
**See Also**
- [MannWhitney U test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test)
- [Stochastic ordering](https://en.wikipedia.org/wiki/Stochastic_ordering)
[Original article](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/mannwhitneyutest/) <!--hide-->

View File

@ -79,6 +79,40 @@ Result:
└───────────────────────────────────────────────┘ └───────────────────────────────────────────────┘
``` ```
# quantilesTimingWeighted {#quantilestimingweighted}
Same as `quantileTimingWeighted`, but accept multiple parameters with quantile levels and return an Array filled with many values of that quantiles.
**Example**
Input table:
``` text
┌─response_time─┬─weight─┐
│ 68 │ 1 │
│ 104 │ 2 │
│ 112 │ 3 │
│ 126 │ 2 │
│ 138 │ 1 │
│ 162 │ 1 │
└───────────────┴────────┘
```
Query:
``` sql
SELECT quantilesTimingWeighted(0,5, 0.99)(response_time, weight) FROM t
```
Result:
``` text
┌─quantilesTimingWeighted(0.5, 0.99)(response_time, weight)─┐
│ [112,162] │
└───────────────────────────────────────────────────────────┘
```
**See Also** **See Also**
- [median](../../../sql-reference/aggregate-functions/reference/median.md#median) - [median](../../../sql-reference/aggregate-functions/reference/median.md#median)

View File

@ -0,0 +1,65 @@
---
toc_priority: 300
toc_title: studentTTest
---
# studentTTest {#studentttest}
Applies Student's t-test to samples from two populations.
**Syntax**
``` sql
studentTTest(sample_data, sample_index)
```
Values of both samples are in the `sample_data` column. If `sample_index` equals to 0 then the value in that row belongs to the sample from the first population. Otherwise it belongs to the sample from the second population.
The null hypothesis is that means of populations are equal. Normal distribution with equal variances is assumed.
**Parameters**
- `sample_data` — sample data. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) or [Decimal](../../../sql-reference/data-types/decimal.md).
- `sample_index` — sample index. [Integer](../../../sql-reference/data-types/int-uint.md).
**Returned values**
[Tuple](../../../sql-reference/data-types/tuple.md) with two elements:
- calculated t-statistic. [Float64](../../../sql-reference/data-types/float.md).
- calculated p-value. [Float64](../../../sql-reference/data-types/float.md).
**Example**
Input table:
``` text
┌─sample_data─┬─sample_index─┐
│ 20.3 │ 0 │
│ 21.1 │ 0 │
│ 21.9 │ 1 │
│ 21.7 │ 0 │
│ 19.9 │ 1 │
│ 21.8 │ 1 │
└─────────────┴──────────────┘
```
Query:
``` sql
SELECT studentTTest(sample_data, sample_index) FROM student_ttest;
```
Result:
``` text
┌─studentTTest(sample_data, sample_index)───┐
│ (-0.21739130434783777,0.8385421208415731) │
└───────────────────────────────────────────┘
```
**See Also**
- [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test)
- [welchTTest function](welchttest.md#welchttest)
[Original article](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/studentttest/) <!--hide-->

View File

@ -0,0 +1,65 @@
---
toc_priority: 301
toc_title: welchTTest
---
# welchTTest {#welchttest}
Applies Welch's t-test to samples from two populations.
**Syntax**
``` sql
welchTTest(sample_data, sample_index)
```
Values of both samples are in the `sample_data` column. If `sample_index` equals to 0 then the value in that row belongs to the sample from the first population. Otherwise it belongs to the sample from the second population.
The null hypothesis is that means of populations are equal. Normal distribution is assumed. Populations may have unequal variance.
**Parameters**
- `sample_data` — sample data. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) or [Decimal](../../../sql-reference/data-types/decimal.md).
- `sample_index` — sample index. [Integer](../../../sql-reference/data-types/int-uint.md).
**Returned values**
[Tuple](../../../sql-reference/data-types/tuple.md) with two elements:
- calculated t-statistic. [Float64](../../../sql-reference/data-types/float.md).
- calculated p-value. [Float64](../../../sql-reference/data-types/float.md).
**Example**
Input table:
``` text
┌─sample_data─┬─sample_index─┐
│ 20.3 │ 0 │
│ 22.1 │ 0 │
│ 21.9 │ 0 │
│ 18.9 │ 1 │
│ 20.3 │ 1 │
│ 19 │ 1 │
└─────────────┴──────────────┘
```
Query:
``` sql
SELECT welchTTest(sample_data, sample_index) FROM welch_ttest;
```
Result:
``` text
┌─welchTTest(sample_data, sample_index)─────┐
│ (2.7988719532211235,0.051807360348581945) │
└───────────────────────────────────────────┘
```
**See Also**
- [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test)
- [studentTTest function](studentttest.md#studentttest)
[Original article](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/welchTTest/) <!--hide-->

View File

@ -45,6 +45,8 @@ SELECT [1, 2] AS x, toTypeName(x)
## Working with Data Types {#working-with-data-types} ## Working with Data Types {#working-with-data-types}
The maximum size of an array is limited to one million elements.
When creating an array on the fly, ClickHouse automatically defines the argument type as the narrowest data type that can store all the listed arguments. If there are any [Nullable](../../sql-reference/data-types/nullable.md#data_type-nullable) or literal [NULL](../../sql-reference/syntax.md#null-literal) values, the type of an array element also becomes [Nullable](../../sql-reference/data-types/nullable.md). When creating an array on the fly, ClickHouse automatically defines the argument type as the narrowest data type that can store all the listed arguments. If there are any [Nullable](../../sql-reference/data-types/nullable.md#data_type-nullable) or literal [NULL](../../sql-reference/syntax.md#null-literal) values, the type of an array element also becomes [Nullable](../../sql-reference/data-types/nullable.md).
If ClickHouse couldnt determine the data type, it generates an exception. For instance, this happens when trying to create an array with strings and numbers simultaneously (`SELECT array(1, 'a')`). If ClickHouse couldnt determine the data type, it generates an exception. For instance, this happens when trying to create an array with strings and numbers simultaneously (`SELECT array(1, 'a')`).

View File

@ -1288,73 +1288,226 @@ Returns the index of the first element in the `arr1` array for which `func` retu
Note that the `arrayFirstIndex` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted. Note that the `arrayFirstIndex` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it cant be omitted.
## arrayMin(\[func,\] arr1, …) {#array-min} ## arrayMin {#array-min}
Returns the min of the `func` values. If the function is omitted, it just returns the min of the array elements. Returns the minimum of elements in the source array.
If the `func` function is specified, returns the mininum of elements converted by this function.
Note that the `arrayMin` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. Note that the `arrayMin` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument.
Examples: **Syntax**
```sql ```sql
SELECT arrayMin([1, 2, 4]) AS res arrayMin([func,] arr)
```
**Parameters**
- `func` — Function. [Expression](../../sql-reference/data-types/special-data-types/expression.md).
- `arr` — Array. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- The minimum of function values (or the array minimum).
Type: if `func` is specified, matches `func` return value type, else matches the array elements type.
**Examples**
Query:
```sql
SELECT arrayMin([1, 2, 4]) AS res;
```
Result:
```text
┌─res─┐ ┌─res─┐
│ 1 │ │ 1 │
└─────┘ └─────┘
```
Query:
SELECT arrayMin(x -> (-x), [1, 2, 4]) AS res ```sql
SELECT arrayMin(x -> (-x), [1, 2, 4]) AS res;
```
Result:
```text
┌─res─┐ ┌─res─┐
│ -4 │ │ -4 │
└─────┘ └─────┘
``` ```
## arrayMax(\[func,\] arr1, …) {#array-max} ## arrayMax {#array-max}
Returns the max of the `func` values. If the function is omitted, it just returns the max of the array elements. Returns the maximum of elements in the source array.
If the `func` function is specified, returns the maximum of elements converted by this function.
Note that the `arrayMax` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. Note that the `arrayMax` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument.
Examples: **Syntax**
```sql ```sql
SELECT arrayMax([1, 2, 4]) AS res arrayMax([func,] arr)
```
**Parameters**
- `func` — Function. [Expression](../../sql-reference/data-types/special-data-types/expression.md).
- `arr` — Array. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- The maximum of function values (or the array maximum).
Type: if `func` is specified, matches `func` return value type, else matches the array elements type.
**Examples**
Query:
```sql
SELECT arrayMax([1, 2, 4]) AS res;
```
Result:
```text
┌─res─┐ ┌─res─┐
│ 4 │ │ 4 │
└─────┘ └─────┘
```
Query:
SELECT arrayMax(x -> (-x), [1, 2, 4]) AS res ```sql
SELECT arrayMax(x -> (-x), [1, 2, 4]) AS res;
```
Result:
```text
┌─res─┐ ┌─res─┐
│ -1 │ │ -1 │
└─────┘ └─────┘
``` ```
## arraySum(\[func,\] arr1, …) {#array-sum} ## arraySum {#array-sum}
Returns the sum of the `func` values. If the function is omitted, it just returns the sum of the array elements. Returns the sum of elements in the source array.
If the `func` function is specified, returns the sum of elements converted by this function.
Note that the `arraySum` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. Note that the `arraySum` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument.
Examples: **Syntax**
```sql ```sql
SELECT arraySum([2,3]) AS res arraySum([func,] arr)
```
**Parameters**
- `func` — Function. [Expression](../../sql-reference/data-types/special-data-types/expression.md).
- `arr` — Array. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- The sum of the function values (or the array sum).
Type: for decimal numbers in source array (or for converted values, if `func` is specified) — [Decimal128](../../sql-reference/data-types/decimal.md), for floating point numbers — [Float64](../../sql-reference/data-types/float.md), for numeric unsigned — [UInt64](../../sql-reference/data-types/int-uint.md), and for numeric signed — [Int64](../../sql-reference/data-types/int-uint.md).
**Examples**
Query:
```sql
SELECT arraySum([2, 3]) AS res;
```
Result:
```text
┌─res─┐ ┌─res─┐
│ 5 │ │ 5 │
└─────┘ └─────┘
```
Query:
SELECT arraySum(x -> x*x, [2, 3]) AS res ```sql
SELECT arraySum(x -> x*x, [2, 3]) AS res;
```
Result:
```text
┌─res─┐ ┌─res─┐
│ 13 │ │ 13 │
└─────┘ └─────┘
``` ```
## arrayAvg {#array-avg}
## arrayAvg(\[func,\] arr1, …) {#array-avg} Returns the average of elements in the source array.
Returns the average of the `func` values. If the function is omitted, it just returns the average of the array elements. If the `func` function is specified, returns the average of elements converted by this function.
Note that the `arrayAvg` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. Note that the `arrayAvg` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument.
**Syntax**
```sql
arrayAvg([func,] arr)
```
**Parameters**
- `func` — Function. [Expression](../../sql-reference/data-types/special-data-types/expression.md).
- `arr` — Array. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- The average of function values (or the array average).
Type: [Float64](../../sql-reference/data-types/float.md).
**Examples**
Query:
```sql
SELECT arrayAvg([1, 2, 4]) AS res;
```
Result:
```text
┌────────────────res─┐
│ 2.3333333333333335 │
└────────────────────┘
```
Query:
```sql
SELECT arrayAvg(x -> (x * x), [2, 4]) AS res;
```
Result:
```text
┌─res─┐
│ 10 │
└─────┘
```
## arrayCumSum(\[func,\] arr1, …) {#arraycumsumfunc-arr1} ## arrayCumSum(\[func,\] arr1, …) {#arraycumsumfunc-arr1}
Returns an array of partial sums of elements in the source array (a running sum). If the `func` function is specified, then the values of the array elements are converted by this function before summing. Returns an array of partial sums of elements in the source array (a running sum). If the `func` function is specified, then the values of the array elements are converted by this function before summing.

View File

@ -380,7 +380,7 @@ Alias: `dateTrunc`.
**Parameters** **Parameters**
- `unit`Part of date. [String](../syntax.md#syntax-string-literal). - `unit`The type of interval to truncate the result. [String Literal](../syntax.md#syntax-string-literal).
Possible values: Possible values:
- `second` - `second`
@ -435,6 +435,201 @@ Result:
- [toStartOfInterval](#tostartofintervaltime-or-data-interval-x-unit-time-zone) - [toStartOfInterval](#tostartofintervaltime-or-data-interval-x-unit-time-zone)
## date\_add {#date_add}
Adds specified date/time interval to the provided date.
**Syntax**
``` sql
date_add(unit, value, date)
```
Aliases: `dateAdd`, `DATE_ADD`.
**Parameters**
- `unit` — The type of interval to add. [String](../../sql-reference/data-types/string.md).
Supported values: second, minute, hour, day, week, month, quarter, year.
- `value` - Value in specified unit - [Int](../../sql-reference/data-types/int-uint.md)
- `date` — [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
**Returned value**
Returns Date or DateTime with `value` expressed in `unit` added to `date`.
**Example**
```sql
select date_add(YEAR, 3, toDate('2018-01-01'));
```
```text
┌─plus(toDate('2018-01-01'), toIntervalYear(3))─┐
│ 2021-01-01 │
└───────────────────────────────────────────────┘
```
## date\_diff {#date_diff}
Returns the difference between two Date or DateTime values.
**Syntax**
``` sql
date_diff('unit', startdate, enddate, [timezone])
```
Aliases: `dateDiff`, `DATE_DIFF`.
**Parameters**
- `unit` — The type of interval for result [String](../../sql-reference/data-types/string.md).
Supported values: second, minute, hour, day, week, month, quarter, year.
- `startdate` — The first time value to subtract (the subtrahend). [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
- `enddate` — The second time value to subtract from (the minuend). [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
- `timezone` — Optional parameter. If specified, it is applied to both `startdate` and `enddate`. If not specified, timezones of `startdate` and `enddate` are used. If they are not the same, the result is unspecified.
**Returned value**
Difference between `enddate` and `startdate` expressed in `unit`.
Type: `int`.
**Example**
Query:
``` sql
SELECT dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-02 23:00:00'));
```
Result:
``` text
┌─dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-02 23:00:00'))─┐
│ 25 │
└────────────────────────────────────────────────────────────────────────────────────────┘
```
## date\_sub {#date_sub}
Subtracts a time/date interval from the provided date.
**Syntax**
``` sql
date_sub(unit, value, date)
```
Aliases: `dateSub`, `DATE_SUB`.
**Parameters**
- `unit` — The type of interval to subtract. [String](../../sql-reference/data-types/string.md).
Supported values: second, minute, hour, day, week, month, quarter, year.
- `value` - Value in specified unit - [Int](../../sql-reference/data-types/int-uint.md)
- `date` — [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md) to subtract value from.
**Returned value**
Returns Date or DateTime with `value` expressed in `unit` subtracted from `date`.
**Example**
Query:
``` sql
SELECT date_sub(YEAR, 3, toDate('2018-01-01'));
```
Result:
``` text
┌─minus(toDate('2018-01-01'), toIntervalYear(3))─┐
│ 2015-01-01 │
└────────────────────────────────────────────────┘
```
## timestamp\_add {#timestamp_add}
Adds the specified time value with the provided date or date time value.
**Syntax**
``` sql
timestamp_add(date, INTERVAL value unit)
```
Aliases: `timeStampAdd`, `TIMESTAMP_ADD`.
**Parameters**
- `date` — Date or Date with time - [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
- `value` - Value in specified unit - [Int](../../sql-reference/data-types/int-uint.md)
- `unit` — The type of interval to add. [String](../../sql-reference/data-types/string.md).
Supported values: second, minute, hour, day, week, month, quarter, year.
**Returned value**
Returns Date or DateTime with the specified `value` expressed in `unit` added to `date`.
**Example**
```sql
select timestamp_add(toDate('2018-01-01'), INTERVAL 3 MONTH);
```
```text
┌─plus(toDate('2018-01-01'), toIntervalMonth(3))─┐
│ 2018-04-01 │
└────────────────────────────────────────────────┘
```
## timestamp\_sub {#timestamp_sub}
Returns the difference between two dates in the specified unit.
**Syntax**
``` sql
timestamp_sub(unit, value, date)
```
Aliases: `timeStampSub`, `TIMESTAMP_SUB`.
**Parameters**
- `unit` — The type of interval to add. [String](../../sql-reference/data-types/string.md).
Supported values: second, minute, hour, day, week, month, quarter, year.
- `value` - Value in specified unit - [Int](../../sql-reference/data-types/int-uint.md).
- `date`- [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
**Returned value**
Difference between `date` and the specified `value` expressed in `unit`.
**Example**
```sql
select timestamp_sub(MONTH, 5, toDateTime('2018-12-18 01:02:03'));
```
```text
┌─minus(toDateTime('2018-12-18 01:02:03'), toIntervalMonth(5))─┐
│ 2018-07-18 01:02:03 │
└──────────────────────────────────────────────────────────────┘
```
## now {#now} ## now {#now}
Returns the current date and time. Returns the current date and time.
@ -550,50 +745,6 @@ SELECT
└──────────────────────────┴───────────────────────────────┘ └──────────────────────────┴───────────────────────────────┘
``` ```
## dateDiff {#datediff}
Returns the difference between two Date or DateTime values.
**Syntax**
``` sql
dateDiff('unit', startdate, enddate, [timezone])
```
**Parameters**
- `unit` — Time unit, in which the returned value is expressed. [String](../../sql-reference/syntax.md#syntax-string-literal).
Supported values: second, minute, hour, day, week, month, quarter, year.
- `startdate` — The first time value to compare. [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
- `enddate` — The second time value to compare. [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md).
- `timezone` — Optional parameter. If specified, it is applied to both `startdate` and `enddate`. If not specified, timezones of `startdate` and `enddate` are used. If they are not the same, the result is unspecified.
**Returned value**
Difference between `startdate` and `enddate` expressed in `unit`.
Type: `int`.
**Example**
Query:
``` sql
SELECT dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-02 23:00:00'));
```
Result:
``` text
┌─dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-02 23:00:00'))─┐
│ 25 │
└────────────────────────────────────────────────────────────────────────────────────────┘
```
## timeSlots(StartTime, Duration,\[, Size\]) {#timeslotsstarttime-duration-size} ## timeSlots(StartTime, Duration,\[, Size\]) {#timeslotsstarttime-duration-size}
For a time interval starting at StartTime and continuing for Duration seconds, it returns an array of moments in time, consisting of points from this interval rounded down to the Size in seconds. Size is an optional parameter: a constant UInt32, set to 1800 by default. For a time interval starting at StartTime and continuing for Duration seconds, it returns an array of moments in time, consisting of points from this interval rounded down to the Size in seconds. Size is an optional parameter: a constant UInt32, set to 1800 by default.

View File

@ -5,7 +5,7 @@ toc_title: QUOTA
# ALTER QUOTA {#alter-quota-statement} # ALTER QUOTA {#alter-quota-statement}
Changes [quotas](../../../operations/access-rights.md#quotas-management). Changes quotas.
Syntax: Syntax:
@ -14,13 +14,13 @@ ALTER QUOTA [IF EXISTS] name [ON CLUSTER cluster_name]
[RENAME TO new_name] [RENAME TO new_name]
[KEYED BY {user_name | ip_address | client_key | client_key,user_name | client_key,ip_address} | NOT KEYED] [KEYED BY {user_name | ip_address | client_key | client_key,user_name | client_key,ip_address} | NOT KEYED]
[FOR [RANDOMIZED] INTERVAL number {second | minute | hour | day | week | month | quarter | year} [FOR [RANDOMIZED] INTERVAL number {second | minute | hour | day | week | month | quarter | year}
{MAX { {queries | errors | result_rows | result_bytes | read_rows | read_bytes | execution_time} = number } [,...] | {MAX { {queries | query_selects | query_inserts | errors | result_rows | result_bytes | read_rows | read_bytes | execution_time} = number } [,...] |
NO LIMITS | TRACKING ONLY} [,...]] NO LIMITS | TRACKING ONLY} [,...]]
[TO {role [,...] | ALL | ALL EXCEPT role [,...]}] [TO {role [,...] | ALL | ALL EXCEPT role [,...]}]
``` ```
Keys `user_name`, `ip_address`, `client_key`, `client_key, user_name` and `client_key, ip_address` correspond to the fields in the [system.quotas](../../../operations/system-tables/quotas.md) table. Keys `user_name`, `ip_address`, `client_key`, `client_key, user_name` and `client_key, ip_address` correspond to the fields in the [system.quotas](../../../operations/system-tables/quotas.md) table.
Parameters `queries`, `errors`, `result_rows`, `result_bytes`, `read_rows`, `read_bytes`, `execution_time` correspond to the fields in the [system.quotas_usage](../../../operations/system-tables/quotas_usage.md) table. Parameters `queries`, `query_selects`, 'query_inserts', errors`, `result_rows`, `result_bytes`, `read_rows`, `read_bytes`, `execution_time` correspond to the fields in the [system.quotas_usage](../../../operations/system-tables/quotas_usage.md) table.
`ON CLUSTER` clause allows creating quotas on a cluster, see [Distributed DDL](../../../sql-reference/distributed-ddl.md). `ON CLUSTER` clause allows creating quotas on a cluster, see [Distributed DDL](../../../sql-reference/distributed-ddl.md).

View File

@ -13,14 +13,14 @@ Syntax:
CREATE QUOTA [IF NOT EXISTS | OR REPLACE] name [ON CLUSTER cluster_name] CREATE QUOTA [IF NOT EXISTS | OR REPLACE] name [ON CLUSTER cluster_name]
[KEYED BY {user_name | ip_address | client_key | client_key,user_name | client_key,ip_address} | NOT KEYED] [KEYED BY {user_name | ip_address | client_key | client_key,user_name | client_key,ip_address} | NOT KEYED]
[FOR [RANDOMIZED] INTERVAL number {second | minute | hour | day | week | month | quarter | year} [FOR [RANDOMIZED] INTERVAL number {second | minute | hour | day | week | month | quarter | year}
{MAX { {queries | errors | result_rows | result_bytes | read_rows | read_bytes | execution_time} = number } [,...] | {MAX { {queries | query_selects | query_inserts | errors | result_rows | result_bytes | read_rows | read_bytes | execution_time} = number } [,...] |
NO LIMITS | TRACKING ONLY} [,...]] NO LIMITS | TRACKING ONLY} [,...]]
[TO {role [,...] | ALL | ALL EXCEPT role [,...]}] [TO {role [,...] | ALL | ALL EXCEPT role [,...]}]
``` ```
Keys `user_name`, `ip_address`, `client_key`, `client_key, user_name` and `client_key, ip_address` correspond to the fields in the [system.quotas](../../../operations/system-tables/quotas.md) table. Keys `user_name`, `ip_address`, `client_key`, `client_key, user_name` and `client_key, ip_address` correspond to the fields in the [system.quotas](../../../operations/system-tables/quotas.md) table.
Parameters `queries`, `errors`, `result_rows`, `result_bytes`, `read_rows`, `read_bytes`, `execution_time` correspond to the fields in the [system.quotas_usage](../../../operations/system-tables/quotas_usage.md) table. Parameters `queries`, `query_selects`, `query_inserts`, `errors`, `result_rows`, `result_bytes`, `read_rows`, `read_bytes`, `execution_time` correspond to the fields in the [system.quotas_usage](../../../operations/system-tables/quotas_usage.md) table.
`ON CLUSTER` clause allows creating quotas on a cluster, see [Distributed DDL](../../../sql-reference/distributed-ddl.md). `ON CLUSTER` clause allows creating quotas on a cluster, see [Distributed DDL](../../../sql-reference/distributed-ddl.md).

View File

@ -1,9 +1,14 @@
# [development] Window Functions ---
toc_priority: 62
toc_title: Window Functions
---
# [experimental] Window Functions
!!! warning "Warning" !!! warning "Warning"
This is an experimental feature that is currently in development and is not ready This is an experimental feature that is currently in development and is not ready
for general use. It will change in unpredictable backwards-incompatible ways in for general use. It will change in unpredictable backwards-incompatible ways in
the future releases. the future releases. Set `allow_experimental_window_functions = 1` to enable it.
ClickHouse currently supports calculation of aggregate functions over a window. ClickHouse currently supports calculation of aggregate functions over a window.
Pure window functions such as `rank`, `lag`, `lead` and so on are not yet supported. Pure window functions such as `rank`, `lag`, `lead` and so on are not yet supported.
@ -11,9 +16,7 @@ Pure window functions such as `rank`, `lag`, `lead` and so on are not yet suppor
The window can be specified either with an `OVER` clause or with a separate The window can be specified either with an `OVER` clause or with a separate
`WINDOW` clause. `WINDOW` clause.
Only two variants of frame are supported, `ROWS` and `RANGE`. The only supported Only two variants of frame are supported, `ROWS` and `RANGE`. Offsets for the `RANGE` frame are not yet supported.
frame boundaries are `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`.
## References ## References
@ -28,6 +31,7 @@ https://github.com/ClickHouse/ClickHouse/blob/master/tests/performance/window_fu
https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/01591_window_functions.sql https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/01591_window_functions.sql
### Postgres Docs ### Postgres Docs
https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW
https://www.postgresql.org/docs/devel/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS https://www.postgresql.org/docs/devel/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
https://www.postgresql.org/docs/devel/functions-window.html https://www.postgresql.org/docs/devel/functions-window.html
https://www.postgresql.org/docs/devel/tutorial-window.html https://www.postgresql.org/docs/devel/tutorial-window.html

View File

@ -714,6 +714,7 @@ auto s = std::string{"Hello"};
### Пользовательская ошибка {#error-messages-user-error} ### Пользовательская ошибка {#error-messages-user-error}
Такая ошибка вызвана действиями пользователя (неверный синтаксис запроса) или конфигурацией внешних систем (кончилось место на диске). Предполагается, что пользователь может устранить её самостоятельно. Для этого в сообщении об ошибке должна содержаться следующая информация: Такая ошибка вызвана действиями пользователя (неверный синтаксис запроса) или конфигурацией внешних систем (кончилось место на диске). Предполагается, что пользователь может устранить её самостоятельно. Для этого в сообщении об ошибке должна содержаться следующая информация:
* что произошло. Это должно объясняться в пользовательских терминах (`Function pow() is not supported for data type UInt128`), а не загадочными конструкциями из кода (`runtime overload resolution failed in DB::BinaryOperationBuilder<FunctionAdaptor<pow>::Impl, UInt128, Int8>::kaboongleFastPath()`). * что произошло. Это должно объясняться в пользовательских терминах (`Function pow() is not supported for data type UInt128`), а не загадочными конструкциями из кода (`runtime overload resolution failed in DB::BinaryOperationBuilder<FunctionAdaptor<pow>::Impl, UInt128, Int8>::kaboongleFastPath()`).
* почему/где/когда -- любой контекст, который помогает отладить проблему. Представьте, как бы её отлаживали вы (программировать и пользоваться отладчиком нельзя). * почему/где/когда -- любой контекст, который помогает отладить проблему. Представьте, как бы её отлаживали вы (программировать и пользоваться отладчиком нельзя).
* что можно предпринять для устранения ошибки. Здесь можно перечислить типичные причины проблемы, настройки, влияющие на это поведение, и так далее. * что можно предпринять для устранения ошибки. Здесь можно перечислить типичные причины проблемы, настройки, влияющие на это поведение, и так далее.

View File

@ -6,29 +6,62 @@
Столбцы: Столбцы:
- `event_type` (Enum) — тип события. Столбец может содержать одно из следующих значений: - `query_id` ([String](../../sql-reference/data-types/string.md)) — идентификатор запроса `INSERT`, создавшего этот кусок.
- `event_type` ([Enum8](../../sql-reference/data-types/enum.md)) — тип события. Столбец может содержать одно из следующих значений:
- `NEW_PART` — вставка нового куска. - `NEW_PART` — вставка нового куска.
- `MERGE_PARTS` — слияние кусков. - `MERGE_PARTS` — слияние кусков.
- `DOWNLOAD_PART` — загрузка с реплики. - `DOWNLOAD_PART` — загрузка с реплики.
- `REMOVE_PART` — удаление или отсоединение из таблицы с помощью [DETACH PARTITION](../../sql-reference/statements/alter/partition.md#alter_detach-partition). - `REMOVE_PART` — удаление или отсоединение из таблицы с помощью [DETACH PARTITION](../../sql-reference/statements/alter/partition.md#alter_detach-partition).
- `MUTATE_PART` — изменение куска. - `MUTATE_PART` — изменение куска.
- `MOVE_PART` — перемещение куска между дисками. - `MOVE_PART` — перемещение куска между дисками.
- `event_date` (Date) — дата события. - `event_date` ([Date](../../sql-reference/data-types/date.md)) — дата события.
- `event_time` (DateTime) — время события. - `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — время события.
- `duration_ms` (UInt64) — длительность. - `duration_ms` ([UInt64](../../sql-reference/data-types/int-uint.md)) — длительность.
- `database` (String) — имя базы данных, в которой находится кусок. - `database` ([String](../../sql-reference/data-types/string.md)) — имя базы данных, в которой находится кусок.
- `table` (String) — имя таблицы, в которой находится кусок. - `table` ([String](../../sql-reference/data-types/string.md)) — имя таблицы, в которой находится кусок.
- `part_name` (String) — имя куска. - `part_name` ([String](../../sql-reference/data-types/string.md)) — имя куска.
- `partition_id` (String) — идентификатор партиции, в которую был добавлен кусок. В столбце будет значение all, если таблица партициируется по выражению `tuple()`. - `partition_id` ([String](../../sql-reference/data-types/string.md)) — идентификатор партиции, в которую был добавлен кусок. В столбце будет значение `all`, если таблица партициируется по выражению `tuple()`.
- `rows` (UInt64) — число строк в куске. - `path_on_disk` ([String](../../sql-reference/data-types/string.md)) — абсолютный путь к папке с файлами кусков данных.
- `size_in_bytes` (UInt64) — размер куска данных в байтах. - `rows` ([UInt64](../../sql-reference/data-types/int-uint.md)) — число строк в куске.
- `merged_from` (Array(String)) — массив имён кусков, из которых образован текущий кусок в результате слияния (также столбец заполняется в случае скачивания уже смерженного куска). - `size_in_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — размер куска данных в байтах.
- `bytes_uncompressed` (UInt64) — количество прочитанных разжатых байт. - `merged_from` ([Array(String)](../../sql-reference/data-types/array.md)) — массив имён кусков, из которых образован текущий кусок в результате слияния (также столбец заполняется в случае скачивания уже смерженного куска).
- `read_rows` (UInt64) — сколько было прочитано строк при слиянии кусков. - `bytes_uncompressed` ([UInt64](../../sql-reference/data-types/int-uint.md)) — количество прочитанных не сжатых байт.
- `read_bytes` (UInt64) — сколько было прочитано байт при слиянии кусков. - `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md)) — сколько было прочитано строк при слиянии кусков.
- `error` (UInt16) — код ошибки, возникшей при текущем событии. - `read_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — сколько было прочитано байт при слиянии кусков.
- `exception` (String) — текст ошибки. - `peak_memory_usage` ([Int64](../../sql-reference/data-types/int-uint.md)) — максимальная разница между выделенной и освобождённой памятью в контексте потока.
- `error` ([UInt16](../../sql-reference/data-types/int-uint.md)) — код ошибки, возникшей при текущем событии.
- `exception` ([String](../../sql-reference/data-types/string.md)) — текст ошибки.
Системная таблица `system.part_log` будет создана после первой вставки данных в таблицу `MergeTree`. Системная таблица `system.part_log` будет создана после первой вставки данных в таблицу `MergeTree`.
**Пример**
``` sql
SELECT * FROM system.part_log LIMIT 1 FORMAT Vertical;
```
``` text
Row 1:
──────
query_id: 983ad9c7-28d5-4ae1-844e-603116b7de31
event_type: NewPart
event_date: 2021-02-02
event_time: 2021-02-02 11:14:28
duration_ms: 35
database: default
table: log_mt_2
part_name: all_1_1_0
partition_id: all
path_on_disk: db/data/default/log_mt_2/all_1_1_0/
rows: 115418
size_in_bytes: 1074311
merged_from: []
bytes_uncompressed: 0
read_rows: 0
read_bytes: 0
peak_memory_usage: 0
error: 0
exception:
```
[Оригинальная статья](https://clickhouse.tech/docs/ru/operations/system_tables/part_log) <!--hide--> [Оригинальная статья](https://clickhouse.tech/docs/ru/operations/system_tables/part_log) <!--hide-->

View File

@ -4,8 +4,63 @@ toc_priority: 106
# argMax {#agg-function-argmax} # argMax {#agg-function-argmax}
Синтаксис: `argMax(arg, val)` Вычисляет значение `arg` при максимальном значении `val`. Если есть несколько разных значений `arg` для максимальных значений `val`, возвращает первое попавшееся из таких значений.
Вычисляет значение arg при максимальном значении val. Если есть несколько разных значений arg для максимальных значений val, то выдаётся первое попавшееся из таких значений. Если функции передан кортеж, то будет выведен кортеж с максимальным значением `val`. Удобно использовать для работы с [SimpleAggregateFunction](../../../sql-reference/data-types/simpleaggregatefunction.md).
[Оригинальная статья](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/argmax/) <!--hide--> **Синтаксис**
``` sql
argMax(arg, val)
```
или
``` sql
argMax(tuple(arg, val))
```
**Параметры**
- `arg` — аргумент.
- `val` — значение.
**Возвращаемое значение**
- Значение `arg`, соответствующее максимальному значению `val`.
Тип: соответствует типу `arg`.
Если передан кортеж:
- Кортеж `(arg, val)` c максимальным значением `val` и соответствующим ему `arg`.
Тип: [Tuple](../../../sql-reference/data-types/tuple.md).
**Пример**
Исходная таблица:
``` text
┌─user─────┬─salary─┐
│ director │ 5000 │
│ manager │ 3000 │
│ worker │ 1000 │
└──────────┴────────┘
```
Запрос:
``` sql
SELECT argMax(user, salary), argMax(tuple(user, salary)) FROM salary;
```
Результат:
``` text
┌─argMax(user, salary)─┬─argMax(tuple(user, salary))─┐
│ director │ ('director',5000) │
└──────────────────────┴─────────────────────────────┘
```
[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/aggregate-functions/reference/argmax/) <!--hide-->

View File

@ -4,11 +4,42 @@ toc_priority: 105
# argMin {#agg-function-argmin} # argMin {#agg-function-argmin}
Синтаксис: `argMin(arg, val)` Вычисляет значение `arg` при минимальном значении `val`. Если есть несколько разных значений `arg` для минимальных значений `val`, возвращает первое попавшееся из таких значений.
Вычисляет значение arg при минимальном значении val. Если есть несколько разных значений arg для минимальных значений val, то выдаётся первое попавшееся из таких значений. Если функции передан кортеж, то будет выведен кортеж с минимальным значением `val`. Удобно использовать для работы с [SimpleAggregateFunction](../../../sql-reference/data-types/simpleaggregatefunction.md).
**Пример:** **Синтаксис**
``` sql
argMin(arg, val)
```
или
``` sql
argMin(tuple(arg, val))
```
**Параметры**
- `arg` — аргумент.
- `val` — значение.
**Возвращаемое значение**
- Значение `arg`, соответствующее минимальному значению `val`.
Тип: соответствует типу `arg`.
Если передан кортеж:
- Кортеж `(arg, val)` c минимальным значением `val` и соответствующим ему `arg`.
Тип: [Tuple](../../../sql-reference/data-types/tuple.md).
**Пример**
Исходная таблица:
``` text ``` text
┌─user─────┬─salary─┐ ┌─user─────┬─salary─┐
@ -18,14 +49,18 @@ toc_priority: 105
└──────────┴────────┘ └──────────┴────────┘
``` ```
Запрос:
``` sql ``` sql
SELECT argMin(user, salary) FROM salary SELECT argMin(user, salary), argMin(tuple(user, salary)) FROM salary;
``` ```
Результат:
``` text ``` text
┌─argMin(user, salary)─┐ ┌─argMin(user, salary)─┬─argMin(tuple(user, salary))─
│ worker │ │ worker │ ('worker',1000) │
└──────────────────────┘ └──────────────────────┴─────────────────────────────
``` ```
[Оригинальная статья](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/argmin/) <!--hide--> [Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/aggregate-functions/reference/argmin/) <!--hide-->

View File

@ -0,0 +1,71 @@
---
toc_priority: 310
toc_title: mannWhitneyUTest
---
# mannWhitneyUTest {#mannwhitneyutest}
Вычисляет U-критерий Манна — Уитни для выборок из двух генеральных совокупностей.
**Синтаксис**
``` sql
mannWhitneyUTest[(alternative[, continuity_correction])](sample_data, sample_index)
```
Значения выборок берутся из столбца `sample_data`. Если `sample_index` равно 0, то значение из этой строки принадлежит первой выборке. Во всех остальных случаях значение принадлежит второй выборке.
Проверяется нулевая гипотеза, что генеральные совокупности стохастически равны. Наряду с двусторонней гипотезой могут быть проверены и односторонние.
Для применения U-критерия Манна — Уитни закон распределения генеральных совокупностей не обязан быть нормальным.
**Параметры**
- `alternative` — альтернативная гипотеза. (Необязательный параметр, по умолчанию: `'two-sided'`.) [String](../../../sql-reference/data-types/string.md).
- `'two-sided'`;
- `'greater'`;
- `'less'`.
- `continuity_correction` - если не 0, то при вычислении p-значения применяется коррекция непрерывности. (Необязательный параметр, по умолчанию: 1.) [UInt64](../../../sql-reference/data-types/int-uint.md).
- `sample_data` — данные выборок. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) or [Decimal](../../../sql-reference/data-types/decimal.md).
- `sample_index` — индексы выборок. [Integer](../../../sql-reference/data-types/int-uint.md).
**Возвращаемые значения**
[Кортеж](../../../sql-reference/data-types/tuple.md) с двумя элементами:
- вычисленное значение критерия Манна — Уитни. [Float64](../../../sql-reference/data-types/float.md).
- вычисленное p-значение. [Float64](../../../sql-reference/data-types/float.md).
**Пример**
Таблица:
``` text
┌─sample_data─┬─sample_index─┐
│ 10 │ 0 │
│ 11 │ 0 │
│ 12 │ 0 │
│ 1 │ 1 │
│ 2 │ 1 │
│ 3 │ 1 │
└─────────────┴──────────────┘
```
Запрос:
``` sql
SELECT mannWhitneyUTest('greater')(sample_data, sample_index) FROM mww_ttest;
```
Результат:
``` text
┌─mannWhitneyUTest('greater')(sample_data, sample_index)─┐
│ (9,0.04042779918503192) │
└────────────────────────────────────────────────────────┘
```
**Смотрите также**
- [U-критерий Манна — Уитни](https://ru.wikipedia.org/wiki/U-%D0%BA%D1%80%D0%B8%D1%82%D0%B5%D1%80%D0%B8%D0%B9_%D0%9C%D0%B0%D0%BD%D0%BD%D0%B0_%E2%80%94_%D0%A3%D0%B8%D1%82%D0%BD%D0%B8)
[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/aggregate-functions/reference/mannwhitneyutest/) <!--hide-->

View File

@ -0,0 +1,65 @@
---
toc_priority: 300
toc_title: studentTTest
---
# studentTTest {#studentttest}
Вычисляет t-критерий Стьюдента для выборок из двух генеральных совокупностей.
**Синтаксис**
``` sql
studentTTest(sample_data, sample_index)
```
Значения выборок берутся из столбца `sample_data`. Если `sample_index` равно 0, то значение из этой строки принадлежит первой выборке. Во всех остальных случаях значение принадлежит второй выборке.
Проверяется нулевая гипотеза, что средние значения генеральных совокупностей совпадают. Для применения t-критерия Стьюдента распределение в генеральных совокупностях должно быть нормальным и дисперсии должны совпадать.
**Параметры**
- `sample_data` — данные выборок. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) or [Decimal](../../../sql-reference/data-types/decimal.md).
- `sample_index` — индексы выборок. [Integer](../../../sql-reference/data-types/int-uint.md).
**Возвращаемые значения**
[Кортеж](../../../sql-reference/data-types/tuple.md) с двумя элементами:
- вычисленное значение критерия Стьюдента. [Float64](../../../sql-reference/data-types/float.md).
- вычисленное p-значение. [Float64](../../../sql-reference/data-types/float.md).
**Пример**
Таблица:
``` text
┌─sample_data─┬─sample_index─┐
│ 20.3 │ 0 │
│ 21.1 │ 0 │
│ 21.9 │ 1 │
│ 21.7 │ 0 │
│ 19.9 │ 1 │
│ 21.8 │ 1 │
└─────────────┴──────────────┘
```
Запрос:
``` sql
SELECT studentTTest(sample_data, sample_index) FROM student_ttest;
```
Результат:
``` text
┌─studentTTest(sample_data, sample_index)───┐
│ (-0.21739130434783777,0.8385421208415731) │
└───────────────────────────────────────────┘
```
**Смотрите также**
- [t-критерий Стьюдента](https://ru.wikipedia.org/wiki/T-%D0%BA%D1%80%D0%B8%D1%82%D0%B5%D1%80%D0%B8%D0%B9_%D0%A1%D1%82%D1%8C%D1%8E%D0%B4%D0%B5%D0%BD%D1%82%D0%B0)
- [welchTTest](welchttest.md#welchttest)
[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/aggregate-functions/reference/studentttest/) <!--hide-->

View File

@ -0,0 +1,65 @@
---
toc_priority: 301
toc_title: welchTTest
---
# welchTTest {#welchttest}
Вычисляет t-критерий Уэлча для выборок из двух генеральных совокупностей.
**Синтаксис**
``` sql
welchTTest(sample_data, sample_index)
```
Значения выборок берутся из столбца `sample_data`. Если `sample_index` равно 0, то значение из этой строки принадлежит первой выборке. Во всех остальных случаях значение принадлежит второй выборке.
Проверяется нулевая гипотеза, что средние значения генеральных совокупностей совпадают. Для применения t-критерия Уэлча распределение в генеральных совокупностях должно быть нормальным. Дисперсии могут не совпадать.
**Параметры**
- `sample_data` — данные выборок. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) or [Decimal](../../../sql-reference/data-types/decimal.md).
- `sample_index` — индексы выборок. [Integer](../../../sql-reference/data-types/int-uint.md).
**Возвращаемые значения**
[Кортеж](../../../sql-reference/data-types/tuple.md) с двумя элементами:
- вычисленное значение критерия Уэлча. [Float64](../../../sql-reference/data-types/float.md).
- вычисленное p-значение. [Float64](../../../sql-reference/data-types/float.md).
**Пример**
Таблица:
``` text
┌─sample_data─┬─sample_index─┐
│ 20.3 │ 0 │
│ 22.1 │ 0 │
│ 21.9 │ 0 │
│ 18.9 │ 1 │
│ 20.3 │ 1 │
│ 19 │ 1 │
└─────────────┴──────────────┘
```
Запрос:
``` sql
SELECT welchTTest(sample_data, sample_index) FROM welch_ttest;
```
Результат:
``` text
┌─welchTTest(sample_data, sample_index)─────┐
│ (2.7988719532211235,0.051807360348581945) │
└───────────────────────────────────────────┘
```
**Смотрите также**
- [t-критерий Уэлча](https://ru.wikipedia.org/wiki/T-%D0%BA%D1%80%D0%B8%D1%82%D0%B5%D1%80%D0%B8%D0%B9_%D0%A3%D1%8D%D0%BB%D1%87%D0%B0)
- [studentTTest](studentttest.md#studentttest)
[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/aggregate-functions/reference/welchTTest/) <!--hide-->

View File

@ -47,6 +47,8 @@ SELECT [1, 2] AS x, toTypeName(x)
## Особенности работы с типами данных {#osobennosti-raboty-s-tipami-dannykh} ## Особенности работы с типами данных {#osobennosti-raboty-s-tipami-dannykh}
Максимальный размер массива ограничен одним миллионом элементов.
При создании массива «на лету» ClickHouse автоматически определяет тип аргументов как наиболее узкий тип данных, в котором можно хранить все перечисленные аргументы. Если среди аргументов есть [NULL](../../sql-reference/data-types/array.md#null-literal) или аргумент типа [Nullable](nullable.md#data_type-nullable), то тип элементов массива — [Nullable](nullable.md). При создании массива «на лету» ClickHouse автоматически определяет тип аргументов как наиболее узкий тип данных, в котором можно хранить все перечисленные аргументы. Если среди аргументов есть [NULL](../../sql-reference/data-types/array.md#null-literal) или аргумент типа [Nullable](nullable.md#data_type-nullable), то тип элементов массива — [Nullable](nullable.md).
Если ClickHouse не смог подобрать тип данных, то он сгенерирует исключение. Это произойдёт, например, при попытке создать массив одновременно со строками и числами `SELECT array(1, 'a')`. Если ClickHouse не смог подобрать тип данных, то он сгенерирует исключение. Это произойдёт, например, при попытке создать массив одновременно со строками и числами `SELECT array(1, 'a')`.

View File

@ -1135,11 +1135,225 @@ SELECT
Функция `arrayFirstIndex` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей нужно передать лямбда-функцию, и этот аргумент не может быть опущен. Функция `arrayFirstIndex` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей нужно передать лямбда-функцию, и этот аргумент не может быть опущен.
## arraySum(\[func,\] arr1, …) {#array-sum} ## arrayMin {#array-min}
Возвращает сумму значений функции `func`. Если функция не указана - просто возвращает сумму элементов массива. Возвращает значение минимального элемента в исходном массиве.
Функция `arraySum` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) - в качестве первого аргумента ей можно передать лямбда-функцию. Если передана функция `func`, возвращается минимум из элементов массива, преобразованных этой функцией.
Функция `arrayMin` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей можно передать лямбда-функцию.
**Синтаксис**
```sql
arrayMin([func,] arr)
```
**Параметры**
- `func` — функция. [Expression](../../sql-reference/data-types/special-data-types/expression.md).
- `arr` — массив. [Array](../../sql-reference/data-types/array.md).
**Возвращаемое значение**
- Минимальное значение функции (или минимальный элемент массива).
Тип: если передана `func`, соответствует типу ее возвращаемого значения, иначе соответствует типу элементов массива.
**Примеры**
Запрос:
```sql
SELECT arrayMin([1, 2, 4]) AS res;
```
Результат:
```text
┌─res─┐
│ 1 │
└─────┘
```
Запрос:
```sql
SELECT arrayMin(x -> (-x), [1, 2, 4]) AS res;
```
Результат:
```text
┌─res─┐
│ -4 │
└─────┘
```
## arrayMax {#array-max}
Возвращает значение максимального элемента в исходном массиве.
Если передана функция `func`, возвращается максимум из элементов массива, преобразованных этой функцией.
Функция `arrayMax` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей можно передать лямбда-функцию.
**Синтаксис**
```sql
arrayMax([func,] arr)
```
**Параметры**
- `func` — функция. [Expression](../../sql-reference/data-types/special-data-types/expression.md).
- `arr` — массив. [Array](../../sql-reference/data-types/array.md).
**Возвращаемое значение**
- Максимальное значение функции (или максимальный элемент массива).
Тип: если передана `func`, соответствует типу ее возвращаемого значения, иначе соответствует типу элементов массива.
**Примеры**
Запрос:
```sql
SELECT arrayMax([1, 2, 4]) AS res;
```
Результат:
```text
┌─res─┐
│ 4 │
└─────┘
```
Запрос:
```sql
SELECT arrayMax(x -> (-x), [1, 2, 4]) AS res;
```
Результат:
```text
┌─res─┐
│ -1 │
└─────┘
```
## arraySum {#array-sum}
Возвращает сумму элементов в исходном массиве.
Если передана функция `func`, возвращается сумма элементов массива, преобразованных этой функцией.
Функция `arraySum` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей можно передать лямбда-функцию.
**Синтаксис**
```sql
arraySum([func,] arr)
```
**Параметры**
- `func` — функция. [Expression](../../sql-reference/data-types/special-data-types/expression.md).
- `arr` — массив. [Array](../../sql-reference/data-types/array.md).
**Возвращаемое значение**
- Сумма значений функции (или сумма элементов массива).
Тип: для Decimal чисел в исходном массиве (если функция `func` была передана, то для чисел, преобразованных ею) — [Decimal128](../../sql-reference/data-types/decimal.md), для чисел с плавающей точкой — [Float64](../../sql-reference/data-types/float.md), для беззнаковых целых чисел — [UInt64](../../sql-reference/data-types/int-uint.md), для целых чисел со знаком — [Int64](../../sql-reference/data-types/int-uint.md).
**Примеры**
Запрос:
```sql
SELECT arraySum([2, 3]) AS res;
```
Результат:
```text
┌─res─┐
│ 5 │
└─────┘
```
Запрос:
```sql
SELECT arraySum(x -> x*x, [2, 3]) AS res;
```
Результат:
```text
┌─res─┐
│ 13 │
└─────┘
```
## arrayAvg {#array-avg}
Возвращает среднее значение элементов в исходном массиве.
Если передана функция `func`, возвращается среднее значение элементов массива, преобразованных этой функцией.
Функция `arrayAvg` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей можно передать лямбда-функцию.
**Синтаксис**
```sql
arrayAvg([func,] arr)
```
**Параметры**
- `func` — функция. [Expression](../../sql-reference/data-types/special-data-types/expression.md).
- `arr` — массив. [Array](../../sql-reference/data-types/array.md).
**Возвращаемое значение**
- Среднее значение функции (или среднее значение элементов массива).
Тип: [Float64](../../sql-reference/data-types/float.md).
**Примеры**
Запрос:
```sql
SELECT arrayAvg([1, 2, 4]) AS res;
```
Результат:
```text
┌────────────────res─┐
│ 2.3333333333333335 │
└────────────────────┘
```
Запрос:
```sql
SELECT arrayAvg(x -> (x * x), [2, 4]) AS res;
```
Результат:
```text
┌─res─┐
│ 10 │
└─────┘
```
## arrayCumSum(\[func,\] arr1, …) {#arraycumsumfunc-arr1} ## arrayCumSum(\[func,\] arr1, …) {#arraycumsumfunc-arr1}

View File

@ -37,7 +37,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
VersionedCollapsingMergeTree(sign, version) VersionedCollapsingMergeTree(sign, version)
``` ```
- `sign` — 指定行类型的列名: `1` 是一个 “state” 行, `-1` 是一个 “cancel” - `sign` — 指定行类型的列名: `1` 是一个 “state” 行, `-1` 是一个 “cancel”
列数据类型应为 `Int8`. 列数据类型应为 `Int8`.

View File

@ -6,12 +6,16 @@ machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3
# 系统。动物园管理员 {#system-zookeeper} # 系统。动物园管理员 {#system-zookeeper}
如果未配置ZooKeeper则表不存在。 允许从配置中定义的ZooKeeper集群读取数据。 如果未配置ZooKeeper则表不存在。 允许从配置中定义的ZooKeeper集群读取数据。
查询必须具有 path WHERE子句中的平等条件。 这是ZooKeeper中您想要获取数据的孩子的路径。 查询必须具有 path WHERE子句中的相等条件或者在某个集合中的条件。 这是ZooKeeper中您想要获取数据的孩子的路径。
查询 `SELECT * FROM system.zookeeper WHERE path = '/clickhouse'` 输出对所有孩子的数据 `/clickhouse` 节点。 查询 `SELECT * FROM system.zookeeper WHERE path = '/clickhouse'` 输出对所有孩子的数据 `/clickhouse` 节点。
要输出所有根节点的数据write path= /. 要输出所有根节点的数据write path= /.
如果在指定的路径 path 不存在,将引发异常。 如果在指定的路径 path 不存在,将引发异常。
查询`SELECT * FROM system.zookeeper WHERE path IN ('/', '/clickhouse')` 输出`/` 和 `/clickhouse`节点上所有子节点的数据。
如果在指定的 path 集合中有不存在的路径,将引发异常。
它可以用来做一批ZooKeeper路径查询。
列: 列:
- `name` (String) — The name of the node. - `name` (String) — The name of the node.

View File

@ -1719,7 +1719,7 @@ private:
} }
// Remember where the data ended. We use this info later to determine // Remember where the data ended. We use this info later to determine
// where the next query begins. // where the next query begins.
parsed_insert_query->end = data_in.buffer().begin() + data_in.count(); parsed_insert_query->end = parsed_insert_query->data + data_in.count();
} }
else if (!is_interactive) else if (!is_interactive)
{ {
@ -1900,6 +1900,9 @@ private:
switch (packet.type) switch (packet.type)
{ {
case Protocol::Server::PartUUIDs:
return true;
case Protocol::Server::Data: case Protocol::Server::Data:
if (!cancelled) if (!cancelled)
onData(packet.block); onData(packet.block);

View File

@ -325,6 +325,51 @@ void QueryFuzzer::fuzzColumnLikeExpressionList(IAST * ast)
// the generic recursion into IAST.children. // the generic recursion into IAST.children.
} }
void QueryFuzzer::fuzzWindowFrame(WindowFrame & frame)
{
switch (fuzz_rand() % 40)
{
case 0:
{
const auto r = fuzz_rand() % 3;
frame.type = r == 0 ? WindowFrame::FrameType::Rows
: r == 1 ? WindowFrame::FrameType::Range
: WindowFrame::FrameType::Groups;
break;
}
case 1:
{
const auto r = fuzz_rand() % 3;
frame.begin_type = r == 0 ? WindowFrame::BoundaryType::Unbounded
: r == 1 ? WindowFrame::BoundaryType::Current
: WindowFrame::BoundaryType::Offset;
break;
}
case 2:
{
const auto r = fuzz_rand() % 3;
frame.end_type = r == 0 ? WindowFrame::BoundaryType::Unbounded
: r == 1 ? WindowFrame::BoundaryType::Current
: WindowFrame::BoundaryType::Offset;
break;
}
case 3:
{
frame.begin_offset = getRandomField(0).get<Int64>();
break;
}
case 4:
{
frame.end_offset = getRandomField(0).get<Int64>();
break;
}
default:
break;
}
frame.is_default = (frame == WindowFrame{});
}
void QueryFuzzer::fuzz(ASTs & asts) void QueryFuzzer::fuzz(ASTs & asts)
{ {
for (auto & ast : asts) for (auto & ast : asts)
@ -409,6 +454,7 @@ void QueryFuzzer::fuzz(ASTPtr & ast)
auto & def = fn->window_definition->as<ASTWindowDefinition &>(); auto & def = fn->window_definition->as<ASTWindowDefinition &>();
fuzzColumnLikeExpressionList(def.partition_by.get()); fuzzColumnLikeExpressionList(def.partition_by.get());
fuzzOrderByList(def.order_by.get()); fuzzOrderByList(def.order_by.get());
fuzzWindowFrame(def.frame);
} }
fuzz(fn->children); fuzz(fn->children);
@ -421,6 +467,23 @@ void QueryFuzzer::fuzz(ASTPtr & ast)
fuzz(select->children); fuzz(select->children);
} }
/*
* The time to fuzz the settings has not yet come.
* Apparently we don't have any infractructure to validate the values of
* the settings, and the first query with max_block_size = -1 breaks
* because of overflows here and there.
*//*
* else if (auto * set = typeid_cast<ASTSetQuery *>(ast.get()))
* {
* for (auto & c : set->changes)
* {
* if (fuzz_rand() % 50 == 0)
* {
* c.value = fuzzField(c.value);
* }
* }
* }
*/
else if (auto * literal = typeid_cast<ASTLiteral *>(ast.get())) else if (auto * literal = typeid_cast<ASTLiteral *>(ast.get()))
{ {
// There is a caveat with fuzzing the children: many ASTs also keep the // There is a caveat with fuzzing the children: many ASTs also keep the

View File

@ -14,6 +14,7 @@ namespace DB
class ASTExpressionList; class ASTExpressionList;
class ASTOrderByElement; class ASTOrderByElement;
struct WindowFrame;
/* /*
* This is an AST-based query fuzzer that makes random modifications to query * This is an AST-based query fuzzer that makes random modifications to query
@ -65,6 +66,7 @@ struct QueryFuzzer
void fuzzOrderByElement(ASTOrderByElement * elem); void fuzzOrderByElement(ASTOrderByElement * elem);
void fuzzOrderByList(IAST * ast); void fuzzOrderByList(IAST * ast);
void fuzzColumnLikeExpressionList(IAST * ast); void fuzzColumnLikeExpressionList(IAST * ast);
void fuzzWindowFrame(WindowFrame & frame);
void fuzz(ASTs & asts); void fuzz(ASTs & asts);
void fuzz(ASTPtr & ast); void fuzz(ASTPtr & ast);
void collectFuzzInfoMain(const ASTPtr ast); void collectFuzzInfoMain(const ASTPtr ast);

View File

@ -31,6 +31,8 @@ struct Quota : public IAccessEntity
enum ResourceType enum ResourceType
{ {
QUERIES, /// Number of queries. QUERIES, /// Number of queries.
QUERY_SELECTS, /// Number of select queries.
QUERY_INSERTS, /// Number of inserts queries.
ERRORS, /// Number of queries with exceptions. ERRORS, /// Number of queries with exceptions.
RESULT_ROWS, /// Number of rows returned as result. RESULT_ROWS, /// Number of rows returned as result.
RESULT_BYTES, /// Number of bytes returned as result. RESULT_BYTES, /// Number of bytes returned as result.
@ -152,6 +154,16 @@ inline const Quota::ResourceTypeInfo & Quota::ResourceTypeInfo::get(ResourceType
static const auto info = make_info("QUERIES", 1); static const auto info = make_info("QUERIES", 1);
return info; return info;
} }
case Quota::QUERY_SELECTS:
{
static const auto info = make_info("QUERY_SELECTS", 1);
return info;
}
case Quota::QUERY_INSERTS:
{
static const auto info = make_info("QUERY_INSERTS", 1);
return info;
}
case Quota::ERRORS: case Quota::ERRORS:
{ {
static const auto info = make_info("ERRORS", 1); static const auto info = make_info("ERRORS", 1);

View File

@ -147,7 +147,7 @@ public:
} }
if (params[0].getType() != Field::Types::String) if (params[0].getType() != Field::Types::String)
throw Exception("Aggregate function " + getName() + " require require first parameter to be a String", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); throw Exception("Aggregate function " + getName() + " require first parameter to be a String", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
auto param = params[0].get<String>(); auto param = params[0].get<String>();
if (param == "two-sided") if (param == "two-sided")
@ -158,13 +158,13 @@ public:
alternative = Alternative::Greater; alternative = Alternative::Greater;
else else
throw Exception("Unknown parameter in aggregate function " + getName() + throw Exception("Unknown parameter in aggregate function " + getName() +
". It must be one of: 'two sided', 'less', 'greater'", ErrorCodes::BAD_ARGUMENTS); ". It must be one of: 'two-sided', 'less', 'greater'", ErrorCodes::BAD_ARGUMENTS);
if (params.size() != 2) if (params.size() != 2)
return; return;
if (params[1].getType() != Field::Types::UInt64) if (params[1].getType() != Field::Types::UInt64)
throw Exception("Aggregate function " + getName() + " require require second parameter to be a UInt64", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); throw Exception("Aggregate function " + getName() + " require second parameter to be a UInt64", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
continuity_correction = static_cast<bool>(params[1].get<UInt64>()); continuity_correction = static_cast<bool>(params[1].get<UInt64>());
} }

View File

@ -149,7 +149,6 @@ private:
UInt8 strict_order; // When the 'strict_order' is set, it doesn't allow interventions of other events. UInt8 strict_order; // When the 'strict_order' is set, it doesn't allow interventions of other events.
// In the case of 'A->B->D->C', it stops finding 'A->B->C' at the 'D' and the max event level is 2. // In the case of 'A->B->D->C', it stops finding 'A->B->C' at the 'D' and the max event level is 2.
// Loop through the entire events_list, update the event timestamp value // Loop through the entire events_list, update the event timestamp value
// The level path must be 1---2---3---...---check_events_size, find the max event level that satisfied the path in the sliding window. // The level path must be 1---2---3---...---check_events_size, find the max event level that satisfied the path in the sliding window.
// If found, returns the max event level, else return 0. // If found, returns the max event level, else return 0.

View File

@ -32,6 +32,8 @@ namespace ErrorCodes
* - a histogram (that is, value -> number), consisting of two parts * - a histogram (that is, value -> number), consisting of two parts
* -- for values from 0 to 1023 - in increments of 1; * -- for values from 0 to 1023 - in increments of 1;
* -- for values from 1024 to 30,000 - in increments of 16; * -- for values from 1024 to 30,000 - in increments of 16;
*
* NOTE: 64-bit integer weight can overflow, see also QantileExactWeighted.h::get()
*/ */
#define TINY_MAX_ELEMS 31 #define TINY_MAX_ELEMS 31
@ -396,9 +398,9 @@ namespace detail
/// Get the value of the `level` quantile. The level must be between 0 and 1. /// Get the value of the `level` quantile. The level must be between 0 and 1.
UInt16 get(double level) const UInt16 get(double level) const
{ {
UInt64 pos = std::ceil(count * level); double pos = std::ceil(count * level);
UInt64 accumulated = 0; double accumulated = 0;
Iterator it(*this); Iterator it(*this);
while (it.isValid()) while (it.isValid())
@ -422,9 +424,9 @@ namespace detail
const auto * indices_end = indices + size; const auto * indices_end = indices + size;
const auto * index = indices; const auto * index = indices;
UInt64 pos = std::ceil(count * levels[*index]); double pos = std::ceil(count * levels[*index]);
UInt64 accumulated = 0; double accumulated = 0;
Iterator it(*this); Iterator it(*this);
while (it.isValid()) while (it.isValid())

View File

@ -542,6 +542,12 @@ void Connection::sendData(const Block & block, const String & name, bool scalar)
throttler->add(out->count() - prev_bytes); throttler->add(out->count() - prev_bytes);
} }
void Connection::sendIgnoredPartUUIDs(const std::vector<UUID> & uuids)
{
writeVarUInt(Protocol::Client::IgnoredPartUUIDs, *out);
writeVectorBinary(uuids, *out);
out->next();
}
void Connection::sendPreparedData(ReadBuffer & input, size_t size, const String & name) void Connection::sendPreparedData(ReadBuffer & input, size_t size, const String & name)
{ {
@ -798,6 +804,10 @@ Packet Connection::receivePacket(std::function<void(Poco::Net::Socket &)> async_
case Protocol::Server::EndOfStream: case Protocol::Server::EndOfStream:
return res; return res;
case Protocol::Server::PartUUIDs:
readVectorBinary(res.part_uuids, *in);
return res;
default: default:
/// In unknown state, disconnect - to not leave unsynchronised connection. /// In unknown state, disconnect - to not leave unsynchronised connection.
disconnect(); disconnect();

View File

@ -66,6 +66,7 @@ struct Packet
std::vector<String> multistring_message; std::vector<String> multistring_message;
Progress progress; Progress progress;
BlockStreamProfileInfo profile_info; BlockStreamProfileInfo profile_info;
std::vector<UUID> part_uuids;
Packet() : type(Protocol::Server::Hello) {} Packet() : type(Protocol::Server::Hello) {}
}; };
@ -157,6 +158,8 @@ public:
void sendScalarsData(Scalars & data); void sendScalarsData(Scalars & data);
/// Send all contents of external (temporary) tables. /// Send all contents of external (temporary) tables.
void sendExternalTablesData(ExternalTablesData & data); void sendExternalTablesData(ExternalTablesData & data);
/// Send parts' uuids to excluded them from query processing
void sendIgnoredPartUUIDs(const std::vector<UUID> & uuids);
/// Send prepared block of data (serialized and, if need, compressed), that will be read from 'input'. /// Send prepared block of data (serialized and, if need, compressed), that will be read from 'input'.
/// You could pass size of serialized/compressed block. /// You could pass size of serialized/compressed block.

View File

@ -140,6 +140,21 @@ void MultiplexedConnections::sendQuery(
sent_query = true; sent_query = true;
} }
void MultiplexedConnections::sendIgnoredPartUUIDs(const std::vector<UUID> & uuids)
{
std::lock_guard lock(cancel_mutex);
if (sent_query)
throw Exception("Cannot send uuids after query is sent.", ErrorCodes::LOGICAL_ERROR);
for (ReplicaState & state : replica_states)
{
Connection * connection = state.connection;
if (connection != nullptr)
connection->sendIgnoredPartUUIDs(uuids);
}
}
Packet MultiplexedConnections::receivePacket() Packet MultiplexedConnections::receivePacket()
{ {
std::lock_guard lock(cancel_mutex); std::lock_guard lock(cancel_mutex);
@ -195,6 +210,7 @@ Packet MultiplexedConnections::drain()
switch (packet.type) switch (packet.type)
{ {
case Protocol::Server::PartUUIDs:
case Protocol::Server::Data: case Protocol::Server::Data:
case Protocol::Server::Progress: case Protocol::Server::Progress:
case Protocol::Server::ProfileInfo: case Protocol::Server::ProfileInfo:
@ -253,6 +269,7 @@ Packet MultiplexedConnections::receivePacketUnlocked(std::function<void(Poco::Ne
switch (packet.type) switch (packet.type)
{ {
case Protocol::Server::PartUUIDs:
case Protocol::Server::Data: case Protocol::Server::Data:
case Protocol::Server::Progress: case Protocol::Server::Progress:
case Protocol::Server::ProfileInfo: case Protocol::Server::ProfileInfo:

View File

@ -50,6 +50,9 @@ public:
/// Send a request to the replica to cancel the request /// Send a request to the replica to cancel the request
void sendCancel(); void sendCancel();
/// Send parts' uuids to replicas to exclude them from query processing
void sendIgnoredPartUUIDs(const std::vector<UUID> & uuids);
/** On each replica, read and skip all packets to EndOfStream or Exception. /** On each replica, read and skip all packets to EndOfStream or Exception.
* Returns EndOfStream if no exception has been received. Otherwise * Returns EndOfStream if no exception has been received. Otherwise
* returns the last received packet of type Exception. * returns the last received packet of type Exception.

View File

@ -75,9 +75,29 @@ void ColumnAggregateFunction::set(const AggregateFunctionPtr & func_)
ColumnAggregateFunction::~ColumnAggregateFunction() ColumnAggregateFunction::~ColumnAggregateFunction()
{ {
if (!func->hasTrivialDestructor() && !src) if (!func->hasTrivialDestructor() && !src)
{
if (copiedDataInfo.empty())
{
for (auto * val : data) for (auto * val : data)
{
func->destroy(val); func->destroy(val);
} }
}
else
{
size_t pos;
for (Map::iterator it = copiedDataInfo.begin(), it_end = copiedDataInfo.end(); it != it_end; ++it)
{
pos = it->getValue().second;
if (data[pos] != nullptr)
{
func->destroy(data[pos]);
data[pos] = nullptr;
}
}
}
}
}
void ColumnAggregateFunction::addArena(ConstArenaPtr arena_) void ColumnAggregateFunction::addArena(ConstArenaPtr arena_)
{ {
@ -455,14 +475,37 @@ void ColumnAggregateFunction::insertFrom(const IColumn & from, size_t n)
/// (only as a whole, see comment above). /// (only as a whole, see comment above).
ensureOwnership(); ensureOwnership();
insertDefault(); insertDefault();
insertMergeFrom(from, n); insertCopyFrom(assert_cast<const ColumnAggregateFunction &>(from).data[n]);
} }
void ColumnAggregateFunction::insertFrom(ConstAggregateDataPtr place) void ColumnAggregateFunction::insertFrom(ConstAggregateDataPtr place)
{ {
ensureOwnership(); ensureOwnership();
insertDefault(); insertDefault();
insertMergeFrom(place); insertCopyFrom(place);
}
void ColumnAggregateFunction::insertCopyFrom(ConstAggregateDataPtr place)
{
Map::LookupResult result;
result = copiedDataInfo.find(place);
if (result == nullptr)
{
copiedDataInfo[place] = data.size()-1;
func->merge(data.back(), place, &createOrGetArena());
}
else
{
size_t pos = result->getValue().second;
if (pos != data.size() - 1)
{
data[data.size() - 1] = data[pos];
}
else /// insert same data to same pos, merge them.
{
func->merge(data.back(), place, &createOrGetArena());
}
}
} }
void ColumnAggregateFunction::insertMergeFrom(ConstAggregateDataPtr place) void ColumnAggregateFunction::insertMergeFrom(ConstAggregateDataPtr place)
@ -697,5 +740,4 @@ MutableColumnPtr ColumnAggregateFunction::cloneResized(size_t size) const
return cloned_col; return cloned_col;
} }
} }
} }

View File

@ -13,6 +13,8 @@
#include <Functions/FunctionHelpers.h> #include <Functions/FunctionHelpers.h>
#include <Common/HashTable/HashMap.h>
namespace DB namespace DB
{ {
@ -82,6 +84,17 @@ private:
/// Name of the type to distinguish different aggregation states. /// Name of the type to distinguish different aggregation states.
String type_string; String type_string;
/// MergedData records, used to avoid duplicated data copy.
///key: src pointer, val: pos in current column.
using Map = HashMap<
ConstAggregateDataPtr,
size_t,
DefaultHash<ConstAggregateDataPtr>,
HashTableGrower<3>,
HashTableAllocatorWithStackMemory<sizeof(std::pair<ConstAggregateDataPtr, size_t>) * (1 << 3)>>;
Map copiedDataInfo;
ColumnAggregateFunction() {} ColumnAggregateFunction() {}
/// Create a new column that has another column as a source. /// Create a new column that has another column as a source.
@ -140,6 +153,8 @@ public:
void insertFrom(ConstAggregateDataPtr place); void insertFrom(ConstAggregateDataPtr place);
void insertCopyFrom(ConstAggregateDataPtr place);
/// Merge state at last row with specified state in another column. /// Merge state at last row with specified state in another column.
void insertMergeFrom(ConstAggregateDataPtr place); void insertMergeFrom(ConstAggregateDataPtr place);

View File

@ -26,4 +26,6 @@ using ColumnInt256 = ColumnVector<Int256>;
using ColumnFloat32 = ColumnVector<Float32>; using ColumnFloat32 = ColumnVector<Float32>;
using ColumnFloat64 = ColumnVector<Float64>; using ColumnFloat64 = ColumnVector<Float64>;
using ColumnUUID = ColumnVector<UInt128>;
} }

View File

@ -63,9 +63,6 @@ public:
/// Call from master thread as soon as possible (e.g. when thread accepted connection) /// Call from master thread as soon as possible (e.g. when thread accepted connection)
static void initializeQuery(); static void initializeQuery();
/// Sets query_context for current thread group
static void attachQueryContext(Context & query_context);
/// You must call one of these methods when create a query child thread: /// You must call one of these methods when create a query child thread:
/// Add current thread to a group associated with the thread group /// Add current thread to a group associated with the thread group
static void attachTo(const ThreadGroupStatusPtr & thread_group); static void attachTo(const ThreadGroupStatusPtr & thread_group);
@ -99,6 +96,10 @@ public:
private: private:
static void defaultThreadDeleter(); static void defaultThreadDeleter();
/// Sets query_context for current thread group
/// Can by used only through QueryScope
static void attachQueryContext(Context & query_context);
}; };
} }

View File

@ -533,11 +533,13 @@
M(564, INTERSERVER_SCHEME_DOESNT_MATCH) \ M(564, INTERSERVER_SCHEME_DOESNT_MATCH) \
M(565, TOO_MANY_PARTITIONS) \ M(565, TOO_MANY_PARTITIONS) \
M(566, CANNOT_RMDIR) \ M(566, CANNOT_RMDIR) \
M(567, DUPLICATED_PART_UUIDS) \
\ \
M(999, KEEPER_EXCEPTION) \ M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \ M(1000, POCO_EXCEPTION) \
M(1001, STD_EXCEPTION) \ M(1001, STD_EXCEPTION) \
M(1002, UNKNOWN_EXCEPTION) M(1002, UNKNOWN_EXCEPTION) \
M(1003, INVALID_SHARD_ID)
/* See END */ /* See END */

View File

@ -109,6 +109,11 @@ struct HashMapCell
DB::assertChar(',', rb); DB::assertChar(',', rb);
DB::readDoubleQuoted(value.second, rb); DB::readDoubleQuoted(value.second, rb);
} }
static bool constexpr need_to_notify_cell_during_move = false;
static void move(HashMapCell * /* old_location */, HashMapCell * /* new_location */) {}
}; };
template <typename Key, typename TMapped, typename Hash, typename TState = HashTableNoState> template <typename Key, typename TMapped, typename Hash, typename TState = HashTableNoState>

View File

@ -69,11 +69,16 @@ namespace ZeroTraits
{ {
template <typename T> template <typename T>
bool check(const T x) { return x == 0; } inline bool check(const T x) { return x == 0; }
template <typename T> template <typename T>
void set(T & x) { x = 0; } inline void set(T & x) { x = 0; }
template <>
inline bool check(const char * x) { return x == nullptr; }
template <>
inline void set(const char *& x){ x = nullptr; }
} }
@ -204,6 +209,13 @@ struct HashTableCell
/// Deserialization, in binary and text form. /// Deserialization, in binary and text form.
void read(DB::ReadBuffer & rb) { DB::readBinary(key, rb); } void read(DB::ReadBuffer & rb) { DB::readBinary(key, rb); }
void readText(DB::ReadBuffer & rb) { DB::readDoubleQuoted(key, rb); } void readText(DB::ReadBuffer & rb) { DB::readDoubleQuoted(key, rb); }
/// When cell pointer is moved during erase, reinsert or resize operations
static constexpr bool need_to_notify_cell_during_move = false;
static void move(HashTableCell * /* old_location */, HashTableCell * /* new_location */) {}
}; };
/** /**
@ -334,6 +346,32 @@ struct ZeroValueStorage<false, Cell>
}; };
template <bool enable, typename Allocator, typename Cell>
struct AllocatorBufferDeleter;
template <typename Allocator, typename Cell>
struct AllocatorBufferDeleter<false, Allocator, Cell>
{
AllocatorBufferDeleter(Allocator &, size_t) {}
void operator()(Cell *) const {}
};
template <typename Allocator, typename Cell>
struct AllocatorBufferDeleter<true, Allocator, Cell>
{
AllocatorBufferDeleter(Allocator & allocator_, size_t size_)
: allocator(allocator_)
, size(size_) {}
void operator()(Cell * buffer) const { allocator.free(buffer, size); }
Allocator & allocator;
size_t size;
};
// The HashTable // The HashTable
template template
< <
@ -427,7 +465,6 @@ protected:
} }
} }
/// Increase the size of the buffer. /// Increase the size of the buffer.
void resize(size_t for_num_elems = 0, size_t for_buf_size = 0) void resize(size_t for_num_elems = 0, size_t for_buf_size = 0)
{ {
@ -460,7 +497,24 @@ protected:
new_grower.increaseSize(); new_grower.increaseSize();
/// Expand the space. /// Expand the space.
buf = reinterpret_cast<Cell *>(Allocator::realloc(buf, getBufferSizeInBytes(), new_grower.bufSize() * sizeof(Cell)));
size_t old_buffer_size = getBufferSizeInBytes();
/** If cell required to be notified during move we need to temporary keep old buffer
* because realloc does not quarantee for reallocated buffer to have same base address
*/
using Deleter = AllocatorBufferDeleter<Cell::need_to_notify_cell_during_move, Allocator, Cell>;
Deleter buffer_deleter(*this, old_buffer_size);
std::unique_ptr<Cell, Deleter> old_buffer(buf, buffer_deleter);
if constexpr (Cell::need_to_notify_cell_during_move)
{
buf = reinterpret_cast<Cell *>(Allocator::alloc(new_grower.bufSize() * sizeof(Cell)));
memcpy(reinterpret_cast<void *>(buf), reinterpret_cast<const void *>(old_buffer.get()), old_buffer_size);
}
else
buf = reinterpret_cast<Cell *>(Allocator::realloc(buf, old_buffer_size, new_grower.bufSize() * sizeof(Cell)));
grower = new_grower; grower = new_grower;
/** Now some items may need to be moved to a new location. /** Now some items may need to be moved to a new location.
@ -470,7 +524,12 @@ protected:
size_t i = 0; size_t i = 0;
for (; i < old_size; ++i) for (; i < old_size; ++i)
if (!buf[i].isZero(*this)) if (!buf[i].isZero(*this))
reinsert(buf[i], buf[i].getHash(*this)); {
size_t updated_place_value = reinsert(buf[i], buf[i].getHash(*this));
if constexpr (Cell::need_to_notify_cell_during_move)
Cell::move(&(old_buffer.get())[i], &buf[updated_place_value]);
}
/** There is also a special case: /** There is also a special case:
* if the element was to be at the end of the old buffer, [ x] * if the element was to be at the end of the old buffer, [ x]
@ -481,7 +540,13 @@ protected:
* process tail from the collision resolution chain immediately after it [ o x ] * process tail from the collision resolution chain immediately after it [ o x ]
*/ */
for (; !buf[i].isZero(*this); ++i) for (; !buf[i].isZero(*this); ++i)
reinsert(buf[i], buf[i].getHash(*this)); {
size_t updated_place_value = reinsert(buf[i], buf[i].getHash(*this));
if constexpr (Cell::need_to_notify_cell_during_move)
if (&buf[i] != &buf[updated_place_value])
Cell::move(&buf[i], &buf[updated_place_value]);
}
#ifdef DBMS_HASH_MAP_DEBUG_RESIZES #ifdef DBMS_HASH_MAP_DEBUG_RESIZES
watch.stop(); watch.stop();
@ -495,20 +560,20 @@ protected:
/** Paste into the new buffer the value that was in the old buffer. /** Paste into the new buffer the value that was in the old buffer.
* Used when increasing the buffer size. * Used when increasing the buffer size.
*/ */
void reinsert(Cell & x, size_t hash_value) size_t reinsert(Cell & x, size_t hash_value)
{ {
size_t place_value = grower.place(hash_value); size_t place_value = grower.place(hash_value);
/// If the element is in its place. /// If the element is in its place.
if (&x == &buf[place_value]) if (&x == &buf[place_value])
return; return place_value;
/// Compute a new location, taking into account the collision resolution chain. /// Compute a new location, taking into account the collision resolution chain.
place_value = findCell(Cell::getKey(x.getValue()), hash_value, place_value); place_value = findCell(Cell::getKey(x.getValue()), hash_value, place_value);
/// If the item remains in its place in the old collision resolution chain. /// If the item remains in its place in the old collision resolution chain.
if (!buf[place_value].isZero(*this)) if (!buf[place_value].isZero(*this))
return; return place_value;
/// Copy to a new location and zero the old one. /// Copy to a new location and zero the old one.
x.setHash(hash_value); x.setHash(hash_value);
@ -516,6 +581,7 @@ protected:
x.setZero(); x.setZero();
/// Then the elements that previously were in collision with this can move to the old place. /// Then the elements that previously were in collision with this can move to the old place.
return place_value;
} }
@ -881,7 +947,11 @@ public:
/// Reinsert node pointed to by iterator /// Reinsert node pointed to by iterator
void ALWAYS_INLINE reinsert(iterator & it, size_t hash_value) void ALWAYS_INLINE reinsert(iterator & it, size_t hash_value)
{ {
reinsert(*it.getPtr(), hash_value); size_t place_value = reinsert(*it.getPtr(), hash_value);
if constexpr (Cell::need_to_notify_cell_during_move)
if (it.getPtr() != &buf[place_value])
Cell::move(it.getPtr(), &buf[place_value]);
} }
@ -958,8 +1028,14 @@ public:
return const_cast<std::decay_t<decltype(*this)> *>(this)->find(x, hash_value); return const_cast<std::decay_t<decltype(*this)> *>(this)->find(x, hash_value);
} }
std::enable_if_t<Grower::performs_linear_probing_with_single_step, void> std::enable_if_t<Grower::performs_linear_probing_with_single_step, bool>
ALWAYS_INLINE erase(const Key & x) ALWAYS_INLINE erase(const Key & x)
{
return erase(x, hash(x));
}
std::enable_if_t<Grower::performs_linear_probing_with_single_step, bool>
ALWAYS_INLINE erase(const Key & x, size_t hash_value)
{ {
/** Deletion from open addressing hash table without tombstones /** Deletion from open addressing hash table without tombstones
* *
@ -977,21 +1053,19 @@ public:
{ {
--m_size; --m_size;
this->clearHasZero(); this->clearHasZero();
return true;
} }
else else
{ {
return; return false;
} }
} }
size_t hash_value = hash(x);
size_t erased_key_position = findCell(x, hash_value, grower.place(hash_value)); size_t erased_key_position = findCell(x, hash_value, grower.place(hash_value));
/// Key is not found /// Key is not found
if (buf[erased_key_position].isZero(*this)) if (buf[erased_key_position].isZero(*this))
{ return false;
return;
}
/// We need to guarantee loop termination because there will be empty position /// We need to guarantee loop termination because there will be empty position
assert(m_size < grower.bufSize()); assert(m_size < grower.bufSize());
@ -1056,12 +1130,18 @@ public:
/// Move the element to the freed place /// Move the element to the freed place
memcpy(static_cast<void *>(&buf[erased_key_position]), static_cast<void *>(&buf[next_position]), sizeof(Cell)); memcpy(static_cast<void *>(&buf[erased_key_position]), static_cast<void *>(&buf[next_position]), sizeof(Cell));
if constexpr (Cell::need_to_notify_cell_during_move)
Cell::move(&buf[next_position], &buf[erased_key_position]);
/// Now we have another freed place /// Now we have another freed place
erased_key_position = next_position; erased_key_position = next_position;
} }
buf[erased_key_position].setZero(); buf[erased_key_position].setZero();
--m_size; --m_size;
return true;
} }
bool ALWAYS_INLINE has(const Key & x) const bool ALWAYS_INLINE has(const Key & x) const

View File

@ -0,0 +1,244 @@
#pragma once
#include <common/types.h>
#include <boost/intrusive/trivial_value_traits.hpp>
#include <boost/intrusive/list.hpp>
#include <boost/noncopyable.hpp>
#include <Core/Defines.h>
#include <Common/Exception.h>
#include <Common/HashTable/HashMap.h>
#include <Common/PODArray.h>
template <typename TKey, typename TMapped, typename Hash, bool save_hash_in_cell>
struct LRUHashMapCell :
public std::conditional_t<save_hash_in_cell,
HashMapCellWithSavedHash<TKey, TMapped, Hash, HashTableNoState>,
HashMapCell<TKey, TMapped, Hash, HashTableNoState>>
{
public:
using Key = TKey;
using Base = std::conditional_t<save_hash_in_cell,
HashMapCellWithSavedHash<TKey, TMapped, Hash, HashTableNoState>,
HashMapCell<TKey, TMapped, Hash, HashTableNoState>>;
using Mapped = typename Base::Mapped;
using State = typename Base::State;
using mapped_type = Mapped;
using key_type = Key;
using Base::Base;
static bool constexpr need_to_notify_cell_during_move = true;
static void move(LRUHashMapCell * __restrict old_location, LRUHashMapCell * __restrict new_location)
{
/** We update new location prev and next pointers because during hash table resize
* they can be updated during move of another cell.
*/
new_location->prev = old_location->prev;
new_location->next = old_location->next;
LRUHashMapCell * prev = new_location->prev;
LRUHashMapCell * next = new_location->next;
/// Updated previous next and next previous nodes of list to point to new location
if (prev)
prev->next = new_location;
if (next)
next->prev = new_location;
}
private:
template<typename, typename, typename, bool>
friend class LRUHashMapCellNodeTraits;
LRUHashMapCell * next = nullptr;
LRUHashMapCell * prev = nullptr;
};
template<typename Key, typename Value, typename Hash, bool save_hash_in_cell>
struct LRUHashMapCellNodeTraits
{
using node = LRUHashMapCell<Key, Value, Hash, save_hash_in_cell>;
using node_ptr = LRUHashMapCell<Key, Value, Hash, save_hash_in_cell> *;
using const_node_ptr = const LRUHashMapCell<Key, Value, Hash, save_hash_in_cell> *;
static node * get_next(const node * ptr) { return ptr->next; }
static void set_next(node * __restrict ptr, node * __restrict next) { ptr->next = next; }
static node * get_previous(const node * ptr) { return ptr->prev; }
static void set_previous(node * __restrict ptr, node * __restrict prev) { ptr->prev = prev; }
};
template <typename TKey, typename TValue, typename Hash, bool save_hash_in_cells>
class LRUHashMapImpl :
private HashMapTable<
TKey,
LRUHashMapCell<TKey, TValue, Hash, save_hash_in_cells>,
Hash,
HashTableGrower<>,
HashTableAllocator>
{
using Base = HashMapTable<
TKey,
LRUHashMapCell<TKey, TValue, Hash, save_hash_in_cells>,
Hash,
HashTableGrower<>,
HashTableAllocator>;
public:
using Key = TKey;
using Value = TValue;
using Cell = LRUHashMapCell<Key, Value, Hash, save_hash_in_cells>;
using LRUHashMapCellIntrusiveValueTraits =
boost::intrusive::trivial_value_traits<
LRUHashMapCellNodeTraits<Key, Value, Hash, save_hash_in_cells>,
boost::intrusive::link_mode_type::normal_link>;
using LRUList = boost::intrusive::list<
Cell,
boost::intrusive::value_traits<LRUHashMapCellIntrusiveValueTraits>,
boost::intrusive::constant_time_size<false>>;
using iterator = typename LRUList::iterator;
using const_iterator = typename LRUList::const_iterator;
using reverse_iterator = typename LRUList::reverse_iterator;
using const_reverse_iterator = typename LRUList::const_reverse_iterator;
LRUHashMapImpl(size_t max_size_, bool preallocate_max_size_in_hash_map = false)
: Base(preallocate_max_size_in_hash_map ? max_size_ : 32)
, max_size(max_size_)
{
assert(max_size > 0);
}
std::pair<Cell *, bool> insert(const Key & key, const Value & value)
{
return emplace(key, value);
}
std::pair<Cell *, bool> insert(const Key & key, Value && value)
{
return emplace(key, std::move(value));
}
template<typename ...Args>
std::pair<Cell *, bool> emplace(const Key & key, Args&&... args)
{
size_t hash_value = Base::hash(key);
Cell * it = Base::find(key, hash_value);
if (it)
{
/// Cell contains element return it and put to the end of lru list
lru_list.splice(lru_list.end(), lru_list, lru_list.iterator_to(*it));
return std::make_pair(it, false);
}
if (size() == max_size)
{
/// Erase least recently used element from front of the list
Cell & node = lru_list.front();
const Key & element_to_remove_key = node.getKey();
size_t key_hash = node.getHash(*this);
lru_list.pop_front();
[[maybe_unused]] bool erased = Base::erase(element_to_remove_key, key_hash);
assert(erased);
}
[[maybe_unused]] bool inserted;
/// Insert value first try to insert in zero storage if not then insert in buffer
if (!Base::emplaceIfZero(key, it, inserted, hash_value))
Base::emplaceNonZero(key, it, inserted, hash_value);
assert(inserted);
new (&it->getMapped()) Value(std::forward<Args>(args)...);
/// Put cell to the end of lru list
lru_list.insert(lru_list.end(), *it);
return std::make_pair(it, true);
}
using Base::find;
Value & get(const Key & key)
{
auto it = Base::find(key);
assert(it);
Value & value = it->getMapped();
/// Put cell to the end of lru list
lru_list.splice(lru_list.end(), lru_list, lru_list.iterator_to(*it));
return value;
}
const Value & get(const Key & key) const
{
return const_cast<std::decay_t<decltype(*this)> *>(this)->get(key);
}
bool contains(const Key & key) const
{
return Base::has(key);
}
bool erase(const Key & key)
{
auto hash = Base::hash(key);
auto it = Base::find(key, hash);
if (!it)
return false;
lru_list.erase(lru_list.iterator_to(*it));
return Base::erase(key, hash);
}
void clear()
{
lru_list.clear();
Base::clear();
}
using Base::size;
size_t getMaxSize() const { return max_size; }
iterator begin() { return lru_list.begin(); }
const_iterator begin() const { return lru_list.cbegin(); }
iterator end() { return lru_list.end(); }
const_iterator end() const { return lru_list.cend(); }
reverse_iterator rbegin() { return lru_list.rbegin(); }
const_reverse_iterator rbegin() const { return lru_list.crbegin(); }
reverse_iterator rend() { return lru_list.rend(); }
const_reverse_iterator rend() const { return lru_list.crend(); }
private:
size_t max_size;
LRUList lru_list;
};
template <typename Key, typename Value, typename Hash = DefaultHash<Key>>
using LRUHashMap = LRUHashMapImpl<Key, Value, Hash, false>;
template <typename Key, typename Value, typename Hash = DefaultHash<Key>>
using LRUHashMapWithSavedHash = LRUHashMapImpl<Key, Value, Hash, true>;

View File

@ -34,7 +34,15 @@ public:
std::optional<std::string> file; std::optional<std::string> file;
std::optional<UInt64> line; std::optional<UInt64> line;
}; };
static constexpr size_t capacity = 32;
static constexpr size_t capacity =
#ifndef NDEBUG
/* The stacks are normally larger in debug version due to less inlining. */
64
#else
32
#endif
;
using FramePointers = std::array<void *, capacity>; using FramePointers = std::array<void *, capacity>;
using Frames = std::array<Frame, capacity>; using Frames = std::array<Frame, capacity>;

View File

@ -68,7 +68,7 @@ TasksStatsCounters::TasksStatsCounters(const UInt64 tid, const MetricsProvider p
case MetricsProvider::Netlink: case MetricsProvider::Netlink:
stats_getter = [metrics_provider = std::make_shared<TaskStatsInfoGetter>(), tid]() stats_getter = [metrics_provider = std::make_shared<TaskStatsInfoGetter>(), tid]()
{ {
::taskstats result; ::taskstats result{};
metrics_provider->getStat(result, tid); metrics_provider->getStat(result, tid);
return result; return result;
}; };
@ -76,7 +76,7 @@ TasksStatsCounters::TasksStatsCounters(const UInt64 tid, const MetricsProvider p
case MetricsProvider::Procfs: case MetricsProvider::Procfs:
stats_getter = [metrics_provider = std::make_shared<ProcfsMetricsProvider>(tid)]() stats_getter = [metrics_provider = std::make_shared<ProcfsMetricsProvider>(tid)]()
{ {
::taskstats result; ::taskstats result{};
metrics_provider->getTaskStats(result); metrics_provider->getTaskStats(result);
return result; return result;
}; };

View File

@ -99,6 +99,11 @@ ThreadStatus::~ThreadStatus()
/// We've already allocated a little bit more than the limit and cannot track it in the thread memory tracker or its parent. /// We've already allocated a little bit more than the limit and cannot track it in the thread memory tracker or its parent.
} }
#if !defined(ARCADIA_BUILD)
/// It may cause segfault if query_context was destroyed, but was not detached
assert((!query_context && query_id.empty()) || (query_context && query_id == query_context->getCurrentQueryId()));
#endif
if (deleter) if (deleter)
deleter(); deleter();
current_thread = nullptr; current_thread = nullptr;

View File

@ -201,7 +201,7 @@ public:
void setFatalErrorCallback(std::function<void()> callback); void setFatalErrorCallback(std::function<void()> callback);
void onFatalError(); void onFatalError();
/// Sets query context for current thread and its thread group /// Sets query context for current master thread and its thread group
/// NOTE: query_context have to be alive until detachQuery() is called /// NOTE: query_context have to be alive until detachQuery() is called
void attachQueryContext(Context & query_context); void attachQueryContext(Context & query_context);

View File

@ -38,6 +38,9 @@ target_link_libraries (arena_with_free_lists PRIVATE dbms)
add_executable (pod_array pod_array.cpp) add_executable (pod_array pod_array.cpp)
target_link_libraries (pod_array PRIVATE clickhouse_common_io) target_link_libraries (pod_array PRIVATE clickhouse_common_io)
add_executable (lru_hash_map_perf lru_hash_map_perf.cpp)
target_link_libraries (lru_hash_map_perf PRIVATE clickhouse_common_io)
add_executable (thread_creation_latency thread_creation_latency.cpp) add_executable (thread_creation_latency thread_creation_latency.cpp)
target_link_libraries (thread_creation_latency PRIVATE clickhouse_common_io) target_link_libraries (thread_creation_latency PRIVATE clickhouse_common_io)

View File

@ -0,0 +1,161 @@
#include <iomanip>
#include <iostream>
#include <Common/HashTable/LRUHashMap.h>
#include <gtest/gtest.h>
template<typename LRUHashMap>
std::vector<typename LRUHashMap::Key> convertToVector(const LRUHashMap & map)
{
std::vector<typename LRUHashMap::Key> result;
result.reserve(map.size());
for (auto & node: map)
result.emplace_back(node.getKey());
return result;
}
void testInsert(size_t elements_to_insert_size, size_t map_size)
{
using LRUHashMap = LRUHashMap<int, int>;
LRUHashMap map(map_size);
std::vector<int> expected;
for (size_t i = 0; i < elements_to_insert_size; ++i)
map.insert(i, i);
for (size_t i = elements_to_insert_size - map_size; i < elements_to_insert_size; ++i)
expected.emplace_back(i);
std::vector<int> actual = convertToVector(map);
ASSERT_EQ(map.size(), actual.size());
ASSERT_EQ(actual, expected);
}
TEST(LRUHashMap, Insert)
{
{
using LRUHashMap = LRUHashMap<int, int>;
LRUHashMap map(3);
map.emplace(1, 1);
map.insert(2, 2);
int v = 3;
map.insert(3, v);
map.emplace(4, 4);
std::vector<int> expected = { 2, 3, 4 };
std::vector<int> actual = convertToVector(map);
ASSERT_EQ(actual, expected);
}
testInsert(1200000, 1200000);
testInsert(10, 5);
testInsert(1200000, 2);
testInsert(1200000, 1);
}
TEST(LRUHashMap, GetModify)
{
using LRUHashMap = LRUHashMap<int, int>;
LRUHashMap map(3);
map.emplace(1, 1);
map.emplace(2, 2);
map.emplace(3, 3);
map.get(3) = 4;
std::vector<int> expected = { 1, 2, 4 };
std::vector<int> actual;
actual.reserve(map.size());
for (auto & node : map)
actual.emplace_back(node.getMapped());
ASSERT_EQ(actual, expected);
}
TEST(LRUHashMap, SetRecentKeyToTop)
{
using LRUHashMap = LRUHashMap<int, int>;
LRUHashMap map(3);
map.emplace(1, 1);
map.emplace(2, 2);
map.emplace(3, 3);
map.emplace(1, 4);
std::vector<int> expected = { 2, 3, 1 };
std::vector<int> actual = convertToVector(map);
ASSERT_EQ(actual, expected);
}
TEST(LRUHashMap, GetRecentKeyToTop)
{
using LRUHashMap = LRUHashMap<int, int>;
LRUHashMap map(3);
map.emplace(1, 1);
map.emplace(2, 2);
map.emplace(3, 3);
map.get(1);
std::vector<int> expected = { 2, 3, 1 };
std::vector<int> actual = convertToVector(map);
ASSERT_EQ(actual, expected);
}
TEST(LRUHashMap, Contains)
{
using LRUHashMap = LRUHashMap<int, int>;
LRUHashMap map(3);
map.emplace(1, 1);
map.emplace(2, 2);
map.emplace(3, 3);
ASSERT_TRUE(map.contains(1));
ASSERT_TRUE(map.contains(2));
ASSERT_TRUE(map.contains(3));
ASSERT_EQ(map.size(), 3);
map.erase(1);
map.erase(2);
map.erase(3);
ASSERT_EQ(map.size(), 0);
ASSERT_FALSE(map.contains(1));
ASSERT_FALSE(map.contains(2));
ASSERT_FALSE(map.contains(3));
}
TEST(LRUHashMap, Clear)
{
using LRUHashMap = LRUHashMap<int, int>;
LRUHashMap map(3);
map.emplace(1, 1);
map.emplace(2, 2);
map.emplace(3, 3);
map.clear();
std::vector<int> expected = {};
std::vector<int> actual = convertToVector(map);
ASSERT_EQ(actual, expected);
ASSERT_EQ(map.size(), 0);
}

View File

@ -0,0 +1,244 @@
#include <vector>
#include <list>
#include <map>
#include <random>
#include <pcg_random.hpp>
#include <Common/Stopwatch.h>
#include <Common/HashTable/LRUHashMap.h>
template<class Key, class Value>
class LRUHashMapBasic
{
public:
using key_type = Key;
using value_type = Value;
using list_type = std::list<key_type>;
using node = std::pair<value_type, typename list_type::iterator>;
using map_type = std::unordered_map<key_type, node, DefaultHash<Key>>;
LRUHashMapBasic(size_t max_size_, bool preallocated)
: hash_map(preallocated ? max_size_ : 32)
, max_size(max_size_)
{
}
void insert(const Key &key, const Value &value)
{
auto it = hash_map.find(key);
if (it == hash_map.end())
{
if (size() >= max_size)
{
auto iterator_to_remove = list.begin();
hash_map.erase(*iterator_to_remove);
list.erase(iterator_to_remove);
}
list.push_back(key);
hash_map[key] = std::make_pair(value, --list.end());
}
else
{
auto & [value_to_update, iterator_in_list_to_update] = it->second;
list.splice(list.end(), list, iterator_in_list_to_update);
iterator_in_list_to_update = list.end();
value_to_update = value;
}
}
value_type & get(const key_type &key)
{
auto iterator_in_map = hash_map.find(key);
assert(iterator_in_map != hash_map.end());
auto & [value_to_return, iterator_in_list_to_update] = iterator_in_map->second;
list.splice(list.end(), list, iterator_in_list_to_update);
iterator_in_list_to_update = list.end();
return value_to_return;
}
const value_type & get(const key_type & key) const
{
return const_cast<std::decay_t<decltype(*this)> *>(this)->get(key);
}
size_t getMaxSize() const
{
return max_size;
}
size_t size() const
{
return hash_map.size();
}
bool empty() const
{
return hash_map.empty();
}
bool contains(const Key & key)
{
return hash_map.find(key) != hash_map.end();
}
void clear()
{
hash_map.clear();
list.clear();
}
private:
map_type hash_map;
list_type list;
size_t max_size;
};
std::vector<UInt64> generateNumbersToInsert(size_t numbers_to_insert_size)
{
std::vector<UInt64> numbers;
numbers.reserve(numbers_to_insert_size);
std::random_device rd;
pcg64 gen(rd());
UInt64 min = std::numeric_limits<UInt64>::min();
UInt64 max = std::numeric_limits<UInt64>::max();
auto distribution = std::uniform_int_distribution<>(min, max);
for (size_t i = 0; i < numbers_to_insert_size; ++i)
{
UInt64 number = distribution(gen);
numbers.emplace_back(number);
}
return numbers;
}
void testInsertElementsIntoHashMap(size_t map_size, const std::vector<UInt64> & numbers_to_insert, bool preallocated)
{
size_t numbers_to_insert_size = numbers_to_insert.size();
std::cout << "TestInsertElementsIntoHashMap preallocated map size: " << map_size << " numbers to insert size: " << numbers_to_insert_size;
std::cout << std::endl;
HashMap<int, int> hash_map(preallocated ? map_size : 32);
Stopwatch watch;
for (size_t i = 0; i < numbers_to_insert_size; ++i)
hash_map.insert({ numbers_to_insert[i], numbers_to_insert[i] });
std::cout << "Inserted in " << watch.elapsedMilliseconds() << " milliseconds" << std::endl;
UInt64 summ = 0;
for (size_t i = 0; i < numbers_to_insert_size; ++i)
{
auto * it = hash_map.find(numbers_to_insert[i]);
if (it)
summ += it->getMapped();
}
std::cout << "Calculated summ: " << summ << " in " << watch.elapsedMilliseconds() << " milliseconds" << std::endl;
}
void testInsertElementsIntoStandardMap(size_t map_size, const std::vector<UInt64> & numbers_to_insert, bool preallocated)
{
size_t numbers_to_insert_size = numbers_to_insert.size();
std::cout << "TestInsertElementsIntoStandardMap map size: " << map_size << " numbers to insert size: " << numbers_to_insert_size;
std::cout << std::endl;
std::unordered_map<int, int> hash_map(preallocated ? map_size : 32);
Stopwatch watch;
for (size_t i = 0; i < numbers_to_insert_size; ++i)
hash_map.insert({ numbers_to_insert[i], numbers_to_insert[i] });
std::cout << "Inserted in " << watch.elapsedMilliseconds() << " milliseconds" << std::endl;
UInt64 summ = 0;
for (size_t i = 0; i < numbers_to_insert_size; ++i)
{
auto it = hash_map.find(numbers_to_insert[i]);
if (it != hash_map.end())
summ += it->second;
}
std::cout << "Calculated summ: " << summ << " in " << watch.elapsedMilliseconds() << " milliseconds" << std::endl;
}
template<typename LRUCache>
UInt64 testInsertIntoEmptyCache(size_t map_size, const std::vector<UInt64> & numbers_to_insert, bool preallocated)
{
size_t numbers_to_insert_size = numbers_to_insert.size();
std::cout << "Test testInsertPreallocated preallocated map size: " << map_size << " numbers to insert size: " << numbers_to_insert_size;
std::cout << std::endl;
LRUCache cache(map_size, preallocated);
Stopwatch watch;
for (size_t i = 0; i < numbers_to_insert_size; ++i)
{
cache.insert(numbers_to_insert[i], numbers_to_insert[i]);
}
std::cout << "Inserted in " << watch.elapsedMilliseconds() << " milliseconds" << std::endl;
UInt64 summ = 0;
for (size_t i = 0; i < numbers_to_insert_size; ++i)
if (cache.contains(numbers_to_insert[i]))
summ += cache.get(numbers_to_insert[i]);
std::cout << "Calculated summ: " << summ << " in " << watch.elapsedMilliseconds() << " milliseconds" << std::endl;
return summ;
}
int main(int argc, char ** argv)
{
(void)(argc);
(void)(argv);
size_t hash_map_size = 1200000;
size_t numbers_to_insert_size = 12000000;
std::vector<UInt64> numbers = generateNumbersToInsert(numbers_to_insert_size);
std::cout << "Test insert into HashMap preallocated=0" << std::endl;
testInsertElementsIntoHashMap(hash_map_size, numbers, true);
std::cout << std::endl;
std::cout << "Test insert into HashMap preallocated=1" << std::endl;
testInsertElementsIntoHashMap(hash_map_size, numbers, true);
std::cout << std::endl;
std::cout << "Test LRUHashMap preallocated=0" << std::endl;
testInsertIntoEmptyCache<LRUHashMap<UInt64, UInt64>>(hash_map_size, numbers, false);
std::cout << std::endl;
std::cout << "Test LRUHashMap preallocated=1" << std::endl;
testInsertIntoEmptyCache<LRUHashMap<UInt64, UInt64>>(hash_map_size, numbers, true);
std::cout << std::endl;
std::cout << "Test LRUHashMapBasic preallocated=0" << std::endl;
testInsertIntoEmptyCache<LRUHashMapBasic<UInt64, UInt64>>(hash_map_size, numbers, false);
std::cout << std::endl;
std::cout << "Test LRUHashMapBasic preallocated=1" << std::endl;
testInsertIntoEmptyCache<LRUHashMapBasic<UInt64, UInt64>>(hash_map_size, numbers, true);
std::cout << std::endl;
return 0;
}

View File

@ -136,6 +136,7 @@ namespace MySQLReplication
out << "XID: " << this->xid << '\n'; out << "XID: " << this->xid << '\n';
} }
/// https://dev.mysql.com/doc/internals/en/table-map-event.html
void TableMapEvent::parseImpl(ReadBuffer & payload) void TableMapEvent::parseImpl(ReadBuffer & payload)
{ {
payload.readStrict(reinterpret_cast<char *>(&table_id), 6); payload.readStrict(reinterpret_cast<char *>(&table_id), 6);
@ -257,15 +258,19 @@ namespace MySQLReplication
out << "Null Bitmap: " << bitmap_str << '\n'; out << "Null Bitmap: " << bitmap_str << '\n';
} }
void RowsEvent::parseImpl(ReadBuffer & payload) void RowsEventHeader::parse(ReadBuffer & payload)
{ {
payload.readStrict(reinterpret_cast<char *>(&table_id), 6); payload.readStrict(reinterpret_cast<char *>(&table_id), 6);
payload.readStrict(reinterpret_cast<char *>(&flags), 2); payload.readStrict(reinterpret_cast<char *>(&flags), 2);
UInt16 extra_data_len;
/// This extra_data_len contains the 2 bytes length. /// This extra_data_len contains the 2 bytes length.
payload.readStrict(reinterpret_cast<char *>(&extra_data_len), 2); payload.readStrict(reinterpret_cast<char *>(&extra_data_len), 2);
payload.ignore(extra_data_len - 2); payload.ignore(extra_data_len - 2);
}
void RowsEvent::parseImpl(ReadBuffer & payload)
{
number_columns = readLengthEncodedNumber(payload); number_columns = readLengthEncodedNumber(payload);
size_t columns_bitmap_size = (number_columns + 7) / 8; size_t columns_bitmap_size = (number_columns + 7) / 8;
switch (header.type) switch (header.type)
@ -795,37 +800,50 @@ namespace MySQLReplication
{ {
event = std::make_shared<TableMapEvent>(std::move(event_header)); event = std::make_shared<TableMapEvent>(std::move(event_header));
event->parseEvent(event_payload); event->parseEvent(event_payload);
table_map = std::static_pointer_cast<TableMapEvent>(event); auto table_map = std::static_pointer_cast<TableMapEvent>(event);
table_maps[table_map->table_id] = table_map;
break; break;
} }
case WRITE_ROWS_EVENT_V1: case WRITE_ROWS_EVENT_V1:
case WRITE_ROWS_EVENT_V2: { case WRITE_ROWS_EVENT_V2: {
if (doReplicate()) RowsEventHeader rows_header(event_header.type);
event = std::make_shared<WriteRowsEvent>(table_map, std::move(event_header)); rows_header.parse(event_payload);
if (doReplicate(rows_header.table_id))
event = std::make_shared<WriteRowsEvent>(table_maps.at(rows_header.table_id), std::move(event_header), rows_header);
else else
event = std::make_shared<DryRunEvent>(std::move(event_header)); event = std::make_shared<DryRunEvent>(std::move(event_header));
event->parseEvent(event_payload); event->parseEvent(event_payload);
if (rows_header.flags & ROWS_END_OF_STATEMENT)
table_maps.clear();
break; break;
} }
case DELETE_ROWS_EVENT_V1: case DELETE_ROWS_EVENT_V1:
case DELETE_ROWS_EVENT_V2: { case DELETE_ROWS_EVENT_V2: {
if (doReplicate()) RowsEventHeader rows_header(event_header.type);
event = std::make_shared<DeleteRowsEvent>(table_map, std::move(event_header)); rows_header.parse(event_payload);
if (doReplicate(rows_header.table_id))
event = std::make_shared<DeleteRowsEvent>(table_maps.at(rows_header.table_id), std::move(event_header), rows_header);
else else
event = std::make_shared<DryRunEvent>(std::move(event_header)); event = std::make_shared<DryRunEvent>(std::move(event_header));
event->parseEvent(event_payload); event->parseEvent(event_payload);
if (rows_header.flags & ROWS_END_OF_STATEMENT)
table_maps.clear();
break; break;
} }
case UPDATE_ROWS_EVENT_V1: case UPDATE_ROWS_EVENT_V1:
case UPDATE_ROWS_EVENT_V2: { case UPDATE_ROWS_EVENT_V2: {
if (doReplicate()) RowsEventHeader rows_header(event_header.type);
event = std::make_shared<UpdateRowsEvent>(table_map, std::move(event_header)); rows_header.parse(event_payload);
if (doReplicate(rows_header.table_id))
event = std::make_shared<UpdateRowsEvent>(table_maps.at(rows_header.table_id), std::move(event_header), rows_header);
else else
event = std::make_shared<DryRunEvent>(std::move(event_header)); event = std::make_shared<DryRunEvent>(std::move(event_header));
event->parseEvent(event_payload); event->parseEvent(event_payload);
if (rows_header.flags & ROWS_END_OF_STATEMENT)
table_maps.clear();
break; break;
} }
case GTID_EVENT: case GTID_EVENT:
@ -843,6 +861,19 @@ namespace MySQLReplication
} }
} }
} }
bool MySQLFlavor::doReplicate(UInt64 table_id)
{
if (replicate_do_db.empty())
return false;
if (table_id == 0x00ffffff)
{
// Special "dummy event"
return false;
}
auto table_map = table_maps.at(table_id);
return table_map->schema == replicate_do_db;
}
} }
} }

View File

@ -430,6 +430,22 @@ namespace MySQLReplication
void parseMeta(String meta); void parseMeta(String meta);
}; };
enum RowsEventFlags
{
ROWS_END_OF_STATEMENT = 1
};
class RowsEventHeader
{
public:
EventType type;
UInt64 table_id;
UInt16 flags;
RowsEventHeader(EventType type_) : type(type_), table_id(0), flags(0) {}
void parse(ReadBuffer & payload);
};
class RowsEvent : public EventBase class RowsEvent : public EventBase
{ {
public: public:
@ -438,9 +454,11 @@ namespace MySQLReplication
String table; String table;
std::vector<Field> rows; std::vector<Field> rows;
RowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_) RowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_, const RowsEventHeader & rows_header)
: EventBase(std::move(header_)), number_columns(0), table_id(0), flags(0), extra_data_len(0), table_map(table_map_) : EventBase(std::move(header_)), number_columns(0), table_map(table_map_)
{ {
table_id = rows_header.table_id;
flags = rows_header.flags;
schema = table_map->schema; schema = table_map->schema;
table = table_map->table; table = table_map->table;
} }
@ -450,7 +468,6 @@ namespace MySQLReplication
protected: protected:
UInt64 table_id; UInt64 table_id;
UInt16 flags; UInt16 flags;
UInt16 extra_data_len;
Bitmap columns_present_bitmap1; Bitmap columns_present_bitmap1;
Bitmap columns_present_bitmap2; Bitmap columns_present_bitmap2;
@ -464,21 +481,24 @@ namespace MySQLReplication
class WriteRowsEvent : public RowsEvent class WriteRowsEvent : public RowsEvent
{ {
public: public:
WriteRowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_) : RowsEvent(table_map_, std::move(header_)) {} WriteRowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_, const RowsEventHeader & rows_header)
: RowsEvent(table_map_, std::move(header_), rows_header) {}
MySQLEventType type() const override { return MYSQL_WRITE_ROWS_EVENT; } MySQLEventType type() const override { return MYSQL_WRITE_ROWS_EVENT; }
}; };
class DeleteRowsEvent : public RowsEvent class DeleteRowsEvent : public RowsEvent
{ {
public: public:
DeleteRowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_) : RowsEvent(table_map_, std::move(header_)) {} DeleteRowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_, const RowsEventHeader & rows_header)
: RowsEvent(table_map_, std::move(header_), rows_header) {}
MySQLEventType type() const override { return MYSQL_DELETE_ROWS_EVENT; } MySQLEventType type() const override { return MYSQL_DELETE_ROWS_EVENT; }
}; };
class UpdateRowsEvent : public RowsEvent class UpdateRowsEvent : public RowsEvent
{ {
public: public:
UpdateRowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_) : RowsEvent(table_map_, std::move(header_)) {} UpdateRowsEvent(std::shared_ptr<TableMapEvent> table_map_, EventHeader && header_, const RowsEventHeader & rows_header)
: RowsEvent(table_map_, std::move(header_), rows_header) {}
MySQLEventType type() const override { return MYSQL_UPDATE_ROWS_EVENT; } MySQLEventType type() const override { return MYSQL_UPDATE_ROWS_EVENT; }
}; };
@ -546,10 +566,10 @@ namespace MySQLReplication
Position position; Position position;
BinlogEventPtr event; BinlogEventPtr event;
String replicate_do_db; String replicate_do_db;
std::shared_ptr<TableMapEvent> table_map; std::map<UInt64, std::shared_ptr<TableMapEvent> > table_maps;
size_t checksum_signature_length = 4; size_t checksum_signature_length = 4;
inline bool doReplicate() { return (replicate_do_db.empty() || table_map->schema == replicate_do_db); } bool doReplicate(UInt64 table_id);
}; };
} }

View File

@ -75,8 +75,9 @@ namespace Protocol
TablesStatusResponse = 9, /// A response to TablesStatus request. TablesStatusResponse = 9, /// A response to TablesStatus request.
Log = 10, /// System logs of the query execution Log = 10, /// System logs of the query execution
TableColumns = 11, /// Columns' description for default values calculation TableColumns = 11, /// Columns' description for default values calculation
PartUUIDs = 12, /// List of unique parts ids.
MAX = TableColumns, MAX = PartUUIDs,
}; };
/// NOTE: If the type of packet argument would be Enum, the comparison packet >= 0 && packet < 10 /// NOTE: If the type of packet argument would be Enum, the comparison packet >= 0 && packet < 10
@ -98,6 +99,7 @@ namespace Protocol
"TablesStatusResponse", "TablesStatusResponse",
"Log", "Log",
"TableColumns", "TableColumns",
"PartUUIDs",
}; };
return packet <= MAX return packet <= MAX
? data[packet] ? data[packet]
@ -132,8 +134,9 @@ namespace Protocol
TablesStatusRequest = 5, /// Check status of tables on the server. TablesStatusRequest = 5, /// Check status of tables on the server.
KeepAlive = 6, /// Keep the connection alive KeepAlive = 6, /// Keep the connection alive
Scalar = 7, /// A block of data (compressed or not). Scalar = 7, /// A block of data (compressed or not).
IgnoredPartUUIDs = 8, /// List of unique parts ids to exclude from query processing
MAX = Scalar, MAX = IgnoredPartUUIDs,
}; };
inline const char * toString(UInt64 packet) inline const char * toString(UInt64 packet)
@ -147,6 +150,7 @@ namespace Protocol
"TablesStatusRequest", "TablesStatusRequest",
"KeepAlive", "KeepAlive",
"Scalar", "Scalar",
"IgnoredPartUUIDs",
}; };
return packet <= MAX return packet <= MAX
? data[packet] ? data[packet]

View File

@ -86,8 +86,6 @@ class IColumn;
\ \
M(Bool, optimize_move_to_prewhere, true, "Allows disabling WHERE to PREWHERE optimization in SELECT queries from MergeTree.", 0) \ M(Bool, optimize_move_to_prewhere, true, "Allows disabling WHERE to PREWHERE optimization in SELECT queries from MergeTree.", 0) \
\ \
M(Milliseconds, insert_in_memory_parts_timeout, 600000, "", 0) \
\
M(UInt64, replication_alter_partitions_sync, 1, "Wait for actions to manipulate the partitions. 0 - do not wait, 1 - wait for execution only of itself, 2 - wait for everyone.", 0) \ M(UInt64, replication_alter_partitions_sync, 1, "Wait for actions to manipulate the partitions. 0 - do not wait, 1 - wait for execution only of itself, 2 - wait for everyone.", 0) \
M(UInt64, replication_alter_columns_timeout, 60, "Wait for actions to change the table structure within the specified number of seconds. 0 - wait unlimited time.", 0) \ M(UInt64, replication_alter_columns_timeout, 60, "Wait for actions to change the table structure within the specified number of seconds. 0 - wait unlimited time.", 0) \
\ \
@ -420,6 +418,9 @@ class IColumn;
M(Bool, async_socket_for_remote, true, "Asynchronously read from socket executing remote query", 0) \ M(Bool, async_socket_for_remote, true, "Asynchronously read from socket executing remote query", 0) \
\ \
M(Bool, optimize_rewrite_sum_if_to_count_if, true, "Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent", 0) \ M(Bool, optimize_rewrite_sum_if_to_count_if, true, "Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent", 0) \
M(UInt64, insert_shard_id, 0, "If non zero, when insert into a distributed table, the data will be inserted into the shard `insert_shard_id` synchronously. Possible values range from 1 to `shards_number` of corresponding distributed table", 0) \
M(Bool, allow_experimental_query_deduplication, false, "Allow sending parts' UUIDs for a query in order to deduplicate data parts if any", 0) \
\
/** Obsolete settings that do nothing but left for compatibility reasons. Remove each one after half a year of obsolescence. */ \ /** Obsolete settings that do nothing but left for compatibility reasons. Remove each one after half a year of obsolescence. */ \
\ \
M(UInt64, max_memory_usage_for_all_queries, 0, "Obsolete. Will be removed after 2020-10-20", 0) \ M(UInt64, max_memory_usage_for_all_queries, 0, "Obsolete. Will be removed after 2020-10-20", 0) \

View File

@ -121,7 +121,7 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
out = std::make_shared<PushingToViewsBlockOutputStream>( out = std::make_shared<PushingToViewsBlockOutputStream>(
dependent_table, dependent_metadata_snapshot, *insert_context, ASTPtr()); dependent_table, dependent_metadata_snapshot, *insert_context, ASTPtr());
views.emplace_back(ViewInfo{std::move(query), database_table, std::move(out), nullptr}); views.emplace_back(ViewInfo{std::move(query), database_table, std::move(out), nullptr, 0 /* elapsed_ms */});
} }
/// Do not push to destination table if the flag is set /// Do not push to destination table if the flag is set
@ -146,8 +146,6 @@ Block PushingToViewsBlockOutputStream::getHeader() const
void PushingToViewsBlockOutputStream::write(const Block & block) void PushingToViewsBlockOutputStream::write(const Block & block)
{ {
Stopwatch watch;
/** Throw an exception if the sizes of arrays - elements of nested data structures doesn't match. /** Throw an exception if the sizes of arrays - elements of nested data structures doesn't match.
* We have to make this assertion before writing to table, because storage engine may assume that they have equal sizes. * We have to make this assertion before writing to table, because storage engine may assume that they have equal sizes.
* NOTE It'd better to do this check in serialization of nested structures (in place when this assumption is required), * NOTE It'd better to do this check in serialization of nested structures (in place when this assumption is required),
@ -177,15 +175,15 @@ void PushingToViewsBlockOutputStream::write(const Block & block)
{ {
// Push to views concurrently if enabled and more than one view is attached // Push to views concurrently if enabled and more than one view is attached
ThreadPool pool(std::min(size_t(settings.max_threads), views.size())); ThreadPool pool(std::min(size_t(settings.max_threads), views.size()));
for (size_t view_num = 0; view_num < views.size(); ++view_num) for (auto & view : views)
{ {
auto thread_group = CurrentThread::getGroup(); auto thread_group = CurrentThread::getGroup();
pool.scheduleOrThrowOnError([=, this] pool.scheduleOrThrowOnError([=, &view, this]
{ {
setThreadName("PushingToViews"); setThreadName("PushingToViews");
if (thread_group) if (thread_group)
CurrentThread::attachToIfDetached(thread_group); CurrentThread::attachToIfDetached(thread_group);
process(block, view_num); process(block, view);
}); });
} }
// Wait for concurrent view processing // Wait for concurrent view processing
@ -194,22 +192,14 @@ void PushingToViewsBlockOutputStream::write(const Block & block)
else else
{ {
// Process sequentially // Process sequentially
for (size_t view_num = 0; view_num < views.size(); ++view_num) for (auto & view : views)
{ {
process(block, view_num); process(block, view);
if (views[view_num].exception) if (view.exception)
std::rethrow_exception(views[view_num].exception); std::rethrow_exception(view.exception);
} }
} }
UInt64 milliseconds = watch.elapsedMilliseconds();
if (views.size() > 1)
{
LOG_TRACE(log, "Pushing from {} to {} views took {} ms.",
storage->getStorageID().getNameForLogs(), views.size(),
milliseconds);
}
} }
void PushingToViewsBlockOutputStream::writePrefix() void PushingToViewsBlockOutputStream::writePrefix()
@ -257,12 +247,13 @@ void PushingToViewsBlockOutputStream::writeSuffix()
if (view.exception) if (view.exception)
continue; continue;
pool.scheduleOrThrowOnError([thread_group, &view] pool.scheduleOrThrowOnError([thread_group, &view, this]
{ {
setThreadName("PushingToViews"); setThreadName("PushingToViews");
if (thread_group) if (thread_group)
CurrentThread::attachToIfDetached(thread_group); CurrentThread::attachToIfDetached(thread_group);
Stopwatch watch;
try try
{ {
view.out->writeSuffix(); view.out->writeSuffix();
@ -271,6 +262,12 @@ void PushingToViewsBlockOutputStream::writeSuffix()
{ {
view.exception = std::current_exception(); view.exception = std::current_exception();
} }
view.elapsed_ms += watch.elapsedMilliseconds();
LOG_TRACE(log, "Pushing from {} to {} took {} ms.",
storage->getStorageID().getNameForLogs(),
view.table_id.getNameForLogs(),
view.elapsed_ms);
}); });
} }
// Wait for concurrent view processing // Wait for concurrent view processing
@ -290,6 +287,7 @@ void PushingToViewsBlockOutputStream::writeSuffix()
if (parallel_processing) if (parallel_processing)
continue; continue;
Stopwatch watch;
try try
{ {
view.out->writeSuffix(); view.out->writeSuffix();
@ -299,10 +297,24 @@ void PushingToViewsBlockOutputStream::writeSuffix()
ex.addMessage("while write prefix to view " + view.table_id.getNameForLogs()); ex.addMessage("while write prefix to view " + view.table_id.getNameForLogs());
throw; throw;
} }
view.elapsed_ms += watch.elapsedMilliseconds();
LOG_TRACE(log, "Pushing from {} to {} took {} ms.",
storage->getStorageID().getNameForLogs(),
view.table_id.getNameForLogs(),
view.elapsed_ms);
} }
if (first_exception) if (first_exception)
std::rethrow_exception(first_exception); std::rethrow_exception(first_exception);
UInt64 milliseconds = main_watch.elapsedMilliseconds();
if (views.size() > 1)
{
LOG_TRACE(log, "Pushing from {} to {} views took {} ms.",
storage->getStorageID().getNameForLogs(), views.size(),
milliseconds);
}
} }
void PushingToViewsBlockOutputStream::flush() void PushingToViewsBlockOutputStream::flush()
@ -314,10 +326,9 @@ void PushingToViewsBlockOutputStream::flush()
view.out->flush(); view.out->flush();
} }
void PushingToViewsBlockOutputStream::process(const Block & block, size_t view_num) void PushingToViewsBlockOutputStream::process(const Block & block, ViewInfo & view)
{ {
Stopwatch watch; Stopwatch watch;
auto & view = views[view_num];
try try
{ {
@ -379,11 +390,7 @@ void PushingToViewsBlockOutputStream::process(const Block & block, size_t view_n
view.exception = std::current_exception(); view.exception = std::current_exception();
} }
UInt64 milliseconds = watch.elapsedMilliseconds(); view.elapsed_ms += watch.elapsedMilliseconds();
LOG_TRACE(log, "Pushing from {} to {} took {} ms.",
storage->getStorageID().getNameForLogs(),
view.table_id.getNameForLogs(),
milliseconds);
} }
} }

View File

@ -1,6 +1,7 @@
#pragma once #pragma once
#include <DataStreams/IBlockOutputStream.h> #include <DataStreams/IBlockOutputStream.h>
#include <Common/Stopwatch.h>
#include <Parsers/IAST_fwd.h> #include <Parsers/IAST_fwd.h>
#include <Storages/IStorage.h> #include <Storages/IStorage.h>
@ -44,6 +45,7 @@ private:
const Context & context; const Context & context;
ASTPtr query_ptr; ASTPtr query_ptr;
Stopwatch main_watch;
struct ViewInfo struct ViewInfo
{ {
@ -51,13 +53,14 @@ private:
StorageID table_id; StorageID table_id;
BlockOutputStreamPtr out; BlockOutputStreamPtr out;
std::exception_ptr exception; std::exception_ptr exception;
UInt64 elapsed_ms = 0;
}; };
std::vector<ViewInfo> views; std::vector<ViewInfo> views;
std::unique_ptr<Context> select_context; std::unique_ptr<Context> select_context;
std::unique_ptr<Context> insert_context; std::unique_ptr<Context> insert_context;
void process(const Block & block, size_t view_num); void process(const Block & block, ViewInfo & view);
}; };

View File

@ -13,6 +13,7 @@
#include <Interpreters/InternalTextLogsQueue.h> #include <Interpreters/InternalTextLogsQueue.h>
#include <IO/ConnectionTimeoutsContext.h> #include <IO/ConnectionTimeoutsContext.h>
#include <Common/FiberStack.h> #include <Common/FiberStack.h>
#include <Storages/MergeTree/MergeTreeDataPartUUID.h>
namespace DB namespace DB
{ {
@ -20,6 +21,7 @@ namespace DB
namespace ErrorCodes namespace ErrorCodes
{ {
extern const int UNKNOWN_PACKET_FROM_SERVER; extern const int UNKNOWN_PACKET_FROM_SERVER;
extern const int DUPLICATED_PART_UUIDS;
} }
RemoteQueryExecutor::RemoteQueryExecutor( RemoteQueryExecutor::RemoteQueryExecutor(
@ -158,6 +160,7 @@ void RemoteQueryExecutor::sendQuery()
std::lock_guard guard(was_cancelled_mutex); std::lock_guard guard(was_cancelled_mutex);
established = true; established = true;
was_cancelled = false;
auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(settings); auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(settings);
ClientInfo modified_client_info = context.getClientInfo(); ClientInfo modified_client_info = context.getClientInfo();
@ -167,6 +170,12 @@ void RemoteQueryExecutor::sendQuery()
modified_client_info.client_trace_context = CurrentThread::get().thread_trace_context; modified_client_info.client_trace_context = CurrentThread::get().thread_trace_context;
} }
{
std::lock_guard lock(duplicated_part_uuids_mutex);
if (!duplicated_part_uuids.empty())
multiplexed_connections->sendIgnoredPartUUIDs(duplicated_part_uuids);
}
multiplexed_connections->sendQuery(timeouts, query, query_id, stage, modified_client_info, true); multiplexed_connections->sendQuery(timeouts, query, query_id, stage, modified_client_info, true);
established = false; established = false;
@ -196,6 +205,8 @@ Block RemoteQueryExecutor::read()
if (auto block = processPacket(std::move(packet))) if (auto block = processPacket(std::move(packet)))
return *block; return *block;
else if (got_duplicated_part_uuids)
return std::get<Block>(restartQueryWithoutDuplicatedUUIDs());
} }
} }
@ -211,7 +222,7 @@ std::variant<Block, int> RemoteQueryExecutor::read(std::unique_ptr<ReadContext>
return Block(); return Block();
} }
if (!read_context) if (!read_context || resent_query)
{ {
std::lock_guard lock(was_cancelled_mutex); std::lock_guard lock(was_cancelled_mutex);
if (was_cancelled) if (was_cancelled)
@ -234,6 +245,8 @@ std::variant<Block, int> RemoteQueryExecutor::read(std::unique_ptr<ReadContext>
{ {
if (auto data = processPacket(std::move(read_context->packet))) if (auto data = processPacket(std::move(read_context->packet)))
return std::move(*data); return std::move(*data);
else if (got_duplicated_part_uuids)
return restartQueryWithoutDuplicatedUUIDs(&read_context);
} }
} }
while (true); while (true);
@ -242,10 +255,39 @@ std::variant<Block, int> RemoteQueryExecutor::read(std::unique_ptr<ReadContext>
#endif #endif
} }
std::variant<Block, int> RemoteQueryExecutor::restartQueryWithoutDuplicatedUUIDs(std::unique_ptr<ReadContext> * read_context)
{
/// Cancel previous query and disconnect before retry.
cancel(read_context);
multiplexed_connections->disconnect();
/// Only resend once, otherwise throw an exception
if (!resent_query)
{
if (log)
LOG_DEBUG(log, "Found duplicate UUIDs, will retry query without those parts");
resent_query = true;
sent_query = false;
got_duplicated_part_uuids = false;
/// Consecutive read will implicitly send query first.
if (!read_context)
return read();
else
return read(*read_context);
}
throw Exception("Found duplicate uuids while processing query.", ErrorCodes::DUPLICATED_PART_UUIDS);
}
std::optional<Block> RemoteQueryExecutor::processPacket(Packet packet) std::optional<Block> RemoteQueryExecutor::processPacket(Packet packet)
{ {
switch (packet.type) switch (packet.type)
{ {
case Protocol::Server::PartUUIDs:
if (!setPartUUIDs(packet.part_uuids))
got_duplicated_part_uuids = true;
break;
case Protocol::Server::Data: case Protocol::Server::Data:
/// If the block is not empty and is not a header block /// If the block is not empty and is not a header block
if (packet.block && (packet.block.rows() > 0)) if (packet.block && (packet.block.rows() > 0))
@ -306,6 +348,20 @@ std::optional<Block> RemoteQueryExecutor::processPacket(Packet packet)
return {}; return {};
} }
bool RemoteQueryExecutor::setPartUUIDs(const std::vector<UUID> & uuids)
{
Context & query_context = const_cast<Context &>(context).getQueryContext();
auto duplicates = query_context.getPartUUIDs()->add(uuids);
if (!duplicates.empty())
{
std::lock_guard lock(duplicated_part_uuids_mutex);
duplicated_part_uuids.insert(duplicated_part_uuids.begin(), duplicates.begin(), duplicates.end());
return false;
}
return true;
}
void RemoteQueryExecutor::finish(std::unique_ptr<ReadContext> * read_context) void RemoteQueryExecutor::finish(std::unique_ptr<ReadContext> * read_context)
{ {
/** If one of: /** If one of:
@ -383,6 +439,7 @@ void RemoteQueryExecutor::sendExternalTables()
{ {
std::lock_guard lock(external_tables_mutex); std::lock_guard lock(external_tables_mutex);
external_tables_data.clear();
external_tables_data.reserve(count); external_tables_data.reserve(count);
for (size_t i = 0; i < count; ++i) for (size_t i = 0; i < count; ++i)

View File

@ -57,6 +57,9 @@ public:
/// Create connection and send query, external tables and scalars. /// Create connection and send query, external tables and scalars.
void sendQuery(); void sendQuery();
/// Query is resent to a replica, the query itself can be modified.
std::atomic<bool> resent_query { false };
/// Read next block of data. Returns empty block if query is finished. /// Read next block of data. Returns empty block if query is finished.
Block read(); Block read();
@ -152,6 +155,14 @@ private:
*/ */
std::atomic<bool> got_unknown_packet_from_replica { false }; std::atomic<bool> got_unknown_packet_from_replica { false };
/** Got duplicated uuids from replica
*/
std::atomic<bool> got_duplicated_part_uuids{ false };
/// Parts uuids, collected from remote replicas
std::mutex duplicated_part_uuids_mutex;
std::vector<UUID> duplicated_part_uuids;
PoolMode pool_mode = PoolMode::GET_MANY; PoolMode pool_mode = PoolMode::GET_MANY;
StorageID main_table = StorageID::createEmpty(); StorageID main_table = StorageID::createEmpty();
@ -163,6 +174,14 @@ private:
/// Send all temporary tables to remote servers /// Send all temporary tables to remote servers
void sendExternalTables(); void sendExternalTables();
/// Set part uuids to a query context, collected from remote replicas.
/// Return true if duplicates found.
bool setPartUUIDs(const std::vector<UUID> & uuids);
/// Cancell query and restart it with info about duplicated UUIDs
/// only for `allow_experimental_query_deduplication`.
std::variant<Block, int> restartQueryWithoutDuplicatedUUIDs(std::unique_ptr<ReadContext> * read_context = nullptr);
/// If wasn't sent yet, send request to cancel all connections to replicas /// If wasn't sent yet, send request to cancel all connections to replicas
void tryCancel(const char * reason, std::unique_ptr<ReadContext> * read_context); void tryCancel(const char * reason, std::unique_ptr<ReadContext> * read_context);
@ -174,6 +193,10 @@ private:
/// Process packet for read and return data block if possible. /// Process packet for read and return data block if possible.
std::optional<Block> processPacket(Packet packet); std::optional<Block> processPacket(Packet packet);
/// Reads packet by packet
Block readPackets();
}; };
} }

View File

@ -104,11 +104,16 @@ template <typename A, typename B> struct ResultOfIntegerDivision
sizeof(A)>::Type; sizeof(A)>::Type;
}; };
/** Division with remainder you get a number with the same number of bits as in divisor. /** Division with remainder you get a number with the same number of bits as in divisor,
* or larger in case of signed type.
*/ */
template <typename A, typename B> struct ResultOfModulo template <typename A, typename B> struct ResultOfModulo
{ {
using Type0 = typename Construct<is_signed_v<A> || is_signed_v<B>, false, sizeof(B)>::Type; static constexpr bool result_is_signed = is_signed_v<A>;
/// If modulo of division can yield negative number, we need larger type to accommodate it.
/// Example: toInt32(-199) % toUInt8(200) will return -199 that does not fit in Int8, only in Int16.
static constexpr size_t size_of_result = result_is_signed ? nextSize(sizeof(B)) : sizeof(B);
using Type0 = typename Construct<result_is_signed, false, size_of_result>::Type;
using Type = std::conditional_t<std::is_floating_point_v<A> || std::is_floating_point_v<B>, Float64, Type0>; using Type = std::conditional_t<std::is_floating_point_v<A> || std::is_floating_point_v<B>, Float64, Type0>;
}; };

View File

@ -1291,7 +1291,6 @@ void CacheDictionary::update(UpdateUnitPtr & update_unit_ptr)
BlockInputStreamPtr stream = current_source_ptr->loadIds(update_unit_ptr->requested_ids); BlockInputStreamPtr stream = current_source_ptr->loadIds(update_unit_ptr->requested_ids);
stream->readPrefix(); stream->readPrefix();
while (true) while (true)
{ {
Block block = stream->read(); Block block = stream->read();

View File

@ -186,6 +186,9 @@ namespace
if (!err.empty()) if (!err.empty())
LOG_ERROR(log, "Having stderr: {}", err); LOG_ERROR(log, "Having stderr: {}", err);
if (thread.joinable())
thread.join();
command->wait(); command->wait();
} }

View File

@ -108,7 +108,7 @@ DiskCacheWrapper::readFile(const String & path, size_t buf_size, size_t estimate
if (!cache_file_predicate(path)) if (!cache_file_predicate(path))
return DiskDecorator::readFile(path, buf_size, estimated_size, aio_threshold, mmap_threshold); return DiskDecorator::readFile(path, buf_size, estimated_size, aio_threshold, mmap_threshold);
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Read file {} from cache", backQuote(path)); LOG_DEBUG(&Poco::Logger::get("DiskCache"), "Read file {} from cache", backQuote(path));
if (cache_disk->exists(path)) if (cache_disk->exists(path))
return cache_disk->readFile(path, buf_size, estimated_size, aio_threshold, mmap_threshold); return cache_disk->readFile(path, buf_size, estimated_size, aio_threshold, mmap_threshold);
@ -122,11 +122,11 @@ DiskCacheWrapper::readFile(const String & path, size_t buf_size, size_t estimate
{ {
/// This thread will responsible for file downloading to cache. /// This thread will responsible for file downloading to cache.
metadata->status = DOWNLOADING; metadata->status = DOWNLOADING;
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "File {} doesn't exist in cache. Will download it", backQuote(path)); LOG_DEBUG(&Poco::Logger::get("DiskCache"), "File {} doesn't exist in cache. Will download it", backQuote(path));
} }
else if (metadata->status == DOWNLOADING) else if (metadata->status == DOWNLOADING)
{ {
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Waiting for file {} download to cache", backQuote(path)); LOG_DEBUG(&Poco::Logger::get("DiskCache"), "Waiting for file {} download to cache", backQuote(path));
metadata->condition.wait(lock, [metadata] { return metadata->status == DOWNLOADED || metadata->status == ERROR; }); metadata->condition.wait(lock, [metadata] { return metadata->status == DOWNLOADED || metadata->status == ERROR; });
} }
} }
@ -139,7 +139,7 @@ DiskCacheWrapper::readFile(const String & path, size_t buf_size, size_t estimate
{ {
try try
{ {
auto dir_path = getDirectoryPath(path); auto dir_path = directoryPath(path);
if (!cache_disk->exists(dir_path)) if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path); cache_disk->createDirectories(dir_path);
@ -151,11 +151,11 @@ DiskCacheWrapper::readFile(const String & path, size_t buf_size, size_t estimate
} }
cache_disk->moveFile(tmp_path, path); cache_disk->moveFile(tmp_path, path);
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "File {} downloaded to cache", backQuote(path)); LOG_DEBUG(&Poco::Logger::get("DiskCache"), "File {} downloaded to cache", backQuote(path));
} }
catch (...) catch (...)
{ {
tryLogCurrentException("DiskS3", "Failed to download file + " + backQuote(path) + " to cache"); tryLogCurrentException("DiskCache", "Failed to download file + " + backQuote(path) + " to cache");
result_status = ERROR; result_status = ERROR;
} }
} }
@ -180,9 +180,9 @@ DiskCacheWrapper::writeFile(const String & path, size_t buf_size, WriteMode mode
if (!cache_file_predicate(path)) if (!cache_file_predicate(path))
return DiskDecorator::writeFile(path, buf_size, mode); return DiskDecorator::writeFile(path, buf_size, mode);
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Write file {} to cache", backQuote(path)); LOG_DEBUG(&Poco::Logger::get("DiskCache"), "Write file {} to cache", backQuote(path));
auto dir_path = getDirectoryPath(path); auto dir_path = directoryPath(path);
if (!cache_disk->exists(dir_path)) if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path); cache_disk->createDirectories(dir_path);
@ -217,7 +217,7 @@ void DiskCacheWrapper::moveFile(const String & from_path, const String & to_path
{ {
if (cache_disk->exists(from_path)) if (cache_disk->exists(from_path))
{ {
auto dir_path = getDirectoryPath(to_path); auto dir_path = directoryPath(to_path);
if (!cache_disk->exists(dir_path)) if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path); cache_disk->createDirectories(dir_path);
@ -230,7 +230,7 @@ void DiskCacheWrapper::replaceFile(const String & from_path, const String & to_p
{ {
if (cache_disk->exists(from_path)) if (cache_disk->exists(from_path))
{ {
auto dir_path = getDirectoryPath(to_path); auto dir_path = directoryPath(to_path);
if (!cache_disk->exists(dir_path)) if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path); cache_disk->createDirectories(dir_path);
@ -239,19 +239,6 @@ void DiskCacheWrapper::replaceFile(const String & from_path, const String & to_p
DiskDecorator::replaceFile(from_path, to_path); DiskDecorator::replaceFile(from_path, to_path);
} }
void DiskCacheWrapper::copyFile(const String & from_path, const String & to_path)
{
if (cache_disk->exists(from_path))
{
auto dir_path = getDirectoryPath(to_path);
if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path);
cache_disk->copyFile(from_path, to_path);
}
DiskDecorator::copyFile(from_path, to_path);
}
void DiskCacheWrapper::removeFile(const String & path) void DiskCacheWrapper::removeFile(const String & path)
{ {
cache_disk->removeFileIfExists(path); cache_disk->removeFileIfExists(path);
@ -280,9 +267,10 @@ void DiskCacheWrapper::removeRecursive(const String & path)
void DiskCacheWrapper::createHardLink(const String & src_path, const String & dst_path) void DiskCacheWrapper::createHardLink(const String & src_path, const String & dst_path)
{ {
if (cache_disk->exists(src_path)) /// Don't create hardlinks for cache files to shadow directory as it just waste cache disk space.
if (cache_disk->exists(src_path) && !dst_path.starts_with("shadow/"))
{ {
auto dir_path = getDirectoryPath(dst_path); auto dir_path = directoryPath(dst_path);
if (!cache_disk->exists(dir_path)) if (!cache_disk->exists(dir_path))
cache_disk->createDirectories(dir_path); cache_disk->createDirectories(dir_path);
@ -303,11 +291,6 @@ void DiskCacheWrapper::createDirectories(const String & path)
DiskDecorator::createDirectories(path); DiskDecorator::createDirectories(path);
} }
inline String DiskCacheWrapper::getDirectoryPath(const String & path)
{
return Poco::Path{path}.setFileName("").toString();
}
/// TODO: Current reservation mechanism leaks IDisk abstraction details. /// TODO: Current reservation mechanism leaks IDisk abstraction details.
/// This hack is needed to return proper disk pointer (wrapper instead of implementation) from reservation object. /// This hack is needed to return proper disk pointer (wrapper instead of implementation) from reservation object.
class ReservationDelegate : public IReservation class ReservationDelegate : public IReservation

View File

@ -32,7 +32,6 @@ public:
void moveDirectory(const String & from_path, const String & to_path) override; void moveDirectory(const String & from_path, const String & to_path) override;
void moveFile(const String & from_path, const String & to_path) override; void moveFile(const String & from_path, const String & to_path) override;
void replaceFile(const String & from_path, const String & to_path) override; void replaceFile(const String & from_path, const String & to_path) override;
void copyFile(const String & from_path, const String & to_path) override;
std::unique_ptr<ReadBufferFromFileBase> std::unique_ptr<ReadBufferFromFileBase>
readFile(const String & path, size_t buf_size, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold) const override; readFile(const String & path, size_t buf_size, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold) const override;
std::unique_ptr<WriteBufferFromFileBase> std::unique_ptr<WriteBufferFromFileBase>
@ -46,7 +45,6 @@ public:
private: private:
std::shared_ptr<FileDownloadMetadata> acquireDownloadMetadata(const String & path) const; std::shared_ptr<FileDownloadMetadata> acquireDownloadMetadata(const String & path) const;
static String getDirectoryPath(const String & path);
/// Disk to cache files. /// Disk to cache files.
std::shared_ptr<DiskLocal> cache_disk; std::shared_ptr<DiskLocal> cache_disk;

View File

@ -103,11 +103,6 @@ void DiskDecorator::replaceFile(const String & from_path, const String & to_path
delegate->replaceFile(from_path, to_path); delegate->replaceFile(from_path, to_path);
} }
void DiskDecorator::copyFile(const String & from_path, const String & to_path)
{
delegate->copyFile(from_path, to_path);
}
void DiskDecorator::copy(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path) void DiskDecorator::copy(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path)
{ {
delegate->copy(from_path, to_disk, to_path); delegate->copy(from_path, to_disk, to_path);
@ -185,4 +180,9 @@ SyncGuardPtr DiskDecorator::getDirectorySyncGuard(const String & path) const
return delegate->getDirectorySyncGuard(path); return delegate->getDirectorySyncGuard(path);
} }
void DiskDecorator::onFreeze(const String & path)
{
delegate->onFreeze(path);
}
} }

View File

@ -32,7 +32,6 @@ public:
void createFile(const String & path) override; void createFile(const String & path) override;
void moveFile(const String & from_path, const String & to_path) override; void moveFile(const String & from_path, const String & to_path) override;
void replaceFile(const String & from_path, const String & to_path) override; void replaceFile(const String & from_path, const String & to_path) override;
void copyFile(const String & from_path, const String & to_path) override;
void copy(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path) override; void copy(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path) override;
void listFiles(const String & path, std::vector<String> & file_names) override; void listFiles(const String & path, std::vector<String> & file_names) override;
std::unique_ptr<ReadBufferFromFileBase> std::unique_ptr<ReadBufferFromFileBase>
@ -48,8 +47,9 @@ public:
void setReadOnly(const String & path) override; void setReadOnly(const String & path) override;
void createHardLink(const String & src_path, const String & dst_path) override; void createHardLink(const String & src_path, const String & dst_path) override;
void truncateFile(const String & path, size_t size) override; void truncateFile(const String & path, size_t size) override;
const String getType() const override { return delegate->getType(); } DiskType::Type getType() const override { return delegate->getType(); }
Executor & getExecutor() override; Executor & getExecutor() override;
void onFreeze(const String & path) override;
SyncGuardPtr getDirectorySyncGuard(const String & path) const override; SyncGuardPtr getDirectorySyncGuard(const String & path) const override;
protected: protected:

View File

@ -218,11 +218,6 @@ void DiskLocal::replaceFile(const String & from_path, const String & to_path)
from_file.renameTo(to_file.path()); from_file.renameTo(to_file.path());
} }
void DiskLocal::copyFile(const String & from_path, const String & to_path)
{
Poco::File(disk_path + from_path).copyTo(disk_path + to_path);
}
std::unique_ptr<ReadBufferFromFileBase> std::unique_ptr<ReadBufferFromFileBase>
DiskLocal::readFile(const String & path, size_t buf_size, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold) const DiskLocal::readFile(const String & path, size_t buf_size, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold) const
{ {

View File

@ -67,8 +67,6 @@ public:
void replaceFile(const String & from_path, const String & to_path) override; void replaceFile(const String & from_path, const String & to_path) override;
void copyFile(const String & from_path, const String & to_path) override;
void copy(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path) override; void copy(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path) override;
void listFiles(const String & path, std::vector<String> & file_names) override; void listFiles(const String & path, std::vector<String> & file_names) override;
@ -100,7 +98,7 @@ public:
void truncateFile(const String & path, size_t size) override; void truncateFile(const String & path, size_t size) override;
const String getType() const override { return "local"; } DiskType::Type getType() const override { return DiskType::Type::Local; }
SyncGuardPtr getDirectorySyncGuard(const String & path) const override; SyncGuardPtr getDirectorySyncGuard(const String & path) const override;

View File

@ -314,11 +314,6 @@ void DiskMemory::replaceFileImpl(const String & from_path, const String & to_pat
files.insert(std::move(node)); files.insert(std::move(node));
} }
void DiskMemory::copyFile(const String & /*from_path*/, const String & /*to_path*/)
{
throw Exception("Method copyFile is not implemented for memory disks", ErrorCodes::NOT_IMPLEMENTED);
}
std::unique_ptr<ReadBufferFromFileBase> DiskMemory::readFile(const String & path, size_t /*buf_size*/, size_t, size_t, size_t) const std::unique_ptr<ReadBufferFromFileBase> DiskMemory::readFile(const String & path, size_t /*buf_size*/, size_t, size_t, size_t) const
{ {
std::lock_guard lock(mutex); std::lock_guard lock(mutex);

View File

@ -60,8 +60,6 @@ public:
void replaceFile(const String & from_path, const String & to_path) override; void replaceFile(const String & from_path, const String & to_path) override;
void copyFile(const String & from_path, const String & to_path) override;
void listFiles(const String & path, std::vector<String> & file_names) override; void listFiles(const String & path, std::vector<String> & file_names) override;
std::unique_ptr<ReadBufferFromFileBase> readFile( std::unique_ptr<ReadBufferFromFileBase> readFile(
@ -91,7 +89,7 @@ public:
void truncateFile(const String & path, size_t size) override; void truncateFile(const String & path, size_t size) override;
const String getType() const override { return "memory"; } DiskType::Type getType() const override { return DiskType::Type::RAM; }
private: private:
void createDirectoriesImpl(const String & path); void createDirectoriesImpl(const String & path);

View File

@ -57,6 +57,29 @@ public:
using SpacePtr = std::shared_ptr<Space>; using SpacePtr = std::shared_ptr<Space>;
struct DiskType
{
enum class Type
{
Local,
RAM,
S3
};
static String toString(Type disk_type)
{
switch (disk_type)
{
case Type::Local:
return "local";
case Type::RAM:
return "memory";
case Type::S3:
return "s3";
}
__builtin_unreachable();
}
};
/** /**
* A guard, that should synchronize file's or directory's state * A guard, that should synchronize file's or directory's state
* with storage device (e.g. fsync in POSIX) in its destructor. * with storage device (e.g. fsync in POSIX) in its destructor.
@ -140,9 +163,6 @@ public:
/// If a file with `to_path` path already exists, it will be replaced. /// If a file with `to_path` path already exists, it will be replaced.
virtual void replaceFile(const String & from_path, const String & to_path) = 0; virtual void replaceFile(const String & from_path, const String & to_path) = 0;
/// Copy the file from `from_path` to `to_path`.
virtual void copyFile(const String & from_path, const String & to_path) = 0;
/// Recursively copy data containing at `from_path` to `to_path` located at `to_disk`. /// Recursively copy data containing at `from_path` to `to_path` located at `to_disk`.
virtual void copy(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path); virtual void copy(const String & from_path, const std::shared_ptr<IDisk> & to_disk, const String & to_path);
@ -191,7 +211,7 @@ public:
virtual void truncateFile(const String & path, size_t size); virtual void truncateFile(const String & path, size_t size);
/// Return disk type - "local", "s3", etc. /// Return disk type - "local", "s3", etc.
virtual const String getType() const = 0; virtual DiskType::Type getType() const = 0;
/// Invoked when Global Context is shutdown. /// Invoked when Global Context is shutdown.
virtual void shutdown() { } virtual void shutdown() { }
@ -199,6 +219,9 @@ public:
/// Returns executor to perform asynchronous operations. /// Returns executor to perform asynchronous operations.
virtual Executor & getExecutor() { return *executor; } virtual Executor & getExecutor() { return *executor; }
/// Invoked on partitions freeze query.
virtual void onFreeze(const String &) { }
/// Returns guard, that insures synchronization of directory metadata with storage device. /// Returns guard, that insures synchronization of directory metadata with storage device.
virtual SyncGuardPtr getDirectorySyncGuard(const String & path) const; virtual SyncGuardPtr getDirectorySyncGuard(const String & path) const;
@ -269,4 +292,11 @@ inline String fileName(const String & path)
{ {
return Poco::Path(path).getFileName(); return Poco::Path(path).getFileName();
} }
/// Return directory path for the specified path.
inline String directoryPath(const String & path)
{
return Poco::Path(path).setFileName("").toString();
}
} }

View File

@ -23,6 +23,8 @@
#include <aws/s3/model/CopyObjectRequest.h> #include <aws/s3/model/CopyObjectRequest.h>
#include <aws/s3/model/DeleteObjectsRequest.h> #include <aws/s3/model/DeleteObjectsRequest.h>
#include <aws/s3/model/GetObjectRequest.h> #include <aws/s3/model/GetObjectRequest.h>
#include <aws/s3/model/ListObjectsV2Request.h>
#include <aws/s3/model/HeadObjectRequest.h>
#include <boost/algorithm/string.hpp> #include <boost/algorithm/string.hpp>
@ -32,12 +34,15 @@ namespace DB
namespace ErrorCodes namespace ErrorCodes
{ {
extern const int S3_ERROR;
extern const int FILE_ALREADY_EXISTS; extern const int FILE_ALREADY_EXISTS;
extern const int CANNOT_SEEK_THROUGH_FILE; extern const int CANNOT_SEEK_THROUGH_FILE;
extern const int UNKNOWN_FORMAT; extern const int UNKNOWN_FORMAT;
extern const int INCORRECT_DISK_INDEX; extern const int INCORRECT_DISK_INDEX;
extern const int BAD_ARGUMENTS;
extern const int PATH_ACCESS_DENIED; extern const int PATH_ACCESS_DENIED;
extern const int CANNOT_DELETE_DIRECTORY; extern const int CANNOT_DELETE_DIRECTORY;
extern const int LOGICAL_ERROR;
} }
@ -76,12 +81,12 @@ String getRandomName()
} }
template <typename Result, typename Error> template <typename Result, typename Error>
void throwIfError(Aws::Utils::Outcome<Result, Error> && response) void throwIfError(Aws::Utils::Outcome<Result, Error> & response)
{ {
if (!response.IsSuccess()) if (!response.IsSuccess())
{ {
const auto & err = response.GetError(); const auto & err = response.GetError();
throw Exception(err.GetMessage(), static_cast<int>(err.GetErrorType())); throw Exception(std::to_string(static_cast<int>(err.GetErrorType())) + ": " + err.GetMessage(), ErrorCodes::S3_ERROR);
} }
} }
@ -244,7 +249,7 @@ public:
if (whence == SEEK_CUR) if (whence == SEEK_CUR)
{ {
/// If position within current working buffer - shift pos. /// If position within current working buffer - shift pos.
if (working_buffer.size() && size_t(getPosition() + offset_) < absolute_position) if (!working_buffer.empty() && size_t(getPosition() + offset_) < absolute_position)
{ {
pos += offset_; pos += offset_;
return getPosition(); return getPosition();
@ -257,7 +262,7 @@ public:
else if (whence == SEEK_SET) else if (whence == SEEK_SET)
{ {
/// If position within current working buffer - shift pos. /// If position within current working buffer - shift pos.
if (working_buffer.size() && size_t(offset_) >= absolute_position - working_buffer.size() if (!working_buffer.empty() && size_t(offset_) >= absolute_position - working_buffer.size()
&& size_t(offset_) < absolute_position) && size_t(offset_) < absolute_position)
{ {
pos = working_buffer.end() - (absolute_position - offset_); pos = working_buffer.end() - (absolute_position - offset_);
@ -500,17 +505,17 @@ private:
CurrentMetrics::Increment metric_increment; CurrentMetrics::Increment metric_increment;
}; };
/// Runs tasks asynchronously using global thread pool. /// Runs tasks asynchronously using thread pool.
class AsyncExecutor : public Executor class AsyncExecutor : public Executor
{ {
public: public:
explicit AsyncExecutor() = default; explicit AsyncExecutor(int thread_pool_size) : pool(ThreadPool(thread_pool_size)) { }
std::future<void> execute(std::function<void()> task) override std::future<void> execute(std::function<void()> task) override
{ {
auto promise = std::make_shared<std::promise<void>>(); auto promise = std::make_shared<std::promise<void>>();
GlobalThreadPool::instance().scheduleOrThrowOnError( pool.scheduleOrThrowOnError(
[promise, task]() [promise, task]()
{ {
try try
@ -531,6 +536,9 @@ public:
return promise->get_future(); return promise->get_future();
} }
private:
ThreadPool pool;
}; };
@ -544,8 +552,10 @@ DiskS3::DiskS3(
size_t min_upload_part_size_, size_t min_upload_part_size_,
size_t max_single_part_upload_size_, size_t max_single_part_upload_size_,
size_t min_bytes_for_seek_, size_t min_bytes_for_seek_,
bool send_metadata_) bool send_metadata_,
: IDisk(std::make_unique<AsyncExecutor>()) int thread_pool_size_,
int list_object_keys_size_)
: IDisk(std::make_unique<AsyncExecutor>(thread_pool_size_))
, name(std::move(name_)) , name(std::move(name_))
, client(std::move(client_)) , client(std::move(client_))
, proxy_configuration(std::move(proxy_configuration_)) , proxy_configuration(std::move(proxy_configuration_))
@ -556,6 +566,8 @@ DiskS3::DiskS3(
, max_single_part_upload_size(max_single_part_upload_size_) , max_single_part_upload_size(max_single_part_upload_size_)
, min_bytes_for_seek(min_bytes_for_seek_) , min_bytes_for_seek(min_bytes_for_seek_)
, send_metadata(send_metadata_) , send_metadata(send_metadata_)
, revision_counter(0)
, list_object_keys_size(list_object_keys_size_)
{ {
} }
@ -613,45 +625,31 @@ void DiskS3::moveFile(const String & from_path, const String & to_path)
{ {
if (exists(to_path)) if (exists(to_path))
throw Exception("File already exists: " + to_path, ErrorCodes::FILE_ALREADY_EXISTS); throw Exception("File already exists: " + to_path, ErrorCodes::FILE_ALREADY_EXISTS);
if (send_metadata)
{
auto revision = ++revision_counter;
const DiskS3::ObjectMetadata object_metadata {
{"from_path", from_path},
{"to_path", to_path}
};
createFileOperationObject("rename", revision, object_metadata);
}
Poco::File(metadata_path + from_path).renameTo(metadata_path + to_path); Poco::File(metadata_path + from_path).renameTo(metadata_path + to_path);
} }
void DiskS3::replaceFile(const String & from_path, const String & to_path) void DiskS3::replaceFile(const String & from_path, const String & to_path)
{ {
Poco::File from_file(metadata_path + from_path); if (exists(to_path))
Poco::File to_file(metadata_path + to_path);
if (to_file.exists())
{ {
Poco::File tmp_file(metadata_path + to_path + ".old"); const String tmp_path = to_path + ".old";
to_file.renameTo(tmp_file.path()); moveFile(to_path, tmp_path);
from_file.renameTo(metadata_path + to_path); moveFile(from_path, to_path);
removeFile(to_path + ".old"); removeFile(tmp_path);
} }
else else
from_file.renameTo(to_file.path()); moveFile(from_path, to_path);
}
void DiskS3::copyFile(const String & from_path, const String & to_path)
{
if (exists(to_path))
removeFile(to_path);
auto from = readMeta(from_path);
auto to = createMeta(to_path);
for (const auto & [path, size] : from.s3_objects)
{
auto new_path = getRandomName();
Aws::S3::Model::CopyObjectRequest req;
req.SetCopySource(bucket + "/" + s3_root_path + path);
req.SetBucket(bucket);
req.SetKey(s3_root_path + new_path);
throwIfError(client->CopyObject(req));
to.addObject(new_path, size);
}
to.save();
} }
std::unique_ptr<ReadBufferFromFileBase> DiskS3::readFile(const String & path, size_t buf_size, size_t, size_t, size_t) const std::unique_ptr<ReadBufferFromFileBase> DiskS3::readFile(const String & path, size_t buf_size, size_t, size_t, size_t) const
@ -673,7 +671,17 @@ std::unique_ptr<WriteBufferFromFileBase> DiskS3::writeFile(const String & path,
/// Path to store new S3 object. /// Path to store new S3 object.
auto s3_path = getRandomName(); auto s3_path = getRandomName();
auto object_metadata = createObjectMetadata(path);
std::optional<ObjectMetadata> object_metadata;
if (send_metadata)
{
auto revision = ++revision_counter;
object_metadata = {
{"path", path}
};
s3_path = "r" + revisionToString(revision) + "-file-" + s3_path;
}
if (!exist || mode == WriteMode::Rewrite) if (!exist || mode == WriteMode::Rewrite)
{ {
/// If metadata file exists - remove and create new. /// If metadata file exists - remove and create new.
@ -777,7 +785,8 @@ void DiskS3::removeAws(const AwsS3KeyKeeper & keys)
Aws::S3::Model::DeleteObjectsRequest request; Aws::S3::Model::DeleteObjectsRequest request;
request.SetBucket(bucket); request.SetBucket(bucket);
request.SetDelete(delkeys); request.SetDelete(delkeys);
throwIfError(client->DeleteObjects(request)); auto outcome = client->DeleteObjects(request);
throwIfError(outcome);
} }
} }
} }
@ -852,6 +861,17 @@ Poco::Timestamp DiskS3::getLastModified(const String & path)
void DiskS3::createHardLink(const String & src_path, const String & dst_path) void DiskS3::createHardLink(const String & src_path, const String & dst_path)
{ {
/// We don't need to record hardlinks created to shadow folder.
if (send_metadata && !dst_path.starts_with("shadow/"))
{
auto revision = ++revision_counter;
const ObjectMetadata object_metadata {
{"src_path", src_path},
{"dst_path", dst_path}
};
createFileOperationObject("hardlink", revision, object_metadata);
}
/// Increment number of references. /// Increment number of references.
auto src = readMeta(src_path); auto src = readMeta(src_path);
++src.ref_count; ++src.ref_count;
@ -886,12 +906,368 @@ void DiskS3::shutdown()
client->DisableRequestProcessing(); client->DisableRequestProcessing();
} }
std::optional<DiskS3::ObjectMetadata> DiskS3::createObjectMetadata(const String & path) const void DiskS3::createFileOperationObject(const String & operation_name, UInt64 revision, const DiskS3::ObjectMetadata & metadata)
{ {
const String key = "operations/r" + revisionToString(revision) + "-" + operation_name;
WriteBufferFromS3 buffer(client, bucket, s3_root_path + key, min_upload_part_size, max_single_part_upload_size, metadata);
buffer.write('0');
buffer.finalize();
}
void DiskS3::startup()
{
if (!send_metadata)
return;
LOG_INFO(&Poco::Logger::get("DiskS3"), "Starting up disk {}", name);
/// Find last revision.
UInt64 l = 0, r = LATEST_REVISION;
while (l < r)
{
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Check revision in bounds {}-{}", l, r);
auto revision = l + (r - l + 1) / 2;
auto revision_str = revisionToString(revision);
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Check object with revision {}", revision);
/// Check file or operation with such revision exists.
if (checkObjectExists(s3_root_path + "r" + revision_str)
|| checkObjectExists(s3_root_path + "operations/r" + revision_str))
l = revision;
else
r = revision - 1;
}
revision_counter = l;
LOG_INFO(&Poco::Logger::get("DiskS3"), "Found last revision number {} for disk {}", revision_counter, name);
}
bool DiskS3::checkObjectExists(const String & prefix)
{
Aws::S3::Model::ListObjectsV2Request request;
request.SetBucket(bucket);
request.SetPrefix(prefix);
request.SetMaxKeys(1);
auto outcome = client->ListObjectsV2(request);
throwIfError(outcome);
return !outcome.GetResult().GetContents().empty();
}
Aws::S3::Model::HeadObjectResult DiskS3::headObject(const String & source_bucket, const String & key)
{
Aws::S3::Model::HeadObjectRequest request;
request.SetBucket(source_bucket);
request.SetKey(key);
auto outcome = client->HeadObject(request);
throwIfError(outcome);
return outcome.GetResultWithOwnership();
}
void DiskS3::listObjects(const String & source_bucket, const String & source_path, std::function<bool(const Aws::S3::Model::ListObjectsV2Result &)> callback)
{
Aws::S3::Model::ListObjectsV2Request request;
request.SetBucket(source_bucket);
request.SetPrefix(source_path);
request.SetMaxKeys(list_object_keys_size);
Aws::S3::Model::ListObjectsV2Outcome outcome;
do
{
outcome = client->ListObjectsV2(request);
throwIfError(outcome);
bool should_continue = callback(outcome.GetResult());
if (!should_continue)
break;
request.SetContinuationToken(outcome.GetResult().GetNextContinuationToken());
} while (outcome.GetResult().GetIsTruncated());
}
void DiskS3::copyObject(const String & src_bucket, const String & src_key, const String & dst_bucket, const String & dst_key)
{
Aws::S3::Model::CopyObjectRequest request;
request.SetCopySource(src_bucket + "/" + src_key);
request.SetBucket(dst_bucket);
request.SetKey(dst_key);
auto outcome = client->CopyObject(request);
throwIfError(outcome);
}
struct DiskS3::RestoreInformation
{
UInt64 revision = LATEST_REVISION;
String source_bucket;
String source_path;
};
void DiskS3::readRestoreInformation(DiskS3::RestoreInformation & restore_information)
{
ReadBufferFromFile buffer(metadata_path + restore_file_name, 512);
buffer.next();
/// Empty file - just restore all metadata.
if (!buffer.hasPendingData())
return;
try
{
readIntText(restore_information.revision, buffer);
assertChar('\n', buffer);
if (!buffer.hasPendingData())
return;
readText(restore_information.source_bucket, buffer);
assertChar('\n', buffer);
if (!buffer.hasPendingData())
return;
readText(restore_information.source_path, buffer);
assertChar('\n', buffer);
if (buffer.hasPendingData())
throw Exception("Extra information at the end of restore file", ErrorCodes::UNKNOWN_FORMAT);
}
catch (const Exception & e)
{
throw Exception("Failed to read restore information", e, ErrorCodes::UNKNOWN_FORMAT);
}
}
void DiskS3::restore()
{
if (!exists(restore_file_name))
return;
try
{
RestoreInformation information;
information.source_bucket = bucket;
information.source_path = s3_root_path;
readRestoreInformation(information);
if (information.revision == 0)
information.revision = LATEST_REVISION;
if (!information.source_path.ends_with('/'))
information.source_path += '/';
if (information.source_bucket == bucket)
{
/// In this case we need to additionally cleanup S3 from objects with later revision.
/// Will be simply just restore to different path.
if (information.source_path == s3_root_path && information.revision != LATEST_REVISION)
throw Exception("Restoring to the same bucket and path is allowed if revision is latest (0)", ErrorCodes::BAD_ARGUMENTS);
/// This case complicates S3 cleanup in case of unsuccessful restore.
if (information.source_path != s3_root_path && s3_root_path.starts_with(information.source_path))
throw Exception("Restoring to the same bucket is allowed only if source path is not a sub-path of configured path in S3 disk", ErrorCodes::BAD_ARGUMENTS);
}
///TODO: Cleanup FS and bucket if previous restore was failed.
LOG_INFO(&Poco::Logger::get("DiskS3"), "Starting to restore disk {}. Revision: {}, Source bucket: {}, Source path: {}",
name, information.revision, information.source_bucket, information.source_path);
restoreFiles(information.source_bucket, information.source_path, information.revision);
restoreFileOperations(information.source_bucket, information.source_path, information.revision);
Poco::File restore_file(metadata_path + restore_file_name);
restore_file.remove();
LOG_INFO(&Poco::Logger::get("DiskS3"), "Restore disk {} finished", name);
}
catch (const Exception & e)
{
LOG_ERROR(&Poco::Logger::get("DiskS3"), "Failed to restore disk. Code: {}, e.displayText() = {}, Stack trace:\n\n{}", e.code(), e.displayText(), e.getStackTraceString());
throw;
}
}
void DiskS3::restoreFiles(const String & source_bucket, const String & source_path, UInt64 target_revision)
{
LOG_INFO(&Poco::Logger::get("DiskS3"), "Starting restore files for disk {}", name);
std::vector<std::future<void>> results;
listObjects(source_bucket, source_path, [this, &source_bucket, &source_path, &target_revision, &results](auto list_result)
{
std::vector<String> keys;
for (const auto & row : list_result.GetContents())
{
const String & key = row.GetKey();
/// Skip file operations objects. They will be processed separately.
if (key.find("/operations/") != String::npos)
continue;
const auto [revision, _] = extractRevisionAndOperationFromKey(key);
/// Filter early if it's possible to get revision from key.
if (revision > target_revision)
continue;
keys.push_back(key);
}
if (!keys.empty())
{
auto result = getExecutor().execute([this, &source_bucket, &source_path, keys]()
{
processRestoreFiles(source_bucket, source_path, keys);
});
results.push_back(std::move(result));
}
return true;
});
for (auto & result : results)
result.wait();
for (auto & result : results)
result.get();
LOG_INFO(&Poco::Logger::get("DiskS3"), "Files are restored for disk {}", name);
}
void DiskS3::processRestoreFiles(const String & source_bucket, const String & source_path, Strings keys)
{
for (const auto & key : keys)
{
auto head_result = headObject(source_bucket, key);
auto object_metadata = head_result.GetMetadata();
/// Restore file if object has 'path' in metadata.
auto path_entry = object_metadata.find("path");
if (path_entry == object_metadata.end())
throw Exception("Failed to restore key " + key + " because it doesn't have 'path' in metadata", ErrorCodes::S3_ERROR);
const auto & path = path_entry->second;
createDirectories(directoryPath(path));
auto metadata = createMeta(path);
auto relative_key = shrinkKey(source_path, key);
/// Copy object if we restore to different bucket / path.
if (bucket != source_bucket || s3_root_path != source_path)
copyObject(source_bucket, key, bucket, s3_root_path + relative_key);
metadata.addObject(relative_key, head_result.GetContentLength());
metadata.save();
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Restored file {}", path);
}
}
void DiskS3::restoreFileOperations(const String & source_bucket, const String & source_path, UInt64 target_revision)
{
LOG_INFO(&Poco::Logger::get("DiskS3"), "Starting restore file operations for disk {}", name);
/// Enable recording file operations if we restore to different bucket / path.
send_metadata = bucket != source_bucket || s3_root_path != source_path;
listObjects(source_bucket, source_path + "operations/", [this, &source_bucket, &target_revision](auto list_result)
{
const String rename = "rename";
const String hardlink = "hardlink";
for (const auto & row : list_result.GetContents())
{
const String & key = row.GetKey();
const auto [revision, operation] = extractRevisionAndOperationFromKey(key);
if (revision == UNKNOWN_REVISION)
{
LOG_WARNING(&Poco::Logger::get("DiskS3"), "Skip key {} with unknown revision", key);
continue;
}
/// S3 ensures that keys will be listed in ascending UTF-8 bytes order (revision order).
/// We can stop processing if revision of the object is already more than required.
if (revision > target_revision)
return false;
/// Keep original revision if restore to different bucket / path.
if (send_metadata) if (send_metadata)
return (DiskS3::ObjectMetadata){{"path", path}}; revision_counter = revision - 1;
return {}; auto object_metadata = headObject(source_bucket, key).GetMetadata();
if (operation == rename)
{
auto from_path = object_metadata["from_path"];
auto to_path = object_metadata["to_path"];
if (exists(from_path))
{
moveFile(from_path, to_path);
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Revision {}. Restored rename {} -> {}", revision, from_path, to_path);
}
}
else if (operation == hardlink)
{
auto src_path = object_metadata["src_path"];
auto dst_path = object_metadata["dst_path"];
if (exists(src_path))
{
createDirectories(directoryPath(dst_path));
createHardLink(src_path, dst_path);
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Revision {}. Restored hardlink {} -> {}", revision, src_path, dst_path);
}
}
}
return true;
});
send_metadata = true;
LOG_INFO(&Poco::Logger::get("DiskS3"), "File operations restored for disk {}", name);
}
std::tuple<UInt64, String> DiskS3::extractRevisionAndOperationFromKey(const String & key)
{
UInt64 revision = UNKNOWN_REVISION;
String operation;
re2::RE2::FullMatch(key, key_regexp, &revision, &operation);
return {revision, operation};
}
String DiskS3::shrinkKey(const String & path, const String & key)
{
if (!key.starts_with(path))
throw Exception("The key " + key + " prefix mismatch with given " + path, ErrorCodes::LOGICAL_ERROR);
return key.substr(path.length());
}
String DiskS3::revisionToString(UInt64 revision)
{
static constexpr size_t max_digits = 19; /// UInt64 max digits in decimal representation.
/// Align revision number with leading zeroes to have strict lexicographical order of them.
auto revision_str = std::to_string(revision);
auto digits_to_align = max_digits - revision_str.length();
for (size_t i = 0; i < digits_to_align; ++i)
revision_str = "0" + revision_str;
return revision_str;
}
void DiskS3::onFreeze(const String & path)
{
createDirectories(path);
WriteBufferFromFile revision_file_buf(metadata_path + path + "revision.txt", 32);
writeIntText(revision_counter.load(), revision_file_buf);
revision_file_buf.finalize();
} }
} }

View File

@ -1,11 +1,16 @@
#pragma once #pragma once
#include <atomic>
#include "Disks/DiskFactory.h" #include "Disks/DiskFactory.h"
#include "Disks/Executor.h" #include "Disks/Executor.h"
#include "ProxyConfiguration.h" #include "ProxyConfiguration.h"
#include <aws/s3/S3Client.h> #include <aws/s3/S3Client.h>
#include <aws/s3/model/HeadObjectResult.h>
#include <aws/s3/model/ListObjectsV2Result.h>
#include <Poco/DirectoryIterator.h> #include <Poco/DirectoryIterator.h>
#include <re2/re2.h>
namespace DB namespace DB
@ -25,6 +30,7 @@ public:
class AwsS3KeyKeeper; class AwsS3KeyKeeper;
struct Metadata; struct Metadata;
struct RestoreInformation;
DiskS3( DiskS3(
String name_, String name_,
@ -36,7 +42,9 @@ public:
size_t min_upload_part_size_, size_t min_upload_part_size_,
size_t max_single_part_upload_size_, size_t max_single_part_upload_size_,
size_t min_bytes_for_seek_, size_t min_bytes_for_seek_,
bool send_metadata_); bool send_metadata_,
int thread_pool_size_,
int list_object_keys_size_);
const String & getName() const override { return name; } const String & getName() const override { return name; }
@ -74,8 +82,6 @@ public:
void replaceFile(const String & from_path, const String & to_path) override; void replaceFile(const String & from_path, const String & to_path) override;
void copyFile(const String & from_path, const String & to_path) override;
void listFiles(const String & path, std::vector<String> & file_names) override; void listFiles(const String & path, std::vector<String> & file_names) override;
std::unique_ptr<ReadBufferFromFileBase> readFile( std::unique_ptr<ReadBufferFromFileBase> readFile(
@ -105,22 +111,47 @@ public:
void setReadOnly(const String & path) override; void setReadOnly(const String & path) override;
const String getType() const override { return "s3"; } DiskType::Type getType() const override { return DiskType::Type::S3; }
void shutdown() override; void shutdown() override;
/// Actions performed after disk creation.
void startup();
/// Restore S3 metadata files on file system.
void restore();
/// Dumps current revision counter into file 'revision.txt' at given path.
void onFreeze(const String & path) override;
private: private:
bool tryReserve(UInt64 bytes); bool tryReserve(UInt64 bytes);
void removeMeta(const String & path, AwsS3KeyKeeper & keys); void removeMeta(const String & path, AwsS3KeyKeeper & keys);
void removeMetaRecursive(const String & path, AwsS3KeyKeeper & keys); void removeMetaRecursive(const String & path, AwsS3KeyKeeper & keys);
void removeAws(const AwsS3KeyKeeper & keys); void removeAws(const AwsS3KeyKeeper & keys);
std::optional<ObjectMetadata> createObjectMetadata(const String & path) const;
Metadata readMeta(const String & path) const; Metadata readMeta(const String & path) const;
Metadata createMeta(const String & path) const; Metadata createMeta(const String & path) const;
private: void createFileOperationObject(const String & operation_name, UInt64 revision, const ObjectMetadata & metadata);
static String revisionToString(UInt64 revision);
bool checkObjectExists(const String & prefix);
Aws::S3::Model::HeadObjectResult headObject(const String & source_bucket, const String & key);
void listObjects(const String & source_bucket, const String & source_path, std::function<bool(const Aws::S3::Model::ListObjectsV2Result &)> callback);
void copyObject(const String & src_bucket, const String & src_key, const String & dst_bucket, const String & dst_key);
void readRestoreInformation(RestoreInformation & restore_information);
void restoreFiles(const String & source_bucket, const String & source_path, UInt64 target_revision);
void processRestoreFiles(const String & source_bucket, const String & source_path, std::vector<String> keys);
void restoreFileOperations(const String & source_bucket, const String & source_path, UInt64 target_revision);
/// Remove 'path' prefix from 'key' to get relative key.
/// It's needed to store keys to metadata files in RELATIVE_PATHS version.
static String shrinkKey(const String & path, const String & key);
std::tuple<UInt64, String> extractRevisionAndOperationFromKey(const String & key);
const String name; const String name;
std::shared_ptr<Aws::S3::S3Client> client; std::shared_ptr<Aws::S3::S3Client> client;
std::shared_ptr<S3::ProxyConfiguration> proxy_configuration; std::shared_ptr<S3::ProxyConfiguration> proxy_configuration;
@ -135,6 +166,18 @@ private:
UInt64 reserved_bytes = 0; UInt64 reserved_bytes = 0;
UInt64 reservation_count = 0; UInt64 reservation_count = 0;
std::mutex reservation_mutex; std::mutex reservation_mutex;
std::atomic<UInt64> revision_counter;
static constexpr UInt64 LATEST_REVISION = (static_cast<UInt64>(1)) << 63;
static constexpr UInt64 UNKNOWN_REVISION = 0;
/// File at path {metadata_path}/restore contains metadata restore information
const String restore_file_name = "restore";
/// The number of keys listed in one request (1000 is max value)
int list_object_keys_size;
/// Key has format: ../../r{revision}-{operation}
const re2::RE2 key_regexp {".*/r(\\d+)-(\\w+).*"};
}; };
} }

View File

@ -152,7 +152,9 @@ void registerDiskS3(DiskFactory & factory)
context.getSettingsRef().s3_min_upload_part_size, context.getSettingsRef().s3_min_upload_part_size,
context.getSettingsRef().s3_max_single_part_upload_size, context.getSettingsRef().s3_max_single_part_upload_size,
config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024), config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024),
config.getBool(config_prefix + ".send_object_metadata", false)); config.getBool(config_prefix + ".send_metadata", false),
config.getInt(config_prefix + ".thread_pool_size", 16),
config.getInt(config_prefix + ".list_object_keys_size", 1000));
/// This code is used only to check access to the corresponding disk. /// This code is used only to check access to the corresponding disk.
if (!config.getBool(config_prefix + ".skip_access_check", false)) if (!config.getBool(config_prefix + ".skip_access_check", false))
@ -162,6 +164,9 @@ void registerDiskS3(DiskFactory & factory)
checkRemoveAccess(*s3disk); checkRemoveAccess(*s3disk);
} }
s3disk->restore();
s3disk->startup();
bool cache_enabled = config.getBool(config_prefix + ".cache_enabled", true); bool cache_enabled = config.getBool(config_prefix + ".cache_enabled", true);
if (cache_enabled) if (cache_enabled)

View File

@ -117,3 +117,6 @@ target_link_libraries(clickhouse_functions PRIVATE clickhouse_functions_array)
if (USE_STATS) if (USE_STATS)
target_link_libraries(clickhouse_functions PRIVATE stats) target_link_libraries(clickhouse_functions PRIVATE stats)
endif() endif()
# Signed integer overflow on user-provided data inside boost::geometry - ignore.
set_source_files_properties("pointInPolygon.cpp" PROPERTIES COMPILE_FLAGS -fno-sanitize=signed-integer-overflow)

View File

@ -704,7 +704,11 @@ struct DateTimeTransformImpl
{ {
using Op = Transformer<typename FromDataType::FieldType, typename ToDataType::FieldType, Transform>; using Op = Transformer<typename FromDataType::FieldType, typename ToDataType::FieldType, Transform>;
const DateLUTImpl & time_zone = extractTimeZoneFromFunctionArguments(arguments, 1, 0); size_t time_zone_argument_position = 1;
if constexpr (std::is_same_v<ToDataType, DataTypeDateTime64>)
time_zone_argument_position = 2;
const DateLUTImpl & time_zone = extractTimeZoneFromFunctionArguments(arguments, time_zone_argument_position, 0);
const ColumnPtr source_col = arguments[0].column; const ColumnPtr source_col = arguments[0].column;
if (const auto * sources = checkAndGetColumn<typename FromDataType::ColumnType>(source_col.get())) if (const auto * sources = checkAndGetColumn<typename FromDataType::ColumnType>(source_col.get()))

View File

@ -477,6 +477,61 @@ template <typename Name> struct ConvertImpl<DataTypeDate, DataTypeDateTime64, Na
template <typename Name> struct ConvertImpl<DataTypeDateTime, DataTypeDateTime64, Name, ConvertDefaultBehaviorTag> template <typename Name> struct ConvertImpl<DataTypeDateTime, DataTypeDateTime64, Name, ConvertDefaultBehaviorTag>
: DateTimeTransformImpl<DataTypeDateTime, DataTypeDateTime64, ToDateTime64Transform> {}; : DateTimeTransformImpl<DataTypeDateTime, DataTypeDateTime64, ToDateTime64Transform> {};
/** Conversion of numeric to DateTime64
*/
template <typename FromType>
struct ToDateTime64TransformUnsigned
{
static constexpr auto name = "toDateTime64";
const DateTime64::NativeType scale_multiplier = 1;
ToDateTime64TransformUnsigned(UInt32 scale = 0)
: scale_multiplier(DecimalUtils::scaleMultiplier<DateTime64::NativeType>(scale))
{}
inline NO_SANITIZE_UNDEFINED DateTime64::NativeType execute(FromType from, const DateLUTImpl &) const
{
from = std::min(time_t(from), time_t(0xFFFFFFFF));
return DecimalUtils::decimalFromComponentsWithMultiplier<DateTime64>(from, 0, scale_multiplier);
}
};
template <typename FromType>
struct ToDateTime64TransformSigned
{
static constexpr auto name = "toDateTime64";
const DateTime64::NativeType scale_multiplier = 1;
ToDateTime64TransformSigned(UInt32 scale = 0)
: scale_multiplier(DecimalUtils::scaleMultiplier<DateTime64::NativeType>(scale))
{}
inline NO_SANITIZE_UNDEFINED DateTime64::NativeType execute(FromType from, const DateLUTImpl &) const
{
if (from < 0)
return 0;
from = std::min(time_t(from), time_t(0xFFFFFFFF));
return DecimalUtils::decimalFromComponentsWithMultiplier<DateTime64>(from, 0, scale_multiplier);
}
};
template <typename Name> struct ConvertImpl<DataTypeInt8, DataTypeDateTime64, Name>
: DateTimeTransformImpl<DataTypeInt8, DataTypeDateTime64, ToDateTime64TransformSigned<Int8>> {};
template <typename Name> struct ConvertImpl<DataTypeInt16, DataTypeDateTime64, Name>
: DateTimeTransformImpl<DataTypeInt16, DataTypeDateTime64, ToDateTime64TransformSigned<Int16>> {};
template <typename Name> struct ConvertImpl<DataTypeInt32, DataTypeDateTime64, Name>
: DateTimeTransformImpl<DataTypeInt32, DataTypeDateTime64, ToDateTime64TransformSigned<Int32>> {};
template <typename Name> struct ConvertImpl<DataTypeInt64, DataTypeDateTime64, Name>
: DateTimeTransformImpl<DataTypeInt64, DataTypeDateTime64, ToDateTime64TransformSigned<Int64>> {};
template <typename Name> struct ConvertImpl<DataTypeUInt64, DataTypeDateTime64, Name>
: DateTimeTransformImpl<DataTypeUInt64, DataTypeDateTime64, ToDateTime64TransformUnsigned<UInt64>> {};
template <typename Name> struct ConvertImpl<DataTypeFloat32, DataTypeDateTime64, Name>
: DateTimeTransformImpl<DataTypeFloat32, DataTypeDateTime64, ToDateTime64TransformSigned<Float32>> {};
template <typename Name> struct ConvertImpl<DataTypeFloat64, DataTypeDateTime64, Name>
: DateTimeTransformImpl<DataTypeFloat64, DataTypeDateTime64, ToDateTime64TransformSigned<Float64>> {};
/** Conversion of DateTime64 to Date or DateTime: discards fractional part. /** Conversion of DateTime64 to Date or DateTime: discards fractional part.
*/ */
template <typename Transform> template <typename Transform>
@ -1294,7 +1349,12 @@ public:
bool useDefaultImplementationForNulls() const override { return checked_return_type; } bool useDefaultImplementationForNulls() const override { return checked_return_type; }
bool useDefaultImplementationForConstants() const override { return true; } bool useDefaultImplementationForConstants() const override { return true; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const override
{
if constexpr (std::is_same_v<ToDataType, DataTypeDateTime64>)
return {2};
return {1};
}
bool canBeExecutedOnDefaultArguments() const override { return false; } bool canBeExecutedOnDefaultArguments() const override { return false; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
@ -2313,7 +2373,7 @@ private:
using LeftDataType = typename Types::LeftType; using LeftDataType = typename Types::LeftType;
using RightDataType = typename Types::RightType; using RightDataType = typename Types::RightType;
if constexpr (IsDataTypeDecimalOrNumber<LeftDataType> && IsDataTypeDecimalOrNumber<RightDataType>) if constexpr (IsDataTypeDecimalOrNumber<LeftDataType> && IsDataTypeDecimalOrNumber<RightDataType> && !std::is_same_v<DataTypeDateTime64, RightDataType>)
{ {
if (wrapper_cast_type == CastType::accurate) if (wrapper_cast_type == CastType::accurate)
{ {

View File

@ -45,6 +45,41 @@ struct ArrayCumSumImpl
} }
template <typename Src, typename Dst>
static void NO_SANITIZE_UNDEFINED implConst(
size_t size, const IColumn::Offset * __restrict offsets, Dst * __restrict res_values, Src src_value)
{
size_t pos = 0;
for (const auto * end = offsets + size; offsets < end; ++offsets)
{
auto offset = *offsets;
Dst accumulated{};
for (; pos < offset; ++pos)
{
accumulated += src_value;
res_values[pos] = accumulated;
}
}
}
template <typename Src, typename Dst>
static void NO_SANITIZE_UNDEFINED implVector(
size_t size, const IColumn::Offset * __restrict offsets, Dst * __restrict res_values, const Src * __restrict src_values)
{
size_t pos = 0;
for (const auto * end = offsets + size; offsets < end; ++offsets)
{
auto offset = *offsets;
Dst accumulated{};
for (; pos < offset; ++pos)
{
accumulated += src_values[pos];
res_values[pos] = accumulated;
}
}
}
template <typename Element, typename Result> template <typename Element, typename Result>
static bool executeType(const ColumnPtr & mapped, const ColumnArray & array, ColumnPtr & res_ptr) static bool executeType(const ColumnPtr & mapped, const ColumnArray & array, ColumnPtr & res_ptr)
{ {
@ -75,19 +110,7 @@ struct ArrayCumSumImpl
typename ColVecResult::Container & res_values = res_nested->getData(); typename ColVecResult::Container & res_values = res_nested->getData();
res_values.resize(column_const->size()); res_values.resize(column_const->size());
implConst(offsets.size(), offsets.data(), res_values.data(), x);
size_t pos = 0;
for (auto offset : offsets)
{
// skip empty arrays
if (pos < offset)
{
res_values[pos++] = x; // NOLINT
for (; pos < offset; ++pos)
res_values[pos] = res_values[pos - 1] + x;
}
}
res_ptr = ColumnArray::create(std::move(res_nested), array.getOffsetsPtr()); res_ptr = ColumnArray::create(std::move(res_nested), array.getOffsetsPtr());
return true; return true;
} }
@ -103,18 +126,7 @@ struct ArrayCumSumImpl
typename ColVecResult::Container & res_values = res_nested->getData(); typename ColVecResult::Container & res_values = res_nested->getData();
res_values.resize(data.size()); res_values.resize(data.size());
implVector(offsets.size(), offsets.data(), res_values.data(), data.data());
size_t pos = 0;
for (auto offset : offsets)
{
// skip empty arrays
if (pos < offset)
{
res_values[pos] = data[pos]; // NOLINT
for (++pos; pos < offset; ++pos)
res_values[pos] = res_values[pos - 1] + data[pos];
}
}
res_ptr = ColumnArray::create(std::move(res_nested), array.getOffsetsPtr()); res_ptr = ColumnArray::create(std::move(res_nested), array.getOffsetsPtr());
return true; return true;

View File

@ -48,6 +48,26 @@ struct ArrayCumSumNonNegativeImpl
} }
template <typename Src, typename Dst>
static void NO_SANITIZE_UNDEFINED implVector(
size_t size, const IColumn::Offset * __restrict offsets, Dst * __restrict res_values, const Src * __restrict src_values)
{
size_t pos = 0;
for (const auto * end = offsets + size; offsets < end; ++offsets)
{
auto offset = *offsets;
Dst accumulated{};
for (; pos < offset; ++pos)
{
accumulated += src_values[pos];
if (accumulated < 0)
accumulated = 0;
res_values[pos] = accumulated;
}
}
}
template <typename Element, typename Result> template <typename Element, typename Result>
static bool executeType(const ColumnPtr & mapped, const ColumnArray & array, ColumnPtr & res_ptr) static bool executeType(const ColumnPtr & mapped, const ColumnArray & array, ColumnPtr & res_ptr)
{ {
@ -70,26 +90,7 @@ struct ArrayCumSumNonNegativeImpl
typename ColVecResult::Container & res_values = res_nested->getData(); typename ColVecResult::Container & res_values = res_nested->getData();
res_values.resize(data.size()); res_values.resize(data.size());
implVector(offsets.size(), offsets.data(), res_values.data(), data.data());
size_t pos = 0;
Result accum_sum = 0;
for (auto offset : offsets)
{
// skip empty arrays
if (pos < offset)
{
accum_sum = data[pos] > 0 ? data[pos] : Element(0); // NOLINT
res_values[pos] = accum_sum;
for (++pos; pos < offset; ++pos)
{
accum_sum = accum_sum + data[pos];
if (accum_sum < 0)
accum_sum = 0;
res_values[pos] = accum_sum;
}
}
}
res_ptr = ColumnArray::create(std::move(res_nested), array.getOffsetsPtr()); res_ptr = ColumnArray::create(std::move(res_nested), array.getOffsetsPtr());
return true; return true;

View File

@ -16,6 +16,7 @@ namespace ErrorCodes
extern const int ILLEGAL_COLUMN; extern const int ILLEGAL_COLUMN;
extern const int ILLEGAL_TYPE_OF_ARGUMENT; extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int TOO_LARGE_ARRAY_SIZE;
} }
class FunctionMapPopulateSeries : public IFunction class FunctionMapPopulateSeries : public IFunction
@ -188,9 +189,13 @@ private:
} }
} }
static constexpr size_t MAX_ARRAY_SIZE = 1ULL << 30;
if (static_cast<size_t>(max_key - min_key) > MAX_ARRAY_SIZE)
throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Too large array size in the result of function {}", getName());
/* fill the result arrays */ /* fill the result arrays */
KeyType key; KeyType key;
for (key = min_key; key <= max_key; ++key) for (key = min_key;; ++key)
{ {
to_keys_data.insert(key); to_keys_data.insert(key);
@ -205,6 +210,8 @@ private:
} }
++offset; ++offset;
if (key == max_key)
break;
} }
to_keys_offsets.push_back(offset); to_keys_offsets.push_back(offset);

View File

@ -532,7 +532,7 @@ private:
return nullptr; return nullptr;
} }
ColumnPtr executeTuple(const ColumnsWithTypeAndName & arguments, size_t input_rows_count) const ColumnPtr executeTuple(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const
{ {
/// Calculate function for each corresponding elements of tuples. /// Calculate function for each corresponding elements of tuples.
@ -558,6 +558,7 @@ private:
const DataTypeTuple & type1 = static_cast<const DataTypeTuple &>(*arg1.type); const DataTypeTuple & type1 = static_cast<const DataTypeTuple &>(*arg1.type);
const DataTypeTuple & type2 = static_cast<const DataTypeTuple &>(*arg2.type); const DataTypeTuple & type2 = static_cast<const DataTypeTuple &>(*arg2.type);
const DataTypeTuple & tuple_result = static_cast<const DataTypeTuple &>(*result_type);
ColumnsWithTypeAndName temporary_columns(3); ColumnsWithTypeAndName temporary_columns(3);
temporary_columns[0] = arguments[0]; temporary_columns[0] = arguments[0];
@ -570,7 +571,7 @@ private:
temporary_columns[1] = {col1_contents[i], type1.getElements()[i], {}}; temporary_columns[1] = {col1_contents[i], type1.getElements()[i], {}};
temporary_columns[2] = {col2_contents[i], type2.getElements()[i], {}}; temporary_columns[2] = {col2_contents[i], type2.getElements()[i], {}};
tuple_columns[i] = executeImpl(temporary_columns, std::make_shared<DataTypeUInt8>(), input_rows_count); tuple_columns[i] = executeImpl(temporary_columns, tuple_result.getElements()[i], input_rows_count);
} }
return ColumnTuple::create(tuple_columns); return ColumnTuple::create(tuple_columns);
@ -988,7 +989,7 @@ public:
|| (res = executeTyped<UInt128, UInt128>(cond_col, arguments, result_type, input_rows_count)) || (res = executeTyped<UInt128, UInt128>(cond_col, arguments, result_type, input_rows_count))
|| (res = executeString(cond_col, arguments, result_type)) || (res = executeString(cond_col, arguments, result_type))
|| (res = executeGenericArray(cond_col, arguments, result_type)) || (res = executeGenericArray(cond_col, arguments, result_type))
|| (res = executeTuple(arguments, input_rows_count)))) || (res = executeTuple(arguments, result_type, input_rows_count))))
{ {
return executeGeneric(cond_col, arguments, input_rows_count); return executeGeneric(cond_col, arguments, input_rows_count);
} }

View File

@ -258,7 +258,7 @@ TEST(NumberTraits, Others)
ASSERT_EQ(getTypeString(DB::NumberTraits::ResultOfFloatingPointDivision<DB::UInt16, DB::Int16>::Type()), "Float64"); ASSERT_EQ(getTypeString(DB::NumberTraits::ResultOfFloatingPointDivision<DB::UInt16, DB::Int16>::Type()), "Float64");
ASSERT_EQ(getTypeString(DB::NumberTraits::ResultOfFloatingPointDivision<DB::UInt32, DB::Int16>::Type()), "Float64"); ASSERT_EQ(getTypeString(DB::NumberTraits::ResultOfFloatingPointDivision<DB::UInt32, DB::Int16>::Type()), "Float64");
ASSERT_EQ(getTypeString(DB::NumberTraits::ResultOfIntegerDivision<DB::UInt8, DB::Int16>::Type()), "Int8"); ASSERT_EQ(getTypeString(DB::NumberTraits::ResultOfIntegerDivision<DB::UInt8, DB::Int16>::Type()), "Int8");
ASSERT_EQ(getTypeString(DB::NumberTraits::ResultOfModulo<DB::UInt32, DB::Int8>::Type()), "Int8"); ASSERT_EQ(getTypeString(DB::NumberTraits::ResultOfModulo<DB::UInt32, DB::Int8>::Type()), "UInt8");
} }

View File

@ -77,7 +77,7 @@ bool BrotliReadBuffer::nextImpl()
if (in->eof()) if (in->eof())
{ {
eof = true; eof = true;
return working_buffer.size() != 0; return !working_buffer.empty();
} }
else else
{ {

View File

@ -40,6 +40,7 @@ public:
inline Position end() const { return end_pos; } inline Position end() const { return end_pos; }
inline size_t size() const { return size_t(end_pos - begin_pos); } inline size_t size() const { return size_t(end_pos - begin_pos); }
inline void resize(size_t size) { end_pos = begin_pos + size; } inline void resize(size_t size) { end_pos = begin_pos + size; }
inline bool empty() const { return size() == 0; }
inline void swap(Buffer & other) inline void swap(Buffer & other)
{ {

View File

@ -25,11 +25,16 @@ protected:
return false; return false;
/// First reading /// First reading
if (working_buffer.size() == 0 && (*current)->hasPendingData()) if (working_buffer.empty())
{
if ((*current)->hasPendingData())
{ {
working_buffer = Buffer((*current)->position(), (*current)->buffer().end()); working_buffer = Buffer((*current)->position(), (*current)->buffer().end());
return true; return true;
} }
}
else
(*current)->position() = position();
if (!(*current)->next()) if (!(*current)->next())
{ {
@ -51,14 +56,12 @@ protected:
} }
public: public:
ConcatReadBuffer(const ReadBuffers & buffers_) : ReadBuffer(nullptr, 0), buffers(buffers_), current(buffers.begin()) {} explicit ConcatReadBuffer(const ReadBuffers & buffers_) : ReadBuffer(nullptr, 0), buffers(buffers_), current(buffers.begin())
ConcatReadBuffer(ReadBuffer & buf1, ReadBuffer & buf2) : ReadBuffer(nullptr, 0)
{ {
buffers.push_back(&buf1); assert(!buffers.empty());
buffers.push_back(&buf2);
current = buffers.begin();
} }
ConcatReadBuffer(ReadBuffer & buf1, ReadBuffer & buf2) : ConcatReadBuffer({&buf1, &buf2}) {}
}; };
} }

View File

@ -1,10 +1,11 @@
#pragma once #pragma once
#include <IO/ReadBuffer.h>
#include <IO/HashingWriteBuffer.h> #include <IO/HashingWriteBuffer.h>
#include <IO/ReadBuffer.h>
namespace DB namespace DB
{ {
/* /*
* Calculates the hash from the read data. When reading, the data is read from the nested ReadBuffer. * Calculates the hash from the read data. When reading, the data is read from the nested ReadBuffer.
* Small pieces are copied into its own memory. * Small pieces are copied into its own memory.
@ -12,14 +13,14 @@ namespace DB
class HashingReadBuffer : public IHashingBuffer<ReadBuffer> class HashingReadBuffer : public IHashingBuffer<ReadBuffer>
{ {
public: public:
HashingReadBuffer(ReadBuffer & in_, size_t block_size_ = DBMS_DEFAULT_HASHING_BLOCK_SIZE) : explicit HashingReadBuffer(ReadBuffer & in_, size_t block_size_ = DBMS_DEFAULT_HASHING_BLOCK_SIZE)
IHashingBuffer<ReadBuffer>(block_size_), in(in_) : IHashingBuffer<ReadBuffer>(block_size_), in(in_)
{ {
working_buffer = in.buffer(); working_buffer = in.buffer();
pos = in.position(); pos = in.position();
/// calculate hash from the data already read /// calculate hash from the data already read
if (working_buffer.size()) if (!working_buffer.empty())
{ {
calculateHash(pos, working_buffer.end() - pos); calculateHash(pos, working_buffer.end() - pos);
} }
@ -39,7 +40,7 @@ private:
return res; return res;
} }
private:
ReadBuffer & in; ReadBuffer & in;
}; };
} }

View File

@ -66,7 +66,7 @@ bool LZMAInflatingReadBuffer::nextImpl()
if (in->eof()) if (in->eof())
{ {
eof = true; eof = true;
return working_buffer.size() != 0; return !working_buffer.empty();
} }
else else
{ {

View File

@ -1,4 +1,5 @@
#include <IO/LimitReadBuffer.h> #include <IO/LimitReadBuffer.h>
#include <Common/Exception.h> #include <Common/Exception.h>
@ -13,6 +14,8 @@ namespace ErrorCodes
bool LimitReadBuffer::nextImpl() bool LimitReadBuffer::nextImpl()
{ {
assert(position() >= in.position());
/// Let underlying buffer calculate read bytes in `next()` call. /// Let underlying buffer calculate read bytes in `next()` call.
in.position() = position(); in.position() = position();
@ -25,7 +28,10 @@ bool LimitReadBuffer::nextImpl()
} }
if (!in.next()) if (!in.next())
{
working_buffer = in.buffer();
return false; return false;
}
working_buffer = in.buffer(); working_buffer = in.buffer();
@ -50,7 +56,7 @@ LimitReadBuffer::LimitReadBuffer(ReadBuffer & in_, UInt64 limit_, bool throw_exc
LimitReadBuffer::~LimitReadBuffer() LimitReadBuffer::~LimitReadBuffer()
{ {
/// Update underlying buffer's position in case when limit wasn't reached. /// Update underlying buffer's position in case when limit wasn't reached.
if (working_buffer.size() != 0) if (!working_buffer.empty())
in.position() = position(); in.position() = position();
} }

Some files were not shown because too many files have changed in this diff Show More