Merge branch 'master' into fix_watch_race_testkeeper

This commit is contained in:
alesapin 2020-12-15 13:41:04 +03:00
commit 9d17f01dc9
65 changed files with 1723 additions and 1545 deletions

View File

@ -1,3 +1,126 @@
### ClickHouse release 20.12
### ClickHouse release v20.12.3.3-stable, 2020-12-13
#### Backward Incompatible Change
* Enable `use_compact_format_in_distributed_parts_names` by default (see the documentation for the reference). [#16728](https://github.com/ClickHouse/ClickHouse/pull/16728) ([Azat Khuzhin](https://github.com/azat)).
* Accept user settings related to file formats (e.g. `format_csv_delimiter`) in the `SETTINGS` clause when creating a table that uses `File` engine, and use these settings in all `INSERT`s and `SELECT`s. The file format settings changed in the current user session, or in the `SETTINGS` clause of a DML query itself, no longer affect the query. [#16591](https://github.com/ClickHouse/ClickHouse/pull/16591) ([Alexander Kuzmenkov](https://github.com/akuzm)).
#### New Feature
* add `*.xz` compression/decompression support.It enables using `*.xz` in `file()` function. This closes [#8828](https://github.com/ClickHouse/ClickHouse/issues/8828). [#16578](https://github.com/ClickHouse/ClickHouse/pull/16578) ([Abi Palagashvili](https://github.com/fibersel)).
* Introduce the query `ALTER TABLE ... DROP|DETACH PART 'part_name'`. [#15511](https://github.com/ClickHouse/ClickHouse/pull/15511) ([nvartolomei](https://github.com/nvartolomei)).
* Added new ALTER UPDATE/DELETE IN PARTITION syntax. [#13403](https://github.com/ClickHouse/ClickHouse/pull/13403) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Allow formatting named tuples as JSON objects when using JSON input/output formats, controlled by the `output_format_json_named_tuples_as_objects` setting, disabled by default. [#17175](https://github.com/ClickHouse/ClickHouse/pull/17175) ([Alexander Kuzmenkov](https://github.com/akuzm)).
* Add a possibility to input enum value as it's id in TSV and CSV formats by default. [#16834](https://github.com/ClickHouse/ClickHouse/pull/16834) ([Kruglov Pavel](https://github.com/Avogar)).
* Add COLLATE support for Nullable, LowCardinality, Array and Tuple, where nested type is String. Also refactor the code associated with collations in ColumnString.cpp. [#16273](https://github.com/ClickHouse/ClickHouse/pull/16273) ([Kruglov Pavel](https://github.com/Avogar)).
* New `tcpPort` function returns TCP port listened by this server. [#17134](https://github.com/ClickHouse/ClickHouse/pull/17134) ([Ivan](https://github.com/abyss7)).
* Add new math functions: `acosh`, `asinh`, `atan2`, `atanh`, `cosh`, `hypot`, `log1p`, `sinh`. [#16636](https://github.com/ClickHouse/ClickHouse/pull/16636) ([Konstantin Malanchev](https://github.com/hombit)).
* Possibility to distribute the merges between different replicas. Introduces the `execute_merges_on_single_replica_time_threshold` mergetree setting. [#16424](https://github.com/ClickHouse/ClickHouse/pull/16424) ([filimonov](https://github.com/filimonov)).
* Add setting `aggregate_functions_null_for_empty` for SQL standard compatibility. This option will rewrite all aggregate functions in a query, adding -OrNull suffix to them. Implements [10273](https://github.com/ClickHouse/ClickHouse/issues/10273). [#16123](https://github.com/ClickHouse/ClickHouse/pull/16123) ([flynn](https://github.com/ucasFL)).
* Updated DateTime, DateTime64 parsing to accept string Date literal format. [#16040](https://github.com/ClickHouse/ClickHouse/pull/16040) ([Maksim Kita](https://github.com/kitaisreal)).
* Make it possible to change the path to history file in `clickhouse-client` using the `--history_file` parameter. [#15960](https://github.com/ClickHouse/ClickHouse/pull/15960) ([Maksim Kita](https://github.com/kitaisreal)).
#### Bug Fix
* Fix the issue when server can stop accepting connections in very rare cases. [#17542](https://github.com/ClickHouse/ClickHouse/pull/17542) ([Amos Bird](https://github.com/amosbird)).
* Fixed `Function not implemented` error when executing `RENAME` query in `Atomic` database with ClickHouse running on Windows Subsystem for Linux. Fixes [#17661](https://github.com/ClickHouse/ClickHouse/issues/17661). [#17664](https://github.com/ClickHouse/ClickHouse/pull/17664) ([tavplubix](https://github.com/tavplubix)).
* Do not restore parts from WAL if `in_memory_parts_enable_wal` is disabled. [#17802](https://github.com/ClickHouse/ClickHouse/pull/17802) ([detailyang](https://github.com/detailyang)).
* fix incorrect initialization of `max_compress_block_size` of MergeTreeWriterSettings with `min_compress_block_size`. [#17833](https://github.com/ClickHouse/ClickHouse/pull/17833) ([flynn](https://github.com/ucasFL)).
* Exception message about max table size to drop was displayed incorrectly. [#17764](https://github.com/ClickHouse/ClickHouse/pull/17764) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fixed possible segfault when there is not enough space when inserting into `Distributed` table. [#17737](https://github.com/ClickHouse/ClickHouse/pull/17737) ([tavplubix](https://github.com/tavplubix)).
* Fixed problem when ClickHouse fails to resume connection to MySQL servers. [#17681](https://github.com/ClickHouse/ClickHouse/pull/17681) ([Alexander Kazakov](https://github.com/Akazz)).
* In might be determined incorrectly if cluster is circular- (cross-) replicated or not when executing `ON CLUSTER` query due to race condition when `pool_size` > 1. It's fixed. [#17640](https://github.com/ClickHouse/ClickHouse/pull/17640) ([tavplubix](https://github.com/tavplubix)).
* Exception `fmt::v7::format_error` can be logged in background for MergeTree tables. This fixes [#17613](https://github.com/ClickHouse/ClickHouse/issues/17613). [#17615](https://github.com/ClickHouse/ClickHouse/pull/17615) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* When clickhouse-client is used in interactive mode with multiline queries, single line comment was erronously extended till the end of query. This fixes [#13654](https://github.com/ClickHouse/ClickHouse/issues/13654). [#17565](https://github.com/ClickHouse/ClickHouse/pull/17565) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix alter query hang when the corresponding mutation was killed on the different replica. Fixes [#16953](https://github.com/ClickHouse/ClickHouse/issues/16953). [#17499](https://github.com/ClickHouse/ClickHouse/pull/17499) ([alesapin](https://github.com/alesapin)).
* Fix issue when mark cache size was underestimated by clickhouse. It may happen when there are a lot of tiny files with marks. [#17496](https://github.com/ClickHouse/ClickHouse/pull/17496) ([alesapin](https://github.com/alesapin)).
* Fix `ORDER BY` with enabled setting `optimize_redundant_functions_in_order_by`. [#17471](https://github.com/ClickHouse/ClickHouse/pull/17471) ([Anton Popov](https://github.com/CurtizJ)).
* Fix duplicates after `DISTINCT` which were possible because of incorrect optimization. Fixes [#17294](https://github.com/ClickHouse/ClickHouse/issues/17294). [#17296](https://github.com/ClickHouse/ClickHouse/pull/17296) ([li chengxiang](https://github.com/chengxianglibra)). [#17439](https://github.com/ClickHouse/ClickHouse/pull/17439) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix crash while reading from `JOIN` table with `LowCardinality` types. Fixes [#17228](https://github.com/ClickHouse/ClickHouse/issues/17228). [#17397](https://github.com/ClickHouse/ClickHouse/pull/17397) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* fix `toInt256(inf)` stack overflow. Int256 is an experimental feature. Closed [#17235](https://github.com/ClickHouse/ClickHouse/issues/17235). [#17257](https://github.com/ClickHouse/ClickHouse/pull/17257) ([flynn](https://github.com/ucasFL)).
* Fix possible `Unexpected packet Data received from client` error logged for Distributed queries with `LIMIT`. [#17254](https://github.com/ClickHouse/ClickHouse/pull/17254) ([Azat Khuzhin](https://github.com/azat)).
* Fix set index invalidation when there are const columns in the subquery. This fixes [#17246](https://github.com/ClickHouse/ClickHouse/issues/17246). [#17249](https://github.com/ClickHouse/ClickHouse/pull/17249) ([Amos Bird](https://github.com/amosbird)).
* Fix possible wrong index analysis when the types of the index comparison are different. This fixes [#17122](https://github.com/ClickHouse/ClickHouse/issues/17122). [#17145](https://github.com/ClickHouse/ClickHouse/pull/17145) ([Amos Bird](https://github.com/amosbird)).
* Fix ColumnConst comparison which leads to crash. This fixed [#17088](https://github.com/ClickHouse/ClickHouse/issues/17088) . [#17135](https://github.com/ClickHouse/ClickHouse/pull/17135) ([Amos Bird](https://github.com/amosbird)).
* Multiple fixed for MaterializeMySQL (experimental feature). Fixes [#16923](https://github.com/ClickHouse/ClickHouse/issues/16923) Fixes [#15883](https://github.com/ClickHouse/ClickHouse/issues/15883) Fix MaterializeMySQL SYNC failure when the modify MySQL binlog_checksum. [#17091](https://github.com/ClickHouse/ClickHouse/pull/17091) ([Winter Zhang](https://github.com/zhang2014)).
* Fix bug when `ON CLUSTER` queries may hang forever for non-leader ReplicatedMergeTreeTables. [#17089](https://github.com/ClickHouse/ClickHouse/pull/17089) ([alesapin](https://github.com/alesapin)).
* Fixed crash on `CREATE TABLE ... AS some_table` query when `some_table` was created `AS table_function()` Fixes [#16944](https://github.com/ClickHouse/ClickHouse/issues/16944). [#17072](https://github.com/ClickHouse/ClickHouse/pull/17072) ([tavplubix](https://github.com/tavplubix)).
* Bug unfinished implementation for funciton fuzzBits, related issue: [#16980](https://github.com/ClickHouse/ClickHouse/issues/16980). [#17051](https://github.com/ClickHouse/ClickHouse/pull/17051) ([hexiaoting](https://github.com/hexiaoting)).
* Fix LLVM's libunwind in the case when CFA register is RAX. This is the [bug](https://bugs.llvm.org/show_bug.cgi?id=48186) in [LLVM's libunwind](https://github.com/llvm/llvm-project/tree/master/libunwind). We already have workarounds for this bug. [#17046](https://github.com/ClickHouse/ClickHouse/pull/17046) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Avoid unnecessary network errors for remote queries which may be cancelled while execution, like queries with `LIMIT`. [#17006](https://github.com/ClickHouse/ClickHouse/pull/17006) ([Azat Khuzhin](https://github.com/azat)).
* Fix `optimize_distributed_group_by_sharding_key` setting (that is disabled by default) for query with OFFSET only. [#16996](https://github.com/ClickHouse/ClickHouse/pull/16996) ([Azat Khuzhin](https://github.com/azat)).
* Fix for Merge tables over Distributed tables with JOIN. [#16993](https://github.com/ClickHouse/ClickHouse/pull/16993) ([Azat Khuzhin](https://github.com/azat)).
* Fixed wrong result in big integers (128, 256 bit) when casting from double. Big integers support is experimental. [#16986](https://github.com/ClickHouse/ClickHouse/pull/16986) ([Mike](https://github.com/myrrc)).
* Fix possible server crash after `ALTER TABLE ... MODIFY COLUMN ... NewType` when `SELECT` have `WHERE` expression on altering column and alter doesn't finished yet. [#16968](https://github.com/ClickHouse/ClickHouse/pull/16968) ([Amos Bird](https://github.com/amosbird)).
* Blame info was not calculated correctly in `clickhouse-git-import`. [#16959](https://github.com/ClickHouse/ClickHouse/pull/16959) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix order by optimization with monotonous functions. Fixes [#16107](https://github.com/ClickHouse/ClickHouse/issues/16107). [#16956](https://github.com/ClickHouse/ClickHouse/pull/16956) ([Anton Popov](https://github.com/CurtizJ)).
* Fix optimization of group by with enabled setting `optimize_aggregators_of_group_by_keys` and joins. Fixes [#12604](https://github.com/ClickHouse/ClickHouse/issues/12604). [#16951](https://github.com/ClickHouse/ClickHouse/pull/16951) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible error `Illegal type of argument` for queries with `ORDER BY`. Fixes [#16580](https://github.com/ClickHouse/ClickHouse/issues/16580). [#16928](https://github.com/ClickHouse/ClickHouse/pull/16928) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix strange code in InterpreterShowAccessQuery. [#16866](https://github.com/ClickHouse/ClickHouse/pull/16866) ([tavplubix](https://github.com/tavplubix)).
* Prevent clickhouse server crashes when using the function `timeSeriesGroupSum`. The function is removed from newer ClickHouse releases. [#16865](https://github.com/ClickHouse/ClickHouse/pull/16865) ([filimonov](https://github.com/filimonov)).
* Fix rare silent crashes when query profiler is on and ClickHouse is installed on OS with glibc version that has (supposedly) broken asynchronous unwind tables for some functions. This fixes [#15301](https://github.com/ClickHouse/ClickHouse/issues/15301). This fixes [#13098](https://github.com/ClickHouse/ClickHouse/issues/13098). [#16846](https://github.com/ClickHouse/ClickHouse/pull/16846) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix crash when using `any` without any arguments. This is for [#16803](https://github.com/ClickHouse/ClickHouse/issues/16803) . cc @azat. [#16826](https://github.com/ClickHouse/ClickHouse/pull/16826) ([Amos Bird](https://github.com/amosbird)).
* If no memory can be allocated while writing table metadata on disk, broken metadata file can be written. [#16772](https://github.com/ClickHouse/ClickHouse/pull/16772) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix trivial query optimization with partition predicate. [#16767](https://github.com/ClickHouse/ClickHouse/pull/16767) ([Azat Khuzhin](https://github.com/azat)).
* Fix `IN` operator over several columns and tuples with enabled `transform_null_in` setting. Fixes [#15310](https://github.com/ClickHouse/ClickHouse/issues/15310). [#16722](https://github.com/ClickHouse/ClickHouse/pull/16722) ([Anton Popov](https://github.com/CurtizJ)).
* Return number of affected rows for INSERT queries via MySQL protocol. Previously ClickHouse used to always return 0, it's fixed. Fixes [#16605](https://github.com/ClickHouse/ClickHouse/issues/16605). [#16715](https://github.com/ClickHouse/ClickHouse/pull/16715) ([Winter Zhang](https://github.com/zhang2014)).
* Fix remote query failure when using 'if' suffix aggregate function. Fixes [#16574](https://github.com/ClickHouse/ClickHouse/issues/16574) Fixes [#16231](https://github.com/ClickHouse/ClickHouse/issues/16231) [#16610](https://github.com/ClickHouse/ClickHouse/pull/16610) ([Winter Zhang](https://github.com/zhang2014)).
* Fix inconsistent behavior caused by `select_sequential_consistency` for optimized trivial count query and system.tables. [#16309](https://github.com/ClickHouse/ClickHouse/pull/16309) ([Hao Chen](https://github.com/haoch)).
#### Improvement
* Remove empty parts after they were pruned by TTL, mutation, or collapsing merge algorithm. [#16895](https://github.com/ClickHouse/ClickHouse/pull/16895) ([Anton Popov](https://github.com/CurtizJ)).
* Enable compact format of directories for asynchronous sends in Distributed tables: `use_compact_format_in_distributed_parts_names` is set to 1 by default. [#16788](https://github.com/ClickHouse/ClickHouse/pull/16788) ([Azat Khuzhin](https://github.com/azat)).
* Abort multipart upload if no data was written to S3. [#16840](https://github.com/ClickHouse/ClickHouse/pull/16840) ([Pavel Kovalenko](https://github.com/Jokser)).
* Reresolve the IP of the `format_avro_schema_registry_url` in case of errors. [#16985](https://github.com/ClickHouse/ClickHouse/pull/16985) ([filimonov](https://github.com/filimonov)).
* Mask password in data_path in the system.distribution_queue. [#16727](https://github.com/ClickHouse/ClickHouse/pull/16727) ([Azat Khuzhin](https://github.com/azat)).
* Throw error when use column transformer replaces non existing column. [#16183](https://github.com/ClickHouse/ClickHouse/pull/16183) ([hexiaoting](https://github.com/hexiaoting)).
* Turn off parallel parsing when there is no enough memory for all threads to work simultaneously. Also there could be exceptions like "Memory limit exceeded" when somebody will try to insert extremely huge rows (> min_chunk_bytes_for_parallel_parsing), because each piece to parse has to be independent set of strings (one or more). [#16721](https://github.com/ClickHouse/ClickHouse/pull/16721) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Install script should always create subdirs in config folders. This is only relevant for Docker build with custom config. [#16936](https://github.com/ClickHouse/ClickHouse/pull/16936) ([filimonov](https://github.com/filimonov)).
* Correct grammar in error message in JSONEachRow, JSONCompactEachRow, and RegexpRow input formats. [#17205](https://github.com/ClickHouse/ClickHouse/pull/17205) ([nico piderman](https://github.com/sneako)).
* Set default `host` and `port` parameters for `SOURCE(CLICKHOUSE(...))` to current instance and set default `user` value to `'default'`. [#16997](https://github.com/ClickHouse/ClickHouse/pull/16997) ([vdimir](https://github.com/vdimir)).
* Throw an informative error message when doing `ATTACH/DETACH TABLE <DICTIONARY>`. Before this PR, `detach table <dict>` works but leads to an ill-formed in-memory metadata. [#16885](https://github.com/ClickHouse/ClickHouse/pull/16885) ([Amos Bird](https://github.com/amosbird)).
* Add cutToFirstSignificantSubdomainWithWWW(). [#16845](https://github.com/ClickHouse/ClickHouse/pull/16845) ([Azat Khuzhin](https://github.com/azat)).
* Server refused to startup with exception message if wrong config is given (`metric_log`.`collect_interval_milliseconds` is missing). [#16815](https://github.com/ClickHouse/ClickHouse/pull/16815) ([Ivan](https://github.com/abyss7)).
* Better exception message when configuration for distributed DDL is absent. This fixes [#5075](https://github.com/ClickHouse/ClickHouse/issues/5075). [#16769](https://github.com/ClickHouse/ClickHouse/pull/16769) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Usability improvement: better suggestions in syntax error message when `CODEC` expression is misplaced in `CREATE TABLE` query. This fixes [#12493](https://github.com/ClickHouse/ClickHouse/issues/12493). [#16768](https://github.com/ClickHouse/ClickHouse/pull/16768) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Remove empty directories for async INSERT at start of Distributed engine. [#16729](https://github.com/ClickHouse/ClickHouse/pull/16729) ([Azat Khuzhin](https://github.com/azat)).
* Workaround for use S3 with nginx server as proxy. Nginx currenty does not accept urls with empty path like `http://domain.com?delete`, but vanilla aws-sdk-cpp produces this kind of urls. This commit uses patched aws-sdk-cpp version, which makes urls with "/" as path in this cases, like `http://domain.com/?delete`. [#16709](https://github.com/ClickHouse/ClickHouse/pull/16709) ([ianton-ru](https://github.com/ianton-ru)).
* Allow `reinterpretAs*` functions to work for integers and floats of the same size. Implements [16640](https://github.com/ClickHouse/ClickHouse/issues/16640). [#16657](https://github.com/ClickHouse/ClickHouse/pull/16657) ([flynn](https://github.com/ucasFL)).
* Now, `<auxiliary_zookeepers>` configuration can be changed in `config.xml` and reloaded without server startup. [#16627](https://github.com/ClickHouse/ClickHouse/pull/16627) ([Amos Bird](https://github.com/amosbird)).
* Support SNI in https connections to remote resources. This will allow to connect to Cloudflare servers that require SNI. This fixes [#10055](https://github.com/ClickHouse/ClickHouse/issues/10055). [#16252](https://github.com/ClickHouse/ClickHouse/pull/16252) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Make it possible to connect to `clickhouse-server` secure endpoint which requires SNI. This is possible when `clickhouse-server` is hosted behind TLS proxy. [#16938](https://github.com/ClickHouse/ClickHouse/pull/16938) ([filimonov](https://github.com/filimonov)).
* Fix possible stack overflow if a loop of materialized views is created. This closes [#15732](https://github.com/ClickHouse/ClickHouse/issues/15732). [#16048](https://github.com/ClickHouse/ClickHouse/pull/16048) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Simplify the implementation of background tasks processing for the MergeTree table engines family. There should be no visible changes for user. [#15983](https://github.com/ClickHouse/ClickHouse/pull/15983) ([alesapin](https://github.com/alesapin)).
* Improvement for MaterializeMySQL (experimental feature). Throw exception about right sync privileges when MySQL sync user has error privileges. [#15977](https://github.com/ClickHouse/ClickHouse/pull/15977) ([TCeason](https://github.com/TCeason)).
* Made `indexOf()` use BloomFilter. [#14977](https://github.com/ClickHouse/ClickHouse/pull/14977) ([achimbab](https://github.com/achimbab)).
#### Performance Improvement
* Use Floyd-Rivest algorithm, it is the best for the ClickHouse use case of partial sorting. Bechmarks are in https://github.com/danlark1/miniselect and [here](https://drive.google.com/drive/folders/1DHEaeXgZuX6AJ9eByeZ8iQVQv0ueP8XM). [#16825](https://github.com/ClickHouse/ClickHouse/pull/16825) ([Danila Kutenin](https://github.com/danlark1)).
* Now `ReplicatedMergeTree` tree engines family uses a separate thread pool for replicated fetches. Size of the pool limited by setting `background_fetches_pool_size` which can be tuned with a server restart. The default value of the setting is 3 and it means that the maximum amount of parallel fetches is equal to 3 (and it allows to utilize 10G network). Fixes #520. [#16390](https://github.com/ClickHouse/ClickHouse/pull/16390) ([alesapin](https://github.com/alesapin)).
* Fixed uncontrolled growth of the state of `quantileTDigest`. [#16680](https://github.com/ClickHouse/ClickHouse/pull/16680) ([hrissan](https://github.com/hrissan)).
* Add `VIEW` subquery description to `EXPLAIN`. Limit push down optimisation for `VIEW`. Add local replicas of `Distributed` to query plan. [#14936](https://github.com/ClickHouse/ClickHouse/pull/14936) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix optimize_read_in_order/optimize_aggregation_in_order with max_threads > 0 and expression in ORDER BY. [#16637](https://github.com/ClickHouse/ClickHouse/pull/16637) ([Azat Khuzhin](https://github.com/azat)).
* Fix performance of reading from `Merge` tables over huge number of `MergeTree` tables. Fixes [#7748](https://github.com/ClickHouse/ClickHouse/issues/7748). [#16988](https://github.com/ClickHouse/ClickHouse/pull/16988) ([Anton Popov](https://github.com/CurtizJ)).
* Now we can safely prune partitions with exact match. Useful case: Suppose table is partitioned by `intHash64(x) % 100` and the query has condition on `intHash64(x) % 100` verbatim, not on x. [#16253](https://github.com/ClickHouse/ClickHouse/pull/16253) ([Amos Bird](https://github.com/amosbird)).
#### Experimental Feature
* Add `EmbeddedRocksDB` table engine (can be used for dictionaries). [#15073](https://github.com/ClickHouse/ClickHouse/pull/15073) ([sundyli](https://github.com/sundy-li)).
#### Build/Testing/Packaging Improvement
* Improvements in test coverage building images. [#17233](https://github.com/ClickHouse/ClickHouse/pull/17233) ([alesapin](https://github.com/alesapin)).
* Update embedded timezone data to version 2020d (also update cctz to the latest master). [#17204](https://github.com/ClickHouse/ClickHouse/pull/17204) ([filimonov](https://github.com/filimonov)).
* Fix UBSan report in Poco. This closes [#12719](https://github.com/ClickHouse/ClickHouse/issues/12719). [#16765](https://github.com/ClickHouse/ClickHouse/pull/16765) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Do not instrument 3rd-party libraries with UBSan. [#16764](https://github.com/ClickHouse/ClickHouse/pull/16764) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix UBSan report in cache dictionaries. This closes [#12641](https://github.com/ClickHouse/ClickHouse/issues/12641). [#16763](https://github.com/ClickHouse/ClickHouse/pull/16763) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix UBSan report when trying to convert infinite floating point number to integer. This closes [#14190](https://github.com/ClickHouse/ClickHouse/issues/14190). [#16677](https://github.com/ClickHouse/ClickHouse/pull/16677) ([alexey-milovidov](https://github.com/alexey-milovidov)).
## ClickHouse release 20.11
### ClickHouse release v20.11.3.3-stable, 2020-11-13

View File

@ -41,9 +41,10 @@ if (SANITIZE)
if (COMPILER_CLANG)
set (TSAN_FLAGS "${TSAN_FLAGS} -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt")
else()
message (WARNING "TSAN suppressions was not passed to the compiler (since the compiler is not clang)")
message (WARNING "Use the following command to pass them manually:")
message (WARNING " export TSAN_OPTIONS=\"$TSAN_OPTIONS suppressions=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt\"")
set (MESSAGE "TSAN suppressions was not passed to the compiler (since the compiler is not clang)\n")
set (MESSAGE "${MESSAGE}Use the following command to pass them manually:\n")
set (MESSAGE "${MESSAGE} export TSAN_OPTIONS=\"$TSAN_OPTIONS suppressions=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt\"")
message (WARNING "${MESSAGE}")
endif()
@ -57,8 +58,18 @@ if (SANITIZE)
endif ()
elseif (SANITIZE STREQUAL "undefined")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=float-divide-by-zero -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/ubsan_suppressions.txt")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=float-divide-by-zero -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/ubsan_suppressions.txt")
set (UBSAN_FLAGS "-fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=float-divide-by-zero")
if (COMPILER_CLANG)
set (UBSAN_FLAGS "${UBSAN_FLAGS} -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/ubsan_suppressions.txt")
else()
set (MESSAGE "UBSAN suppressions was not passed to the compiler (since the compiler is not clang)\n")
set (MESSAGE "${MESSAGE}Use the following command to pass them manually:\n")
set (MESSAGE "${MESSAGE} export UBSAN_OPTIONS=\"$UBSAN_OPTIONS suppressions=${CMAKE_SOURCE_DIR}/tests/ubsan_suppressions.txt\"")
message (WARNING "${MESSAGE}")
endif()
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}")
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=undefined")
endif()

2
contrib/librdkafka vendored

@ -1 +1 @@
Subproject commit 9902bc4fb18bb441fa55ca154b341cdda191e5d3
Subproject commit f2f6616419d567c9198aef0d1133a2e9b4f02276

View File

@ -148,6 +148,10 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ
if split_binary:
cmake_flags.append('-DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1')
# We can't always build utils because it requires too much space, but
# we have to build them at least in some way in CI. The split build is
# probably the least heavy disk-wise.
cmake_flags.append('-DENABLE_UTILS=1')
if clang_tidy:
cmake_flags.append('-DENABLE_CLANG_TIDY=1')

View File

@ -15,6 +15,8 @@ For more information and documentation see https://clickhouse.yandex/.
$ docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
```
By default ClickHouse will be accessible only via docker network. See the [networking section below](#networking).
### connect to it from a native client
```bash
$ docker run -it --rm --link some-clickhouse-server:clickhouse-server yandex/clickhouse-client --host clickhouse-server
@ -22,6 +24,70 @@ $ docker run -it --rm --link some-clickhouse-server:clickhouse-server yandex/cli
More information about [ClickHouse client](https://clickhouse.yandex/docs/en/interfaces/cli/).
### connect to it using curl
```bash
echo "SELECT 'Hello, ClickHouse!'" | docker run -i --rm --link some-clickhouse-server:clickhouse-server curlimages/curl 'http://clickhouse-server:8123/?query=' -s --data-binary @-
```
More information about [ClickHouse HTTP Interface](https://clickhouse.tech/docs/en/interfaces/http/).
### stopping / removing the containter
```bash
$ docker stop some-clickhouse-server
$ docker rm some-clickhouse-server
```
### networking
You can expose you ClickHouse running in docker by [mapping particular port](https://docs.docker.com/config/containers/container-networking/) from inside container to a host ports:
```bash
$ docker run -d -p 18123:8123 -p19000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
$ echo 'SELECT version()' | curl 'http://localhost:18123/' --data-binary @-
20.12.3.3
```
or by allowing container to use [host ports directly](https://docs.docker.com/network/host/) using `--network=host` (also allows archiving better network performance):
```bash
$ docker run -d --network=host --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
$ echo 'SELECT version()' | curl 'http://localhost:8123/' --data-binary @-
20.12.3.3
```
### Volumes
Typically you may want to mount the following folders inside your container to archieve persistency:
* `/var/lib/clickhouse/` - main folder where ClickHouse stores the data
* `/val/log/clickhouse-server/` - logs
```bash
$ docker run -d \
-v $(realpath ./ch_data):/var/lib/clickhouse/ \
-v $(realpath ./ch_logs):/var/log/clickhouse-server/ \
--name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
```
You may also want to mount:
* `/etc/clickhouse-server/config.d/*.xml` - files with server configuration adjustmenets
* `/etc/clickhouse-server/usert.d/*.xml` - files with use settings adjustmenets
* `/docker-entrypoint-initdb.d/` - folder with database initialization scripts (see below).
### Linux capabilities
ClickHouse has some advanced functionality which requite enabling several [linux capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html).
It is optional and can be enabled using the following [docker command line agruments](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities):
```bash
$ docker run -d \
--cap-add=SYS_NICE --cap-add=NET_ADMIN --cap-add=IPC_LOCK \
--name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
```
## Configuration
Container exposes 8123 port for [HTTP interface](https://clickhouse.yandex/docs/en/interfaces/http_interface/) and 9000 port for [native client](https://clickhouse.yandex/docs/en/interfaces/tcp/).

View File

@ -3,6 +3,7 @@
<mysql_port remove="remove"/>
<interserver_http_port remove="remove"/>
<tcp_with_proxy_port remove="remove"/>
<test_keeper_server remove="remove"/>
<listen_host>::</listen_host>
<logger>

View File

@ -41,42 +41,7 @@ namespace
}
void applyParamsToAccessRights(AccessRights & access, const ContextAccessParams & params)
{
static const AccessFlags table_ddl = AccessType::CREATE_DATABASE | AccessType::CREATE_TABLE | AccessType::CREATE_VIEW
| AccessType::ALTER_TABLE | AccessType::ALTER_VIEW | AccessType::DROP_DATABASE | AccessType::DROP_TABLE | AccessType::DROP_VIEW
| AccessType::TRUNCATE;
static const AccessFlags dictionary_ddl = AccessType::CREATE_DICTIONARY | AccessType::DROP_DICTIONARY;
static const AccessFlags table_and_dictionary_ddl = table_ddl | dictionary_ddl;
static const AccessFlags write_table_access = AccessType::INSERT | AccessType::OPTIMIZE;
static const AccessFlags write_dcl_access = AccessType::ACCESS_MANAGEMENT - AccessType::SHOW_ACCESS;
if (params.readonly)
access.revoke(write_table_access | table_and_dictionary_ddl | write_dcl_access | AccessType::SYSTEM | AccessType::KILL_QUERY);
if (params.readonly == 1)
{
/// Table functions are forbidden in readonly mode.
/// For readonly = 2 they're allowed.
access.revoke(AccessType::CREATE_TEMPORARY_TABLE);
}
if (!params.allow_ddl)
access.revoke(table_and_dictionary_ddl);
if (!params.allow_introspection)
access.revoke(AccessType::INTROSPECTION);
if (params.readonly)
{
/// No grant option in readonly mode.
access.revokeGrantOption(AccessType::ALL);
}
}
void addImplicitAccessRights(AccessRights & access)
AccessRights addImplicitAccessRights(const AccessRights & access)
{
auto modifier = [&](const AccessFlags & flags, const AccessFlags & min_flags_with_children, const AccessFlags & max_flags_with_children, const std::string_view & database, const std::string_view & table, const std::string_view & column) -> AccessFlags
{
@ -150,23 +115,12 @@ namespace
return res;
};
access.modifyFlags(modifier);
/// Transform access to temporary tables into access to "_temporary_and_external_tables" database.
if (access.isGranted(AccessType::CREATE_TEMPORARY_TABLE))
access.grant(AccessFlags::allTableFlags() | AccessFlags::allColumnFlags(), DatabaseCatalog::TEMPORARY_DATABASE);
AccessRights res = access;
res.modifyFlags(modifier);
/// Anyone has access to the "system" database.
access.grant(AccessType::SELECT, DatabaseCatalog::SYSTEM_DATABASE);
}
AccessRights calculateFinalAccessRights(const AccessRights & access_from_user_and_roles, const ContextAccessParams & params)
{
AccessRights res_access = access_from_user_and_roles;
applyParamsToAccessRights(res_access, params);
addImplicitAccessRights(res_access);
return res_access;
res.grant(AccessType::SELECT, DatabaseCatalog::SYSTEM_DATABASE);
return res;
}
@ -176,6 +130,12 @@ namespace
ids[0] = id;
return ids;
}
/// Helper for using in templates.
std::string_view getDatabase() { return {}; }
template <typename... OtherArgs>
std::string_view getDatabase(const std::string_view & arg1, const OtherArgs &...) { return arg1; }
}
@ -203,10 +163,7 @@ void ContextAccess::setUser(const UserPtr & user_) const
/// User has been dropped.
auto nothing_granted = std::make_shared<AccessRights>();
access = nothing_granted;
access_without_readonly = nothing_granted;
access_with_allow_ddl = nothing_granted;
access_with_allow_introspection = nothing_granted;
access_from_user_and_roles = nothing_granted;
access_with_implicit = nothing_granted;
subscription_for_user_change = {};
subscription_for_roles_changes = {};
enabled_roles = nullptr;
@ -270,12 +227,8 @@ void ContextAccess::setRolesInfo(const std::shared_ptr<const EnabledRolesInfo> &
void ContextAccess::calculateAccessRights() const
{
access_from_user_and_roles = std::make_shared<AccessRights>(mixAccessRightsFromUserAndRoles(*user, *roles_info));
access = std::make_shared<AccessRights>(calculateFinalAccessRights(*access_from_user_and_roles, params));
access_without_readonly = nullptr;
access_with_allow_ddl = nullptr;
access_with_allow_introspection = nullptr;
access = std::make_shared<AccessRights>(mixAccessRightsFromUserAndRoles(*user, *roles_info));
access_with_implicit = std::make_shared<AccessRights>(addImplicitAccessRights(*access));
if (trace_log)
{
@ -287,6 +240,7 @@ void ContextAccess::calculateAccessRights() const
}
LOG_TRACE(trace_log, "Settings: readonly={}, allow_ddl={}, allow_introspection_functions={}", params.readonly, params.allow_ddl, params.allow_introspection);
LOG_TRACE(trace_log, "List of all grants: {}", access->toString());
LOG_TRACE(trace_log, "List of all grants including implicit: {}", access_with_implicit->toString());
}
}
@ -340,6 +294,7 @@ std::shared_ptr<const ContextAccess> ContextAccess::getFullAccess()
static const std::shared_ptr<const ContextAccess> res = []
{
auto full_access = std::shared_ptr<ContextAccess>(new ContextAccess);
full_access->is_full_access = true;
full_access->access = std::make_shared<AccessRights>(AccessRights::getFullAccess());
full_access->enabled_quota = EnabledQuota::getUnlimitedQuota();
return full_access;
@ -362,323 +317,303 @@ std::shared_ptr<const SettingsConstraints> ContextAccess::getSettingsConstraints
}
std::shared_ptr<const AccessRights> ContextAccess::getAccess() const
std::shared_ptr<const AccessRights> ContextAccess::getAccessRights() const
{
std::lock_guard lock{mutex};
return access;
}
template <bool grant_option, typename... Args>
bool ContextAccess::isGrantedImpl2(const AccessFlags & flags, const Args &... args) const
std::shared_ptr<const AccessRights> ContextAccess::getAccessRightsWithImplicit() const
{
bool access_granted;
if constexpr (grant_option)
access_granted = getAccess()->hasGrantOption(flags, args...);
else
access_granted = getAccess()->isGranted(flags, args...);
if (trace_log)
LOG_TRACE(trace_log, "Access {}: {}{}", (access_granted ? "granted" : "denied"), (AccessRightsElement{flags, args...}.toString()),
(grant_option ? " WITH GRANT OPTION" : ""));
return access_granted;
std::lock_guard lock{mutex};
return access_with_implicit;
}
template <bool grant_option>
bool ContextAccess::isGrantedImpl(const AccessFlags & flags) const
template <bool throw_if_denied, bool grant_option>
bool ContextAccess::checkAccessImpl(const AccessFlags & flags) const
{
return isGrantedImpl2<grant_option>(flags);
return checkAccessImpl2<throw_if_denied, grant_option>(flags);
}
template <bool grant_option, typename... Args>
bool ContextAccess::isGrantedImpl(const AccessFlags & flags, const std::string_view & database, const Args &... args) const
template <bool throw_if_denied, bool grant_option, typename... Args>
bool ContextAccess::checkAccessImpl(const AccessFlags & flags, const std::string_view & database, const Args &... args) const
{
return isGrantedImpl2<grant_option>(flags, database.empty() ? params.current_database : database, args...);
return checkAccessImpl2<throw_if_denied, grant_option>(flags, database.empty() ? params.current_database : database, args...);
}
template <bool grant_option>
bool ContextAccess::isGrantedImpl(const AccessRightsElement & element) const
template <bool throw_if_denied, bool grant_option>
bool ContextAccess::checkAccessImpl(const AccessRightsElement & element) const
{
if (element.any_database)
return isGrantedImpl<grant_option>(element.access_flags);
return checkAccessImpl<throw_if_denied, grant_option>(element.access_flags);
else if (element.any_table)
return isGrantedImpl<grant_option>(element.access_flags, element.database);
return checkAccessImpl<throw_if_denied, grant_option>(element.access_flags, element.database);
else if (element.any_column)
return isGrantedImpl<grant_option>(element.access_flags, element.database, element.table);
return checkAccessImpl<throw_if_denied, grant_option>(element.access_flags, element.database, element.table);
else
return isGrantedImpl<grant_option>(element.access_flags, element.database, element.table, element.columns);
return checkAccessImpl<throw_if_denied, grant_option>(element.access_flags, element.database, element.table, element.columns);
}
template <bool grant_option>
bool ContextAccess::isGrantedImpl(const AccessRightsElements & elements) const
template <bool throw_if_denied, bool grant_option>
bool ContextAccess::checkAccessImpl(const AccessRightsElements & elements) const
{
for (const auto & element : elements)
if (!isGrantedImpl<grant_option>(element))
if (!checkAccessImpl<throw_if_denied, grant_option>(element))
return false;
return true;
}
bool ContextAccess::isGranted(const AccessFlags & flags) const { return isGrantedImpl<false>(flags); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database) const { return isGrantedImpl<false>(flags, database); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database, const std::string_view & table) const { return isGrantedImpl<false>(flags, database, table); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::string_view & column) const { return isGrantedImpl<false>(flags, database, table, column); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::vector<std::string_view> & columns) const { return isGrantedImpl<false>(flags, database, table, columns); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const Strings & columns) const { return isGrantedImpl<false>(flags, database, table, columns); }
bool ContextAccess::isGranted(const AccessRightsElement & element) const { return isGrantedImpl<false>(element); }
bool ContextAccess::isGranted(const AccessRightsElements & elements) const { return isGrantedImpl<false>(elements); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags) const { return isGrantedImpl<true>(flags); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database) const { return isGrantedImpl<true>(flags, database); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table) const { return isGrantedImpl<true>(flags, database, table); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::string_view & column) const { return isGrantedImpl<true>(flags, database, table, column); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::vector<std::string_view> & columns) const { return isGrantedImpl<true>(flags, database, table, columns); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const Strings & columns) const { return isGrantedImpl<true>(flags, database, table, columns); }
bool ContextAccess::hasGrantOption(const AccessRightsElement & element) const { return isGrantedImpl<true>(element); }
bool ContextAccess::hasGrantOption(const AccessRightsElements & elements) const { return isGrantedImpl<true>(elements); }
template <bool grant_option, typename... Args>
void ContextAccess::checkAccessImpl2(const AccessFlags & flags, const Args &... args) const
template <bool throw_if_denied, bool grant_option, typename... Args>
bool ContextAccess::checkAccessImpl2(const AccessFlags & flags, const Args &... args) const
{
if constexpr (grant_option)
auto access_granted = [&]
{
if (hasGrantOption(flags, args...))
return;
}
else
{
if (isGranted(flags, args...))
return;
}
auto show_error = [&](const String & msg, int error_code)
{
throw Exception(user_name + ": " + msg, error_code);
if (trace_log)
LOG_TRACE(trace_log, "Access granted: {}{}", (AccessRightsElement{flags, args...}.toString()),
(grant_option ? " WITH GRANT OPTION" : ""));
return true;
};
std::lock_guard lock{mutex};
if (!user)
show_error("User has been dropped", ErrorCodes::UNKNOWN_USER);
if (grant_option && access->isGranted(flags, args...))
auto access_denied = [&](const String & error_msg, int error_code)
{
show_error(
"Not enough privileges. "
"The required privileges have been granted, but without grant option. "
"To execute this query it's necessary to have the grant "
+ AccessRightsElement{flags, args...}.toString() + " WITH GRANT OPTION",
if (trace_log)
LOG_TRACE(trace_log, "Access denied: {}{}", (AccessRightsElement{flags, args...}.toString()),
(grant_option ? " WITH GRANT OPTION" : ""));
if constexpr (throw_if_denied)
throw Exception(getUserName() + ": " + error_msg, error_code);
return false;
};
if (!flags || is_full_access)
return access_granted();
if (!getUser())
return access_denied("User has been dropped", ErrorCodes::UNKNOWN_USER);
/// If the current user was allowed to create a temporary table
/// then he is allowed to do with it whatever he wants.
if ((sizeof...(args) >= 2) && (getDatabase(args...) == DatabaseCatalog::TEMPORARY_DATABASE))
return access_granted();
auto acs = getAccessRightsWithImplicit();
bool granted;
if constexpr (grant_option)
granted = acs->hasGrantOption(flags, args...);
else
granted = acs->isGranted(flags, args...);
if (!granted)
{
if (grant_option && acs->isGranted(flags, args...))
{
return access_denied(
"Not enough privileges. "
"The required privileges have been granted, but without grant option. "
"To execute this query it's necessary to have grant "
+ AccessRightsElement{flags, args...}.toString() + " WITH GRANT OPTION",
ErrorCodes::ACCESS_DENIED);
}
return access_denied(
"Not enough privileges. To execute this query it's necessary to have grant "
+ AccessRightsElement{flags, args...}.toString() + (grant_option ? " WITH GRANT OPTION" : ""),
ErrorCodes::ACCESS_DENIED);
}
struct PrecalculatedFlags
{
const AccessFlags table_ddl = AccessType::CREATE_DATABASE | AccessType::CREATE_TABLE | AccessType::CREATE_VIEW
| AccessType::ALTER_TABLE | AccessType::ALTER_VIEW | AccessType::DROP_DATABASE | AccessType::DROP_TABLE | AccessType::DROP_VIEW
| AccessType::TRUNCATE;
const AccessFlags dictionary_ddl = AccessType::CREATE_DICTIONARY | AccessType::DROP_DICTIONARY;
const AccessFlags table_and_dictionary_ddl = table_ddl | dictionary_ddl;
const AccessFlags write_table_access = AccessType::INSERT | AccessType::OPTIMIZE;
const AccessFlags write_dcl_access = AccessType::ACCESS_MANAGEMENT - AccessType::SHOW_ACCESS;
const AccessFlags not_readonly_flags = write_table_access | table_and_dictionary_ddl | write_dcl_access | AccessType::SYSTEM | AccessType::KILL_QUERY;
const AccessFlags not_readonly_1_flags = AccessType::CREATE_TEMPORARY_TABLE;
const AccessFlags ddl_flags = table_ddl | dictionary_ddl;
const AccessFlags introspection_flags = AccessType::INTROSPECTION;
};
static const PrecalculatedFlags precalc;
if (params.readonly)
{
if (!access_without_readonly)
{
Params changed_params = params;
changed_params.readonly = 0;
access_without_readonly = std::make_shared<AccessRights>(calculateFinalAccessRights(*access_from_user_and_roles, changed_params));
}
if (access_without_readonly->isGranted(flags, args...))
if constexpr (grant_option)
return access_denied("Cannot change grants in readonly mode.", ErrorCodes::READONLY);
if ((flags & precalc.not_readonly_flags) ||
((params.readonly == 1) && (flags & precalc.not_readonly_1_flags)))
{
if (params.interface == ClientInfo::Interface::HTTP && params.http_method == ClientInfo::HTTPMethod::GET)
show_error(
{
return access_denied(
"Cannot execute query in readonly mode. "
"For queries over HTTP, method GET implies readonly. You should use method POST for modifying queries",
ErrorCodes::READONLY);
}
else
show_error("Cannot execute query in readonly mode", ErrorCodes::READONLY);
return access_denied("Cannot execute query in readonly mode", ErrorCodes::READONLY);
}
}
if (!params.allow_ddl)
if (!params.allow_ddl && !grant_option)
{
if (!access_with_allow_ddl)
{
Params changed_params = params;
changed_params.allow_ddl = true;
access_with_allow_ddl = std::make_shared<AccessRights>(calculateFinalAccessRights(*access_from_user_and_roles, changed_params));
}
if (access_with_allow_ddl->isGranted(flags, args...))
{
show_error("Cannot execute query. DDL queries are prohibited for the user", ErrorCodes::QUERY_IS_PROHIBITED);
}
if (flags & precalc.ddl_flags)
return access_denied("Cannot execute query. DDL queries are prohibited for the user", ErrorCodes::QUERY_IS_PROHIBITED);
}
if (!params.allow_introspection)
if (!params.allow_introspection && !grant_option)
{
if (!access_with_allow_introspection)
{
Params changed_params = params;
changed_params.allow_introspection = true;
access_with_allow_introspection = std::make_shared<AccessRights>(calculateFinalAccessRights(*access_from_user_and_roles, changed_params));
}
if (access_with_allow_introspection->isGranted(flags, args...))
{
show_error("Introspection functions are disabled, because setting 'allow_introspection_functions' is set to 0", ErrorCodes::FUNCTION_NOT_ALLOWED);
}
if (flags & precalc.introspection_flags)
return access_denied("Introspection functions are disabled, because setting 'allow_introspection_functions' is set to 0", ErrorCodes::FUNCTION_NOT_ALLOWED);
}
show_error(
"Not enough privileges. To execute this query it's necessary to have the grant "
+ AccessRightsElement{flags, args...}.toString() + (grant_option ? " WITH GRANT OPTION" : ""),
ErrorCodes::ACCESS_DENIED);
return access_granted();
}
template <bool grant_option>
void ContextAccess::checkAccessImpl(const AccessFlags & flags) const
bool ContextAccess::isGranted(const AccessFlags & flags) const { return checkAccessImpl<false, false>(flags); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database) const { return checkAccessImpl<false, false>(flags, database); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database, const std::string_view & table) const { return checkAccessImpl<false, false>(flags, database, table); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::string_view & column) const { return checkAccessImpl<false, false>(flags, database, table, column); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::vector<std::string_view> & columns) const { return checkAccessImpl<false, false>(flags, database, table, columns); }
bool ContextAccess::isGranted(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const Strings & columns) const { return checkAccessImpl<false, false>(flags, database, table, columns); }
bool ContextAccess::isGranted(const AccessRightsElement & element) const { return checkAccessImpl<false, false>(element); }
bool ContextAccess::isGranted(const AccessRightsElements & elements) const { return checkAccessImpl<false, false>(elements); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags) const { return checkAccessImpl<false, true>(flags); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database) const { return checkAccessImpl<false, true>(flags, database); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table) const { return checkAccessImpl<false, true>(flags, database, table); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::string_view & column) const { return checkAccessImpl<false, true>(flags, database, table, column); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::vector<std::string_view> & columns) const { return checkAccessImpl<false, true>(flags, database, table, columns); }
bool ContextAccess::hasGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const Strings & columns) const { return checkAccessImpl<false, true>(flags, database, table, columns); }
bool ContextAccess::hasGrantOption(const AccessRightsElement & element) const { return checkAccessImpl<false, true>(element); }
bool ContextAccess::hasGrantOption(const AccessRightsElements & elements) const { return checkAccessImpl<false, true>(elements); }
void ContextAccess::checkAccess(const AccessFlags & flags) const { checkAccessImpl<true, false>(flags); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database) const { checkAccessImpl<true, false>(flags, database); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database, const std::string_view & table) const { checkAccessImpl<true, false>(flags, database, table); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::string_view & column) const { checkAccessImpl<true, false>(flags, database, table, column); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::vector<std::string_view> & columns) const { checkAccessImpl<true, false>(flags, database, table, columns); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const Strings & columns) const { checkAccessImpl<true, false>(flags, database, table, columns); }
void ContextAccess::checkAccess(const AccessRightsElement & element) const { checkAccessImpl<true, false>(element); }
void ContextAccess::checkAccess(const AccessRightsElements & elements) const { checkAccessImpl<true, false>(elements); }
void ContextAccess::checkGrantOption(const AccessFlags & flags) const { checkAccessImpl<true, true>(flags); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database) const { checkAccessImpl<true, true>(flags, database); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table) const { checkAccessImpl<true, true>(flags, database, table); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::string_view & column) const { checkAccessImpl<true, true>(flags, database, table, column); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::vector<std::string_view> & columns) const { checkAccessImpl<true, true>(flags, database, table, columns); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const Strings & columns) const { checkAccessImpl<true, true>(flags, database, table, columns); }
void ContextAccess::checkGrantOption(const AccessRightsElement & element) const { checkAccessImpl<true, true>(element); }
void ContextAccess::checkGrantOption(const AccessRightsElements & elements) const { checkAccessImpl<true, true>(elements); }
template <bool throw_if_denied>
bool ContextAccess::checkAdminOptionImpl(const UUID & role_id) const
{
checkAccessImpl2<grant_option>(flags);
return checkAdminOptionImpl2<throw_if_denied>(to_array(role_id), [this](const UUID & id, size_t) { return manager->tryReadName(id); });
}
template <bool grant_option, typename... Args>
void ContextAccess::checkAccessImpl(const AccessFlags & flags, const std::string_view & database, const Args &... args) const
template <bool throw_if_denied>
bool ContextAccess::checkAdminOptionImpl(const UUID & role_id, const String & role_name) const
{
checkAccessImpl2<grant_option>(flags, database.empty() ? params.current_database : database, args...);
return checkAdminOptionImpl2<throw_if_denied>(to_array(role_id), [&role_name](const UUID &, size_t) { return std::optional<String>{role_name}; });
}
template <bool grant_option>
void ContextAccess::checkAccessImpl(const AccessRightsElement & element) const
template <bool throw_if_denied>
bool ContextAccess::checkAdminOptionImpl(const UUID & role_id, const std::unordered_map<UUID, String> & names_of_roles) const
{
if (element.any_database)
checkAccessImpl<grant_option>(element.access_flags);
else if (element.any_table)
checkAccessImpl<grant_option>(element.access_flags, element.database);
else if (element.any_column)
checkAccessImpl<grant_option>(element.access_flags, element.database, element.table);
else
checkAccessImpl<grant_option>(element.access_flags, element.database, element.table, element.columns);
return checkAdminOptionImpl2<throw_if_denied>(to_array(role_id), [&names_of_roles](const UUID & id, size_t) { auto it = names_of_roles.find(id); return (it != names_of_roles.end()) ? it->second : std::optional<String>{}; });
}
template <bool grant_option>
void ContextAccess::checkAccessImpl(const AccessRightsElements & elements) const
template <bool throw_if_denied>
bool ContextAccess::checkAdminOptionImpl(const std::vector<UUID> & role_ids) const
{
for (const auto & element : elements)
checkAccessImpl<grant_option>(element);
return checkAdminOptionImpl2<throw_if_denied>(role_ids, [this](const UUID & id, size_t) { return manager->tryReadName(id); });
}
void ContextAccess::checkAccess(const AccessFlags & flags) const { checkAccessImpl<false>(flags); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database) const { checkAccessImpl<false>(flags, database); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database, const std::string_view & table) const { checkAccessImpl<false>(flags, database, table); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::string_view & column) const { checkAccessImpl<false>(flags, database, table, column); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::vector<std::string_view> & columns) const { checkAccessImpl<false>(flags, database, table, columns); }
void ContextAccess::checkAccess(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const Strings & columns) const { checkAccessImpl<false>(flags, database, table, columns); }
void ContextAccess::checkAccess(const AccessRightsElement & element) const { checkAccessImpl<false>(element); }
void ContextAccess::checkAccess(const AccessRightsElements & elements) const { checkAccessImpl<false>(elements); }
void ContextAccess::checkGrantOption(const AccessFlags & flags) const { checkAccessImpl<true>(flags); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database) const { checkAccessImpl<true>(flags, database); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table) const { checkAccessImpl<true>(flags, database, table); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::string_view & column) const { checkAccessImpl<true>(flags, database, table, column); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const std::vector<std::string_view> & columns) const { checkAccessImpl<true>(flags, database, table, columns); }
void ContextAccess::checkGrantOption(const AccessFlags & flags, const std::string_view & database, const std::string_view & table, const Strings & columns) const { checkAccessImpl<true>(flags, database, table, columns); }
void ContextAccess::checkGrantOption(const AccessRightsElement & element) const { checkAccessImpl<true>(element); }
void ContextAccess::checkGrantOption(const AccessRightsElements & elements) const { checkAccessImpl<true>(elements); }
template <typename Container, typename GetNameFunction>
bool ContextAccess::checkAdminOptionImpl(bool throw_on_error, const Container & role_ids, const GetNameFunction & get_name_function) const
template <bool throw_if_denied>
bool ContextAccess::checkAdminOptionImpl(const std::vector<UUID> & role_ids, const Strings & names_of_roles) const
{
return checkAdminOptionImpl2<throw_if_denied>(role_ids, [&names_of_roles](const UUID &, size_t i) { return std::optional<String>{names_of_roles[i]}; });
}
template <bool throw_if_denied>
bool ContextAccess::checkAdminOptionImpl(const std::vector<UUID> & role_ids, const std::unordered_map<UUID, String> & names_of_roles) const
{
return checkAdminOptionImpl2<throw_if_denied>(role_ids, [&names_of_roles](const UUID & id, size_t) { auto it = names_of_roles.find(id); return (it != names_of_roles.end()) ? it->second : std::optional<String>{}; });
}
template <bool throw_if_denied, typename Container, typename GetNameFunction>
bool ContextAccess::checkAdminOptionImpl2(const Container & role_ids, const GetNameFunction & get_name_function) const
{
if (!std::size(role_ids) || is_full_access)
return true;
auto show_error = [this](const String & msg, int error_code)
{
UNUSED(this);
if constexpr (throw_if_denied)
throw Exception(getUserName() + ": " + msg, error_code);
};
if (!getUser())
{
show_error("User has been dropped", ErrorCodes::UNKNOWN_USER);
return false;
}
if (isGranted(AccessType::ROLE_ADMIN))
return true;
auto info = getRolesInfo();
if (!info)
{
if (!user)
{
if (throw_on_error)
throw Exception(user_name + ": User has been dropped", ErrorCodes::UNKNOWN_USER);
else
return false;
}
return true;
}
size_t i = 0;
for (auto it = std::begin(role_ids); it != std::end(role_ids); ++it, ++i)
{
const UUID & role_id = *it;
if (info->enabled_roles_with_admin_option.count(role_id))
if (info && info->enabled_roles_with_admin_option.count(role_id))
continue;
auto role_name = get_name_function(role_id, i);
if (!role_name)
role_name = "ID {" + toString(role_id) + "}";
String msg = "To execute this query it's necessary to have the role " + backQuoteIfNeed(*role_name) + " granted with ADMIN option";
if (info->enabled_roles.count(role_id))
msg = "Role " + backQuote(*role_name) + " is granted, but without ADMIN option. " + msg;
if (throw_on_error)
throw Exception(getUserName() + ": Not enough privileges. " + msg, ErrorCodes::ACCESS_DENIED);
else
return false;
if (throw_if_denied)
{
auto role_name = get_name_function(role_id, i);
if (!role_name)
role_name = "ID {" + toString(role_id) + "}";
if (info && info->enabled_roles.count(role_id))
show_error("Not enough privileges. "
"Role " + backQuote(*role_name) + " is granted, but without ADMIN option. "
"To execute this query it's necessary to have the role " + backQuoteIfNeed(*role_name) + " granted with ADMIN option.",
ErrorCodes::ACCESS_DENIED);
else
show_error("Not enough privileges. "
"To execute this query it's necessary to have the role " + backQuoteIfNeed(*role_name) + " granted with ADMIN option.",
ErrorCodes::ACCESS_DENIED);
}
return false;
}
return true;
}
bool ContextAccess::hasAdminOption(const UUID & role_id) const
{
return checkAdminOptionImpl(false, to_array(role_id), [this](const UUID & id, size_t) { return manager->tryReadName(id); });
}
bool ContextAccess::hasAdminOption(const UUID & role_id) const { return checkAdminOptionImpl<false>(role_id); }
bool ContextAccess::hasAdminOption(const UUID & role_id, const String & role_name) const { return checkAdminOptionImpl<false>(role_id, role_name); }
bool ContextAccess::hasAdminOption(const UUID & role_id, const std::unordered_map<UUID, String> & names_of_roles) const { return checkAdminOptionImpl<false>(role_id, names_of_roles); }
bool ContextAccess::hasAdminOption(const std::vector<UUID> & role_ids) const { return checkAdminOptionImpl<false>(role_ids); }
bool ContextAccess::hasAdminOption(const std::vector<UUID> & role_ids, const Strings & names_of_roles) const { return checkAdminOptionImpl<false>(role_ids, names_of_roles); }
bool ContextAccess::hasAdminOption(const std::vector<UUID> & role_ids, const std::unordered_map<UUID, String> & names_of_roles) const { return checkAdminOptionImpl<false>(role_ids, names_of_roles); }
bool ContextAccess::hasAdminOption(const UUID & role_id, const String & role_name) const
{
return checkAdminOptionImpl(false, to_array(role_id), [&role_name](const UUID &, size_t) { return std::optional<String>{role_name}; });
}
bool ContextAccess::hasAdminOption(const UUID & role_id, const std::unordered_map<UUID, String> & names_of_roles) const
{
return checkAdminOptionImpl(false, to_array(role_id), [&names_of_roles](const UUID & id, size_t) { auto it = names_of_roles.find(id); return (it != names_of_roles.end()) ? it->second : std::optional<String>{}; });
}
bool ContextAccess::hasAdminOption(const std::vector<UUID> & role_ids) const
{
return checkAdminOptionImpl(false, role_ids, [this](const UUID & id, size_t) { return manager->tryReadName(id); });
}
bool ContextAccess::hasAdminOption(const std::vector<UUID> & role_ids, const Strings & names_of_roles) const
{
return checkAdminOptionImpl(false, role_ids, [&names_of_roles](const UUID &, size_t i) { return std::optional<String>{names_of_roles[i]}; });
}
bool ContextAccess::hasAdminOption(const std::vector<UUID> & role_ids, const std::unordered_map<UUID, String> & names_of_roles) const
{
return checkAdminOptionImpl(false, role_ids, [&names_of_roles](const UUID & id, size_t) { auto it = names_of_roles.find(id); return (it != names_of_roles.end()) ? it->second : std::optional<String>{}; });
}
void ContextAccess::checkAdminOption(const UUID & role_id) const
{
checkAdminOptionImpl(true, to_array(role_id), [this](const UUID & id, size_t) { return manager->tryReadName(id); });
}
void ContextAccess::checkAdminOption(const UUID & role_id, const String & role_name) const
{
checkAdminOptionImpl(true, to_array(role_id), [&role_name](const UUID &, size_t) { return std::optional<String>{role_name}; });
}
void ContextAccess::checkAdminOption(const UUID & role_id, const std::unordered_map<UUID, String> & names_of_roles) const
{
checkAdminOptionImpl(true, to_array(role_id), [&names_of_roles](const UUID & id, size_t) { auto it = names_of_roles.find(id); return (it != names_of_roles.end()) ? it->second : std::optional<String>{}; });
}
void ContextAccess::checkAdminOption(const std::vector<UUID> & role_ids) const
{
checkAdminOptionImpl(true, role_ids, [this](const UUID & id, size_t) { return manager->tryReadName(id); });
}
void ContextAccess::checkAdminOption(const std::vector<UUID> & role_ids, const Strings & names_of_roles) const
{
checkAdminOptionImpl(true, role_ids, [&names_of_roles](const UUID &, size_t i) { return std::optional<String>{names_of_roles[i]}; });
}
void ContextAccess::checkAdminOption(const std::vector<UUID> & role_ids, const std::unordered_map<UUID, String> & names_of_roles) const
{
checkAdminOptionImpl(true, role_ids, [&names_of_roles](const UUID & id, size_t) { auto it = names_of_roles.find(id); return (it != names_of_roles.end()) ? it->second : std::optional<String>{}; });
}
void ContextAccess::checkAdminOption(const UUID & role_id) const { checkAdminOptionImpl<true>(role_id); }
void ContextAccess::checkAdminOption(const UUID & role_id, const String & role_name) const { checkAdminOptionImpl<true>(role_id, role_name); }
void ContextAccess::checkAdminOption(const UUID & role_id, const std::unordered_map<UUID, String> & names_of_roles) const { checkAdminOptionImpl<true>(role_id, names_of_roles); }
void ContextAccess::checkAdminOption(const std::vector<UUID> & role_ids) const { checkAdminOptionImpl<true>(role_ids); }
void ContextAccess::checkAdminOption(const std::vector<UUID> & role_ids, const Strings & names_of_roles) const { checkAdminOptionImpl<true>(role_ids, names_of_roles); }
void ContextAccess::checkAdminOption(const std::vector<UUID> & role_ids, const std::unordered_map<UUID, String> & names_of_roles) const { checkAdminOptionImpl<true>(role_ids, names_of_roles); }
}

View File

@ -96,7 +96,8 @@ public:
std::shared_ptr<const SettingsConstraints> getSettingsConstraints() const;
/// Returns the current access rights.
std::shared_ptr<const AccessRights> getAccess() const;
std::shared_ptr<const AccessRights> getAccessRights() const;
std::shared_ptr<const AccessRights> getAccessRightsWithImplicit() const;
/// Checks if a specified access is granted.
bool isGranted(const AccessFlags & flags) const;
@ -166,41 +167,45 @@ private:
void setSettingsAndConstraints() const;
void calculateAccessRights() const;
template <bool grant_option>
bool isGrantedImpl(const AccessFlags & flags) const;
template <bool throw_if_denied, bool grant_option>
bool checkAccessImpl(const AccessFlags & flags) const;
template <bool grant_option, typename... Args>
bool isGrantedImpl(const AccessFlags & flags, const std::string_view & database, const Args &... args) const;
template <bool throw_if_denied, bool grant_option, typename... Args>
bool checkAccessImpl(const AccessFlags & flags, const std::string_view & database, const Args &... args) const;
template <bool grant_option>
bool isGrantedImpl(const AccessRightsElement & element) const;
template <bool throw_if_denied, bool grant_option>
bool checkAccessImpl(const AccessRightsElement & element) const;
template <bool grant_option>
bool isGrantedImpl(const AccessRightsElements & elements) const;
template <bool throw_if_denied, bool grant_option>
bool checkAccessImpl(const AccessRightsElements & elements) const;
template <bool grant_option, typename... Args>
bool isGrantedImpl2(const AccessFlags & flags, const Args &... args) const;
template <bool throw_if_denied, bool grant_option, typename... Args>
bool checkAccessImpl2(const AccessFlags & flags, const Args &... args) const;
template <bool grant_option>
void checkAccessImpl(const AccessFlags & flags) const;
template <bool throw_if_denied>
bool checkAdminOptionImpl(const UUID & role_id) const;
template <bool grant_option, typename... Args>
void checkAccessImpl(const AccessFlags & flags, const std::string_view & database, const Args &... args) const;
template <bool throw_if_denied>
bool checkAdminOptionImpl(const UUID & role_id, const String & role_name) const;
template <bool grant_option>
void checkAccessImpl(const AccessRightsElement & element) const;
template <bool throw_if_denied>
bool checkAdminOptionImpl(const UUID & role_id, const std::unordered_map<UUID, String> & names_of_roles) const;
template <bool grant_option>
void checkAccessImpl(const AccessRightsElements & elements) const;
template <bool throw_if_denied>
bool checkAdminOptionImpl(const std::vector<UUID> & role_ids) const;
template <bool grant_option, typename... Args>
void checkAccessImpl2(const AccessFlags & flags, const Args &... args) const;
template <bool throw_if_denied>
bool checkAdminOptionImpl(const std::vector<UUID> & role_ids, const Strings & names_of_roles) const;
template <typename Container, typename GetNameFunction>
bool checkAdminOptionImpl(bool throw_on_error, const Container & role_ids, const GetNameFunction & get_name_function) const;
template <bool throw_if_denied>
bool checkAdminOptionImpl(const std::vector<UUID> & role_ids, const std::unordered_map<UUID, String> & names_of_roles) const;
template <bool throw_if_denied, typename Container, typename GetNameFunction>
bool checkAdminOptionImpl2(const Container & role_ids, const GetNameFunction & get_name_function) const;
const AccessControlManager * manager = nullptr;
const Params params;
bool is_full_access = false;
mutable Poco::Logger * trace_log = nullptr;
mutable UserPtr user;
mutable String user_name;
@ -209,13 +214,10 @@ private:
mutable ext::scope_guard subscription_for_roles_changes;
mutable std::shared_ptr<const EnabledRolesInfo> roles_info;
mutable std::shared_ptr<const AccessRights> access;
mutable std::shared_ptr<const AccessRights> access_with_implicit;
mutable std::shared_ptr<const EnabledRowPolicies> enabled_row_policies;
mutable std::shared_ptr<const EnabledQuota> enabled_quota;
mutable std::shared_ptr<const EnabledSettings> enabled_settings;
mutable std::shared_ptr<const AccessRights> access_without_readonly;
mutable std::shared_ptr<const AccessRights> access_with_allow_ddl;
mutable std::shared_ptr<const AccessRights> access_with_allow_introspection;
mutable std::shared_ptr<const AccessRights> access_from_user_and_roles;
mutable std::mutex mutex;
};

View File

@ -33,7 +33,7 @@ struct AvgFraction
/// Allow division by zero as sometimes we need to return NaN.
/// Invoked only is either Numerator or Denominator are Decimal.
Float64 NO_SANITIZE_UNDEFINED divideIfAnyDecimal(UInt32 num_scale, UInt32 denom_scale) const
Float64 NO_SANITIZE_UNDEFINED divideIfAnyDecimal(UInt32 num_scale, UInt32 denom_scale [[maybe_unused]]) const
{
if constexpr (IsDecimalNumber<Numerator> && IsDecimalNumber<Denominator>)
{

View File

@ -65,6 +65,7 @@ class IColumn;
M(UInt64, distributed_connections_pool_size, DBMS_DEFAULT_DISTRIBUTED_CONNECTIONS_POOL_SIZE, "Maximum number of connections with one remote server in the pool.", 0) \
M(UInt64, connections_with_failover_max_tries, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES, "The maximum number of attempts to connect to replicas.", 0) \
M(UInt64, s3_min_upload_part_size, 512*1024*1024, "The minimum size of part to upload during multipart upload to S3.", 0) \
M(UInt64, s3_max_single_part_upload_size, 64*1024*1024, "The maximum size of object to upload using singlepart upload to S3.", 0) \
M(UInt64, s3_max_redirects, 10, "Max number of S3 redirects hops allowed.", 0) \
M(Bool, extremes, false, "Calculate minimums and maximums of the result columns. They can be output in JSON-formats.", IMPORTANT) \
M(Bool, use_uncompressed_cache, true, "Whether to use the cache of uncompressed blocks.", 0) \
@ -407,8 +408,6 @@ class IColumn;
\
M(UInt64, max_memory_usage_for_all_queries, 0, "Obsolete. Will be removed after 2020-10-20", 0) \
M(UInt64, multiple_joins_rewriter_version, 0, "Obsolete setting, does nothing. Will be removed after 2021-03-31", 0) \
M(Bool, experimental_use_processors, true, "Obsolete setting, does nothing. Will be removed after 2020-11-29.", 0) \
M(Bool, force_optimize_skip_unused_shards_no_nested, false, "Obsolete setting, does nothing. Will be removed after 2020-12-01. Use force_optimize_skip_unused_shards_nesting instead.", 0) \
M(Bool, enable_debug_queries, false, "Enabled debug queries, but now is obsolete", 0) \
M(Bool, allow_experimental_database_atomic, true, "Obsolete setting, does nothing. Will be removed after 2021-02-12", 0) \
M(UnionMode, union_default_mode, UnionMode::DISTINCT, "Set default Union Mode in SelectWithUnion query. Possible values: empty string, 'ALL', 'DISTINCT'. If empty, query without Union Mode will throw exception.", 0)

View File

@ -35,8 +35,8 @@ public:
};
DatabaseAtomic::DatabaseAtomic(String name_, String metadata_path_, UUID uuid, Context & context_)
: DatabaseOrdinary(name_, std::move(metadata_path_), "store/", "DatabaseAtomic (" + name_ + ")", context_)
DatabaseAtomic::DatabaseAtomic(String name_, String metadata_path_, UUID uuid, const String & logger_name, const Context & context_)
: DatabaseOrdinary(name_, std::move(metadata_path_), "store/", logger_name, context_)
, path_to_table_symlinks(global_context.getPath() + "data/" + escapeForFileName(name_) + "/")
, path_to_metadata_symlink(global_context.getPath() + "metadata/" + escapeForFileName(name_))
, db_uuid(uuid)
@ -46,6 +46,11 @@ DatabaseAtomic::DatabaseAtomic(String name_, String metadata_path_, UUID uuid, C
tryCreateMetadataSymlink();
}
DatabaseAtomic::DatabaseAtomic(String name_, String metadata_path_, UUID uuid, const Context & context_)
: DatabaseAtomic(name_, std::move(metadata_path_), uuid, "DatabaseAtomic (" + name_ + ")", context_)
{
}
String DatabaseAtomic::getTableDataPath(const String & table_name) const
{
std::lock_guard lock(mutex);

View File

@ -20,8 +20,8 @@ namespace DB
class DatabaseAtomic : public DatabaseOrdinary
{
public:
DatabaseAtomic(String name_, String metadata_path_, UUID uuid, Context & context_);
DatabaseAtomic(String name_, String metadata_path_, UUID uuid, const String & logger_name, const Context & context_);
DatabaseAtomic(String name_, String metadata_path_, UUID uuid, const Context & context_);
String getEngineName() const override { return "Atomic"; }
UUID getUUID() const override { return db_uuid; }
@ -51,14 +51,14 @@ public:
void loadStoredObjects(Context & context, bool has_force_restore_data_flag, bool force_attach) override;
/// Atomic database cannot be detached if there is detached table which still in use
void assertCanBeDetached(bool cleanup);
void assertCanBeDetached(bool cleanup) override;
UUID tryGetTableUUID(const String & table_name) const override;
void tryCreateSymlink(const String & table_name, const String & actual_data_path, bool if_data_path_exist = false);
void tryRemoveSymlink(const String & table_name);
void waitDetachedTableNotInUse(const UUID & uuid);
void waitDetachedTableNotInUse(const UUID & uuid) override;
private:
void commitAlterTable(const StorageID & table_id, const String & table_metadata_tmp_path, const String & table_metadata_path) override;

View File

@ -120,27 +120,32 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String
const auto & [remote_host_name, remote_port] = parseAddress(host_name_and_port, 3306);
auto mysql_pool = mysqlxx::Pool(mysql_database_name, remote_host_name, mysql_user_name, mysql_user_password, remote_port);
if (engine_name == "MaterializeMySQL")
if (engine_name == "MySQL")
{
MySQLClient client(remote_host_name, remote_port, mysql_user_name, mysql_user_password);
auto mysql_database_settings = std::make_unique<ConnectionMySQLSettings>();
auto materialize_mode_settings = std::make_unique<MaterializeMySQLSettings>();
mysql_database_settings->loadFromQueryContext(context);
mysql_database_settings->loadFromQuery(*engine_define); /// higher priority
if (engine_define->settings)
materialize_mode_settings->loadFromQuery(*engine_define);
return std::make_shared<DatabaseMaterializeMySQL>(
context, database_name, metadata_path, engine_define, mysql_database_name, std::move(mysql_pool), std::move(client)
, std::move(materialize_mode_settings));
return std::make_shared<DatabaseConnectionMySQL>(
context, database_name, metadata_path, engine_define, mysql_database_name, std::move(mysql_database_settings), std::move(mysql_pool));
}
auto mysql_database_settings = std::make_unique<ConnectionMySQLSettings>();
MySQLClient client(remote_host_name, remote_port, mysql_user_name, mysql_user_password);
mysql_database_settings->loadFromQueryContext(context);
mysql_database_settings->loadFromQuery(*engine_define); /// higher priority
auto materialize_mode_settings = std::make_unique<MaterializeMySQLSettings>();
return std::make_shared<DatabaseConnectionMySQL>(
context, database_name, metadata_path, engine_define, mysql_database_name, std::move(mysql_database_settings), std::move(mysql_pool));
if (engine_define->settings)
materialize_mode_settings->loadFromQuery(*engine_define);
if (create.uuid == UUIDHelpers::Nil)
return std::make_shared<DatabaseMaterializeMySQL<DatabaseOrdinary>>(
context, database_name, metadata_path, uuid, mysql_database_name, std::move(mysql_pool), std::move(client)
, std::move(materialize_mode_settings));
else
return std::make_shared<DatabaseMaterializeMySQL<DatabaseAtomic>>(
context, database_name, metadata_path, uuid, mysql_database_name, std::move(mysql_pool), std::move(client)
, std::move(materialize_mode_settings));
}
catch (...)
{

View File

@ -400,7 +400,7 @@ void DatabaseOnDisk::iterateMetadataFiles(const Context & context, const Iterati
{
auto process_tmp_drop_metadata_file = [&](const String & file_name)
{
assert(getEngineName() != "Atomic");
assert(getUUID() == UUIDHelpers::Nil);
static const char * tmp_drop_ext = ".sql.tmp_drop";
const std::string object_name = file_name.substr(0, file_name.size() - strlen(tmp_drop_ext));
if (Poco::File(context.getPath() + getDataPath() + '/' + object_name).exists())

View File

@ -80,7 +80,7 @@ StoragePtr DatabaseWithOwnTablesBase::detachTableUnlocked(const String & table_n
auto table_id = res->getStorageID();
if (table_id.hasUUID())
{
assert(database_name == DatabaseCatalog::TEMPORARY_DATABASE || getEngineName() == "Atomic");
assert(database_name == DatabaseCatalog::TEMPORARY_DATABASE || getUUID() != UUIDHelpers::Nil);
DatabaseCatalog::instance().removeUUIDMapping(table_id.uuid);
}
@ -102,7 +102,7 @@ void DatabaseWithOwnTablesBase::attachTableUnlocked(const String & table_name, c
if (table_id.hasUUID())
{
assert(database_name == DatabaseCatalog::TEMPORARY_DATABASE || getEngineName() == "Atomic");
assert(database_name == DatabaseCatalog::TEMPORARY_DATABASE || getUUID() != UUIDHelpers::Nil);
DatabaseCatalog::instance().addUUIDMapping(table_id.uuid, shared_from_this(), table);
}
@ -131,7 +131,7 @@ void DatabaseWithOwnTablesBase::shutdown()
kv.second->shutdown();
if (table_id.hasUUID())
{
assert(getDatabaseName() == DatabaseCatalog::TEMPORARY_DATABASE || getEngineName() == "Atomic");
assert(getDatabaseName() == DatabaseCatalog::TEMPORARY_DATABASE || getUUID() != UUIDHelpers::Nil);
DatabaseCatalog::instance().removeUUIDMapping(table_id.uuid);
}
}

View File

@ -334,6 +334,10 @@ public:
/// All tables and dictionaries should be detached before detaching the database.
virtual bool shouldBeEmptyOnDetach() const { return true; }
virtual void assertCanBeDetached(bool /*cleanup*/) {}
virtual void waitDetachedTableNotInUse(const UUID & /*uuid*/) { assert(false); }
/// Ask all tables to complete the background threads they are using and delete all table objects.
virtual void shutdown() = 0;

View File

@ -8,6 +8,7 @@
# include <Interpreters/Context.h>
# include <Databases/DatabaseOrdinary.h>
# include <Databases/DatabaseAtomic.h>
# include <Databases/MySQL/DatabaseMaterializeTablesIterator.h>
# include <Databases/MySQL/MaterializeMySQLSyncThread.h>
# include <Parsers/ASTCreateQuery.h>
@ -22,21 +23,37 @@ namespace DB
namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
extern const int LOGICAL_ERROR;
}
DatabaseMaterializeMySQL::DatabaseMaterializeMySQL(
const Context & context, const String & database_name_, const String & metadata_path_, const IAST * database_engine_define_
, const String & mysql_database_name_, mysqlxx::Pool && pool_, MySQLClient && client_, std::unique_ptr<MaterializeMySQLSettings> settings_)
: IDatabase(database_name_), global_context(context.getGlobalContext()), engine_define(database_engine_define_->clone())
, nested_database(std::make_shared<DatabaseOrdinary>(database_name_, metadata_path_, context))
, settings(std::move(settings_)), log(&Poco::Logger::get("DatabaseMaterializeMySQL"))
template<>
DatabaseMaterializeMySQL<DatabaseOrdinary>::DatabaseMaterializeMySQL(
const Context & context, const String & database_name_, const String & metadata_path_, UUID /*uuid*/,
const String & mysql_database_name_, mysqlxx::Pool && pool_, MySQLClient && client_, std::unique_ptr<MaterializeMySQLSettings> settings_)
: DatabaseOrdinary(database_name_
, metadata_path_
, "data/" + escapeForFileName(database_name_) + "/"
, "DatabaseMaterializeMySQL<Ordinary> (" + database_name_ + ")", context
)
, settings(std::move(settings_))
, materialize_thread(context, database_name_, mysql_database_name_, std::move(pool_), std::move(client_), settings.get())
{
}
void DatabaseMaterializeMySQL::rethrowExceptionIfNeed() const
template<>
DatabaseMaterializeMySQL<DatabaseAtomic>::DatabaseMaterializeMySQL(
const Context & context, const String & database_name_, const String & metadata_path_, UUID uuid,
const String & mysql_database_name_, mysqlxx::Pool && pool_, MySQLClient && client_, std::unique_ptr<MaterializeMySQLSettings> settings_)
: DatabaseAtomic(database_name_, metadata_path_, uuid, "DatabaseMaterializeMySQL<Atomic> (" + database_name_ + ")", context)
, settings(std::move(settings_))
, materialize_thread(context, database_name_, mysql_database_name_, std::move(pool_), std::move(client_), settings.get())
{
std::unique_lock<std::mutex> lock(mutex);
}
template<typename Base>
void DatabaseMaterializeMySQL<Base>::rethrowExceptionIfNeed() const
{
std::unique_lock<std::mutex> lock(Base::mutex);
if (!settings->allows_query_when_mysql_lost && exception)
{
@ -46,129 +63,71 @@ void DatabaseMaterializeMySQL::rethrowExceptionIfNeed() const
}
catch (Exception & ex)
{
/// This method can be called from multiple threads
/// and Exception can be modified concurrently by calling addMessage(...),
/// so we rethrow a copy.
throw Exception(ex);
}
}
}
void DatabaseMaterializeMySQL::setException(const std::exception_ptr & exception_)
template<typename Base>
void DatabaseMaterializeMySQL<Base>::setException(const std::exception_ptr & exception_)
{
std::unique_lock<std::mutex> lock(mutex);
std::unique_lock<std::mutex> lock(Base::mutex);
exception = exception_;
}
ASTPtr DatabaseMaterializeMySQL::getCreateDatabaseQuery() const
{
const auto & create_query = std::make_shared<ASTCreateQuery>();
create_query->database = database_name;
create_query->set(create_query->storage, engine_define);
return create_query;
}
void DatabaseMaterializeMySQL::loadStoredObjects(Context & context, bool has_force_restore_data_flag, bool force_attach)
template<typename Base>
void DatabaseMaterializeMySQL<Base>::loadStoredObjects(Context & context, bool has_force_restore_data_flag, bool force_attach)
{
Base::loadStoredObjects(context, has_force_restore_data_flag, force_attach);
try
{
std::unique_lock<std::mutex> lock(mutex);
nested_database->loadStoredObjects(context, has_force_restore_data_flag, force_attach);
materialize_thread.startSynchronization();
started_up = true;
}
catch (...)
{
tryLogCurrentException(log, "Cannot load MySQL nested database stored objects.");
tryLogCurrentException(Base::log, "Cannot load MySQL nested database stored objects.");
if (!force_attach)
throw;
}
}
void DatabaseMaterializeMySQL::shutdown()
template<typename Base>
void DatabaseMaterializeMySQL<Base>::createTable(const Context & context, const String & name, const StoragePtr & table, const ASTPtr & query)
{
materialize_thread.stopSynchronization();
auto iterator = nested_database->getTablesIterator(global_context, {});
/// We only shutdown the table, The tables is cleaned up when destructed database
for (; iterator->isValid(); iterator->next())
iterator->table()->shutdown();
assertCalledFromSyncThreadOrDrop("create table");
Base::createTable(context, name, table, query);
}
bool DatabaseMaterializeMySQL::empty() const
template<typename Base>
void DatabaseMaterializeMySQL<Base>::dropTable(const Context & context, const String & name, bool no_delay)
{
return nested_database->empty();
assertCalledFromSyncThreadOrDrop("drop table");
Base::dropTable(context, name, no_delay);
}
String DatabaseMaterializeMySQL::getDataPath() const
template<typename Base>
void DatabaseMaterializeMySQL<Base>::attachTable(const String & name, const StoragePtr & table, const String & relative_table_path)
{
return nested_database->getDataPath();
assertCalledFromSyncThreadOrDrop("attach table");
Base::attachTable(name, table, relative_table_path);
}
String DatabaseMaterializeMySQL::getMetadataPath() const
template<typename Base>
StoragePtr DatabaseMaterializeMySQL<Base>::detachTable(const String & name)
{
return nested_database->getMetadataPath();
assertCalledFromSyncThreadOrDrop("detach table");
return Base::detachTable(name);
}
String DatabaseMaterializeMySQL::getTableDataPath(const String & table_name) const
template<typename Base>
void DatabaseMaterializeMySQL<Base>::renameTable(const Context & context, const String & name, IDatabase & to_database, const String & to_name, bool exchange, bool dictionary)
{
return nested_database->getTableDataPath(table_name);
}
String DatabaseMaterializeMySQL::getTableDataPath(const ASTCreateQuery & query) const
{
return nested_database->getTableDataPath(query);
}
String DatabaseMaterializeMySQL::getObjectMetadataPath(const String & table_name) const
{
return nested_database->getObjectMetadataPath(table_name);
}
UUID DatabaseMaterializeMySQL::tryGetTableUUID(const String & table_name) const
{
return nested_database->tryGetTableUUID(table_name);
}
time_t DatabaseMaterializeMySQL::getObjectMetadataModificationTime(const String & name) const
{
return nested_database->getObjectMetadataModificationTime(name);
}
void DatabaseMaterializeMySQL::createTable(const Context & context, const String & name, const StoragePtr & table, const ASTPtr & query)
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
throw Exception("MaterializeMySQL database not support create table.", ErrorCodes::NOT_IMPLEMENTED);
nested_database->createTable(context, name, table, query);
}
void DatabaseMaterializeMySQL::dropTable(const Context & context, const String & name, bool no_delay)
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
throw Exception("MaterializeMySQL database not support drop table.", ErrorCodes::NOT_IMPLEMENTED);
nested_database->dropTable(context, name, no_delay);
}
void DatabaseMaterializeMySQL::attachTable(const String & name, const StoragePtr & table, const String & relative_table_path)
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
throw Exception("MaterializeMySQL database not support attach table.", ErrorCodes::NOT_IMPLEMENTED);
nested_database->attachTable(name, table, relative_table_path);
}
StoragePtr DatabaseMaterializeMySQL::detachTable(const String & name)
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
throw Exception("MaterializeMySQL database not support detach table.", ErrorCodes::NOT_IMPLEMENTED);
return nested_database->detachTable(name);
}
void DatabaseMaterializeMySQL::renameTable(const Context & context, const String & name, IDatabase & to_database, const String & to_name, bool exchange, bool dictionary)
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
throw Exception("MaterializeMySQL database not support rename table.", ErrorCodes::NOT_IMPLEMENTED);
assertCalledFromSyncThreadOrDrop("rename table");
if (exchange)
throw Exception("MaterializeMySQL database not support exchange table.", ErrorCodes::NOT_IMPLEMENTED);
@ -176,57 +135,37 @@ void DatabaseMaterializeMySQL::renameTable(const Context & context, const String
if (dictionary)
throw Exception("MaterializeMySQL database not support rename dictionary.", ErrorCodes::NOT_IMPLEMENTED);
if (to_database.getDatabaseName() != getDatabaseName())
if (to_database.getDatabaseName() != Base::getDatabaseName())
throw Exception("Cannot rename with other database for MaterializeMySQL database.", ErrorCodes::NOT_IMPLEMENTED);
nested_database->renameTable(context, name, *nested_database, to_name, exchange, dictionary);
Base::renameTable(context, name, *this, to_name, exchange, dictionary);
}
void DatabaseMaterializeMySQL::alterTable(const Context & context, const StorageID & table_id, const StorageInMemoryMetadata & metadata)
template<typename Base>
void DatabaseMaterializeMySQL<Base>::alterTable(const Context & context, const StorageID & table_id, const StorageInMemoryMetadata & metadata)
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
throw Exception("MaterializeMySQL database not support alter table.", ErrorCodes::NOT_IMPLEMENTED);
nested_database->alterTable(context, table_id, metadata);
assertCalledFromSyncThreadOrDrop("alter table");
Base::alterTable(context, table_id, metadata);
}
bool DatabaseMaterializeMySQL::shouldBeEmptyOnDetach() const
template<typename Base>
void DatabaseMaterializeMySQL<Base>::drop(const Context & context)
{
return false;
/// Remove metadata info
Poco::File metadata(Base::getMetadataPath() + "/.metadata");
if (metadata.exists())
metadata.remove(false);
Base::drop(context);
}
void DatabaseMaterializeMySQL::drop(const Context & context)
{
if (nested_database->shouldBeEmptyOnDetach())
{
for (auto iterator = nested_database->getTablesIterator(context, {}); iterator->isValid(); iterator->next())
{
TableExclusiveLockHolder table_lock = iterator->table()->lockExclusively(
context.getCurrentQueryId(), context.getSettingsRef().lock_acquire_timeout);
nested_database->dropTable(context, iterator->name(), true);
}
/// Remove metadata info
Poco::File metadata(getMetadataPath() + "/.metadata");
if (metadata.exists())
metadata.remove(false);
}
nested_database->drop(context);
}
bool DatabaseMaterializeMySQL::isTableExist(const String & name, const Context & context) const
{
return nested_database->isTableExist(name, context);
}
StoragePtr DatabaseMaterializeMySQL::tryGetTable(const String & name, const Context & context) const
template<typename Base>
StoragePtr DatabaseMaterializeMySQL<Base>::tryGetTable(const String & name, const Context & context) const
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
{
StoragePtr nested_storage = nested_database->tryGetTable(name, context);
StoragePtr nested_storage = Base::tryGetTable(name, context);
if (!nested_storage)
return {};
@ -234,20 +173,71 @@ StoragePtr DatabaseMaterializeMySQL::tryGetTable(const String & name, const Cont
return std::make_shared<StorageMaterializeMySQL>(std::move(nested_storage), this);
}
return nested_database->tryGetTable(name, context);
return Base::tryGetTable(name, context);
}
DatabaseTablesIteratorPtr DatabaseMaterializeMySQL::getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name)
template<typename Base>
DatabaseTablesIteratorPtr DatabaseMaterializeMySQL<Base>::getTablesIterator(const Context & context, const DatabaseOnDisk::FilterByNameFunction & filter_by_table_name)
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
{
DatabaseTablesIteratorPtr iterator = nested_database->getTablesIterator(context, filter_by_table_name);
DatabaseTablesIteratorPtr iterator = Base::getTablesIterator(context, filter_by_table_name);
return std::make_unique<DatabaseMaterializeTablesIterator>(std::move(iterator), this);
}
return nested_database->getTablesIterator(context, filter_by_table_name);
return Base::getTablesIterator(context, filter_by_table_name);
}
template<typename Base>
void DatabaseMaterializeMySQL<Base>::assertCalledFromSyncThreadOrDrop(const char * method) const
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread() && started_up)
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "MaterializeMySQL database not support {}", method);
}
template<typename Base>
void DatabaseMaterializeMySQL<Base>::shutdownSynchronizationThread()
{
materialize_thread.stopSynchronization();
started_up = false;
}
template<typename Database, template<class> class Helper, typename... Args>
auto castToMaterializeMySQLAndCallHelper(Database * database, Args && ... args)
{
using Ordinary = DatabaseMaterializeMySQL<DatabaseOrdinary>;
using Atomic = DatabaseMaterializeMySQL<DatabaseAtomic>;
using ToOrdinary = typename std::conditional_t<std::is_const_v<Database>, const Ordinary *, Ordinary *>;
using ToAtomic = typename std::conditional_t<std::is_const_v<Database>, const Atomic *, Atomic *>;
if (auto * database_materialize = typeid_cast<ToOrdinary>(database))
return (database_materialize->*Helper<Ordinary>::v)(std::forward<Args>(args)...);
if (auto * database_materialize = typeid_cast<ToAtomic>(database))
return (database_materialize->*Helper<Atomic>::v)(std::forward<Args>(args)...);
throw Exception("LOGICAL_ERROR: cannot cast to DatabaseMaterializeMySQL, it is a bug.", ErrorCodes::LOGICAL_ERROR);
}
template<typename T> struct HelperSetException { static constexpr auto v = &T::setException; };
void setSynchronizationThreadException(const DatabasePtr & materialize_mysql_db, const std::exception_ptr & exception)
{
castToMaterializeMySQLAndCallHelper<IDatabase, HelperSetException>(materialize_mysql_db.get(), exception);
}
template<typename T> struct HelperStopSync { static constexpr auto v = &T::shutdownSynchronizationThread; };
void stopDatabaseSynchronization(const DatabasePtr & materialize_mysql_db)
{
castToMaterializeMySQLAndCallHelper<IDatabase, HelperStopSync>(materialize_mysql_db.get());
}
template<typename T> struct HelperRethrow { static constexpr auto v = &T::rethrowExceptionIfNeed; };
void rethrowSyncExceptionIfNeed(const IDatabase * materialize_mysql_db)
{
castToMaterializeMySQLAndCallHelper<const IDatabase, HelperRethrow>(materialize_mysql_db);
}
template class DatabaseMaterializeMySQL<DatabaseOrdinary>;
template class DatabaseMaterializeMySQL<DatabaseAtomic>;
}
#endif

View File

@ -17,48 +17,34 @@ namespace DB
*
* All table structure and data will be written to the local file system
*/
class DatabaseMaterializeMySQL : public IDatabase
template<typename Base>
class DatabaseMaterializeMySQL : public Base
{
public:
DatabaseMaterializeMySQL(
const Context & context, const String & database_name_, const String & metadata_path_,
const IAST * database_engine_define_, const String & mysql_database_name_, mysqlxx::Pool && pool_,
const Context & context, const String & database_name_, const String & metadata_path_, UUID uuid,
const String & mysql_database_name_, mysqlxx::Pool && pool_,
MySQLClient && client_, std::unique_ptr<MaterializeMySQLSettings> settings_);
void rethrowExceptionIfNeed() const;
void setException(const std::exception_ptr & exception);
protected:
const Context & global_context;
ASTPtr engine_define;
DatabasePtr nested_database;
std::unique_ptr<MaterializeMySQLSettings> settings;
Poco::Logger * log;
MaterializeMySQLSyncThread materialize_thread;
std::exception_ptr exception;
std::atomic_bool started_up{false};
public:
String getEngineName() const override { return "MaterializeMySQL"; }
ASTPtr getCreateDatabaseQuery() const override;
void loadStoredObjects(Context & context, bool has_force_restore_data_flag, bool force_attach) override;
void shutdown() override;
bool empty() const override;
String getDataPath() const override;
String getTableDataPath(const String & table_name) const override;
String getTableDataPath(const ASTCreateQuery & query) const override;
UUID tryGetTableUUID(const String & table_name) const override;
void createTable(const Context & context, const String & name, const StoragePtr & table, const ASTPtr & query) override;
void dropTable(const Context & context, const String & name, bool no_delay) override;
@ -71,23 +57,22 @@ public:
void alterTable(const Context & context, const StorageID & table_id, const StorageInMemoryMetadata & metadata) override;
time_t getObjectMetadataModificationTime(const String & name) const override;
String getMetadataPath() const override;
String getObjectMetadataPath(const String & table_name) const override;
bool shouldBeEmptyOnDetach() const override;
void drop(const Context & context) override;
bool isTableExist(const String & name, const Context & context) const override;
StoragePtr tryGetTable(const String & name, const Context & context) const override;
DatabaseTablesIteratorPtr getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name) override;
DatabaseTablesIteratorPtr getTablesIterator(const Context & context, const DatabaseOnDisk::FilterByNameFunction & filter_by_table_name) override;
void assertCalledFromSyncThreadOrDrop(const char * method) const;
void shutdownSynchronizationThread();
};
void setSynchronizationThreadException(const DatabasePtr & materialize_mysql_db, const std::exception_ptr & exception);
void stopDatabaseSynchronization(const DatabasePtr & materialize_mysql_db);
void rethrowSyncExceptionIfNeed(const IDatabase * materialize_mysql_db);
}
#endif

View File

@ -2,7 +2,6 @@
#include <Databases/IDatabase.h>
#include <Storages/StorageMaterializeMySQL.h>
#include <Databases/MySQL/DatabaseMaterializeMySQL.h>
namespace DB
{
@ -30,7 +29,7 @@ public:
UUID uuid() const override { return nested_iterator->uuid(); }
DatabaseMaterializeTablesIterator(DatabaseTablesIteratorPtr nested_iterator_, DatabaseMaterializeMySQL * database_)
DatabaseMaterializeTablesIterator(DatabaseTablesIteratorPtr nested_iterator_, const IDatabase * database_)
: nested_iterator(std::move(nested_iterator_)), database(database_)
{
}
@ -38,8 +37,7 @@ public:
private:
mutable std::vector<StoragePtr> tables;
DatabaseTablesIteratorPtr nested_iterator;
DatabaseMaterializeMySQL * database;
const IDatabase * database;
};
}

View File

@ -71,15 +71,6 @@ static BlockIO tryToExecuteQuery(const String & query_to_execute, Context & quer
}
}
static inline DatabaseMaterializeMySQL & getDatabase(const String & database_name)
{
DatabasePtr database = DatabaseCatalog::instance().getDatabase(database_name);
if (DatabaseMaterializeMySQL * database_materialize = typeid_cast<DatabaseMaterializeMySQL *>(database.get()))
return *database_materialize;
throw Exception("LOGICAL_ERROR: cannot cast to DatabaseMaterializeMySQL, it is a bug.", ErrorCodes::LOGICAL_ERROR);
}
MaterializeMySQLSyncThread::~MaterializeMySQLSyncThread()
{
@ -190,7 +181,8 @@ void MaterializeMySQLSyncThread::synchronization()
{
client.disconnect();
tryLogCurrentException(log);
getDatabase(database_name).setException(std::current_exception());
auto db = DatabaseCatalog::instance().getDatabase(database_name);
setSynchronizationThreadException(db, std::current_exception());
}
}
@ -343,7 +335,7 @@ std::optional<MaterializeMetadata> MaterializeMySQLSyncThread::prepareSynchroniz
opened_transaction = false;
MaterializeMetadata metadata(
connection, getDatabase(database_name).getMetadataPath() + "/.metadata", mysql_database_name, opened_transaction);
connection, DatabaseCatalog::instance().getDatabase(database_name)->getMetadataPath() + "/.metadata", mysql_database_name, opened_transaction);
if (!metadata.need_dumping_tables.empty())
{

View File

@ -348,11 +348,11 @@ public:
DiskS3::Metadata metadata_,
const String & s3_path_,
std::optional<DiskS3::ObjectMetadata> object_metadata_,
bool is_multipart,
size_t min_upload_part_size,
size_t max_single_part_upload_size,
size_t buf_size_)
: WriteBufferFromFileBase(buf_size_, nullptr, 0)
, impl(WriteBufferFromS3(client_ptr_, bucket_, metadata_.s3_root_path + s3_path_, min_upload_part_size, is_multipart,std::move(object_metadata_), buf_size_))
, impl(WriteBufferFromS3(client_ptr_, bucket_, metadata_.s3_root_path + s3_path_, min_upload_part_size, max_single_part_upload_size,std::move(object_metadata_), buf_size_))
, metadata(std::move(metadata_))
, s3_path(s3_path_)
{
@ -542,7 +542,7 @@ DiskS3::DiskS3(
String s3_root_path_,
String metadata_path_,
size_t min_upload_part_size_,
size_t min_multi_part_upload_size_,
size_t max_single_part_upload_size_,
size_t min_bytes_for_seek_,
bool send_metadata_)
: IDisk(std::make_unique<AsyncExecutor>())
@ -553,7 +553,7 @@ DiskS3::DiskS3(
, s3_root_path(std::move(s3_root_path_))
, metadata_path(std::move(metadata_path_))
, min_upload_part_size(min_upload_part_size_)
, min_multi_part_upload_size(min_multi_part_upload_size_)
, max_single_part_upload_size(max_single_part_upload_size_)
, min_bytes_for_seek(min_bytes_for_seek_)
, send_metadata(send_metadata_)
{
@ -665,7 +665,7 @@ std::unique_ptr<ReadBufferFromFileBase> DiskS3::readFile(const String & path, si
return std::make_unique<SeekAvoidingReadBuffer>(std::move(reader), min_bytes_for_seek);
}
std::unique_ptr<WriteBufferFromFileBase> DiskS3::writeFile(const String & path, size_t buf_size, WriteMode mode, size_t estimated_size, size_t)
std::unique_ptr<WriteBufferFromFileBase> DiskS3::writeFile(const String & path, size_t buf_size, WriteMode mode, size_t, size_t)
{
bool exist = exists(path);
if (exist && readMeta(path).read_only)
@ -674,7 +674,6 @@ std::unique_ptr<WriteBufferFromFileBase> DiskS3::writeFile(const String & path,
/// Path to store new S3 object.
auto s3_path = getRandomName();
auto object_metadata = createObjectMetadata(path);
bool is_multipart = estimated_size >= min_multi_part_upload_size;
if (!exist || mode == WriteMode::Rewrite)
{
/// If metadata file exists - remove and create new.
@ -687,7 +686,8 @@ std::unique_ptr<WriteBufferFromFileBase> DiskS3::writeFile(const String & path,
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Write to file by path: {}. New S3 path: {}", backQuote(metadata_path + path), s3_root_path + s3_path);
return std::make_unique<WriteIndirectBufferFromS3>(client, bucket, metadata, s3_path, object_metadata, is_multipart, min_upload_part_size, buf_size);
return std::make_unique<WriteIndirectBufferFromS3>(
client, bucket, metadata, s3_path, object_metadata, min_upload_part_size, max_single_part_upload_size, buf_size);
}
else
{
@ -696,7 +696,8 @@ std::unique_ptr<WriteBufferFromFileBase> DiskS3::writeFile(const String & path,
LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Append to file by path: {}. New S3 path: {}. Existing S3 objects: {}.",
backQuote(metadata_path + path), s3_root_path + s3_path, metadata.s3_objects.size());
return std::make_unique<WriteIndirectBufferFromS3>(client, bucket, metadata, s3_path, object_metadata, is_multipart, min_upload_part_size, buf_size);
return std::make_unique<WriteIndirectBufferFromS3>(
client, bucket, metadata, s3_path, object_metadata, min_upload_part_size, max_single_part_upload_size, buf_size);
}
}

View File

@ -34,7 +34,7 @@ public:
String s3_root_path_,
String metadata_path_,
size_t min_upload_part_size_,
size_t min_multi_part_upload_size_,
size_t max_single_part_upload_size_,
size_t min_bytes_for_seek_,
bool send_metadata_);
@ -133,7 +133,7 @@ private:
const String s3_root_path;
const String metadata_path;
size_t min_upload_part_size;
size_t min_multi_part_upload_size;
size_t max_single_part_upload_size;
size_t min_bytes_for_seek;
bool send_metadata;

View File

@ -148,7 +148,7 @@ void registerDiskS3(DiskFactory & factory)
uri.key,
metadata_path,
context.getSettingsRef().s3_min_upload_part_size,
config.getUInt64(config_prefix + ".min_multi_part_upload_size", 10 * 1024 * 1024),
context.getSettingsRef().s3_max_single_part_upload_size,
config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024),
config.getBool(config_prefix + ".send_object_metadata", false));

View File

@ -0,0 +1,311 @@
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypesDecimal.h>
#include <DataTypes/DataTypeDateTime64.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnDecimal.h>
#include "FunctionArrayMapped.h"
#include <Functions/FunctionFactory.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
}
enum class AggregateOperation
{
min,
max,
sum,
average
};
/**
* During array aggregation we derive result type from operation.
* For array min or array max we use array element as result type.
* For array average we use Float64.
* For array sum for decimal numbers we use Decimal128, for floating point numbers Float64, for numeric unsigned Int64,
* and for numeric signed UInt64.
*/
template <typename ArrayElement, AggregateOperation operation>
struct ArrayAggregateResultImpl;
template <typename ArrayElement>
struct ArrayAggregateResultImpl<ArrayElement, AggregateOperation::min>
{
using Result = ArrayElement;
};
template <typename ArrayElement>
struct ArrayAggregateResultImpl<ArrayElement, AggregateOperation::max>
{
using Result = ArrayElement;
};
template <typename ArrayElement>
struct ArrayAggregateResultImpl<ArrayElement, AggregateOperation::average>
{
using Result = Float64;
};
template <typename ArrayElement>
struct ArrayAggregateResultImpl<ArrayElement, AggregateOperation::sum>
{
using Result = std::conditional_t<
IsDecimalNumber<ArrayElement>,
Decimal128,
std::conditional_t<
std::is_floating_point_v<ArrayElement>,
Float64,
std::conditional_t<std::is_signed_v<ArrayElement>, Int64, UInt64>>>;
};
template <typename ArrayElement, AggregateOperation operation>
using ArrayAggregateResult = typename ArrayAggregateResultImpl<ArrayElement, operation>::Result;
template<AggregateOperation aggregate_operation>
struct ArrayAggregateImpl
{
static bool needBoolean() { return false; }
static bool needExpression() { return false; }
static bool needOneArray() { return false; }
static DataTypePtr getReturnType(const DataTypePtr & expression_return, const DataTypePtr & /*array_element*/)
{
DataTypePtr result;
auto call = [&](const auto & types)
{
using Types = std::decay_t<decltype(types)>;
using DataType = typename Types::LeftType;
if constexpr (aggregate_operation == AggregateOperation::average)
{
result = std::make_shared<DataTypeFloat64>();
return true;
}
else if constexpr (IsDataTypeNumber<DataType>)
{
using NumberReturnType = ArrayAggregateResult<typename DataType::FieldType, aggregate_operation>;
result = std::make_shared<DataTypeNumber<NumberReturnType>>();
return true;
}
else if constexpr (IsDataTypeDecimal<DataType> && !IsDataTypeDateOrDateTime<DataType>)
{
using DecimalReturnType = ArrayAggregateResult<typename DataType::FieldType, aggregate_operation>;
UInt32 scale = getDecimalScale(*expression_return);
result = std::make_shared<DataTypeDecimal<DecimalReturnType>>(DecimalUtils::maxPrecision<DecimalReturnType>(), scale);
return true;
}
return false;
};
if (!callOnIndexAndDataType<void>(expression_return->getTypeId(), call))
{
throw Exception(
"array aggregation function cannot be performed on type " + expression_return->getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
return result;
}
template <typename Element>
static bool executeType(const ColumnPtr & mapped, const ColumnArray::Offsets & offsets, ColumnPtr & res_ptr)
{
using Result = ArrayAggregateResult<Element, aggregate_operation>;
using ColVecType = std::conditional_t<IsDecimalNumber<Element>, ColumnDecimal<Element>, ColumnVector<Element>>;
using ColVecResult = std::conditional_t<IsDecimalNumber<Result>, ColumnDecimal<Result>, ColumnVector<Result>>;
/// For average on decimal array we return Float64 as result,
/// but to keep decimal presisision we convert to Float64 as last step of average computation
static constexpr bool use_decimal_for_average_aggregation
= aggregate_operation == AggregateOperation::average && IsDecimalNumber<Element>;
using AggregationType = std::conditional_t<use_decimal_for_average_aggregation, Decimal128, Result>;
const ColVecType * column = checkAndGetColumn<ColVecType>(&*mapped);
/// Constant case.
if (!column)
{
const ColumnConst * column_const = checkAndGetColumnConst<ColVecType>(&*mapped);
if (!column_const)
return false;
const AggregationType x = column_const->template getValue<Element>(); // NOLINT
const typename ColVecType::Container & data
= checkAndGetColumn<ColVecType>(&column_const->getDataColumn())->getData();
typename ColVecResult::MutablePtr res_column;
if constexpr (IsDecimalNumber<Element>)
{
res_column = ColVecResult::create(offsets.size(), data.getScale());
}
else
res_column = ColVecResult::create(offsets.size());
typename ColVecResult::Container & res = res_column->getData();
size_t pos = 0;
for (size_t i = 0; i < offsets.size(); ++i)
{
if constexpr (aggregate_operation == AggregateOperation::sum)
{
size_t array_size = offsets[i] - pos;
/// Just multiply the value by array size.
res[i] = x * array_size;
}
else if constexpr (aggregate_operation == AggregateOperation::min ||
aggregate_operation == AggregateOperation::max)
{
res[i] = x;
}
else if constexpr (aggregate_operation == AggregateOperation::average)
{
if constexpr (IsDecimalNumber<Element>)
{
res[i] = DecimalUtils::convertTo<Result>(x, data.getScale());
}
else
{
res[i] = x;
}
}
pos = offsets[i];
}
res_ptr = std::move(res_column);
return true;
}
const typename ColVecType::Container & data = column->getData();
typename ColVecResult::MutablePtr res_column;
if constexpr (IsDecimalNumber<Element>)
res_column = ColVecResult::create(offsets.size(), data.getScale());
else
res_column = ColVecResult::create(offsets.size());
typename ColVecResult::Container & res = res_column->getData();
size_t pos = 0;
for (size_t i = 0; i < offsets.size(); ++i)
{
AggregationType s = 0;
/// Array is empty
if (offsets[i] == pos)
{
res[i] = s;
continue;
}
size_t count = 1;
s = data[pos]; // NOLINT
++pos;
for (; pos < offsets[i]; ++pos)
{
auto element = data[pos];
if constexpr (aggregate_operation == AggregateOperation::sum ||
aggregate_operation == AggregateOperation::average)
{
s += element;
}
else if constexpr (aggregate_operation == AggregateOperation::min)
{
if (element < s)
{
s = element;
}
}
else if constexpr (aggregate_operation == AggregateOperation::max)
{
if (element > s)
{
s = element;
}
}
++count;
}
if constexpr (aggregate_operation == AggregateOperation::average)
{
s = s / count;
}
if constexpr (use_decimal_for_average_aggregation)
{
res[i] = DecimalUtils::convertTo<Result>(s, data.getScale());
}
else
{
res[i] = s;
}
}
res_ptr = std::move(res_column);
return true;
}
static ColumnPtr execute(const ColumnArray & array, ColumnPtr mapped)
{
const IColumn::Offsets & offsets = array.getOffsets();
ColumnPtr res;
if (executeType<UInt8>(mapped, offsets, res) ||
executeType<UInt16>(mapped, offsets, res) ||
executeType<UInt32>(mapped, offsets, res) ||
executeType<UInt64>(mapped, offsets, res) ||
executeType<Int8>(mapped, offsets, res) ||
executeType<Int16>(mapped, offsets, res) ||
executeType<Int32>(mapped, offsets, res) ||
executeType<Int64>(mapped, offsets, res) ||
executeType<Float32>(mapped, offsets, res) ||
executeType<Float64>(mapped, offsets, res) ||
executeType<Decimal32>(mapped, offsets, res) ||
executeType<Decimal64>(mapped, offsets, res) ||
executeType<Decimal128>(mapped, offsets, res))
return res;
else
throw Exception("Unexpected column for arraySum: " + mapped->getName(), ErrorCodes::ILLEGAL_COLUMN);
}
};
struct NameArrayMin { static constexpr auto name = "arrayMin"; };
using FunctionArrayMin = FunctionArrayMapped<ArrayAggregateImpl<AggregateOperation::min>, NameArrayMin>;
struct NameArrayMax { static constexpr auto name = "arrayMax"; };
using FunctionArrayMax = FunctionArrayMapped<ArrayAggregateImpl<AggregateOperation::max>, NameArrayMax>;
struct NameArraySum { static constexpr auto name = "arraySum"; };
using FunctionArraySum = FunctionArrayMapped<ArrayAggregateImpl<AggregateOperation::sum>, NameArraySum>;
struct NameArrayAverage { static constexpr auto name = "arrayAvg"; };
using FunctionArrayAverage = FunctionArrayMapped<ArrayAggregateImpl<AggregateOperation::average>, NameArrayAverage>;
void registerFunctionArrayAggregation(FunctionFactory & factory)
{
factory.registerFunction<FunctionArrayMin>();
factory.registerFunction<FunctionArrayMax>();
factory.registerFunction<FunctionArraySum>();
factory.registerFunction<FunctionArrayAverage>();
}
}

View File

@ -1,146 +0,0 @@
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypesDecimal.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnDecimal.h>
#include "FunctionArrayMapped.h"
#include <Functions/FunctionFactory.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
}
struct ArraySumImpl
{
static bool needBoolean() { return false; }
static bool needExpression() { return false; }
static bool needOneArray() { return false; }
static DataTypePtr getReturnType(const DataTypePtr & expression_return, const DataTypePtr & /*array_element*/)
{
WhichDataType which(expression_return);
if (which.isNativeUInt())
return std::make_shared<DataTypeUInt64>();
if (which.isNativeInt())
return std::make_shared<DataTypeInt64>();
if (which.isFloat())
return std::make_shared<DataTypeFloat64>();
if (which.isDecimal())
{
UInt32 scale = getDecimalScale(*expression_return);
return std::make_shared<DataTypeDecimal<Decimal128>>(DecimalUtils::maxPrecision<Decimal128>(), scale);
}
throw Exception("arraySum cannot add values of type " + expression_return->getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
template <typename Element, typename Result>
static bool executeType(const ColumnPtr & mapped, const ColumnArray::Offsets & offsets, ColumnPtr & res_ptr)
{
using ColVecType = std::conditional_t<IsDecimalNumber<Element>, ColumnDecimal<Element>, ColumnVector<Element>>;
using ColVecResult = std::conditional_t<IsDecimalNumber<Result>, ColumnDecimal<Result>, ColumnVector<Result>>;
const ColVecType * column = checkAndGetColumn<ColVecType>(&*mapped);
/// Constant case.
if (!column)
{
const ColumnConst * column_const = checkAndGetColumnConst<ColVecType>(&*mapped);
if (!column_const)
return false;
const Result x = column_const->template getValue<Element>(); // NOLINT
typename ColVecResult::MutablePtr res_column;
if constexpr (IsDecimalNumber<Element>)
{
const typename ColVecType::Container & data =
checkAndGetColumn<ColVecType>(&column_const->getDataColumn())->getData();
res_column = ColVecResult::create(offsets.size(), data.getScale());
}
else
res_column = ColVecResult::create(offsets.size());
typename ColVecResult::Container & res = res_column->getData();
size_t pos = 0;
for (size_t i = 0; i < offsets.size(); ++i)
{
/// Just multiply the value by array size.
res[i] = x * (offsets[i] - pos);
pos = offsets[i];
}
res_ptr = std::move(res_column);
return true;
}
const typename ColVecType::Container & data = column->getData();
typename ColVecResult::MutablePtr res_column;
if constexpr (IsDecimalNumber<Element>)
res_column = ColVecResult::create(offsets.size(), data.getScale());
else
res_column = ColVecResult::create(offsets.size());
typename ColVecResult::Container & res = res_column->getData();
size_t pos = 0;
for (size_t i = 0; i < offsets.size(); ++i)
{
Result s = 0;
for (; pos < offsets[i]; ++pos)
{
s += data[pos];
}
res[i] = s;
}
res_ptr = std::move(res_column);
return true;
}
static ColumnPtr execute(const ColumnArray & array, ColumnPtr mapped)
{
const IColumn::Offsets & offsets = array.getOffsets();
ColumnPtr res;
if (executeType< UInt8 , UInt64>(mapped, offsets, res) ||
executeType< UInt16, UInt64>(mapped, offsets, res) ||
executeType< UInt32, UInt64>(mapped, offsets, res) ||
executeType< UInt64, UInt64>(mapped, offsets, res) ||
executeType< Int8 , Int64>(mapped, offsets, res) ||
executeType< Int16, Int64>(mapped, offsets, res) ||
executeType< Int32, Int64>(mapped, offsets, res) ||
executeType< Int64, Int64>(mapped, offsets, res) ||
executeType<Float32,Float64>(mapped, offsets, res) ||
executeType<Float64,Float64>(mapped, offsets, res) ||
executeType<Decimal32, Decimal128>(mapped, offsets, res) ||
executeType<Decimal64, Decimal128>(mapped, offsets, res) ||
executeType<Decimal128, Decimal128>(mapped, offsets, res))
return res;
else
throw Exception("Unexpected column for arraySum: " + mapped->getName(), ErrorCodes::ILLEGAL_COLUMN);
}
};
struct NameArraySum { static constexpr auto name = "arraySum"; };
using FunctionArraySum = FunctionArrayMapped<ArraySumImpl, NameArraySum>;
void registerFunctionArraySum(FunctionFactory & factory)
{
factory.registerFunction<FunctionArraySum>();
}
}

View File

@ -9,7 +9,7 @@ void registerFunctionArrayCount(FunctionFactory & factory);
void registerFunctionArrayExists(FunctionFactory & factory);
void registerFunctionArrayAll(FunctionFactory & factory);
void registerFunctionArrayCompact(FunctionFactory & factory);
void registerFunctionArraySum(FunctionFactory & factory);
void registerFunctionArrayAggregation(FunctionFactory & factory);
void registerFunctionArrayFirst(FunctionFactory & factory);
void registerFunctionArrayFirstIndex(FunctionFactory & factory);
void registerFunctionsArrayFill(FunctionFactory & factory);
@ -27,7 +27,7 @@ void registerFunctionsHigherOrder(FunctionFactory & factory)
registerFunctionArrayExists(factory);
registerFunctionArrayAll(factory);
registerFunctionArrayCompact(factory);
registerFunctionArraySum(factory);
registerFunctionArrayAggregation(factory);
registerFunctionArrayFirst(factory);
registerFunctionArrayFirstIndex(factory);
registerFunctionsArrayFill(factory);

View File

@ -120,6 +120,7 @@ SRCS(
appendTrailingCharIfAbsent.cpp
array/array.cpp
array/arrayAUC.cpp
array/arrayAggregation.cpp
array/arrayAll.cpp
array/arrayCompact.cpp
array/arrayConcat.cpp
@ -155,7 +156,6 @@ SRCS(
array/arraySlice.cpp
array/arraySort.cpp
array/arraySplit.cpp
array/arraySum.cpp
array/arrayUniq.cpp
array/arrayWithConstant.cpp
array/arrayZip.cpp

View File

@ -368,8 +368,12 @@ namespace S3
throw Exception(
"Bucket name length is out of bounds in virtual hosted style S3 URI: " + bucket + " (" + uri.toString() + ")", ErrorCodes::BAD_ARGUMENTS);
/// Remove leading '/' from path to extract key.
key = uri.getPath().substr(1);
if (!uri.getPath().empty())
{
/// Remove leading '/' from path to extract key.
key = uri.getPath().substr(1);
}
if (key.empty() || key == "/")
throw Exception("Key name is empty in virtual hosted style S3 URI: " + key + " (" + uri.toString() + ")", ErrorCodes::BAD_ARGUMENTS);
boost::to_upper(name);

View File

@ -5,12 +5,9 @@
# include <IO/WriteBufferFromS3.h>
# include <IO/WriteHelpers.h>
# include <aws/core/utils/memory/stl/AWSStreamFwd.h>
# include <aws/core/utils/memory/stl/AWSStringStream.h>
# include <aws/s3/S3Client.h>
# include <aws/s3/model/CompleteMultipartUploadRequest.h>
# include <aws/s3/model/AbortMultipartUploadRequest.h>
# include <aws/s3/model/CreateMultipartUploadRequest.h>
# include <aws/s3/model/CompleteMultipartUploadRequest.h>
# include <aws/s3/model/PutObjectRequest.h>
# include <aws/s3/model/UploadPartRequest.h>
# include <common/logger_useful.h>
@ -42,22 +39,19 @@ WriteBufferFromS3::WriteBufferFromS3(
const String & bucket_,
const String & key_,
size_t minimum_upload_part_size_,
bool is_multipart_,
size_t max_single_part_upload_size_,
std::optional<std::map<String, String>> object_metadata_,
size_t buffer_size_)
: BufferWithOwnMemory<WriteBuffer>(buffer_size_, nullptr, 0)
, is_multipart(is_multipart_)
, bucket(bucket_)
, key(key_)
, object_metadata(std::move(object_metadata_))
, client_ptr(std::move(client_ptr_))
, minimum_upload_part_size{minimum_upload_part_size_}
, temporary_buffer{std::make_unique<WriteBufferFromOwnString>()}
, last_part_size{0}
{
if (is_multipart)
initiate();
}
, minimum_upload_part_size(minimum_upload_part_size_)
, max_single_part_upload_size(max_single_part_upload_size_)
, temporary_buffer(Aws::MakeShared<Aws::StringStream>("temporary buffer"))
, last_part_size(0)
{ }
void WriteBufferFromS3::nextImpl()
{
@ -68,16 +62,17 @@ void WriteBufferFromS3::nextImpl()
ProfileEvents::increment(ProfileEvents::S3WriteBytes, offset());
if (is_multipart)
{
last_part_size += offset();
last_part_size += offset();
if (last_part_size > minimum_upload_part_size)
{
writePart(temporary_buffer->str());
last_part_size = 0;
temporary_buffer->restart();
}
/// Data size exceeds singlepart upload threshold, need to use multipart upload.
if (multipart_upload_id.empty() && last_part_size > max_single_part_upload_size)
createMultipartUpload();
if (!multipart_upload_id.empty() && last_part_size > minimum_upload_part_size)
{
writePart();
last_part_size = 0;
temporary_buffer = Aws::MakeShared<Aws::StringStream>("temporary buffer");
}
}
@ -88,17 +83,23 @@ void WriteBufferFromS3::finalize()
void WriteBufferFromS3::finalizeImpl()
{
if (!finalized)
if (finalized)
return;
next();
if (multipart_upload_id.empty())
{
next();
if (is_multipart)
writePart(temporary_buffer->str());
complete();
finalized = true;
makeSinglepartUpload();
}
else
{
/// Write rest of the data as last part.
writePart();
completeMultipartUpload();
}
finalized = true;
}
WriteBufferFromS3::~WriteBufferFromS3()
@ -113,7 +114,7 @@ WriteBufferFromS3::~WriteBufferFromS3()
}
}
void WriteBufferFromS3::initiate()
void WriteBufferFromS3::createMultipartUpload()
{
Aws::S3::Model::CreateMultipartUploadRequest req;
req.SetBucket(bucket);
@ -125,17 +126,17 @@ void WriteBufferFromS3::initiate()
if (outcome.IsSuccess())
{
upload_id = outcome.GetResult().GetUploadId();
LOG_DEBUG(log, "Multipart upload initiated. Upload id: {}", upload_id);
multipart_upload_id = outcome.GetResult().GetUploadId();
LOG_DEBUG(log, "Multipart upload has created. Upload id: {}", multipart_upload_id);
}
else
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
}
void WriteBufferFromS3::writePart(const String & data)
void WriteBufferFromS3::writePart()
{
if (data.empty())
if (temporary_buffer->tellp() <= 0)
return;
if (part_tags.size() == S3_WARN_MAX_PARTS)
@ -149,93 +150,71 @@ void WriteBufferFromS3::writePart(const String & data)
req.SetBucket(bucket);
req.SetKey(key);
req.SetPartNumber(part_tags.size() + 1);
req.SetUploadId(upload_id);
req.SetContentLength(data.size());
req.SetBody(std::make_shared<Aws::StringStream>(data));
req.SetUploadId(multipart_upload_id);
req.SetContentLength(temporary_buffer->tellp());
req.SetBody(temporary_buffer);
auto outcome = client_ptr->UploadPart(req);
LOG_TRACE(log, "Writing part. Bucket: {}, Key: {}, Upload_id: {}, Data size: {}", bucket, key, upload_id, data.size());
LOG_TRACE(log, "Writing part. Bucket: {}, Key: {}, Upload_id: {}, Data size: {}", bucket, key, multipart_upload_id, temporary_buffer->tellp());
if (outcome.IsSuccess())
{
auto etag = outcome.GetResult().GetETag();
part_tags.push_back(etag);
LOG_DEBUG(log, "Writing part finished. Total parts: {}, Upload_id: {}, Etag: {}", part_tags.size(), upload_id, etag);
LOG_DEBUG(log, "Writing part finished. Total parts: {}, Upload_id: {}, Etag: {}", part_tags.size(), multipart_upload_id, etag);
}
else
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
}
void WriteBufferFromS3::complete()
void WriteBufferFromS3::completeMultipartUpload()
{
if (is_multipart)
LOG_DEBUG(log, "Completing multipart upload. Bucket: {}, Key: {}, Upload_id: {}", bucket, key, multipart_upload_id);
Aws::S3::Model::CompleteMultipartUploadRequest req;
req.SetBucket(bucket);
req.SetKey(key);
req.SetUploadId(multipart_upload_id);
Aws::S3::Model::CompletedMultipartUpload multipart_upload;
for (size_t i = 0; i < part_tags.size(); ++i)
{
if (part_tags.empty())
{
LOG_DEBUG(log, "Multipart upload has no data. Aborting it. Bucket: {}, Key: {}, Upload_id: {}", bucket, key, upload_id);
Aws::S3::Model::AbortMultipartUploadRequest req;
req.SetBucket(bucket);
req.SetKey(key);
req.SetUploadId(upload_id);
auto outcome = client_ptr->AbortMultipartUpload(req);
if (outcome.IsSuccess())
LOG_DEBUG(log, "Aborting multipart upload completed. Upload_id: {}", upload_id);
else
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
return;
}
LOG_DEBUG(log, "Completing multipart upload. Bucket: {}, Key: {}, Upload_id: {}", bucket, key, upload_id);
Aws::S3::Model::CompleteMultipartUploadRequest req;
req.SetBucket(bucket);
req.SetKey(key);
req.SetUploadId(upload_id);
Aws::S3::Model::CompletedMultipartUpload multipart_upload;
for (size_t i = 0; i < part_tags.size(); ++i)
{
Aws::S3::Model::CompletedPart part;
multipart_upload.AddParts(part.WithETag(part_tags[i]).WithPartNumber(i + 1));
}
req.SetMultipartUpload(multipart_upload);
auto outcome = client_ptr->CompleteMultipartUpload(req);
if (outcome.IsSuccess())
LOG_DEBUG(log, "Multipart upload completed. Upload_id: {}", upload_id);
else
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
Aws::S3::Model::CompletedPart part;
multipart_upload.AddParts(part.WithETag(part_tags[i]).WithPartNumber(i + 1));
}
req.SetMultipartUpload(multipart_upload);
auto outcome = client_ptr->CompleteMultipartUpload(req);
if (outcome.IsSuccess())
LOG_DEBUG(log, "Multipart upload has completed. Upload_id: {}", multipart_upload_id);
else
{
LOG_DEBUG(log, "Making single part upload. Bucket: {}, Key: {}", bucket, key);
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
}
Aws::S3::Model::PutObjectRequest req;
req.SetBucket(bucket);
req.SetKey(key);
if (object_metadata.has_value())
req.SetMetadata(object_metadata.value());
void WriteBufferFromS3::makeSinglepartUpload()
{
if (temporary_buffer->tellp() <= 0)
return;
/// This could be improved using an adapter to WriteBuffer.
const std::shared_ptr<Aws::IOStream> input_data = Aws::MakeShared<Aws::StringStream>("temporary buffer", temporary_buffer->str());
temporary_buffer = std::make_unique<WriteBufferFromOwnString>();
req.SetBody(input_data);
LOG_DEBUG(log, "Making single part upload. Bucket: {}, Key: {}", bucket, key);
auto outcome = client_ptr->PutObject(req);
Aws::S3::Model::PutObjectRequest req;
req.SetBucket(bucket);
req.SetKey(key);
req.SetContentLength(temporary_buffer->tellp());
req.SetBody(temporary_buffer);
if (object_metadata.has_value())
req.SetMetadata(object_metadata.value());
if (outcome.IsSuccess())
LOG_DEBUG(log, "Single part upload has completed. Bucket: {}, Key: {}", bucket, key);
else
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
}
auto outcome = client_ptr->PutObject(req);
if (outcome.IsSuccess())
LOG_DEBUG(log, "Single part upload has completed. Bucket: {}, Key: {}", bucket, key);
else
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
}
}

View File

@ -6,11 +6,13 @@
# include <memory>
# include <vector>
# include <common/logger_useful.h>
# include <common/types.h>
# include <IO/BufferWithOwnMemory.h>
# include <IO/HTTPCommon.h>
# include <IO/WriteBuffer.h>
# include <IO/WriteBufferFromString.h>
# include <aws/core/utils/memory/stl/AWSStringStream.h>
namespace Aws::S3
{
@ -19,24 +21,30 @@ class S3Client;
namespace DB
{
/* Perform S3 HTTP PUT request.
/**
* Buffer to write a data to a S3 object with specified bucket and key.
* If data size written to the buffer is less than 'max_single_part_upload_size' write is performed using singlepart upload.
* In another case multipart upload is used:
* Data is divided on chunks with size greater than 'minimum_upload_part_size'. Last chunk can be less than this threshold.
* Each chunk is written as a part to S3.
*/
class WriteBufferFromS3 : public BufferWithOwnMemory<WriteBuffer>
{
private:
bool is_multipart;
String bucket;
String key;
std::optional<std::map<String, String>> object_metadata;
std::shared_ptr<Aws::S3::S3Client> client_ptr;
size_t minimum_upload_part_size;
std::unique_ptr<WriteBufferFromOwnString> temporary_buffer;
size_t max_single_part_upload_size;
/// Buffer to accumulate data.
std::shared_ptr<Aws::StringStream> temporary_buffer;
size_t last_part_size;
/// Upload in S3 is made in parts.
/// We initiate upload, then upload each part and get ETag as a response, and then finish upload with listing all our parts.
String upload_id;
String multipart_upload_id;
std::vector<String> part_tags;
Poco::Logger * log = &Poco::Logger::get("WriteBufferFromS3");
@ -47,7 +55,7 @@ public:
const String & bucket_,
const String & key_,
size_t minimum_upload_part_size_,
bool is_multipart,
size_t max_single_part_upload_size_,
std::optional<std::map<String, String>> object_metadata_ = std::nullopt,
size_t buffer_size_ = DBMS_DEFAULT_BUFFER_SIZE);
@ -61,9 +69,11 @@ public:
private:
bool finalized = false;
void initiate();
void writePart(const String & data);
void complete();
void createMultipartUpload();
void writePart();
void completeMultipartUpload();
void makeSinglepartUpload();
void finalizeImpl();
};

View File

@ -249,6 +249,19 @@ ReturnType readFloatTextPreciseImpl(T & x, ReadBuffer & buf)
}
// credit: https://johnnylee-sde.github.io/Fast-numeric-string-to-int/
static inline bool is_made_of_eight_digits_fast(uint64_t val) noexcept
{
return (((val & 0xF0F0F0F0F0F0F0F0) | (((val + 0x0606060606060606) & 0xF0F0F0F0F0F0F0F0) >> 4)) == 0x3333333333333333);
}
static inline bool is_made_of_eight_digits_fast(const char * chars) noexcept
{
uint64_t val;
::memcpy(&val, chars, 8);
return is_made_of_eight_digits_fast(val);
}
template <size_t N, typename T>
static inline void readUIntTextUpToNSignificantDigits(T & x, ReadBuffer & buf)
{
@ -266,9 +279,6 @@ static inline void readUIntTextUpToNSignificantDigits(T & x, ReadBuffer & buf)
else
return;
}
while (!buf.eof() && isNumericASCII(*buf.position()))
++buf.position();
}
else
{
@ -283,10 +293,16 @@ static inline void readUIntTextUpToNSignificantDigits(T & x, ReadBuffer & buf)
else
return;
}
while (!buf.eof() && isNumericASCII(*buf.position()))
++buf.position();
}
while (!buf.eof() && (buf.position() + 8 <= buf.buffer().end()) &&
is_made_of_eight_digits_fast(buf.position()))
{
buf.position() += 8;
}
while (!buf.eof() && isNumericASCII(*buf.position()))
++buf.position();
}
@ -319,7 +335,6 @@ ReturnType readFloatTextFastImpl(T & x, ReadBuffer & in)
++in.position();
}
auto count_after_sign = in.count();
constexpr int significant_digits = std::numeric_limits<UInt64>::digits10;

View File

@ -4,7 +4,7 @@
#include <Storages/IStorage.h>
#include <Databases/IDatabase.h>
#include <Databases/DatabaseMemory.h>
#include <Databases/DatabaseAtomic.h>
#include <Databases/DatabaseOnDisk.h>
#include <Poco/File.h>
#include <Common/quoteString.h>
#include <Storages/StorageMemory.h>
@ -16,6 +16,15 @@
#include <Common/CurrentMetrics.h>
#include <common/logger_useful.h>
#if !defined(ARCADIA_BUILD)
# include "config_core.h"
#endif
#if USE_MYSQL
# include <Databases/MySQL/MaterializeMySQLSyncThread.h>
# include <Storages/StorageMaterializeMySQL.h>
#endif
#include <filesystem>
@ -217,6 +226,15 @@ DatabaseAndTable DatabaseCatalog::getTableImpl(
exception->emplace("Table " + table_id.getNameForLogs() + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE);
return {};
}
#if USE_MYSQL
/// It's definitely not the best place for this logic, but behaviour must be consistent with DatabaseMaterializeMySQL::tryGetTable(...)
if (db_and_table.first->getEngineName() == "MaterializeMySQL")
{
if (!MaterializeMySQLSyncThread::isMySQLSyncThread())
db_and_table.second = std::make_shared<StorageMaterializeMySQL>(std::move(db_and_table.second), db_and_table.first.get());
}
#endif
return db_and_table;
}
@ -286,7 +304,6 @@ void DatabaseCatalog::attachDatabase(const String & database_name, const Databas
assertDatabaseDoesntExistUnlocked(database_name);
databases.emplace(database_name, database);
UUID db_uuid = database->getUUID();
assert((db_uuid != UUIDHelpers::Nil) ^ (dynamic_cast<DatabaseAtomic *>(database.get()) == nullptr));
if (db_uuid != UUIDHelpers::Nil)
db_uuid_map.emplace(db_uuid, database);
}
@ -313,9 +330,8 @@ DatabasePtr DatabaseCatalog::detachDatabase(const String & database_name, bool d
if (!db->empty())
throw Exception("New table appeared in database being dropped or detached. Try again.",
ErrorCodes::DATABASE_NOT_EMPTY);
auto * database_atomic = typeid_cast<DatabaseAtomic *>(db.get());
if (!drop && database_atomic)
database_atomic->assertCanBeDetached(false);
if (!drop)
db->assertCanBeDetached(false);
}
catch (...)
{

View File

@ -157,6 +157,35 @@ BlockIO InterpreterCreateQuery::createDatabase(ASTCreateQuery & create)
if (!create.attach && fs::exists(metadata_path))
throw Exception(ErrorCodes::DATABASE_ALREADY_EXISTS, "Metadata directory {} already exists", metadata_path.string());
}
else if (create.storage->engine->name == "MaterializeMySQL")
{
/// It creates nested database with Ordinary or Atomic engine depending on UUID in query and default engine setting.
/// Do nothing if it's an internal ATTACH on server startup or short-syntax ATTACH query from user,
/// because we got correct query from the metadata file in this case.
/// If we got query from user, then normalize it first.
bool attach_from_user = create.attach && !internal && !create.attach_short_syntax;
bool create_from_user = !create.attach;
if (create_from_user)
{
const auto & default_engine = context.getSettingsRef().default_database_engine.value;
if (create.uuid == UUIDHelpers::Nil && default_engine == DefaultDatabaseEngine::Atomic)
create.uuid = UUIDHelpers::generateV4(); /// Will enable Atomic engine for nested database
}
else if (attach_from_user && create.uuid == UUIDHelpers::Nil)
{
/// Ambiguity is possible: should we attach nested database as Ordinary
/// or throw "UUID must be specified" for Atomic? So we suggest short syntax for Ordinary.
throw Exception("Use short attach syntax ('ATTACH DATABASE name;' without engine) to attach existing database "
"or specify UUID to attach new database with Atomic engine", ErrorCodes::INCORRECT_QUERY);
}
/// Set metadata path according to nested engine
if (create.uuid == UUIDHelpers::Nil)
metadata_path = metadata_path / "metadata" / database_name_escaped;
else
metadata_path = metadata_path / "store" / DatabaseCatalog::getPathForUUID(create.uuid);
}
else
{
bool is_on_cluster = context.getClientInfo().query_kind == ClientInfo::QueryKind::SECONDARY_QUERY;
@ -655,7 +684,7 @@ void InterpreterCreateQuery::assertOrSetUUID(ASTCreateQuery & create, const Data
bool from_path = create.attach_from_path.has_value();
if (database->getEngineName() == "Atomic")
if (database->getUUID() != UUIDHelpers::Nil)
{
if (create.attach && !from_path && create.uuid == UUIDHelpers::Nil)
{

View File

@ -11,7 +11,14 @@
#include <Common/escapeForFileName.h>
#include <Common/quoteString.h>
#include <Common/typeid_cast.h>
#include <Databases/DatabaseAtomic.h>
#if !defined(ARCADIA_BUILD)
# include "config_core.h"
#endif
#if USE_MYSQL
# include <Databases/MySQL/DatabaseMaterializeMySQL.h>
#endif
namespace DB
@ -66,10 +73,7 @@ void InterpreterDropQuery::waitForTableToBeActuallyDroppedOrDetached(const ASTDr
if (query.kind == ASTDropQuery::Kind::Drop)
DatabaseCatalog::instance().waitTableFinallyDropped(uuid_to_wait);
else if (query.kind == ASTDropQuery::Kind::Detach)
{
if (auto * atomic = typeid_cast<DatabaseAtomic *>(db.get()))
atomic->waitDetachedTableNotInUse(uuid_to_wait);
}
db->waitDetachedTableNotInUse(uuid_to_wait);
}
BlockIO InterpreterDropQuery::executeToTable(const ASTDropQuery & query)
@ -122,7 +126,7 @@ BlockIO InterpreterDropQuery::executeToTableImpl(const ASTDropQuery & query, Dat
table->checkTableCanBeDetached();
table->shutdown();
TableExclusiveLockHolder table_lock;
if (database->getEngineName() != "Atomic")
if (database->getUUID() == UUIDHelpers::Nil)
table_lock = table->lockExclusively(context.getCurrentQueryId(), context.getSettingsRef().lock_acquire_timeout);
/// Drop table from memory, don't touch data and metadata
database->detachTable(table_id.table_name);
@ -145,7 +149,7 @@ BlockIO InterpreterDropQuery::executeToTableImpl(const ASTDropQuery & query, Dat
table->shutdown();
TableExclusiveLockHolder table_lock;
if (database->getEngineName() != "Atomic")
if (database->getUUID() == UUIDHelpers::Nil)
table_lock = table->lockExclusively(context.getCurrentQueryId(), context.getSettingsRef().lock_acquire_timeout);
database->dropTable(context, table_id.table_name, query.no_delay);
@ -282,6 +286,11 @@ BlockIO InterpreterDropQuery::executeToDatabaseImpl(const ASTDropQuery & query,
bool drop = query.kind == ASTDropQuery::Kind::Drop;
context.checkAccess(AccessType::DROP_DATABASE, database_name);
#if USE_MYSQL
if (database->getEngineName() == "MaterializeMySQL")
stopDatabaseSynchronization(database);
#endif
if (database->shouldBeEmptyOnDetach())
{
/// DETACH or DROP all tables and dictionaries inside database.
@ -312,9 +321,8 @@ BlockIO InterpreterDropQuery::executeToDatabaseImpl(const ASTDropQuery & query,
/// Protects from concurrent CREATE TABLE queries
auto db_guard = DatabaseCatalog::instance().getExclusiveDDLGuardForDatabase(database_name);
auto * database_atomic = typeid_cast<DatabaseAtomic *>(database.get());
if (!drop && database_atomic)
database_atomic->assertCanBeDetached(true);
if (!drop)
database->assertCanBeDetached(true);
/// DETACH or DROP database itself
DatabaseCatalog::instance().detachDatabase(database_name, drop, database->shouldBeEmptyOnDetach());

View File

@ -128,24 +128,24 @@ InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(
const ASTPtr & query_ptr_, const Context & context_, const SelectQueryOptions & options_, const Names & required_result_column_names)
: IInterpreterUnionOrSelectQuery(query_ptr_, context_, options_)
{
auto & ast = query_ptr->as<ASTSelectWithUnionQuery &>();
ASTSelectWithUnionQuery * ast = query_ptr->as<ASTSelectWithUnionQuery>();
/// Normalize AST Tree
if (!ast.is_normalized)
if (!ast->is_normalized)
{
CustomizeASTSelectWithUnionQueryNormalizeVisitor::Data union_default_mode{context->getSettingsRef().union_default_mode};
CustomizeASTSelectWithUnionQueryNormalizeVisitor(union_default_mode).visit(query_ptr);
/// After normalization, if it only has one ASTSelectWithUnionQuery child,
/// we can lift it up, this can reduce one unnecessary recursion later.
if (ast.list_of_selects->children.size() == 1 && ast.list_of_selects->children.at(0)->as<ASTSelectWithUnionQuery>())
if (ast->list_of_selects->children.size() == 1 && ast->list_of_selects->children.at(0)->as<ASTSelectWithUnionQuery>())
{
query_ptr = std::move(ast.list_of_selects->children.at(0));
ast = query_ptr->as<ASTSelectWithUnionQuery &>();
query_ptr = std::move(ast->list_of_selects->children.at(0));
ast = query_ptr->as<ASTSelectWithUnionQuery>();
}
}
size_t num_children = ast.list_of_selects->children.size();
size_t num_children = ast->list_of_selects->children.size();
if (!num_children)
throw Exception("Logical error: no children in ASTSelectWithUnionQuery", ErrorCodes::LOGICAL_ERROR);
@ -161,7 +161,7 @@ InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(
/// Result header if there are no filtering by 'required_result_column_names'.
/// We use it to determine positions of 'required_result_column_names' in SELECT clause.
Block full_result_header = getCurrentChildResultHeader(ast.list_of_selects->children.at(0), required_result_column_names);
Block full_result_header = getCurrentChildResultHeader(ast->list_of_selects->children.at(0), required_result_column_names);
std::vector<size_t> positions_of_required_result_columns(required_result_column_names.size());
@ -171,7 +171,7 @@ InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(
for (size_t query_num = 1; query_num < num_children; ++query_num)
{
Block full_result_header_for_current_select
= getCurrentChildResultHeader(ast.list_of_selects->children.at(query_num), required_result_column_names);
= getCurrentChildResultHeader(ast->list_of_selects->children.at(query_num), required_result_column_names);
if (full_result_header_for_current_select.columns() != full_result_header.columns())
throw Exception("Different number of columns in UNION ALL elements:\n"
@ -192,7 +192,7 @@ InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(
= query_num == 0 ? required_result_column_names : required_result_column_names_for_other_selects[query_num];
nested_interpreters.emplace_back(
buildCurrentChildInterpreter(ast.list_of_selects->children.at(query_num), current_required_result_column_names));
buildCurrentChildInterpreter(ast->list_of_selects->children.at(query_num), current_required_result_column_names));
}
/// Determine structure of the result.

View File

@ -449,12 +449,22 @@ bool ParserCreateTableQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expe
if (!s_rparen.ignore(pos, expected))
return false;
if (!storage_p.parse(pos, storage, expected) && !is_temporary)
auto storage_parse_result = storage_p.parse(pos, storage, expected);
if (storage_parse_result && s_as.ignore(pos, expected))
{
if (!select_p.parse(pos, select, expected))
return false;
}
if (!storage_parse_result && !is_temporary)
{
if (!s_as.ignore(pos, expected))
return false;
if (!table_function_p.parse(pos, as_table_function, expected))
{
return false;
}
}
}
else

View File

@ -20,7 +20,7 @@ class ASTStorage;
M(Milliseconds, kafka_poll_timeout_ms, 0, "Timeout for single poll from Kafka.", 0) \
/* default is min(max_block_size, kafka_max_block_size)*/ \
M(UInt64, kafka_poll_max_batch_size, 0, "Maximum amount of messages to be polled in a single Kafka poll.", 0) \
/* default is = min_insert_block_size / kafka_num_consumers */ \
/* default is = max_insert_block_size / kafka_num_consumers */ \
M(UInt64, kafka_max_block_size, 0, "Number of row collected by poll(s) for flushing data from Kafka.", 0) \
/* default is stream_flush_interval_ms */ \
M(Milliseconds, kafka_flush_interval_ms, 0, "Timeout for flushing data from Kafka.", 0) \

View File

@ -21,12 +21,13 @@
#include <Processors/Pipe.h>
#include <Processors/Transforms/FilterTransform.h>
#include <Databases/MySQL/DatabaseMaterializeMySQL.h>
#include <Storages/SelectQueryInfo.h>
namespace DB
{
StorageMaterializeMySQL::StorageMaterializeMySQL(const StoragePtr & nested_storage_, const DatabaseMaterializeMySQL * database_)
StorageMaterializeMySQL::StorageMaterializeMySQL(const StoragePtr & nested_storage_, const IDatabase * database_)
: StorageProxy(nested_storage_->getStorageID()), nested_storage(nested_storage_), database(database_)
{
auto nested_memory_metadata = nested_storage->getInMemoryMetadata();
@ -45,7 +46,7 @@ Pipe StorageMaterializeMySQL::read(
unsigned int num_streams)
{
/// If the background synchronization thread has exception.
database->rethrowExceptionIfNeed();
rethrowSyncExceptionIfNeed(database);
NameSet column_names_set = NameSet(column_names.begin(), column_names.end());
auto lock = nested_storage->lockForShare(context.getCurrentQueryId(), context.getSettingsRef().lock_acquire_timeout);
@ -106,7 +107,7 @@ Pipe StorageMaterializeMySQL::read(
NamesAndTypesList StorageMaterializeMySQL::getVirtuals() const
{
/// If the background synchronization thread has exception.
database->rethrowExceptionIfNeed();
rethrowSyncExceptionIfNeed(database);
return nested_storage->getVirtuals();
}

View File

@ -5,7 +5,6 @@
#if USE_MYSQL
#include <Storages/StorageProxy.h>
#include <Databases/MySQL/DatabaseMaterializeMySQL.h>
namespace DB
{
@ -21,7 +20,7 @@ class StorageMaterializeMySQL final : public ext::shared_ptr_helper<StorageMater
public:
String getName() const override { return "MaterializeMySQL"; }
StorageMaterializeMySQL(const StoragePtr & nested_storage_, const DatabaseMaterializeMySQL * database_);
StorageMaterializeMySQL(const StoragePtr & nested_storage_, const IDatabase * database_);
Pipe read(
const Names & column_names, const StorageMetadataPtr & metadata_snapshot, SelectQueryInfo & query_info,
@ -32,15 +31,18 @@ public:
NamesAndTypesList getVirtuals() const override;
ColumnSizeByName getColumnSizes() const override;
private:
StoragePtr getNested() const override { return nested_storage; }
void drop() override { nested_storage->drop(); }
private:
[[noreturn]] void throwNotAllowed() const
{
throw Exception("This method is not allowed for MaterializeMySQL", ErrorCodes::NOT_IMPLEMENTED);
}
StoragePtr nested_storage;
const DatabaseMaterializeMySQL * database;
const IDatabase * database;
};
}

View File

@ -23,7 +23,7 @@ namespace ErrorCodes
class MemorySource : public SourceWithProgress
{
using InitializerFunc = std::function<void(BlocksList::const_iterator &, size_t &, std::shared_ptr<const BlocksList> &)>;
using InitializerFunc = std::function<void(std::shared_ptr<const Blocks> &)>;
public:
/// Blocks are stored in std::list which may be appended in another thread.
/// We use pointer to the beginning of the list and its current size.
@ -32,17 +32,15 @@ public:
MemorySource(
Names column_names_,
BlocksList::const_iterator first_,
size_t num_blocks_,
const StorageMemory & storage,
const StorageMetadataPtr & metadata_snapshot,
std::shared_ptr<const BlocksList> data_,
InitializerFunc initializer_func_ = [](BlocksList::const_iterator &, size_t &, std::shared_ptr<const BlocksList> &) {})
std::shared_ptr<const Blocks> data_,
std::shared_ptr<std::atomic<size_t>> parallel_execution_index_,
InitializerFunc initializer_func_ = {})
: SourceWithProgress(metadata_snapshot->getSampleBlockForColumns(column_names_, storage.getVirtuals(), storage.getStorageID()))
, column_names(std::move(column_names_))
, current_it(first_)
, num_blocks(num_blocks_)
, data(data_)
, parallel_execution_index(parallel_execution_index_)
, initializer_func(std::move(initializer_func_))
{
}
@ -52,16 +50,20 @@ public:
protected:
Chunk generate() override
{
if (!postponed_init_done)
if (initializer_func)
{
initializer_func(current_it, num_blocks, data);
postponed_init_done = true;
initializer_func(data);
initializer_func = {};
}
if (current_block_idx == num_blocks)
return {};
size_t current_index = getAndIncrementExecutionIndex();
const Block & src = *current_it;
if (current_index >= data->size())
{
return {};
}
const Block & src = (*data)[current_index];
Columns columns;
columns.reserve(column_names.size());
@ -69,20 +71,26 @@ protected:
for (const auto & name : column_names)
columns.push_back(src.getByName(name).column);
if (++current_block_idx < num_blocks)
++current_it;
return Chunk(std::move(columns), src.rows());
}
private:
const Names column_names;
BlocksList::const_iterator current_it;
size_t num_blocks;
size_t current_block_idx = 0;
size_t getAndIncrementExecutionIndex()
{
if (parallel_execution_index)
{
return (*parallel_execution_index)++;
}
else
{
return execution_index++;
}
}
std::shared_ptr<const BlocksList> data;
bool postponed_init_done = false;
const Names column_names;
size_t execution_index = 0;
std::shared_ptr<const Blocks> data;
std::shared_ptr<std::atomic<size_t>> parallel_execution_index;
InitializerFunc initializer_func;
};
@ -107,7 +115,7 @@ public:
metadata_snapshot->check(block, true);
{
std::lock_guard lock(storage.mutex);
auto new_data = std::make_unique<BlocksList>(*(storage.data.get()));
auto new_data = std::make_unique<Blocks>(*(storage.data.get()));
new_data->push_back(block);
storage.data.set(std::move(new_data));
@ -123,7 +131,7 @@ private:
StorageMemory::StorageMemory(const StorageID & table_id_, ColumnsDescription columns_description_, ConstraintsDescription constraints_)
: IStorage(table_id_), data(std::make_unique<const BlocksList>())
: IStorage(table_id_), data(std::make_unique<const Blocks>())
{
StorageInMemoryMetadata storage_metadata;
storage_metadata.setColumns(std::move(columns_description_));
@ -155,21 +163,17 @@ Pipe StorageMemory::read(
return Pipe(std::make_shared<MemorySource>(
column_names,
data.get()->end(),
0,
*this,
metadata_snapshot,
data.get(),
[this](BlocksList::const_iterator & current_it, size_t & num_blocks, std::shared_ptr<const BlocksList> & current_data)
nullptr /* data */,
nullptr /* parallel execution index */,
[this](std::shared_ptr<const Blocks> & data_to_initialize)
{
current_data = data.get();
current_it = current_data->begin();
num_blocks = current_data->size();
data_to_initialize = data.get();
}));
}
auto current_data = data.get();
size_t size = current_data->size();
if (num_streams > size)
@ -177,23 +181,11 @@ Pipe StorageMemory::read(
Pipes pipes;
BlocksList::const_iterator it = current_data->begin();
auto parallel_execution_index = std::make_shared<std::atomic<size_t>>(0);
size_t offset = 0;
for (size_t stream = 0; stream < num_streams; ++stream)
{
size_t next_offset = (stream + 1) * size / num_streams;
size_t num_blocks = next_offset - offset;
assert(num_blocks > 0);
pipes.emplace_back(std::make_shared<MemorySource>(column_names, it, num_blocks, *this, metadata_snapshot, current_data));
while (offset < next_offset)
{
++it;
++offset;
}
pipes.emplace_back(std::make_shared<MemorySource>(column_names, *this, metadata_snapshot, current_data, parallel_execution_index));
}
return Pipe::unitePipes(std::move(pipes));
@ -208,7 +200,7 @@ BlockOutputStreamPtr StorageMemory::write(const ASTPtr & /*query*/, const Storag
void StorageMemory::drop()
{
data.set(std::make_unique<BlocksList>());
data.set(std::make_unique<Blocks>());
total_size_bytes.store(0, std::memory_order_relaxed);
total_size_rows.store(0, std::memory_order_relaxed);
}
@ -233,7 +225,7 @@ void StorageMemory::mutate(const MutationCommands & commands, const Context & co
auto in = interpreter->execute();
in->readPrefix();
BlocksList out;
Blocks out;
Block block;
while ((block = in->read()))
{
@ -241,17 +233,17 @@ void StorageMemory::mutate(const MutationCommands & commands, const Context & co
}
in->readSuffix();
std::unique_ptr<BlocksList> new_data;
std::unique_ptr<Blocks> new_data;
// all column affected
if (interpreter->isAffectingAllColumns())
{
new_data = std::make_unique<BlocksList>(out);
new_data = std::make_unique<Blocks>(out);
}
else
{
/// just some of the column affected, we need update it with new column
new_data = std::make_unique<BlocksList>(*(data.get()));
new_data = std::make_unique<Blocks>(*(data.get()));
auto data_it = new_data->begin();
auto out_it = out.begin();
@ -284,7 +276,7 @@ void StorageMemory::mutate(const MutationCommands & commands, const Context & co
void StorageMemory::truncate(
const ASTPtr &, const StorageMetadataPtr &, const Context &, TableExclusiveLockHolder &)
{
data.set(std::make_unique<BlocksList>());
data.set(std::make_unique<Blocks>());
total_size_bytes.store(0, std::memory_order_relaxed);
total_size_rows.store(0, std::memory_order_relaxed);
}

View File

@ -91,7 +91,7 @@ public:
private:
/// MultiVersion data storage, so that we can copy the list of blocks to readers.
MultiVersion<BlocksList> data;
MultiVersion<Blocks> data;
mutable std::mutex mutex;

View File

@ -84,6 +84,22 @@ StorageMerge::StorageMerge(
setInMemoryMetadata(storage_metadata);
}
StorageMerge::StorageMerge(
const StorageID & table_id_,
const ColumnsDescription & columns_,
const String & source_database_,
const Tables & tables_,
const Context & context_)
: IStorage(table_id_)
, source_database(source_database_)
, tables(tables_)
, global_context(context_.getGlobalContext())
{
StorageInMemoryMetadata storage_metadata;
storage_metadata.setColumns(columns_);
setInMemoryMetadata(storage_metadata);
}
template <typename F>
StoragePtr StorageMerge::getFirstTable(F && predicate) const
{
@ -439,8 +455,12 @@ DatabaseTablesIteratorPtr StorageMerge::getDatabaseIterator(const Context & cont
e.addMessage("while getting table iterator of Merge table. Maybe caused by two Merge tables that will endlessly try to read each other's data");
throw;
}
if (tables)
return std::make_unique<DatabaseTablesSnapshotIterator>(*tables, source_database);
auto database = DatabaseCatalog::instance().getDatabase(source_database);
auto table_name_match = [this](const String & table_name_) { return table_name_regexp.match(table_name_); };
auto table_name_match = [this](const String & table_name_) { return table_name_regexp->match(table_name_); };
return database->getTablesIterator(context, table_name_match);
}

View File

@ -48,7 +48,8 @@ public:
private:
String source_database;
OptimizedRegularExpression table_name_regexp;
std::optional<OptimizedRegularExpression> table_name_regexp;
std::optional<Tables> tables;
const Context & global_context;
using StorageWithLockAndName = std::tuple<StoragePtr, TableLockHolder, String>;
@ -75,6 +76,13 @@ protected:
const String & table_name_regexp_,
const Context & context_);
StorageMerge(
const StorageID & table_id_,
const ColumnsDescription & columns_,
const String & source_database_,
const Tables & source_tables_,
const Context & context_);
Pipe createSources(
const StorageMetadataPtr & metadata_snapshot,
SelectQueryInfo & query_info,

View File

@ -141,17 +141,18 @@ namespace
public:
StorageS3BlockOutputStream(
const String & format,
UInt64 min_upload_part_size,
const Block & sample_block_,
const Context & context,
const CompressionMethod compression_method,
const std::shared_ptr<Aws::S3::S3Client> & client,
const String & bucket,
const String & key)
const String & key,
size_t min_upload_part_size,
size_t max_single_part_upload_size)
: sample_block(sample_block_)
{
write_buf = wrapWriteBufferWithCompressionMethod(
std::make_unique<WriteBufferFromS3>(client, bucket, key, min_upload_part_size, true), compression_method, 3);
std::make_unique<WriteBufferFromS3>(client, bucket, key, min_upload_part_size, max_single_part_upload_size), compression_method, 3);
writer = FormatFactory::instance().getOutput(format, *write_buf, sample_block, context);
}
@ -192,6 +193,7 @@ StorageS3::StorageS3(
const StorageID & table_id_,
const String & format_name_,
UInt64 min_upload_part_size_,
UInt64 max_single_part_upload_size_,
const ColumnsDescription & columns_,
const ConstraintsDescription & constraints_,
const Context & context_,
@ -201,6 +203,7 @@ StorageS3::StorageS3(
, global_context(context_.getGlobalContext())
, format_name(format_name_)
, min_upload_part_size(min_upload_part_size_)
, max_single_part_upload_size(max_single_part_upload_size_)
, compression_method(compression_method_)
, name(uri_.storage_name)
{
@ -331,9 +334,15 @@ Pipe StorageS3::read(
BlockOutputStreamPtr StorageS3::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, const Context & /*context*/)
{
return std::make_shared<StorageS3BlockOutputStream>(
format_name, min_upload_part_size, metadata_snapshot->getSampleBlock(),
global_context, chooseCompressionMethod(uri.endpoint, compression_method),
client, uri.bucket, uri.key);
format_name,
metadata_snapshot->getSampleBlock(),
global_context,
chooseCompressionMethod(uri.endpoint, compression_method),
client,
uri.bucket,
uri.key,
min_upload_part_size,
max_single_part_upload_size);
}
void registerStorageS3Impl(const String & name, StorageFactory & factory)
@ -362,6 +371,7 @@ void registerStorageS3Impl(const String & name, StorageFactory & factory)
}
UInt64 min_upload_part_size = args.local_context.getSettingsRef().s3_min_upload_part_size;
UInt64 max_single_part_upload_size = args.local_context.getSettingsRef().s3_max_single_part_upload_size;
String compression_method;
String format_name;
@ -383,6 +393,7 @@ void registerStorageS3Impl(const String & name, StorageFactory & factory)
args.table_id,
format_name,
min_upload_part_size,
max_single_part_upload_size,
args.columns,
args.constraints,
args.context,

View File

@ -31,6 +31,7 @@ public:
const StorageID & table_id_,
const String & format_name_,
UInt64 min_upload_part_size_,
UInt64 max_single_part_upload_size_,
const ColumnsDescription & columns_,
const ConstraintsDescription & constraints_,
const Context & context_,
@ -59,7 +60,8 @@ private:
const Context & global_context;
String format_name;
UInt64 min_upload_part_size;
size_t min_upload_part_size;
size_t max_single_part_upload_size;
String compression_method;
std::shared_ptr<Aws::S3::S3Client> client;
String name;

View File

@ -6,6 +6,7 @@
#include <TableFunctions/ITableFunction.h>
#include <Interpreters/evaluateConstantExpression.h>
#include <Interpreters/Context.h>
#include <Access/ContextAccess.h>
#include <TableFunctions/TableFunctionMerge.h>
#include <TableFunctions/TableFunctionFactory.h>
#include <TableFunctions/registerTableFunctions.h>
@ -22,29 +23,6 @@ namespace ErrorCodes
}
static NamesAndTypesList chooseColumns(const String & source_database, const String & table_name_regexp_, const Context & context)
{
OptimizedRegularExpression table_name_regexp(table_name_regexp_);
auto table_name_match = [&](const String & table_name) { return table_name_regexp.match(table_name); };
StoragePtr any_table;
{
auto database = DatabaseCatalog::instance().getDatabase(source_database);
auto iterator = database->getTablesIterator(context, table_name_match);
if (iterator->isValid())
if (const auto & table = iterator->table())
any_table = table;
}
if (!any_table)
throw Exception("Error while executing table function merge. In database " + source_database + " no one matches regular expression: "
+ table_name_regexp_, ErrorCodes::UNKNOWN_TABLE);
return any_table->getInMemoryMetadataPtr()->getColumns().getAllPhysical();
}
void TableFunctionMerge::parseArguments(const ASTPtr & ast_function, const Context & context)
{
ASTs & args_func = ast_function->children;
@ -68,9 +46,46 @@ void TableFunctionMerge::parseArguments(const ASTPtr & ast_function, const Conte
table_name_regexp = args[1]->as<ASTLiteral &>().value.safeGet<String>();
}
const Tables & TableFunctionMerge::getMatchingTables(const Context & context) const
{
if (tables)
return *tables;
auto database = DatabaseCatalog::instance().getDatabase(source_database);
OptimizedRegularExpression re(table_name_regexp);
auto table_name_match = [&](const String & table_name_) { return re.match(table_name_); };
auto access = context.getAccess();
bool granted_show_on_all_tables = access->isGranted(AccessType::SHOW_TABLES, source_database);
bool granted_select_on_all_tables = access->isGranted(AccessType::SELECT, source_database);
tables.emplace();
for (auto it = database->getTablesIterator(context, table_name_match); it->isValid(); it->next())
{
if (!it->table())
continue;
bool granted_show = granted_show_on_all_tables || access->isGranted(AccessType::SHOW_TABLES, source_database, it->name());
if (!granted_show)
continue;
if (!granted_select_on_all_tables)
access->checkAccess(AccessType::SELECT, source_database, it->name());
tables->emplace(it->name(), it->table());
}
if (tables->empty())
throw Exception("Error while executing table function merge. In database " + source_database + " no one matches regular expression: "
+ table_name_regexp, ErrorCodes::UNKNOWN_TABLE);
return *tables;
}
ColumnsDescription TableFunctionMerge::getActualTableStructure(const Context & context) const
{
return ColumnsDescription{chooseColumns(source_database, table_name_regexp, context)};
auto first_table = getMatchingTables(context).begin()->second;
return ColumnsDescription{first_table->getInMemoryMetadataPtr()->getColumns().getAllPhysical()};
}
StoragePtr TableFunctionMerge::executeImpl(const ASTPtr & /*ast_function*/, const Context & context, const std::string & table_name, ColumnsDescription /*cached_columns*/) const
@ -79,7 +94,7 @@ StoragePtr TableFunctionMerge::executeImpl(const ASTPtr & /*ast_function*/, cons
StorageID(getDatabaseName(), table_name),
getActualTableStructure(context),
source_database,
table_name_regexp,
getMatchingTables(context),
context);
res->startup();

View File

@ -19,11 +19,13 @@ private:
StoragePtr executeImpl(const ASTPtr & ast_function, const Context & context, const std::string & table_name, ColumnsDescription cached_columns) const override;
const char * getStorageTypeName() const override { return "Merge"; }
const Tables & getMatchingTables(const Context & context) const;
ColumnsDescription getActualTableStructure(const Context & context) const override;
void parseArguments(const ASTPtr & ast_function, const Context & context) override;
String source_database;
String table_name_regexp;
mutable std::optional<Tables> tables;
};

View File

@ -67,6 +67,7 @@ StoragePtr TableFunctionS3::executeImpl(const ASTPtr & /*ast_function*/, const C
Poco::URI uri (filename);
S3::URI s3_uri (uri);
UInt64 min_upload_part_size = context.getSettingsRef().s3_min_upload_part_size;
UInt64 max_single_part_upload_size = context.getSettingsRef().s3_max_single_part_upload_size;
StoragePtr storage = StorageS3::create(
s3_uri,
@ -75,6 +76,7 @@ StoragePtr TableFunctionS3::executeImpl(const ASTPtr & /*ast_function*/, const C
StorageID(getDatabaseName(), table_name),
format,
min_upload_part_size,
max_single_part_upload_size,
getActualTableStructure(context),
ConstraintsDescription{},
const_cast<Context &>(context),

View File

@ -3,7 +3,7 @@
<profiles>
<default>
<allow_experimental_database_materialize_mysql>1</allow_experimental_database_materialize_mysql>
<allow_introspection_functions>1</allow_introspection_functions>
<default_database_engine>Ordinary</default_database_engine>
</default>
</profiles>

View File

@ -0,0 +1,19 @@
<?xml version="1.0"?>
<yandex>
<profiles>
<default>
<allow_experimental_database_materialize_mysql>1</allow_experimental_database_materialize_mysql>
<default_database_engine>Atomic</default_database_engine>
</default>
</profiles>
<users>
<default>
<password></password>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<profile>default</profile>
</default>
</users>
</yandex>

View File

@ -15,6 +15,7 @@ from multiprocessing.dummy import Pool
def check_query(clickhouse_node, query, result_set, retry_count=60, interval_seconds=3):
lastest_result = ''
for i in range(retry_count):
try:
lastest_result = clickhouse_node.query(query)
@ -35,6 +36,7 @@ def dml_with_materialize_mysql_database(clickhouse_node, mysql_node, service_nam
clickhouse_node.query("DROP DATABASE IF EXISTS test_database")
mysql_node.query("CREATE DATABASE test_database DEFAULT CHARACTER SET 'utf8'")
# existed before the mapping was created
mysql_node.query("CREATE TABLE test_database.test_table_1 ("
"`key` INT NOT NULL PRIMARY KEY, "
"unsigned_tiny_int TINYINT UNSIGNED, tiny_int TINYINT, "
@ -51,9 +53,10 @@ def dml_with_materialize_mysql_database(clickhouse_node, mysql_node, service_nam
"_date Date, _datetime DateTime, _timestamp TIMESTAMP, _bool BOOLEAN) ENGINE = InnoDB;")
# it already has some data
mysql_node.query(
"INSERT INTO test_database.test_table_1 VALUES(1, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 3.2, -3.2, 3.4, -3.4, 'varchar', 'char', "
"'2020-01-01', '2020-01-01 00:00:00', '2020-01-01 00:00:00', true);")
mysql_node.query("""
INSERT INTO test_database.test_table_1 VALUES(1, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 3.2, -3.2, 3.4, -3.4, 'varchar', 'char',
'2020-01-01', '2020-01-01 00:00:00', '2020-01-01 00:00:00', true);
""")
clickhouse_node.query(
"CREATE DATABASE test_database ENGINE = MaterializeMySQL('{}:3306', 'test_database', 'root', 'clickhouse')".format(
@ -65,9 +68,10 @@ def dml_with_materialize_mysql_database(clickhouse_node, mysql_node, service_nam
"1\t1\t-1\t2\t-2\t3\t-3\t4\t-4\t5\t-5\t6\t-6\t3.2\t-3.2\t3.4\t-3.4\tvarchar\tchar\t2020-01-01\t"
"2020-01-01 00:00:00\t2020-01-01 00:00:00\t1\n")
mysql_node.query(
"INSERT INTO test_database.test_table_1 VALUES(2, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 3.2, -3.2, 3.4, -3.4, 'varchar', 'char', "
"'2020-01-01', '2020-01-01 00:00:00', '2020-01-01 00:00:00', false);")
mysql_node.query("""
INSERT INTO test_database.test_table_1 VALUES(2, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 3.2, -3.2, 3.4, -3.4, 'varchar', 'char',
'2020-01-01', '2020-01-01 00:00:00', '2020-01-01 00:00:00', false);
""")
check_query(clickhouse_node, "SELECT * FROM test_database.test_table_1 ORDER BY key FORMAT TSV",
"1\t1\t-1\t2\t-2\t3\t-3\t4\t-4\t5\t-5\t6\t-6\t3.2\t-3.2\t3.4\t-3.4\tvarchar\tchar\t2020-01-01\t"
@ -76,14 +80,16 @@ def dml_with_materialize_mysql_database(clickhouse_node, mysql_node, service_nam
mysql_node.query("UPDATE test_database.test_table_1 SET unsigned_tiny_int = 2 WHERE `key` = 1")
check_query(clickhouse_node, "SELECT key, unsigned_tiny_int, tiny_int, unsigned_small_int,"
" small_int, unsigned_medium_int, medium_int, unsigned_int, _int, unsigned_integer, _integer, "
" unsigned_bigint, _bigint, unsigned_float, _float, unsigned_double, _double, _varchar, _char, "
" _date, _datetime, /* exclude it, because ON UPDATE CURRENT_TIMESTAMP _timestamp, */ "
" _bool FROM test_database.test_table_1 ORDER BY key FORMAT TSV",
"1\t2\t-1\t2\t-2\t3\t-3\t4\t-4\t5\t-5\t6\t-6\t3.2\t-3.2\t3.4\t-3.4\tvarchar\tchar\t2020-01-01\t"
"2020-01-01 00:00:00\t1\n2\t1\t-1\t2\t-2\t3\t-3\t4\t-4\t5\t-5\t6\t-6\t3.2\t-3.2\t3.4\t-3.4\t"
"varchar\tchar\t2020-01-01\t2020-01-01 00:00:00\t0\n")
check_query(clickhouse_node, """
SELECT key, unsigned_tiny_int, tiny_int, unsigned_small_int,
small_int, unsigned_medium_int, medium_int, unsigned_int, _int, unsigned_integer, _integer,
unsigned_bigint, _bigint, unsigned_float, _float, unsigned_double, _double, _varchar, _char,
_date, _datetime, /* exclude it, because ON UPDATE CURRENT_TIMESTAMP _timestamp, */
_bool FROM test_database.test_table_1 ORDER BY key FORMAT TSV
""",
"1\t2\t-1\t2\t-2\t3\t-3\t4\t-4\t5\t-5\t6\t-6\t3.2\t-3.2\t3.4\t-3.4\tvarchar\tchar\t2020-01-01\t"
"2020-01-01 00:00:00\t1\n2\t1\t-1\t2\t-2\t3\t-3\t4\t-4\t5\t-5\t6\t-6\t3.2\t-3.2\t3.4\t-3.4\t"
"varchar\tchar\t2020-01-01\t2020-01-01 00:00:00\t0\n")
# update primary key
mysql_node.query("UPDATE test_database.test_table_1 SET `key` = 3 WHERE `unsigned_tiny_int` = 2")
@ -556,6 +562,12 @@ def err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, mysql_n
assert 'MySQL SYNC USER ACCESS ERR:' in str(exception.value)
assert "priv_err_db" not in clickhouse_node.query("SHOW DATABASES")
mysql_node.query("GRANT SELECT ON priv_err_db.* TO 'test'@'%'")
time.sleep(3)
clickhouse_node.query("ATTACH DATABASE priv_err_db")
clickhouse_node.query("DROP DATABASE priv_err_db")
mysql_node.query("REVOKE SELECT ON priv_err_db.* FROM 'test'@'%'")
mysql_node.query("DROP DATABASE priv_err_db;")
mysql_node.query("DROP USER 'test'@'%'")

View File

@ -14,7 +14,10 @@ from . import materialize_with_ddl
DOCKER_COMPOSE_PATH = get_docker_compose_path()
cluster = ClickHouseCluster(__file__)
clickhouse_node = cluster.add_instance('node1', user_configs=["configs/users.xml"], with_mysql=False, stay_alive=True)
node_db_ordinary = cluster.add_instance('node1', user_configs=["configs/users.xml"], with_mysql=False, stay_alive=True)
node_db_atomic = cluster.add_instance('node2', user_configs=["configs/users_db_atomic.xml"], with_mysql=False, stay_alive=True)
@pytest.fixture(scope="module")
def started_cluster():
@ -119,39 +122,30 @@ def started_mysql_8_0():
'--remove-orphans'])
def test_materialize_database_dml_with_mysql_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_materialize_database_dml_with_mysql_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.dml_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.materialize_mysql_database_with_datetime_and_decimal(clickhouse_node, started_mysql_5_7, "mysql1")
def test_materialize_database_dml_with_mysql_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_materialize_database_dml_with_mysql_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.dml_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql8_0")
materialize_with_ddl.materialize_mysql_database_with_datetime_and_decimal(clickhouse_node, started_mysql_8_0, "mysql8_0")
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_materialize_database_ddl_with_mysql_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.drop_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.create_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.alter_add_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.alter_drop_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
# mysql 5.7 cannot support alter rename column
# materialize_with_ddl.alter_rename_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.alter_rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.alter_modify_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
def test_materialize_database_ddl_with_mysql_5_7(started_cluster, started_mysql_5_7):
try:
materialize_with_ddl.drop_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.create_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.alter_add_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7,
"mysql1")
materialize_with_ddl.alter_drop_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7,
"mysql1")
# mysql 5.7 cannot support alter rename column
# materialize_with_ddl.alter_rename_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
materialize_with_ddl.alter_rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7,
"mysql1")
materialize_with_ddl.alter_modify_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7,
"mysql1")
except:
print((clickhouse_node.query(
"select '\n', thread_id, query_id, arrayStringConcat(arrayMap(x -> concat(demangle(addressToSymbol(x)), '\n ', addressToLine(x)), trace), '\n') AS sym from system.stack_trace format TSVRaw")))
raise
def test_materialize_database_ddl_with_mysql_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_materialize_database_ddl_with_mysql_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.drop_table_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql8_0")
materialize_with_ddl.create_table_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql8_0")
materialize_with_ddl.rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql8_0")
@ -166,61 +160,72 @@ def test_materialize_database_ddl_with_mysql_8_0(started_cluster, started_mysql_
materialize_with_ddl.alter_modify_column_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0,
"mysql8_0")
def test_materialize_database_ddl_with_empty_transaction_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_materialize_database_ddl_with_empty_transaction_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.query_event_with_empty_transaction(clickhouse_node, started_mysql_5_7, "mysql1")
def test_materialize_database_ddl_with_empty_transaction_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_materialize_database_ddl_with_empty_transaction_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.query_event_with_empty_transaction(clickhouse_node, started_mysql_8_0, "mysql8_0")
def test_select_without_columns_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_select_without_columns_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.select_without_columns(clickhouse_node, started_mysql_5_7, "mysql1")
def test_select_without_columns_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_select_without_columns_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.select_without_columns(clickhouse_node, started_mysql_8_0, "mysql8_0")
def test_insert_with_modify_binlog_checksum_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_insert_with_modify_binlog_checksum_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.insert_with_modify_binlog_checksum(clickhouse_node, started_mysql_5_7, "mysql1")
def test_insert_with_modify_binlog_checksum_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_insert_with_modify_binlog_checksum_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.insert_with_modify_binlog_checksum(clickhouse_node, started_mysql_8_0, "mysql8_0")
def test_materialize_database_err_sync_user_privs_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_materialize_database_err_sync_user_privs_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql1")
def test_materialize_database_err_sync_user_privs_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_materialize_database_err_sync_user_privs_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql8_0")
def test_network_partition_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_network_partition_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.network_partition_test(clickhouse_node, started_mysql_5_7, "mysql1")
def test_network_partition_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_network_partition_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.network_partition_test(clickhouse_node, started_mysql_8_0, "mysql8_0")
def test_mysql_kill_sync_thread_restore_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_mysql_kill_sync_thread_restore_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.mysql_kill_sync_thread_restore_test(clickhouse_node, started_mysql_5_7, "mysql1")
def test_mysql_kill_sync_thread_restore_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_mysql_kill_sync_thread_restore_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.mysql_kill_sync_thread_restore_test(clickhouse_node, started_mysql_8_0, "mysql8_0")
def test_mysql_killed_while_insert_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_mysql_killed_while_insert_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.mysql_killed_while_insert(clickhouse_node, started_mysql_5_7, "mysql1")
def test_mysql_killed_while_insert_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_mysql_killed_while_insert_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.mysql_killed_while_insert(clickhouse_node, started_mysql_8_0, "mysql8_0")
def test_clickhouse_killed_while_insert_5_7(started_cluster, started_mysql_5_7):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_clickhouse_killed_while_insert_5_7(started_cluster, started_mysql_5_7, clickhouse_node):
materialize_with_ddl.clickhouse_killed_while_insert(clickhouse_node, started_mysql_5_7, "mysql1")
def test_clickhouse_killed_while_insert_8_0(started_cluster, started_mysql_8_0):
@pytest.mark.parametrize(('clickhouse_node'), [node_db_ordinary, node_db_atomic])
def test_clickhouse_killed_while_insert_8_0(started_cluster, started_mysql_8_0, clickhouse_node):
materialize_with_ddl.clickhouse_killed_while_insert(clickhouse_node, started_mysql_8_0, "mysql8_0")

View File

@ -207,34 +207,26 @@ def test_show_profiles():
def test_allow_ddl():
assert "Not enough privileges" in instance.query_and_get_error("CREATE TABLE tbl(a Int32) ENGINE=Log", user="robin")
assert "DDL queries are prohibited" in instance.query_and_get_error("CREATE TABLE tbl(a Int32) ENGINE=Log",
settings={"allow_ddl": 0})
assert "Not enough privileges" in instance.query_and_get_error("GRANT CREATE ON tbl TO robin", user="robin")
assert "DDL queries are prohibited" in instance.query_and_get_error("GRANT CREATE ON tbl TO robin",
settings={"allow_ddl": 0})
assert "it's necessary to have grant" in instance.query_and_get_error("CREATE TABLE tbl(a Int32) ENGINE=Log", user="robin")
assert "it's necessary to have grant" in instance.query_and_get_error("GRANT CREATE ON tbl TO robin", user="robin")
assert "DDL queries are prohibited" in instance.query_and_get_error("CREATE TABLE tbl(a Int32) ENGINE=Log", settings={"allow_ddl": 0})
instance.query("GRANT CREATE ON tbl TO robin")
instance.query("CREATE TABLE tbl(a Int32) ENGINE=Log", user="robin")
instance.query("DROP TABLE tbl")
def test_allow_introspection():
assert "Introspection functions are disabled" in instance.query_and_get_error("SELECT demangle('a')")
assert "Not enough privileges" in instance.query_and_get_error("SELECT demangle('a')", user="robin")
assert "Not enough privileges" in instance.query_and_get_error("SELECT demangle('a')", user="robin",
settings={"allow_introspection_functions": 1})
assert "Introspection functions are disabled" in instance.query_and_get_error("GRANT demangle ON *.* TO robin")
assert "Not enough privileges" in instance.query_and_get_error("GRANT demangle ON *.* TO robin", user="robin")
assert "Not enough privileges" in instance.query_and_get_error("GRANT demangle ON *.* TO robin", user="robin",
settings={"allow_introspection_functions": 1})
assert instance.query("SELECT demangle('a')", settings={"allow_introspection_functions": 1}) == "signed char\n"
instance.query("GRANT demangle ON *.* TO robin", settings={"allow_introspection_functions": 1})
assert "Introspection functions are disabled" in instance.query_and_get_error("SELECT demangle('a')")
assert "it's necessary to have grant" in instance.query_and_get_error("SELECT demangle('a')", user="robin")
assert "it's necessary to have grant" in instance.query_and_get_error("SELECT demangle('a')", user="robin", settings={"allow_introspection_functions": 1})
instance.query("GRANT demangle ON *.* TO robin")
assert "Introspection functions are disabled" in instance.query_and_get_error("SELECT demangle('a')", user="robin")
assert instance.query("SELECT demangle('a')", user="robin", settings={"allow_introspection_functions": 1}) == "signed char\n"
instance.query("ALTER USER robin SETTINGS allow_introspection_functions=1")
assert instance.query("SELECT demangle('a')", user="robin") == "signed char\n"
@ -248,4 +240,4 @@ def test_allow_introspection():
assert "Introspection functions are disabled" in instance.query_and_get_error("SELECT demangle('a')", user="robin")
instance.query("REVOKE demangle ON *.* FROM robin", settings={"allow_introspection_functions": 1})
assert "Not enough privileges" in instance.query_and_get_error("SELECT demangle('a')", user="robin")
assert "it's necessary to have grant" in instance.query_and_get_error("SELECT demangle('a')", user="robin")

View File

@ -9,6 +9,7 @@ import time
import avro.schema
from confluent_kafka.avro.cached_schema_registry_client import CachedSchemaRegistryClient
from confluent_kafka.avro.serializer.message_serializer import MessageSerializer
from confluent_kafka import admin
import kafka.errors
import pytest
@ -1161,6 +1162,66 @@ def test_kafka_materialized_view(kafka_cluster):
kafka_check_result(result, True)
@pytest.mark.timeout(180)
def test_librdkafka_snappy_regression(kafka_cluster):
"""
Regression for UB in snappy-c (that is used in librdkafka),
backport pr is [1].
[1]: https://github.com/ClickHouse-Extras/librdkafka/pull/3
Example of corruption:
2020.12.10 09:59:56.831507 [ 20 ] {} <Error> void DB::StorageKafka::threadFunc(size_t): Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected '"' before: 'foo"}': (while reading the value of key value): (at row 1)
, Stack trace (when copying this message, always include the lines below):
"""
# create topic with snappy compression
admin_client = admin.AdminClient({'bootstrap.servers': 'localhost:9092'})
topic_snappy = admin.NewTopic(topic='snappy_regression', num_partitions=1, replication_factor=1, config={
'compression.type': 'snappy',
})
admin_client.create_topics(new_topics=[topic_snappy], validate_only=False)
instance.query('''
CREATE TABLE test.kafka (key UInt64, value String)
ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka1:19092',
kafka_topic_list = 'snappy_regression',
kafka_group_name = 'ch_snappy_regression',
kafka_format = 'JSONEachRow';
''')
messages = []
expected = []
# To trigger this regression there should duplicated messages
# Orignal reproducer is:
#
# $ gcc --version |& fgrep gcc
# gcc (GCC) 10.2.0
# $ yes foobarbaz | fold -w 80 | head -n10 >| in-…
# $ make clean && make CFLAGS='-Wall -g -O2 -ftree-loop-vectorize -DNDEBUG=1 -DSG=1 -fPIC'
# $ ./verify in
# final comparision of in failed at 20 of 100
value = 'foobarbaz'*10
number_of_messages = 50
for i in range(number_of_messages):
messages.append(json.dumps({'key': i, 'value': value}))
expected.append(f'{i}\t{value}')
kafka_produce('snappy_regression', messages)
expected = '\n'.join(expected)
while True:
result = instance.query('SELECT * FROM test.kafka')
rows = len(result.strip('\n').split('\n'))
print(rows)
if rows == number_of_messages:
break
assert TSV(result) == TSV(expected)
instance.query('DROP TABLE test.kafka')
@pytest.mark.timeout(180)
def test_kafka_materialized_view_with_subquery(kafka_cluster):

View File

@ -306,7 +306,8 @@ def test_multipart_put(cluster, maybe_auth, positive):
cluster.minio_redirect_host, cluster.minio_redirect_port, bucket, filename, maybe_auth, table_format)
try:
run_query(instance, put_query, stdin=csv_data, settings={'s3_min_upload_part_size': min_part_size_bytes})
run_query(instance, put_query, stdin=csv_data, settings={'s3_min_upload_part_size': min_part_size_bytes,
's3_max_single_part_upload_size': 0})
except helpers.client.QueryRuntimeException:
if positive:
raise

View File

@ -0,0 +1,55 @@
import pytest
from helpers.cluster import ClickHouseCluster
from helpers.test_tools import TSV
cluster = ClickHouseCluster(__file__)
instance = cluster.add_instance('instance')
@pytest.fixture(scope="module", autouse=True)
def started_cluster():
try:
cluster.start()
instance.query("CREATE TABLE table1(x UInt32) ENGINE = MergeTree ORDER BY tuple()")
instance.query("CREATE TABLE table2(x UInt32) ENGINE = MergeTree ORDER BY tuple()")
instance.query("INSERT INTO table1 VALUES (1)")
instance.query("INSERT INTO table2 VALUES (2)")
yield cluster
finally:
cluster.shutdown()
@pytest.fixture(autouse=True)
def cleanup_after_test():
try:
yield
finally:
instance.query("DROP USER IF EXISTS A")
def test_merge():
select_query = "SELECT * FROM merge('default', 'table[0-9]+') ORDER BY x"
assert instance.query(select_query) == "1\n2\n"
instance.query("CREATE USER A")
assert "it's necessary to have the grant CREATE TEMPORARY TABLE ON *.*" in instance.query_and_get_error(select_query, user = 'A')
instance.query("GRANT CREATE TEMPORARY TABLE ON *.* TO A")
assert "no one matches regular expression" in instance.query_and_get_error(select_query, user = 'A')
instance.query("GRANT SELECT ON default.table1 TO A")
assert instance.query(select_query, user = 'A') == "1\n"
instance.query("GRANT SELECT ON default.* TO A")
assert instance.query(select_query, user = 'A') == "1\n2\n"
instance.query("REVOKE SELECT ON default.table1 FROM A")
assert instance.query(select_query, user = 'A') == "2\n"
instance.query("REVOKE ALL ON default.* FROM A")
instance.query("GRANT SELECT ON default.table1 TO A")
instance.query("GRANT INSERT ON default.table2 TO A")
assert "it's necessary to have the grant SELECT ON default.table2" in instance.query_and_get_error(select_query, user = 'A')

View File

@ -49,3 +49,7 @@ DROP DICTIONARY dict;
DROP TABLE test_01056_dict_data.dict_data;
DROP DATABASE test_01056_dict_data;
CREATE TABLE t1 (x String) ENGINE = Memory AS SELECT 1;
SELECT x, toTypeName(x) FROM t1;
DROP TABLE t1;

View File

@ -9,8 +9,6 @@ select 1 from remote('127.{1,2}', currentDatabase(), test_01081) lhs join system
-- Code: 171. DB::Exception: Received from localhost:9000. DB::Exception: Received from 127.2:9000. DB::Exception: Block structure mismatch in function connect between PartialSortingTransform and LazyOutputFormat stream: different columns:
-- _dummy Int Int32(size = 0), 1 UInt8 UInt8(size = 0)
-- _dummy Int Int32(size = 0), 1 UInt8 Const(size = 0, UInt8(size = 1)).
--
-- With experimental_use_processors=1 (default at the time of writing).
insert into test_01081 select * from system.numbers limit 10;
select 1 from remote('127.{1,2}', currentDatabase(), test_01081) lhs join system.one as rhs on rhs.dummy = 1 order by 1;

View File

@ -1,17 +1,29 @@
-- Check remerge_sort_lowered_memory_bytes_ratio setting
set max_memory_usage='4Gi';
set max_memory_usage='300Mi';
-- enter remerge once limit*2 is reached
set max_bytes_before_remerge_sort='10Mi';
-- more blocks
set max_block_size=40960;
-- default 2, slightly not enough:
--
-- MergeSortingTransform: Memory usage is lowered from 1.91 GiB to 980.00 MiB
-- remerge_sort_lowered_memory_bytes_ratio default 2, slightly not enough
-- MergeSortingTransform: Re-merging intermediate ORDER BY data (20 blocks with 819200 rows) to save memory consumption
-- MergeSortingTransform: Memory usage is lowered from 186.25 MiB to 95.00 MiB
-- MergeSortingTransform: Re-merging is not useful (memory usage was not lowered by remerge_sort_lowered_memory_bytes_ratio=2.0)
select number k, repeat(toString(number), 101) v1, repeat(toString(number), 102) v2, repeat(toString(number), 103) v3 from numbers(toUInt64(10e6)) order by k limit 400e3 format Null; -- { serverError 241 }
select number k, repeat(toString(number), 101) v1, repeat(toString(number), 102) v2, repeat(toString(number), 103) v3 from numbers(toUInt64(10e6)) order by k limit 400e3 settings remerge_sort_lowered_memory_bytes_ratio=2. format Null; -- { serverError 241 }
select number k, repeat(toString(number), 11) v1, repeat(toString(number), 12) v2 from numbers(toUInt64(3e6)) order by k limit 400e3 format Null; -- { serverError 241 }
select number k, repeat(toString(number), 11) v1, repeat(toString(number), 12) v2 from numbers(toUInt64(3e6)) order by k limit 400e3 settings remerge_sort_lowered_memory_bytes_ratio=2. format Null; -- { serverError 241 }
-- 1.91/0.98=1.94 is good
select number k, repeat(toString(number), 101) v1, repeat(toString(number), 102) v2, repeat(toString(number), 103) v3 from numbers(toUInt64(10e6)) order by k limit 400e3 settings remerge_sort_lowered_memory_bytes_ratio=1.9 format Null;
-- remerge_sort_lowered_memory_bytes_ratio 1.9 is good (need at least 1.91/0.98=1.94)
-- MergeSortingTransform: Re-merging intermediate ORDER BY data (20 blocks with 819200 rows) to save memory consumption
-- MergeSortingTransform: Memory usage is lowered from 186.25 MiB to 95.00 MiB
-- MergeSortingTransform: Re-merging intermediate ORDER BY data (20 blocks with 809600 rows) to save memory consumption
-- MergeSortingTransform: Memory usage is lowered from 188.13 MiB to 95.00 MiB
-- MergeSortingTransform: Re-merging intermediate ORDER BY data (20 blocks with 809600 rows) to save memory consumption
-- MergeSortingTransform: Memory usage is lowered from 188.13 MiB to 95.00 MiB
-- MergeSortingTransform: Re-merging intermediate ORDER BY data (20 blocks with 809600 rows) to save memory consumption
-- MergeSortingTransform: Memory usage is lowered from 188.13 MiB to 95.00 MiB
-- MergeSortingTransform: Re-merging intermediate ORDER BY data (20 blocks with 809600 rows) to save memory consumption
-- MergeSortingTransform: Memory usage is lowered from 188.13 MiB to 95.00 MiB
-- MergeSortingTransform: Re-merging intermediate ORDER BY data (20 blocks with 809600 rows) to save memory consumption
-- MergeSortingTransform: Memory usage is lowered from 188.13 MiB to 95.00 MiB
select number k, repeat(toString(number), 11) v1, repeat(toString(number), 12) v2 from numbers(toUInt64(3e6)) order by k limit 400e3 settings remerge_sort_lowered_memory_bytes_ratio=1.9 format Null;

View File

@ -0,0 +1,56 @@
Array min 1
Array max 6
Array sum 21
Array avg 3.5
Table array int min
1
0
1
Table array int max
6
0
3
Table array int sum
21
0
6
Table array int avg
3.5
0
2
Table array decimal min
1.00000000
0.00000000
1.00000000
Table array decimal max
6.00000000
0.00000000
3.00000000
Table array decimal sum
21.00000000
0.00000000
6.00000000
Table array decimal avg
3.5
0
2
Types of aggregation result array min
Int8 Int16 Int32 Int64
UInt8 UInt16 UInt32 UInt64
Float32 Float64
Decimal(9, 8) Decimal(18, 8) Decimal(38, 8)
Types of aggregation result array max
Int8 Int16 Int32 Int64
UInt8 UInt16 UInt32 UInt64
Float32 Float64
Decimal(9, 8) Decimal(18, 8) Decimal(38, 8)
Types of aggregation result array summ
Int64 Int64 Int64 Int64
UInt64 UInt64 UInt64 UInt64
Float64 Float64
Decimal(38, 8) Decimal(38, 8) Decimal(38, 8)
Types of aggregation result array avg
Float64 Float64 Float64 Float64
Float64 Float64 Float64 Float64
Float64 Float64
Float64 Float64 Float64

View File

@ -0,0 +1,56 @@
SELECT 'Array min ', (arrayMin(array(1,2,3,4,5,6)));
SELECT 'Array max ', (arrayMax(array(1,2,3,4,5,6)));
SELECT 'Array sum ', (arraySum(array(1,2,3,4,5,6)));
SELECT 'Array avg ', (arrayAvg(array(1,2,3,4,5,6)));
DROP TABLE IF EXISTS test_aggregation;
CREATE TABLE test_aggregation (x Array(Int)) ENGINE=TinyLog;
INSERT INTO test_aggregation VALUES ([1,2,3,4,5,6]), ([]), ([1,2,3]);
SELECT 'Table array int min';
SELECT arrayMin(x) FROM test_aggregation;
SELECT 'Table array int max';
SELECT arrayMax(x) FROM test_aggregation;
SELECT 'Table array int sum';
SELECT arraySum(x) FROM test_aggregation;
SELECT 'Table array int avg';
SELECT arrayAvg(x) FROM test_aggregation;
DROP TABLE test_aggregation;
CREATE TABLE test_aggregation (x Array(Decimal64(8))) ENGINE=TinyLog;
INSERT INTO test_aggregation VALUES ([1,2,3,4,5,6]), ([]), ([1,2,3]);
SELECT 'Table array decimal min';
SELECT arrayMin(x) FROM test_aggregation;
SELECT 'Table array decimal max';
SELECT arrayMax(x) FROM test_aggregation;
SELECT 'Table array decimal sum';
SELECT arraySum(x) FROM test_aggregation;
SELECT 'Table array decimal avg';
SELECT arrayAvg(x) FROM test_aggregation;
DROP TABLE test_aggregation;
SELECT 'Types of aggregation result array min';
SELECT toTypeName(arrayMin([toInt8(0)])), toTypeName(arrayMin([toInt16(0)])), toTypeName(arrayMin([toInt32(0)])), toTypeName(arrayMin([toInt64(0)]));
SELECT toTypeName(arrayMin([toUInt8(0)])), toTypeName(arrayMin([toUInt16(0)])), toTypeName(arrayMin([toUInt32(0)])), toTypeName(arrayMin([toUInt64(0)]));
SELECT toTypeName(arrayMin([toFloat32(0)])), toTypeName(arrayMin([toFloat64(0)]));
SELECT toTypeName(arrayMin([toDecimal32(0, 8)])), toTypeName(arrayMin([toDecimal64(0, 8)])), toTypeName(arrayMin([toDecimal128(0, 8)]));
SELECT 'Types of aggregation result array max';
SELECT toTypeName(arrayMax([toInt8(0)])), toTypeName(arrayMax([toInt16(0)])), toTypeName(arrayMax([toInt32(0)])), toTypeName(arrayMax([toInt64(0)]));
SELECT toTypeName(arrayMax([toUInt8(0)])), toTypeName(arrayMax([toUInt16(0)])), toTypeName(arrayMax([toUInt32(0)])), toTypeName(arrayMax([toUInt64(0)]));
SELECT toTypeName(arrayMax([toFloat32(0)])), toTypeName(arrayMax([toFloat64(0)]));
SELECT toTypeName(arrayMax([toDecimal32(0, 8)])), toTypeName(arrayMax([toDecimal64(0, 8)])), toTypeName(arrayMax([toDecimal128(0, 8)]));
SELECT 'Types of aggregation result array summ';
SELECT toTypeName(arraySum([toInt8(0)])), toTypeName(arraySum([toInt16(0)])), toTypeName(arraySum([toInt32(0)])), toTypeName(arraySum([toInt64(0)]));
SELECT toTypeName(arraySum([toUInt8(0)])), toTypeName(arraySum([toUInt16(0)])), toTypeName(arraySum([toUInt32(0)])), toTypeName(arraySum([toUInt64(0)]));
SELECT toTypeName(arraySum([toFloat32(0)])), toTypeName(arraySum([toFloat64(0)]));
SELECT toTypeName(arraySum([toDecimal32(0, 8)])), toTypeName(arraySum([toDecimal64(0, 8)])), toTypeName(arraySum([toDecimal128(0, 8)]));
SELECT 'Types of aggregation result array avg';
SELECT toTypeName(arrayAvg([toInt8(0)])), toTypeName(arrayAvg([toInt16(0)])), toTypeName(arrayAvg([toInt32(0)])), toTypeName(arrayAvg([toInt64(0)]));
SELECT toTypeName(arrayAvg([toUInt8(0)])), toTypeName(arrayAvg([toUInt16(0)])), toTypeName(arrayAvg([toUInt32(0)])), toTypeName(arrayAvg([toUInt64(0)]));
SELECT toTypeName(arrayAvg([toFloat32(0)])), toTypeName(arrayAvg([toFloat64(0)]));
SELECT toTypeName(arrayAvg([toDecimal32(0, 8)])), toTypeName(arrayAvg([toDecimal64(0, 8)])), toTypeName(arrayAvg([toDecimal128(0, 8)]));

View File

@ -1,4 +1,5 @@
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypesNumber.h>
#include <IO/HashingWriteBuffer.h>
#include <IO/ReadBufferFromFile.h>
#include <IO/WriteBufferFromFile.h>

View File

@ -1,507 +0,0 @@
#!/usr/bin/env python3
# Note: should work with python 2 and 3
import requests
import json
import subprocess
import re
import os
import time
import logging
import codecs
import argparse
GITHUB_API_URL = 'https://api.github.com/'
SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__))
def http_get_json(url, token, max_retries, retry_timeout):
for t in range(max_retries):
if token:
resp = requests.get(url, headers={"Authorization": "token {}".format(token)})
else:
resp = requests.get(url)
if resp.status_code != 200:
msg = "Request {} failed with code {}.\n{}\n".format(url, resp.status_code, resp.text)
if resp.status_code == 403 or resp.status_code >= 500:
try:
if (resp.json()['message'].startswith('API rate limit exceeded') or resp.status_code >= 500) and t + 1 < max_retries:
logging.warning(msg)
time.sleep(retry_timeout)
continue
except Exception:
pass
raise Exception(msg)
return resp.json()
def github_api_get_json(query, token, max_retries, retry_timeout):
return http_get_json(GITHUB_API_URL + query, token, max_retries, retry_timeout)
def check_sha(sha):
if not (re.match('^[a-hA-H0-9]+$', sha) and len(sha) >= 7):
raise Exception("String " + sha + " doesn't look like git sha.")
def get_merge_base(first, second, project_root):
try:
command = "git merge-base {} {}".format(first, second)
text = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, cwd=project_root).stdout.read()
text = text.decode('utf-8', 'ignore')
sha = tuple(filter(len, text.split()))[0]
check_sha(sha)
return sha
except Exception:
logging.error('Cannot find merge base for %s and %s', first, second)
raise
def rev_parse(rev, project_root):
try:
command = "git rev-parse {}".format(rev)
text = subprocess.check_output(command, shell=True, cwd=project_root)
text = text.decode('utf-8', 'ignore')
sha = tuple(filter(len, text.split()))[0]
check_sha(sha)
return sha
except Exception:
logging.error('Cannot find revision %s', rev)
raise
# Get list of commits from branch to base_sha. Update commits_info.
def get_commits_from_branch(repo, branch, base_sha, commits_info, max_pages, token, max_retries, retry_timeout):
def get_commits_from_page(page):
query = 'repos/{}/commits?sha={}&page={}'.format(repo, branch, page)
resp = github_api_get_json(query, token, max_retries, retry_timeout)
for commit in resp:
sha = commit['sha']
if sha not in commits_info:
commits_info[sha] = commit
return [commit['sha'] for commit in resp]
commits = []
found_base_commit = False
for page in range(max_pages):
page_commits = get_commits_from_page(page)
for commit in page_commits:
if commit == base_sha:
found_base_commit = True
break
commits.append(commit)
if found_base_commit:
break
if not found_base_commit:
raise Exception("Can't found base commit sha {} in branch {}. Checked {} commits on {} pages.\nCommits: {}"
.format(base_sha, branch, len(commits), max_pages, ' '.join(commits)))
return commits
# Get list of commits a specified commit is cherry-picked from. Can return an empty list.
def parse_original_commits_from_cherry_pick_message(commit_message):
prefix = '(cherry picked from commits'
pos = commit_message.find(prefix)
if pos == -1:
prefix = '(cherry picked from commit'
pos = commit_message.find(prefix)
if pos == -1:
return []
pos += len(prefix)
endpos = commit_message.find(')', pos)
if endpos == -1:
return []
lst = [x.strip() for x in commit_message[pos:endpos].split(',')]
lst = [x for x in lst if x]
return lst
# Use GitHub search api to check if commit from any pull request. Update pull_requests info.
def find_pull_request_for_commit(commit_info, pull_requests, token, max_retries, retry_timeout):
commits = [commit_info['sha']] + parse_original_commits_from_cherry_pick_message(commit_info['commit']['message'])
# Special case for cherry-picked merge commits without -x option. Parse pr number from commit message and search it.
if commit_info['commit']['message'].startswith('Merge pull request'):
tokens = commit_info['commit']['message'][len('Merge pull request'):].split()
if len(tokens) > 0 and tokens[0].startswith('#'):
pr_number = tokens[0][1:]
if len(pr_number) > 0 and pr_number.isdigit():
commits = [pr_number]
query = 'search/issues?q={}+type:pr+repo:{}&sort=created&order=asc'.format(' '.join(commits), repo)
resp = github_api_get_json(query, token, max_retries, retry_timeout)
found = False
for item in resp['items']:
if 'pull_request' in item:
found = True
number = item['number']
if number not in pull_requests:
pull_requests[number] = {
'title': item['title'],
'description': item['body'],
'user': item['user']['login'],
}
return found
# Find pull requests from list of commits. If no pull request found, add commit to not_found_commits list.
def find_pull_requests(commits, commits_info, token, max_retries, retry_timeout):
not_found_commits = []
pull_requests = {}
for i, commit in enumerate(commits):
if (i + 1) % 10 == 0:
logging.info('Processed %d commits', i + 1)
if not find_pull_request_for_commit(commits_info[commit], pull_requests, token, max_retries, retry_timeout):
not_found_commits.append(commit)
return not_found_commits, pull_requests
# Find pull requests by list of numbers
def find_pull_requests_by_num(pull_requests_nums, token, max_retries, retry_timeout):
pull_requests = {}
for pr in pull_requests_nums:
item = github_api_get_json('repos/{}/pulls/{}'.format(repo, pr), token, max_retries, retry_timeout)
number = item['number']
if number not in pull_requests:
pull_requests[number] = {
'title': item['title'],
'description': item['body'],
'user': item['user']['login'],
}
return pull_requests
# Get users for all unknown commits and pull requests.
def get_users_info(pull_requests, commits_info, token, max_retries, retry_timeout):
users = {}
def update_user(user):
if user not in users:
query = 'users/{}'.format(user)
resp = github_api_get_json(query, token, max_retries, retry_timeout)
users[user] = resp
for pull_request in list(pull_requests.values()):
update_user(pull_request['user'])
for commit_info in list(commits_info.values()):
if 'committer' in commit_info and commit_info['committer'] is not None and 'login' in commit_info['committer']:
update_user(commit_info['committer']['login'])
else:
logging.warning('Not found author for commit %s.', commit_info['html_url'])
return users
# List of unknown commits -> text description.
def process_unknown_commits(commits, commits_info, users):
pattern = 'Commit: [{}]({})\nAuthor: {}\nMessage: {}'
texts = []
for commit in commits:
info = commits_info[commit]
html_url = info['html_url']
msg = info['commit']['message']
name = None
login = None
author = None
if not info['author']:
author = 'Unknown'
else:
# GitHub login
if 'login' in info['author']:
login = info['author']['login']
# First, try get name from github user
try:
name = users[login]['name']
except KeyError:
pass
else:
login = 'Unknown'
# Then, try get name from commit
if not name:
try:
name = info['commit']['author']['name']
except KeyError:
pass
author = '[{}]({})'.format(name or login, info['author']['html_url'])
texts.append(pattern.format(commit, html_url, author, msg))
text = 'Commits which are not from any pull request:\n\n'
return text + '\n\n'.join(texts)
# This function mirrors the PR description checks in ClickhousePullRequestTrigger.
# Returns False if the PR should not be mentioned changelog.
def parse_one_pull_request(item):
description = item['description']
lines = [line for line in [x.strip() for x in description.split('\n')] if line] if description else []
lines = [re.sub(r'\s+', ' ', l) for l in lines]
cat_pos = None
short_descr_pos = None
long_descr_pos = None
if lines:
for i in range(len(lines) - 1):
if re.match(r'(?i).*category.*:$', lines[i]):
cat_pos = i
if re.match(r'(?i)^\**\s*(Short description|Change\s*log entry)', lines[i]):
short_descr_pos = i
if re.match(r'(?i)^\**\s*Detailed description', lines[i]):
long_descr_pos = i
if cat_pos is None:
return False
cat = lines[cat_pos + 1]
cat = re.sub(r'^[-*\s]*', '', cat)
# Filter out the PR categories that are not for changelog.
if re.match(r'(?i)doc|((non|in|not|un)[-\s]*significant)|(not[ ]*for[ ]*changelog)', cat):
return False
short_descr = ''
if short_descr_pos:
short_descr_end = long_descr_pos or len(lines)
short_descr = lines[short_descr_pos + 1]
if short_descr_pos + 2 != short_descr_end:
short_descr += ' ...'
# If we have nothing meaningful
if not re.match('\w', short_descr):
short_descr = item['title']
# TODO: Add detailed description somewhere
item['entry'] = short_descr
item['category'] = cat
return True
# List of pull requests -> text description.
def process_pull_requests(pull_requests, users, repo):
groups = {}
for id, item in list(pull_requests.items()):
if not parse_one_pull_request(item):
continue
pattern = "{} [#{}]({}) ({})"
link = 'https://github.com/{}/pull/{}'.format(repo, id)
author = 'author not found'
if item['user'] in users:
# TODO get user from any commit if no user name on github
user = users[item['user']]
author = '[{}]({})'.format(user['name'] or user['login'], user['html_url'])
cat = item['category']
if cat not in groups:
groups[cat] = []
groups[cat].append(pattern.format(item['entry'], id, link, author))
categories_preferred_order = ['Backward Incompatible Change', 'New Feature', 'Bug Fix', 'Improvement', 'Performance Improvement', 'Build/Testing/Packaging Improvement', 'Other']
def categories_sort_key(name):
if name in categories_preferred_order:
return str(categories_preferred_order.index(name)).zfill(3)
else:
return name.lower()
texts = []
for group, text in sorted(list(groups.items()), key = lambda kv: categories_sort_key(kv[0])):
items = ['* {}'.format(pr) for pr in text]
texts.append('### {}\n{}'.format(group if group else '[No category]', '\n'.join(items)))
return '\n\n'.join(texts)
# Load inner state. For debug purposes.
def load_state(state_file, base_sha, new_tag, prev_tag):
state = {}
if state_file:
try:
if os.path.exists(state_file):
logging.info('Reading state from %s', state_file)
with codecs.open(state_file, encoding='utf-8') as f:
state = json.loads(f.read())
else:
logging.info('State file does not exist. Will create new one.')
except Exception as e:
logging.warning('Cannot load state from %s. Reason: %s', state_file, str(e))
if state:
if 'base_sha' not in state or 'new_tag' not in state or 'prev_tag' not in state:
logging.warning('Invalid state. Will create new one.')
elif state['base_sha'] == base_sha and state['new_tag'] == new_tag and state['prev_tag'] == prev_tag:
logging.info('State loaded.')
else:
logging.info('Loaded state has different tags or merge base sha. Will create new state.')
state = {}
return state
# Save inner state. For debug purposes.
def save_state(state_file, state):
with codecs.open(state_file, 'w', encoding='utf-8') as f:
f.write(json.dumps(state, indent=4, separators=(',', ': ')))
def make_changelog(new_tag, prev_tag, pull_requests_nums, repo, repo_folder, state_file, token, max_retries, retry_timeout):
base_sha = None
if new_tag and prev_tag:
base_sha = get_merge_base(new_tag, prev_tag, repo_folder)
logging.info('Base sha: %s', base_sha)
# Step 1. Get commits from merge_base to new_tag HEAD.
# Result is a list of commits + map with commits info (author, message)
commits_info = {}
commits = []
is_commits_loaded = False
# Step 2. For each commit check if it is from any pull request (using github search api).
# Result is a list of unknown commits + map with pull request info (author, description).
unknown_commits = []
pull_requests = {}
is_pull_requests_loaded = False
# Step 3. Map users with their info (Name)
users = {}
is_users_loaded = False
# Step 4. Make changelog text from data above.
state = load_state(state_file, base_sha, new_tag, prev_tag)
if state:
if 'commits' in state and 'commits_info' in state:
logging.info('Loading commits from %s', state_file)
commits_info = state['commits_info']
commits = state['commits']
is_commits_loaded = True
if 'pull_requests' in state and 'unknown_commits' in state:
logging.info('Loading pull requests from %s', state_file)
unknown_commits = state['unknown_commits']
pull_requests = state['pull_requests']
is_pull_requests_loaded = True
if 'users' in state:
logging.info('Loading users requests from %s', state_file)
users = state['users']
is_users_loaded = True
if base_sha:
state['base_sha'] = base_sha
state['new_tag'] = new_tag
state['prev_tag'] = prev_tag
if not is_commits_loaded:
logging.info('Getting commits using github api.')
commits = get_commits_from_branch(repo, new_tag, base_sha, commits_info, 100, token, max_retries, retry_timeout)
state['commits'] = commits
state['commits_info'] = commits_info
logging.info('Found %d commits from %s to %s.\n', len(commits), new_tag, base_sha)
save_state(state_file, state)
if not is_pull_requests_loaded:
logging.info('Searching for pull requests using github api.')
unknown_commits, pull_requests = find_pull_requests(commits, commits_info, token, max_retries, retry_timeout)
state['unknown_commits'] = unknown_commits
state['pull_requests'] = pull_requests
else:
pull_requests = find_pull_requests_by_num(pull_requests_nums.split(','), token, max_retries, retry_timeout)
logging.info('Found %d pull requests and %d unknown commits.\n', len(pull_requests), len(unknown_commits))
save_state(state_file, state)
if not is_users_loaded:
logging.info('Getting users info using github api.')
users = get_users_info(pull_requests, commits_info, token, max_retries, retry_timeout)
state['users'] = users
logging.info('Found %d users.', len(users))
save_state(state_file, state)
changelog = '{}\n\n{}'.format(process_pull_requests(pull_requests, users, repo), process_unknown_commits(unknown_commits, commits_info, users))
# Substitute links to issues
changelog = re.sub(r'(?<!\[)#(\d{4,})(?!\])', r'[#\1](https://github.com/{}/issues/\1)'.format(repo), changelog)
# Remove double whitespaces and trailing whitespaces
changelog = re.sub(r' {2,}| +$', r''.format(repo), changelog)
print(changelog.encode('utf-8'))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Make changelog.')
parser.add_argument('prev_release_tag', nargs='?', help='Git tag from previous release.')
parser.add_argument('new_release_tag', nargs='?', help='Git tag for new release.')
parser.add_argument('--pull-requests', help='Process the specified list of pull-request numbers (comma separated) instead of commits between tags.')
parser.add_argument('--token', help='Github token. Use it to increase github api query limit.')
parser.add_argument('--directory', help='ClickHouse repo directory. Script dir by default.')
parser.add_argument('--state', help='File to dump inner states result.', default='changelog_state.json')
parser.add_argument('--repo', help='ClickHouse repo on GitHub.', default='ClickHouse/ClickHouse')
parser.add_argument('--max_retry', default=100, type=int,
help='Max number of retries pre api query in case of API rate limit exceeded error.')
parser.add_argument('--retry_timeout', help='Timeout after retry in seconds.', type=int, default=5)
args = parser.parse_args()
prev_release_tag = args.prev_release_tag
new_release_tag = args.new_release_tag
token = args.token or ''
repo_folder = args.directory or SCRIPT_DIR
state_file = args.state
repo = args.repo
max_retry = args.max_retry
retry_timeout = args.retry_timeout
pull_requests = args.pull_requests
if (not prev_release_tag or not new_release_tag) and not pull_requests:
raise Exception('Either release tags or --pull-requests must be specified')
if prev_release_tag and new_release_tag and pull_requests:
raise Exception('Either release tags or --pull-requests must be specified')
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
repo_folder = os.path.expanduser(repo_folder)
new_release_tag = rev_parse(new_release_tag, repo_folder)
prev_release_tag = rev_parse(prev_release_tag, repo_folder)
make_changelog(new_release_tag, prev_release_tag, pull_requests, repo, repo_folder, state_file, token, max_retry, retry_timeout)