Merge branch 'master' into pufit/fix-dump-ast-with-sql-security

This commit is contained in:
pufit 2024-04-30 10:05:22 -04:00 committed by GitHub
commit 677a76c8d8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
142 changed files with 2600 additions and 811 deletions

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v24.4, 2024-04-30](#244)**<br/>
**[ClickHouse release v24.3 LTS, 2024-03-26](#243)**<br/>
**[ClickHouse release v24.2, 2024-02-29](#242)**<br/>
**[ClickHouse release v24.1, 2024-01-30](#241)**<br/>
@ -6,6 +7,168 @@
# 2024 Changelog
### <a id="244"></a> ClickHouse release 24.4 LTS, 2024-04-30
#### Upgrade Notes
* `clickhouse-odbc-bridge` and `clickhouse-library-bridge` are now separate packages. This closes [#61677](https://github.com/ClickHouse/ClickHouse/issues/61677). [#62114](https://github.com/ClickHouse/ClickHouse/pull/62114) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Don't allow to set max_parallel_replicas (for the experimental parallel reading from replicas) to `0` as it doesn't make sense. Closes [#60140](https://github.com/ClickHouse/ClickHouse/issues/60140). [#61201](https://github.com/ClickHouse/ClickHouse/pull/61201) ([Kruglov Pavel](https://github.com/Avogar)).
* Remove support for `INSERT WATCH` query (part of the deprecated `LIVE VIEW` feature). [#62382](https://github.com/ClickHouse/ClickHouse/pull/62382) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Removed the `optimize_monotonous_functions_in_order_by` setting. [#63004](https://github.com/ClickHouse/ClickHouse/pull/63004) ([Raúl Marín](https://github.com/Algunenano)).
* Remove experimental tag from the `Replicated` database engine. Now it is in Beta stage. [#62937](https://github.com/ClickHouse/ClickHouse/pull/62937) ([Justin de Guzman](https://github.com/justindeguzman)).
#### New Feature
* Support recursive CTEs. [#62074](https://github.com/ClickHouse/ClickHouse/pull/62074) ([Maksim Kita](https://github.com/kitaisreal)).
* Support `QUALIFY` clause. Closes [#47819](https://github.com/ClickHouse/ClickHouse/issues/47819). [#62619](https://github.com/ClickHouse/ClickHouse/pull/62619) ([Maksim Kita](https://github.com/kitaisreal)).
* Table engines are grantable now, and it won't affect existing users behavior. [#60117](https://github.com/ClickHouse/ClickHouse/pull/60117) ([jsc0218](https://github.com/jsc0218)).
* Added a rewritable S3 disk which supports INSERT operations and does not require locally stored metadata. [#61116](https://github.com/ClickHouse/ClickHouse/pull/61116) ([Julia Kartseva](https://github.com/jkartseva)). The main use case is for system tables.
* The syntax highlighting while typing in the client will work on the syntax level (previously, it worked on the lexer level). [#62123](https://github.com/ClickHouse/ClickHouse/pull/62123) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Supports dropping multiple tables at the same time like `DROP TABLE a, b, c`;. [#58705](https://github.com/ClickHouse/ClickHouse/pull/58705) ([zhongyuankai](https://github.com/zhongyuankai)).
* Modifying memory table settings through `ALTER MODIFY SETTING` is now supported. Example: `ALTER TABLE memory MODIFY SETTING min_rows_to_keep = 100, max_rows_to_keep = 1000;`. [#62039](https://github.com/ClickHouse/ClickHouse/pull/62039) ([zhongyuankai](https://github.com/zhongyuankai)).
* Added `role` query parameter to the HTTP interface. It works similarly to `SET ROLE x`, applying the role before the statement is executed. This allows for overcoming the limitation of the HTTP interface, as multiple statements are not allowed, and it is not possible to send both `SET ROLE x` and the statement itself at the same time. It is possible to set multiple roles that way, e.g., `?role=x&role=y`, which will be an equivalent of `SET ROLE x, y`. [#62669](https://github.com/ClickHouse/ClickHouse/pull/62669) ([Serge Klochkov](https://github.com/slvrtrn)).
* Add `SYSTEM UNLOAD PRIMARY KEY` to free up memory usage for a table's primary key. [#62738](https://github.com/ClickHouse/ClickHouse/pull/62738) ([Pablo Marcos](https://github.com/pamarcos)).
* Added `value1`, `value2`, ..., `value10` columns to `system.text_log`. These columns contain values that were used to format the message. [#59619](https://github.com/ClickHouse/ClickHouse/pull/59619) ([Alexey Katsman](https://github.com/alexkats)).
* Added persistent virtual column `_block_offset` which stores original number of row in block that was assigned at insert. Persistence of column `_block_offset` can be enabled by the MergeTree setting `enable_block_offset_column`. Added virtual column`_part_data_version` which contains either min block number or mutation version of part. Persistent virtual column `_block_number` is not considered experimental anymore. [#60676](https://github.com/ClickHouse/ClickHouse/pull/60676) ([Anton Popov](https://github.com/CurtizJ)).
* Add a setting `input_format_json_throw_on_bad_escape_sequence`, disabling it allows saving bad escape sequences in JSON input formats. [#61889](https://github.com/ClickHouse/ClickHouse/pull/61889) ([Kruglov Pavel](https://github.com/Avogar)).
#### Performance Improvement
* JOIN filter push down improvements using equivalent sets. [#61216](https://github.com/ClickHouse/ClickHouse/pull/61216) ([Maksim Kita](https://github.com/kitaisreal)).
* Convert OUTER JOIN to INNER JOIN optimization if the filter after JOIN always filters default values. Optimization can be controlled with setting `query_plan_convert_outer_join_to_inner_join`, enabled by default. [#62907](https://github.com/ClickHouse/ClickHouse/pull/62907) ([Maksim Kita](https://github.com/kitaisreal)).
* Enabled fast Parquet encoder by default (output_format_parquet_use_custom_encoder). [#62088](https://github.com/ClickHouse/ClickHouse/pull/62088) ([Michael Kolupaev](https://github.com/al13n321)).
* Improvement for AWS S3. Client has to send header 'Keep-Alive: timeout=X' to the server. If a client receives a response from the server with that header, client has to use the value from the server. Also for a client it is better not to use a connection which is nearly expired in order to avoid connection close race. [#62249](https://github.com/ClickHouse/ClickHouse/pull/62249) ([Sema Checherinda](https://github.com/CheSema)).
* Reduce overhead of the mutations for SELECTs (v2). [#60856](https://github.com/ClickHouse/ClickHouse/pull/60856) ([Azat Khuzhin](https://github.com/azat)).
* More frequently invoked functions in PODArray are now force-inlined. [#61144](https://github.com/ClickHouse/ClickHouse/pull/61144) ([李扬](https://github.com/taiyang-li)).
* Speed up parsing of JSON by skipping the rest of the object when all required columns are read. [#62210](https://github.com/ClickHouse/ClickHouse/pull/62210) ([lgbo](https://github.com/lgbo-ustc)).
* Improve trivial insert select from files in file/s3/hdfs/url/... table functions. Add separate max_parsing_threads setting to control the number of threads used in parallel parsing. [#62404](https://github.com/ClickHouse/ClickHouse/pull/62404) ([Kruglov Pavel](https://github.com/Avogar)).
* Functions `to_utc_timestamp` and `from_utc_timestamp` are now about 2x faster. [#62583](https://github.com/ClickHouse/ClickHouse/pull/62583) ([KevinyhZou](https://github.com/KevinyhZou)).
* Functions `parseDateTimeOrNull`, `parseDateTimeOrZero`, `parseDateTimeInJodaSyntaxOrNull` and `parseDateTimeInJodaSyntaxOrZero` now run significantly faster (10x - 1000x) when the input contains mostly non-parseable values. [#62634](https://github.com/ClickHouse/ClickHouse/pull/62634) ([LiuNeng](https://github.com/liuneng1994)).
* SELECTs against `system.query_cache` are now noticeably faster when the query cache contains lots of entries (e.g. more than 100.000). [#62671](https://github.com/ClickHouse/ClickHouse/pull/62671) ([Robert Schulze](https://github.com/rschu1ze)).
* Less contention in filesystem cache (part 3): execute removal from filesystem without lock on space reservation attempt. [#61163](https://github.com/ClickHouse/ClickHouse/pull/61163) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Speed up dynamic resize of filesystem cache. [#61723](https://github.com/ClickHouse/ClickHouse/pull/61723) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Dictionary source with `INVALIDATE_QUERY` is not reloaded twice on startup. [#62050](https://github.com/ClickHouse/ClickHouse/pull/62050) ([vdimir](https://github.com/vdimir)).
* Fix an issue where when a redundant `= 1` or `= 0` is added after a boolean expression involving the primary key, the primary index is not used. For example, both `SELECT * FROM <table> WHERE <primary-key> IN (<value>) = 1` and `SELECT * FROM <table> WHERE <primary-key> NOT IN (<value>) = 0` will both perform a full table scan, when the primary index can be used. [#62142](https://github.com/ClickHouse/ClickHouse/pull/62142) ([josh-hildred](https://github.com/josh-hildred)).
* Return stream of chunks from `system.remote_data_paths` instead of accumulating the whole result in one big chunk. This allows to consume less memory, show intermediate progress and cancel the query. [#62613](https://github.com/ClickHouse/ClickHouse/pull/62613) ([Alexander Gololobov](https://github.com/davenger)).
#### Experimental Feature
* Support parallel write buffer for Azure Blob Storage managed by setting `azure_allow_parallel_part_upload`. [#62534](https://github.com/ClickHouse/ClickHouse/pull/62534) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Userspace page cache works with static web storage (`disk(type = web)`) now. Use client setting `use_page_cache_for_disks_without_file_cache=1` to enable. [#61911](https://github.com/ClickHouse/ClickHouse/pull/61911) ([Michael Kolupaev](https://github.com/al13n321)).
* Don't treat Bool and number variants as suspicious in the `Variant` type. [#61999](https://github.com/ClickHouse/ClickHouse/pull/61999) ([Kruglov Pavel](https://github.com/Avogar)).
* Implement better conversion from String to `Variant` using parsing. [#62005](https://github.com/ClickHouse/ClickHouse/pull/62005) ([Kruglov Pavel](https://github.com/Avogar)).
* Support `Variant` in JSONExtract functions. [#62014](https://github.com/ClickHouse/ClickHouse/pull/62014) ([Kruglov Pavel](https://github.com/Avogar)).
* Mark type `Variant` as comparable so it can be used in primary key. [#62693](https://github.com/ClickHouse/ClickHouse/pull/62693) ([Kruglov Pavel](https://github.com/Avogar)).
#### Improvement
* For convenience purpose, `SELECT * FROM numbers() `will work in the same way as `SELECT * FROM system.numbers` - without a limit. [#61969](https://github.com/ClickHouse/ClickHouse/pull/61969) ([YenchangChan](https://github.com/YenchangChan)).
* Introduce separate consumer/producer tags for the Kafka configuration. This avoids warnings from librdkafka (a bad C library with a lot of bugs) that consumer properties were specified for producer instances and vice versa (e.g. `Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance`). Closes: [#58983](https://github.com/ClickHouse/ClickHouse/issues/58983). [#58956](https://github.com/ClickHouse/ClickHouse/pull/58956) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Functions `date_diff` and `age` now calculate their result at nanosecond instead of microsecond precision. They now also offer `nanosecond` (or `nanoseconds` or `ns`) as a possible value for the `unit` parameter. [#61409](https://github.com/ClickHouse/ClickHouse/pull/61409) ([Austin Kothig](https://github.com/kothiga)).
* Added nano-, micro-, milliseconds unit for `date_trunc`. [#62335](https://github.com/ClickHouse/ClickHouse/pull/62335) ([Misz606](https://github.com/Misz606)).
* Reload certificate chain during certificate reload. [#61671](https://github.com/ClickHouse/ClickHouse/pull/61671) ([Pervakov Grigorii](https://github.com/GrigoryPervakov)).
* Try to prevent an error [#60432](https://github.com/ClickHouse/ClickHouse/issues/60432) by not allowing a table to be attached if there is an active replica for that replica path. [#61876](https://github.com/ClickHouse/ClickHouse/pull/61876) ([Arthur Passos](https://github.com/arthurpassos)).
* Implement support for `input` for `clickhouse-local`. [#61923](https://github.com/ClickHouse/ClickHouse/pull/61923) ([Azat Khuzhin](https://github.com/azat)).
* `Join` table engine with strictness `ANY` is consistent after reload. When several rows with the same key are inserted, the first one will have higher priority (before, it was chosen randomly upon table loading). close [#51027](https://github.com/ClickHouse/ClickHouse/issues/51027). [#61972](https://github.com/ClickHouse/ClickHouse/pull/61972) ([vdimir](https://github.com/vdimir)).
* Automatically infer Nullable column types from Apache Arrow schema. [#61984](https://github.com/ClickHouse/ClickHouse/pull/61984) ([Maksim Kita](https://github.com/kitaisreal)).
* Allow to cancel parallel merge of aggregate states during aggregation. Example: `uniqExact`. [#61992](https://github.com/ClickHouse/ClickHouse/pull/61992) ([Maksim Kita](https://github.com/kitaisreal)).
* Use `system.keywords` to fill in the suggestions and also use them in the all places internally. [#62000](https://github.com/ClickHouse/ClickHouse/pull/62000) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* `OPTIMIZE FINAL` for `ReplicatedMergeTree` now will wait for currently active merges to finish and then reattempt to schedule a final merge. This will put it more in line with ordinary `MergeTree` behaviour. [#62067](https://github.com/ClickHouse/ClickHouse/pull/62067) ([Nikita Taranov](https://github.com/nickitat)).
* While read data from a hive text file, it would use the first line of hive text file to resize of number of input fields, and sometimes the fields number of first line is not matched with the hive table defined , such as the hive table is defined to have 3 columns, like `test_tbl(a Int32, b Int32, c Int32)`, but the first line of text file only has 2 fields, and in this suitation, the input fields will be resized to 2, and if the next line of the text file has 3 fields, then the third field can not be read but set a default value 0, which is not right. [#62086](https://github.com/ClickHouse/ClickHouse/pull/62086) ([KevinyhZou](https://github.com/KevinyhZou)).
* `CREATE AS` copies the table's comment. [#62117](https://github.com/ClickHouse/ClickHouse/pull/62117) ([Pablo Marcos](https://github.com/pamarcos)).
* Add query progress to table zookeeper. [#62152](https://github.com/ClickHouse/ClickHouse/pull/62152) ([JackyWoo](https://github.com/JackyWoo)).
* Add ability to turn on trace collector (Real and CPU) server-wide. [#62189](https://github.com/ClickHouse/ClickHouse/pull/62189) ([alesapin](https://github.com/alesapin)).
* Added setting `lightweight_deletes_sync` (default value: 2 - wait all replicas synchronously). It is similar to setting `mutations_sync` but affects only behaviour of lightweight deletes. [#62195](https://github.com/ClickHouse/ClickHouse/pull/62195) ([Anton Popov](https://github.com/CurtizJ)).
* Distinguish booleans and integers while parsing values for custom settings: `SET custom_a = true; SET custom_b = 1;`. [#62206](https://github.com/ClickHouse/ClickHouse/pull/62206) ([Vitaly Baranov](https://github.com/vitlibar)).
* Support S3 access through AWS Private Link Interface endpoints. Closes [#60021](https://github.com/ClickHouse/ClickHouse/issues/60021), [#31074](https://github.com/ClickHouse/ClickHouse/issues/31074) and [#53761](https://github.com/ClickHouse/ClickHouse/issues/53761). [#62208](https://github.com/ClickHouse/ClickHouse/pull/62208) ([Arthur Passos](https://github.com/arthurpassos)).
* Do not create a directory for UDF in clickhouse-client if it does not exist. This closes [#59597](https://github.com/ClickHouse/ClickHouse/issues/59597). [#62366](https://github.com/ClickHouse/ClickHouse/pull/62366) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The query cache now no longer caches results of queries against system tables (`system.*`, `information_schema.*`, `INFORMATION_SCHEMA.*`). [#62376](https://github.com/ClickHouse/ClickHouse/pull/62376) ([Robert Schulze](https://github.com/rschu1ze)).
* `MOVE PARTITION TO TABLE` query can be delayed or can throw `TOO_MANY_PARTS` exception to avoid exceeding limits on the part count. The same settings and limits are applied as for the`INSERT` query (see `max_parts_in_total`, `parts_to_delay_insert`, `parts_to_throw_insert`, `inactive_parts_to_throw_insert`, `inactive_parts_to_delay_insert`, `max_avg_part_size_for_too_many_parts`, `min_delay_to_insert_ms` and `max_delay_to_insert` settings). [#62420](https://github.com/ClickHouse/ClickHouse/pull/62420) ([Sergei Trifonov](https://github.com/serxa)).
* Changed the default installation directory on macOS from `/usr/bin` to `/usr/local/bin`. This is necessary because Apple's System Integrity Protection introduced with macOS El Capitan (2015) prevents writing into `/usr/bin`, even with `sudo`. [#62489](https://github.com/ClickHouse/ClickHouse/pull/62489) ([haohang](https://github.com/yokofly)).
* Make transform always return the first match. [#62518](https://github.com/ClickHouse/ClickHouse/pull/62518) ([Raúl Marín](https://github.com/Algunenano)).
* Added the missing `hostname` column to system table `blob_storage_log`. [#62456](https://github.com/ClickHouse/ClickHouse/pull/62456) ([Jayme Bird](https://github.com/jaymebrd)).
* For consistency with other system tables, `system.backup_log` now has a column `event_time`. [#62541](https://github.com/ClickHouse/ClickHouse/pull/62541) ([Jayme Bird](https://github.com/jaymebrd)).
* Table `system.backup_log` now has the "default" sorting key which is `event_date, event_time`, the same as for other `_log` table engines. [#62667](https://github.com/ClickHouse/ClickHouse/pull/62667) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Avoid evaluating table DEFAULT expressions while executing `RESTORE`. [#62601](https://github.com/ClickHouse/ClickHouse/pull/62601) ([Vitaly Baranov](https://github.com/vitlibar)).
* S3 storage and backups also need the same default keep alive settings as s3 disk. [#62648](https://github.com/ClickHouse/ClickHouse/pull/62648) ([Sema Checherinda](https://github.com/CheSema)).
* Add librdkafka's (that infamous C library, which has a lot of bugs) client identifier to log messages to be able to differentiate log messages from different consumers of a single table. [#62813](https://github.com/ClickHouse/ClickHouse/pull/62813) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Allow special macros `{uuid}` and `{database}` in a Replicated database ZooKeeper path. [#62818](https://github.com/ClickHouse/ClickHouse/pull/62818) ([Vitaly Baranov](https://github.com/vitlibar)).
* Allow quota key with different auth scheme in HTTP requests. [#62842](https://github.com/ClickHouse/ClickHouse/pull/62842) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Reduce the verbosity of command line argument `--help` in `clickhouse client` and `clickhouse local`. The previous output is now generated by `--help --verbose`. [#62973](https://github.com/ClickHouse/ClickHouse/pull/62973) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* `log_bin_use_v1_row_events` was removed in MySQL 8.3, and we adjust the experimental `MaterializedMySQL` engine for it [#60479](https://github.com/ClickHouse/ClickHouse/issues/60479). [#63101](https://github.com/ClickHouse/ClickHouse/pull/63101) ([Eugene Klimov](https://github.com/Slach)). Author: Nikolay Yankin.
#### Build/Testing/Packaging Improvement
* Vendor in Rust dependencies, so the Rust code (that we use for minor features for hype and lulz) can be built in a sane way, similarly to C++. [#62297](https://github.com/ClickHouse/ClickHouse/pull/62297) ([Raúl Marín](https://github.com/Algunenano)).
* ClickHouse now uses OpenSSL 3.2 instead of BoringSSL. [#59870](https://github.com/ClickHouse/ClickHouse/pull/59870) ([Robert Schulze](https://github.com/rschu1ze)). Note that OpenSSL has generally worse engineering culture (such as non-zero number of sanitizer reports, that we had to patch, a complex build system with generated files, etc.) but has better compatibility.
* Ignore DROP queries in stress test with 1/2 probability, use TRUNCATE instead of ignoring DROP in upgrade check for Memory/JOIN tables. [#61476](https://github.com/ClickHouse/ClickHouse/pull/61476) ([Kruglov Pavel](https://github.com/Avogar)).
* Remove from the Keeper Docker image the volumes at /etc/clickhouse-keeper and /var/log/clickhouse-keeper. [#61683](https://github.com/ClickHouse/ClickHouse/pull/61683) ([Tristan](https://github.com/Tristan971)).
* Add tests for all issues which are no longer relevant with Analyzer being enabled by default. Closes: [#55794](https://github.com/ClickHouse/ClickHouse/issues/55794) Closes: [#49472](https://github.com/ClickHouse/ClickHouse/issues/49472) Closes: [#44414](https://github.com/ClickHouse/ClickHouse/issues/44414) Closes: [#13843](https://github.com/ClickHouse/ClickHouse/issues/13843) Closes: [#55803](https://github.com/ClickHouse/ClickHouse/issues/55803) Closes: [#48308](https://github.com/ClickHouse/ClickHouse/issues/48308) Closes: [#45535](https://github.com/ClickHouse/ClickHouse/issues/45535) Closes: [#44365](https://github.com/ClickHouse/ClickHouse/issues/44365) Closes: [#44153](https://github.com/ClickHouse/ClickHouse/issues/44153) Closes: [#42399](https://github.com/ClickHouse/ClickHouse/issues/42399) Closes: [#27115](https://github.com/ClickHouse/ClickHouse/issues/27115) Closes: [#23162](https://github.com/ClickHouse/ClickHouse/issues/23162) Closes: [#15395](https://github.com/ClickHouse/ClickHouse/issues/15395) Closes: [#15411](https://github.com/ClickHouse/ClickHouse/issues/15411) Closes: [#14978](https://github.com/ClickHouse/ClickHouse/issues/14978) Closes: [#17319](https://github.com/ClickHouse/ClickHouse/issues/17319) Closes: [#11813](https://github.com/ClickHouse/ClickHouse/issues/11813) Closes: [#13210](https://github.com/ClickHouse/ClickHouse/issues/13210) Closes: [#23053](https://github.com/ClickHouse/ClickHouse/issues/23053) Closes: [#37729](https://github.com/ClickHouse/ClickHouse/issues/37729) Closes: [#32639](https://github.com/ClickHouse/ClickHouse/issues/32639) Closes: [#9954](https://github.com/ClickHouse/ClickHouse/issues/9954) Closes: [#41964](https://github.com/ClickHouse/ClickHouse/issues/41964) Closes: [#54317](https://github.com/ClickHouse/ClickHouse/issues/54317) Closes: [#7520](https://github.com/ClickHouse/ClickHouse/issues/7520) Closes: [#36973](https://github.com/ClickHouse/ClickHouse/issues/36973) Closes: [#40955](https://github.com/ClickHouse/ClickHouse/issues/40955) Closes: [#19687](https://github.com/ClickHouse/ClickHouse/issues/19687) Closes: [#23104](https://github.com/ClickHouse/ClickHouse/issues/23104) Closes: [#21584](https://github.com/ClickHouse/ClickHouse/issues/21584) Closes: [#23344](https://github.com/ClickHouse/ClickHouse/issues/23344) Closes: [#22627](https://github.com/ClickHouse/ClickHouse/issues/22627) Closes: [#10276](https://github.com/ClickHouse/ClickHouse/issues/10276) Closes: [#19687](https://github.com/ClickHouse/ClickHouse/issues/19687) Closes: [#4567](https://github.com/ClickHouse/ClickHouse/issues/4567) Closes: [#17710](https://github.com/ClickHouse/ClickHouse/issues/17710) Closes: [#11068](https://github.com/ClickHouse/ClickHouse/issues/11068) Closes: [#24395](https://github.com/ClickHouse/ClickHouse/issues/24395) Closes: [#23416](https://github.com/ClickHouse/ClickHouse/issues/23416) Closes: [#23162](https://github.com/ClickHouse/ClickHouse/issues/23162) Closes: [#25655](https://github.com/ClickHouse/ClickHouse/issues/25655) Closes: [#11757](https://github.com/ClickHouse/ClickHouse/issues/11757) Closes: [#6571](https://github.com/ClickHouse/ClickHouse/issues/6571) Closes: [#4432](https://github.com/ClickHouse/ClickHouse/issues/4432) Closes: [#8259](https://github.com/ClickHouse/ClickHouse/issues/8259) Closes: [#9233](https://github.com/ClickHouse/ClickHouse/issues/9233) Closes: [#14699](https://github.com/ClickHouse/ClickHouse/issues/14699) Closes: [#27068](https://github.com/ClickHouse/ClickHouse/issues/27068) Closes: [#28687](https://github.com/ClickHouse/ClickHouse/issues/28687) Closes: [#28777](https://github.com/ClickHouse/ClickHouse/issues/28777) Closes: [#29734](https://github.com/ClickHouse/ClickHouse/issues/29734) Closes: [#61238](https://github.com/ClickHouse/ClickHouse/issues/61238) Closes: [#33825](https://github.com/ClickHouse/ClickHouse/issues/33825) Closes: [#35608](https://github.com/ClickHouse/ClickHouse/issues/35608) Closes: [#29838](https://github.com/ClickHouse/ClickHouse/issues/29838) Closes: [#35652](https://github.com/ClickHouse/ClickHouse/issues/35652) Closes: [#36189](https://github.com/ClickHouse/ClickHouse/issues/36189) Closes: [#39634](https://github.com/ClickHouse/ClickHouse/issues/39634) Closes: [#47432](https://github.com/ClickHouse/ClickHouse/issues/47432) Closes: [#54910](https://github.com/ClickHouse/ClickHouse/issues/54910) Closes: [#57321](https://github.com/ClickHouse/ClickHouse/issues/57321) Closes: [#59154](https://github.com/ClickHouse/ClickHouse/issues/59154) Closes: [#61014](https://github.com/ClickHouse/ClickHouse/issues/61014) Closes: [#61950](https://github.com/ClickHouse/ClickHouse/issues/61950) Closes: [#55647](https://github.com/ClickHouse/ClickHouse/issues/55647) Closes: [#61947](https://github.com/ClickHouse/ClickHouse/issues/61947). [#62185](https://github.com/ClickHouse/ClickHouse/pull/62185) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add more tests from issues which are no longer relevant or fixed by analyzer. Closes: [#58985](https://github.com/ClickHouse/ClickHouse/issues/58985) Closes: [#59549](https://github.com/ClickHouse/ClickHouse/issues/59549) Closes: [#36963](https://github.com/ClickHouse/ClickHouse/issues/36963) Closes: [#39453](https://github.com/ClickHouse/ClickHouse/issues/39453) Closes: [#56521](https://github.com/ClickHouse/ClickHouse/issues/56521) Closes: [#47552](https://github.com/ClickHouse/ClickHouse/issues/47552) Closes: [#56503](https://github.com/ClickHouse/ClickHouse/issues/56503) Closes: [#59101](https://github.com/ClickHouse/ClickHouse/issues/59101) Closes: [#50271](https://github.com/ClickHouse/ClickHouse/issues/50271) Closes: [#54954](https://github.com/ClickHouse/ClickHouse/issues/54954) Closes: [#56466](https://github.com/ClickHouse/ClickHouse/issues/56466) Closes: [#11000](https://github.com/ClickHouse/ClickHouse/issues/11000) Closes: [#10894](https://github.com/ClickHouse/ClickHouse/issues/10894) Closes: https://github.com/ClickHouse/ClickHouse/issues/448 Closes: [#8030](https://github.com/ClickHouse/ClickHouse/issues/8030) Closes: [#32139](https://github.com/ClickHouse/ClickHouse/issues/32139) Closes: [#47288](https://github.com/ClickHouse/ClickHouse/issues/47288) Closes: [#50705](https://github.com/ClickHouse/ClickHouse/issues/50705) Closes: [#54511](https://github.com/ClickHouse/ClickHouse/issues/54511) Closes: [#55466](https://github.com/ClickHouse/ClickHouse/issues/55466) Closes: [#58500](https://github.com/ClickHouse/ClickHouse/issues/58500) Closes: [#39923](https://github.com/ClickHouse/ClickHouse/issues/39923) Closes: [#39855](https://github.com/ClickHouse/ClickHouse/issues/39855) Closes: [#4596](https://github.com/ClickHouse/ClickHouse/issues/4596) Closes: [#47422](https://github.com/ClickHouse/ClickHouse/issues/47422) Closes: [#33000](https://github.com/ClickHouse/ClickHouse/issues/33000) Closes: [#14739](https://github.com/ClickHouse/ClickHouse/issues/14739) Closes: [#44039](https://github.com/ClickHouse/ClickHouse/issues/44039) Closes: [#8547](https://github.com/ClickHouse/ClickHouse/issues/8547) Closes: [#22923](https://github.com/ClickHouse/ClickHouse/issues/22923) Closes: [#23865](https://github.com/ClickHouse/ClickHouse/issues/23865) Closes: [#29748](https://github.com/ClickHouse/ClickHouse/issues/29748) Closes: [#4222](https://github.com/ClickHouse/ClickHouse/issues/4222). [#62457](https://github.com/ClickHouse/ClickHouse/pull/62457) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fixed build errors when OpenSSL is linked dynamically (note: this is generally unsupported and only required for IBM's s390x platforms). [#62888](https://github.com/ClickHouse/ClickHouse/pull/62888) ([Harry Lee](https://github.com/HarryLeeIBM)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix logical-error when undoing quorum insert transaction. [#61953](https://github.com/ClickHouse/ClickHouse/pull/61953) ([Han Fei](https://github.com/hanfei1991)).
* Fix parser error when using COUNT(*) with FILTER clause [#61357](https://github.com/ClickHouse/ClickHouse/pull/61357) ([Duc Canh Le](https://github.com/canhld94)).
* Fix logical error in `group_by_use_nulls` + grouping sets + analyzer + materialize/constant [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)).
* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix abort in Apache Arrow [#61720](https://github.com/ClickHouse/ClickHouse/pull/61720) ([Kruglov Pavel](https://github.com/Avogar)).
* Search for `convert_to_replicated` flag at the correct path corresponding to the specific disk [#61769](https://github.com/ClickHouse/ClickHouse/pull/61769) ([Kirill](https://github.com/kirillgarbar)).
* Fix possible connections data-race for distributed_foreground_insert/distributed_background_insert_batch [#61867](https://github.com/ClickHouse/ClickHouse/pull/61867) ([Azat Khuzhin](https://github.com/azat)).
* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix writing exception message in output format in HTTP when http_wait_end_of_query is used [#61951](https://github.com/ClickHouse/ClickHouse/pull/61951) ([Kruglov Pavel](https://github.com/Avogar)).
* Proper fix for LowCardinality together with JSONExtact functions [#61957](https://github.com/ClickHouse/ClickHouse/pull/61957) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Crash in Engine Merge if Row Policy does not have expression [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)).
* Fix WriteBufferAzureBlobStorage destructor uncaught exception [#61988](https://github.com/ClickHouse/ClickHouse/pull/61988) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix CREATE TABLE without columns definition for ReplicatedMergeTree [#62040](https://github.com/ClickHouse/ClickHouse/pull/62040) ([Azat Khuzhin](https://github.com/azat)).
* Fix optimize_skip_unused_shards_rewrite_in for composite sharding key [#62047](https://github.com/ClickHouse/ClickHouse/pull/62047) ([Azat Khuzhin](https://github.com/azat)).
* ReadWriteBufferFromHTTP set right header host when redirected [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)).
* Fix external table cannot parse data type Bool [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)).
* Analyzer: Fix query parameter resolution [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)).
* Fix restoring parts while readonly [#62207](https://github.com/ClickHouse/ClickHouse/pull/62207) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix crash in index definition containing SQL UDF [#62225](https://github.com/ClickHouse/ClickHouse/pull/62225) ([vdimir](https://github.com/vdimir)).
* Fixing NULL random seed for generateRandom with analyzer. [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Correctly handle const columns in Distinct Transfom [#62250](https://github.com/ClickHouse/ClickHouse/pull/62250) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix Parts Splitter for queries with the FINAL modifier [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)).
* Analyzer: Fix alias to parametrized view resolution [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)).
* Analyzer: Fix name resolution from parent scopes [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)).
* Fix argMax with nullable non native numeric column [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)).
* Fix BACKUP and RESTORE of a materialized view in Ordinary database [#62295](https://github.com/ClickHouse/ClickHouse/pull/62295) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix data race on scalars in Context [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix primary key in materialized view [#62319](https://github.com/ClickHouse/ClickHouse/pull/62319) ([Murat Khairulin](https://github.com/mxwell)).
* Do not build multithread insert pipeline for tables without support [#62333](https://github.com/ClickHouse/ClickHouse/pull/62333) ([vdimir](https://github.com/vdimir)).
* Fix analyzer with positional arguments in distributed query [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)).
* Fix filter pushdown from additional_table_filters in Merge engine in analyzer [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix GLOBAL IN table queries with analyzer. [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Respect settings truncate_on_insert/create_new_file_on_insert in s3/hdfs/azure engines during partitioned write [#62425](https://github.com/ClickHouse/ClickHouse/pull/62425) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix backup restore path for AzureBlobStorage [#62447](https://github.com/ClickHouse/ClickHouse/pull/62447) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix SimpleSquashingChunksTransform [#62451](https://github.com/ClickHouse/ClickHouse/pull/62451) ([Nikita Taranov](https://github.com/nickitat)).
* Fix capture of nested lambda. [#62462](https://github.com/ClickHouse/ClickHouse/pull/62462) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Avoid crash when reading protobuf with recursive types [#62506](https://github.com/ClickHouse/ClickHouse/pull/62506) ([Raúl Marín](https://github.com/Algunenano)).
* Fix a bug moving one partition from one to itself [#62524](https://github.com/ClickHouse/ClickHouse/pull/62524) ([helifu](https://github.com/helifu)).
* Fix scalar subquery in LIMIT [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix segfault in the experimental and unsupported Hive engine, which we don't like anyway [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix memory leak in groupArraySorted [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix crash in largestTriangleThreeBuckets [#62646](https://github.com/ClickHouse/ClickHouse/pull/62646) ([Raúl Marín](https://github.com/Algunenano)).
* Fix tumble\[Start,End\] and hop\[Start,End\] for bigger resolutions [#62705](https://github.com/ClickHouse/ClickHouse/pull/62705) ([Jordi Villar](https://github.com/jrdi)).
* Fix argMin/argMax combinator state [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)).
* Fix temporary data in cache failing because of cache lock contention optimization [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix crash in function `mergeTreeIndex` [#62762](https://github.com/ClickHouse/ClickHouse/pull/62762) ([Anton Popov](https://github.com/CurtizJ)).
* fix: update: nested materialized columns: size check fixes [#62773](https://github.com/ClickHouse/ClickHouse/pull/62773) ([Eliot Hautefeuille](https://github.com/hileef)).
* Fix FINAL modifier is not respected in CTE with analyzer [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)).
* Fix crash in function `formatRow` with `JSON` format and HTTP interface [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)).
* Azure: fix building final url from endpoint object [#62850](https://github.com/ClickHouse/ClickHouse/pull/62850) ([Daniel Pozo Escalona](https://github.com/danipozo)).
* Fix GCD codec [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)).
* Fix LowCardinality(Nullable) key in hyperrectangle [#62866](https://github.com/ClickHouse/ClickHouse/pull/62866) ([Amos Bird](https://github.com/amosbird)).
* Fix fromUnixtimestamp in joda syntax while the input value beyond UInt32 [#62901](https://github.com/ClickHouse/ClickHouse/pull/62901) ([KevinyhZou](https://github.com/KevinyhZou)).
* Disable optimize_rewrite_aggregate_function_with_if for sum(nullable) [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)).
* Fix PREWHERE for StorageBuffer with different source table column types. [#62916](https://github.com/ClickHouse/ClickHouse/pull/62916) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix temporary data in cache incorrectly processing failure of cache key directory creation [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)).
* gRPC: fix crash on IPv6 peer connection [#62978](https://github.com/ClickHouse/ClickHouse/pull/62978) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix possible CHECKSUM_DOESNT_MATCH (and others) during replicated fetches [#62987](https://github.com/ClickHouse/ClickHouse/pull/62987) ([Azat Khuzhin](https://github.com/azat)).
* Fix terminate with uncaught exception in temporary data in cache [#62998](https://github.com/ClickHouse/ClickHouse/pull/62998) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix optimize_rewrite_aggregate_function_with_if implicit cast [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)).
* Fix unhandled exception in ~RestorerFromBackup [#63040](https://github.com/ClickHouse/ClickHouse/pull/63040) ([Vitaly Baranov](https://github.com/vitlibar)).
* Do not remove server constants from GROUP BY key for secondary query. [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix incorrect judgement of of monotonicity of function abs [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)).
* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)).
* Use user specified db instead of "config" for MongoDB wire protocol version check [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)).
### <a id="243"></a> ClickHouse release 24.3 LTS, 2024-03-27
#### Upgrade Notes

View File

@ -769,6 +769,7 @@ In addition to local block devices, ClickHouse supports these storage types:
- [`web` for read-only from web](#web-storage)
- [`cache` for local caching](/docs/en/operations/storing-data.md/#using-local-cache)
- [`s3_plain` for backups to S3](/docs/en/operations/backup#backuprestore-using-an-s3-disk)
- [`s3_plain_rewritable` for immutable, non-replicated tables in S3](/docs/en/operations/storing-data.md#s3-plain-rewritable-storage)
## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes}

View File

@ -28,7 +28,7 @@ Starting from 24.1 clickhouse version, it is possible to use a new configuration
It requires to specify:
1. `type` equal to `object_storage`
2. `object_storage_type`, equal to one of `s3`, `azure_blob_storage` (or just `azure` from `24.3`), `hdfs`, `local_blob_storage` (or just `local` from `24.3`), `web`.
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web`.
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web` and, starting from `24.4`, `plain_rewritable`.
Usage of `plain` metadata type is described in [plain storage section](/docs/en/operations/storing-data.md/#storing-data-on-webserver), `web` metadata type can be used only with `web` object storage type, `local` metadata type stores metadata files locally (each metadata files contains mapping to files in object storage and some additional meta information about them).
E.g. configuration option
@ -341,6 +341,36 @@ Configuration:
</s3_plain>
```
### Using S3 Plain Rewritable Storage {#s3-plain-rewritable-storage}
A new disk type `s3_plain_rewritable` was introduced in `24.4`.
Similar to the `s3_plain` disk type, it does not require additional storage for metadata files; instead, metadata is stored in S3.
Unlike `s3_plain` disk type, `s3_plain_rewritable` allows executing merges and supports INSERT operations.
[Mutations](/docs/en/sql-reference/statements/alter#mutations) and replication of tables are not supported.
A use case for this disk type are non-replicated `MergeTree` tables. Although the `s3` disk type is suitable for non-replicated
MergeTree tables, you may opt for the `s3_plain_rewritable` disk type if you do not require local metadata for the table and are
willing to accept a limited set of operations. This could be useful, for example, for system tables.
Configuration:
``` xml
<s3_plain_rewritable>
<type>s3_plain_rewritable</type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_environment_credentials>1</use_environment_credentials>
</s3_plain_rewritable>
```
is equal to
``` xml
<s3_plain_rewritable>
<type>object_storage</type>
<object_storage_type>s3</object_storage_type>
<metadata_type>plain_rewritable</metadata_type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_environment_credentials>1</use_environment_credentials>
</s3_plain_rewritable>
```
### Using Azure Blob Storage {#azure-blob-storage}
`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.

View File

@ -1147,13 +1147,13 @@ tryBase58Decode(encoded)
Query:
```sql
SELECT tryBase58Decode('3dc8KtHrwM') as res;
SELECT tryBase58Decode('3dc8KtHrwM') as res, tryBase58Decode('invalid') as res_invalid;
```
```response
┌─res─────┐
│ Encoded │
└─────────┘
┌─res─────┬─res_invalid─
│ Encoded │
└─────────┴─────────────
```
## base64Encode
@ -1187,13 +1187,13 @@ tryBase64Decode(encoded)
Query:
```sql
SELECT tryBase64Decode('RW5jb2RlZA==') as res;
SELECT tryBase64Decode('RW5jb2RlZA==') as res, tryBase64Decode('invalid') as res_invalid;
```
```response
┌─res─────┐
│ Encoded │
└─────────┘
┌─res─────┬─res_invalid─
│ Encoded │
└─────────┴─────────────
```
## endsWith {#endswith}

View File

@ -532,3 +532,15 @@ If there's a refresh in progress for the given view, interrupt and cancel it. Ot
```sql
SYSTEM CANCEL VIEW [db.]name
```
### SYSTEM UNLOAD PRIMARY KEY
Unload the primary keys for the given table or for all tables.
```sql
SYSTEM UNLOAD PRIMARY KEY [db.]name
```
```sql
SYSTEM UNLOAD PRIMARY KEY
```

View File

@ -121,9 +121,12 @@ if (BUILD_STANDALONE_KEEPER)
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/DiskType.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/IObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataOperationsHolder.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromDisk.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataFromDiskTransactionState.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageTransactionState.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/DiskObjectStorageMetadata.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromDiskTransactionOperations.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/DiskObjectStorage.cpp
@ -137,6 +140,7 @@ if (BUILD_STANDALONE_KEEPER)
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/S3Capabilities.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/diskSettings.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/DiskS3Utils.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/CommonPathPrefixKeyGenerator.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/ObjectStorageFactory.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFactory.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/RegisterDiskObjectStorage.cpp

View File

@ -210,6 +210,7 @@ enum class AccessType
M(SYSTEM_FAILPOINT, "SYSTEM ENABLE FAILPOINT, SYSTEM DISABLE FAILPOINT, SYSTEM WAIT FAILPOINT", GLOBAL, SYSTEM) \
M(SYSTEM_LISTEN, "SYSTEM START LISTEN, SYSTEM STOP LISTEN", GLOBAL, SYSTEM) \
M(SYSTEM_JEMALLOC, "SYSTEM JEMALLOC PURGE, SYSTEM JEMALLOC ENABLE PROFILE, SYSTEM JEMALLOC DISABLE PROFILE, SYSTEM JEMALLOC FLUSH PROFILE", GLOBAL, SYSTEM) \
M(SYSTEM_UNLOAD_PRIMARY_KEY, "SYSTEM UNLOAD PRIMARY KEY", TABLE, SYSTEM) \
M(SYSTEM, "", GROUP, ALL) /* allows to execute SYSTEM {SHUTDOWN|RELOAD CONFIG|...} */ \
\
M(dictGet, "dictHas, dictGetHierarchy, dictIsIn", DICTIONARY, ALL) /* allows to execute functions dictGet(), dictHas(), dictGetHierarchy(), dictIsIn() */\

View File

@ -53,8 +53,9 @@ TEST(AccessRights, Union)
"SHOW ROW POLICIES, SYSTEM MERGES, SYSTEM TTL MERGES, SYSTEM FETCHES, "
"SYSTEM MOVES, SYSTEM PULLING REPLICATION LOG, SYSTEM CLEANUP, SYSTEM VIEWS, SYSTEM SENDS, SYSTEM REPLICATION QUEUES, SYSTEM VIRTUAL PARTS UPDATE, "
"SYSTEM DROP REPLICA, SYSTEM SYNC REPLICA, SYSTEM RESTART REPLICA, "
"SYSTEM RESTORE REPLICA, SYSTEM WAIT LOADING PARTS, SYSTEM SYNC DATABASE REPLICA, SYSTEM FLUSH DISTRIBUTED, dictGet ON db1.*, "
"GRANT TABLE ENGINE ON db1, GRANT SET DEFINER ON db1, GRANT NAMED COLLECTION ADMIN ON db1");
"SYSTEM RESTORE REPLICA, SYSTEM WAIT LOADING PARTS, SYSTEM SYNC DATABASE REPLICA, SYSTEM FLUSH DISTRIBUTED, "
"SYSTEM UNLOAD PRIMARY KEY, dictGet ON db1.*, GRANT TABLE ENGINE ON db1, "
"GRANT SET DEFINER ON db1, GRANT NAMED COLLECTION ADMIN ON db1");
}

View File

@ -10,9 +10,10 @@
#include <Interpreters/Context.h>
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/ConstantNode.h>
#include <Analyzer/FunctionNode.h>
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/Utils.h>
namespace DB
{
@ -52,17 +53,24 @@ public:
const auto & second_const_value = second_const_node->getValue();
if (second_const_value.isNull()
|| (lower_name == "sum" && isInt64OrUInt64FieldType(second_const_value.getType()) && second_const_value.get<UInt64>() == 0
&& !function_node->getResultType()->isNullable()))
&& !if_node->getResultType()->isNullable()))
{
/// avg(if(cond, a, null)) -> avgIf(a, cond)
/// avg(if(cond, nullable_a, null)) -> avgIfOrNull(a, cond)
/// avg(if(cond, a, null)) -> avgIf(a::ResultTypeIf, cond)
/// avg(if(cond, nullable_a, null)) -> avgIf(nullable_a, cond)
/// sum(if(cond, a, 0)) -> sumIf(a, cond)
/// sum(if(cond, nullable_a, 0)) **is not** equivalent to sumIfOrNull(cond, nullable_a) as
/// it changes the output when no rows pass the condition (from 0 to NULL)
function_arguments_nodes.resize(2);
function_arguments_nodes[0] = std::move(if_arguments_nodes[1]);
function_arguments_nodes[1] = std::move(if_arguments_nodes[0]);
QueryTreeNodes new_arguments{2};
/// We need to preserve the output type from if()
if (if_arguments_nodes[1]->getResultType()->getName() != if_node->getResultType()->getName())
new_arguments[0] = createCastFunction(std::move(if_arguments_nodes[1]), if_node->getResultType(), getContext());
else
new_arguments[0] = std::move(if_arguments_nodes[1]);
new_arguments[1] = std::move(if_arguments_nodes[0]);
function_arguments_nodes = std::move(new_arguments);
resolveAsAggregateFunctionWithIf(
*function_node, {function_arguments_nodes[0]->getResultType(), function_arguments_nodes[1]->getResultType()});
}
@ -72,21 +80,27 @@ public:
const auto & first_const_value = first_const_node->getValue();
if (first_const_value.isNull()
|| (lower_name == "sum" && isInt64OrUInt64FieldType(first_const_value.getType()) && first_const_value.get<UInt64>() == 0
&& !function_node->getResultType()->isNullable()))
&& !if_node->getResultType()->isNullable()))
{
/// avg(if(cond, null, a) -> avgIfOrNullable(a, !cond))
/// avg(if(cond, null, a) -> avgIf(a::ResultTypeIf, !cond))
/// sum(if(cond, 0, a) -> sumIf(a, !cond))
/// sum(if(cond, 0, nullable_a) **is not** sumIf(a, !cond)) -> Same as above
QueryTreeNodes new_arguments{2};
if (if_arguments_nodes[2]->getResultType()->getName() != if_node->getResultType()->getName())
new_arguments[0] = createCastFunction(std::move(if_arguments_nodes[2]), if_node->getResultType(), getContext());
else
new_arguments[0] = std::move(if_arguments_nodes[2]);
auto not_function = std::make_shared<FunctionNode>("not");
auto & not_function_arguments = not_function->getArguments().getNodes();
not_function_arguments.push_back(std::move(if_arguments_nodes[0]));
not_function->resolveAsFunction(
FunctionFactory::instance().get("not", getContext())->build(not_function->getArgumentColumns()));
new_arguments[1] = std::move(not_function);
function_arguments_nodes.resize(2);
function_arguments_nodes[0] = std::move(if_arguments_nodes[2]);
function_arguments_nodes[1] = std::move(not_function);
function_arguments_nodes = std::move(new_arguments);
resolveAsAggregateFunctionWithIf(
*function_node, {function_arguments_nodes[0]->getResultType(), function_arguments_nodes[1]->getResultType()});
}
@ -98,13 +112,9 @@ private:
{
auto result_type = function_node.getResultType();
std::string suffix = "If";
if (result_type->isNullable())
suffix = "OrNullIf";
AggregateFunctionProperties properties;
auto aggregate_function = AggregateFunctionFactory::instance().get(
function_node.getFunctionName() + suffix,
function_node.getFunctionName() + "If",
function_node.getNullsAction(),
argument_types,
function_node.getAggregateFunction()->getParameters(),

View File

@ -258,8 +258,6 @@ void addQueryTreePasses(QueryTreePassManager & manager, bool only_analyze)
manager.addPass(std::make_unique<RewriteSumFunctionWithSumAndCountPass>());
manager.addPass(std::make_unique<CountDistinctPass>());
manager.addPass(std::make_unique<UniqToCountPass>());
manager.addPass(std::make_unique<RewriteAggregateFunctionWithIfPass>());
manager.addPass(std::make_unique<SumIfToCountIfPass>());
manager.addPass(std::make_unique<RewriteArrayExistsToHasPass>());
manager.addPass(std::make_unique<NormalizeCountVariantsPass>());
@ -276,9 +274,12 @@ void addQueryTreePasses(QueryTreePassManager & manager, bool only_analyze)
manager.addPass(std::make_unique<OptimizeGroupByFunctionKeysPass>());
manager.addPass(std::make_unique<OptimizeGroupByInjectiveFunctionsPass>());
/// The order here is important as we want to keep collapsing in order
manager.addPass(std::make_unique<MultiIfToIfPass>());
manager.addPass(std::make_unique<IfConstantConditionPass>());
manager.addPass(std::make_unique<IfChainToMultiIfPass>());
manager.addPass(std::make_unique<RewriteAggregateFunctionWithIfPass>());
manager.addPass(std::make_unique<SumIfToCountIfPass>());
manager.addPass(std::make_unique<ComparisonTupleEliminationPass>());

View File

@ -125,7 +125,7 @@ void highlight(const String & query, std::vector<replxx::Replxx::Color> & colors
const char * begin = query.data();
const char * end = begin + query.size();
Tokens tokens(begin, end, 1000, true);
Tokens tokens(begin, end, 10000, true);
IParser::Pos token_iterator(tokens, static_cast<uint32_t>(1000), static_cast<uint32_t>(10000));
Expected expected;
expected.enable_highlighting = true;

View File

@ -14,10 +14,7 @@ public:
, re_gen(key_template)
{
}
DB::ObjectStorageKey generate(const String &) const override
{
return DB::ObjectStorageKey::createAsAbsolute(re_gen.generate());
}
DB::ObjectStorageKey generate(const String &, bool) const override { return DB::ObjectStorageKey::createAsAbsolute(re_gen.generate()); }
private:
String key_template;
@ -32,7 +29,7 @@ public:
: key_prefix(std::move(key_prefix_))
{}
DB::ObjectStorageKey generate(const String &) const override
DB::ObjectStorageKey generate(const String &, bool) const override
{
/// Path to store the new S3 object.
@ -63,7 +60,7 @@ public:
: key_prefix(std::move(key_prefix_))
{}
DB::ObjectStorageKey generate(const String & path) const override
DB::ObjectStorageKey generate(const String & path, bool) const override
{
return DB::ObjectStorageKey::createAsRelative(key_prefix, path);
}

View File

@ -1,7 +1,7 @@
#pragma once
#include "ObjectStorageKey.h"
#include <memory>
#include "ObjectStorageKey.h"
namespace DB
{
@ -9,8 +9,9 @@ namespace DB
class IObjectStorageKeysGenerator
{
public:
virtual ObjectStorageKey generate(const String & path) const = 0;
virtual ~IObjectStorageKeysGenerator() = default;
virtual ObjectStorageKey generate(const String & path, bool is_directory) const = 0;
};
using ObjectStorageKeysGeneratorPtr = std::shared_ptr<IObjectStorageKeysGenerator>;

View File

@ -50,6 +50,7 @@ class IColumn;
M(MaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
M(Bool, use_concurrency_control, true, "Respect the server's concurrency control (see the `concurrent_threads_soft_limit_num` and `concurrent_threads_soft_limit_ratio_to_cores` global server settings). If disabled, it allows using a larger number of threads even if the server is overloaded (not recommended for normal usage, and needed mostly for tests).", 0) \
M(MaxThreads, max_download_threads, 4, "The maximum number of threads to download data (e.g. for URL engine).", 0) \
M(MaxThreads, max_parsing_threads, 0, "The maximum number of threads to parse data in input formats that support parallel parsing. By default, it is determined automatically", 0) \
M(UInt64, max_download_buffer_size, 10*1024*1024, "The maximal size of buffer for parallel downloading (e.g. for URL engine) per each thread.", 0) \
M(UInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.", 0) \
M(UInt64, max_read_buffer_size_local_fs, 128*1024, "The maximum size of the buffer to read from local filesystem. If set to 0 then max_read_buffer_size will be used.", 0) \
@ -682,7 +683,7 @@ class IColumn;
M(Bool, query_cache_share_between_users, false, "Allow other users to read entry in the query cache", 0) \
M(Bool, enable_sharing_sets_for_mutations, true, "Allow sharing set objects build for IN subqueries between different tasks of the same mutation. This reduces memory usage and CPU consumption", 0) \
\
M(Bool, optimize_rewrite_sum_if_to_count_if, false, "Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent", 0) \
M(Bool, optimize_rewrite_sum_if_to_count_if, true, "Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent", 0) \
M(Bool, optimize_rewrite_aggregate_function_with_if, true, "Rewrite aggregate functions with if expression as argument when logically equivalent. For example, avg(if(cond, col, null)) can be rewritten to avgIf(cond, col)", 0) \
M(Bool, optimize_rewrite_array_exists_to_has, false, "Rewrite arrayExists() functions to has() when logically equivalent. For example, arrayExists(x -> x = 1, arr) can be rewritten to has(arr, 1)", 0) \
M(UInt64, insert_shard_id, 0, "If non zero, when insert into a distributed table, the data will be inserted into the shard `insert_shard_id` synchronously. Possible values range from 1 to `shards_number` of corresponding distributed table", 0) \

View File

@ -86,6 +86,7 @@ namespace SettingsChangesHistory
static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> settings_changes_history =
{
{"24.4", {{"input_format_json_throw_on_bad_escape_sequence", true, true, "Allow to save JSON strings with bad escape sequences"},
{"max_parsing_threads", 0, 0, "Add a separate setting to control number of threads in parallel parsing from files"},
{"ignore_drop_queries_probability", 0, 0, "Allow to ignore drop queries in server with specified probability for testing purposes"},
{"lightweight_deletes_sync", 2, 2, "The same as 'mutation_sync', but controls only execution of lightweight deletes"},
{"query_cache_system_table_handling", "save", "throw", "The query cache no longer caches results of queries against system tables"},
@ -94,6 +95,7 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
{"first_day_of_week", "Monday", "Monday", "Added a setting for the first day of the week for date/time functions"},
{"allow_experimental_database_replicated", false, true, "Database engine Replicated is now in Beta stage"},
{"temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds", (10 * 60 * 1000), (10 * 60 * 1000), "Wait time to lock cache for sapce reservation in temporary data in filesystem cache"},
{"optimize_rewrite_sum_if_to_count_if", false, true, "Only available for the analyzer, where it works correctly"},
{"azure_allow_parallel_part_upload", "true", "true", "Use multiple threads for azure multipart upload."},
{"max_recursive_cte_evaluation_depth", DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, DBMS_RECURSIVE_CTE_MAX_EVALUATION_DEPTH, "Maximum limit on recursive CTE evaluation depth"},
{"query_plan_convert_outer_join_to_inner_join", false, true, "Allow to convert OUTER JOIN to INNER JOIN if filter after JOIN always filters default values"},

View File

@ -48,11 +48,6 @@ bool queryProfilerWorks() { return false; }
namespace DB
{
namespace ErrorCodes
{
extern const int INVALID_SETTING_VALUE;
}
/// Update some settings defaults to avoid some known issues.
void applySettingsQuirks(Settings & settings, LoggerPtr log)
{
@ -95,7 +90,7 @@ void applySettingsQuirks(Settings & settings, LoggerPtr log)
}
}
void doSettingsSanityCheck(const Settings & current_settings)
void doSettingsSanityCheckClamp(Settings & current_settings, LoggerPtr log)
{
auto getCurrentValue = [&current_settings](const std::string_view name) -> Field
{
@ -106,8 +101,13 @@ void doSettingsSanityCheck(const Settings & current_settings)
};
UInt64 max_threads = getCurrentValue("max_threads").get<UInt64>();
if (max_threads > getNumberOfPhysicalCPUCores() * 65536)
throw Exception(ErrorCodes::INVALID_SETTING_VALUE, "Sanity check: Too many threads requested ({})", max_threads);
UInt64 max_threads_max_value = 256 * getNumberOfPhysicalCPUCores();
if (max_threads > max_threads_max_value)
{
if (log)
LOG_WARNING(log, "Sanity check: Too many threads requested ({}). Reduced to {}", max_threads, max_threads_max_value);
current_settings.set("max_threads", max_threads_max_value);
}
constexpr UInt64 max_sane_block_rows_size = 4294967296; // 2^32
std::unordered_set<String> block_rows_settings{
@ -122,7 +122,11 @@ void doSettingsSanityCheck(const Settings & current_settings)
{
auto block_size = getCurrentValue(setting).get<UInt64>();
if (block_size > max_sane_block_rows_size)
throw Exception(ErrorCodes::INVALID_SETTING_VALUE, "Sanity check: '{}' value is too high ({})", setting, block_size);
{
if (log)
LOG_WARNING(log, "Sanity check: '{}' value is too high ({}). Reduced to {}", setting, block_size, max_sane_block_rows_size);
current_settings.set(setting, max_sane_block_rows_size);
}
}
}
}

View File

@ -10,6 +10,6 @@ struct Settings;
/// Update some settings defaults to avoid some known issues.
void applySettingsQuirks(Settings & settings, LoggerPtr log = nullptr);
/// Verify that some settings have sane values. Throws if not
void doSettingsSanityCheck(const Settings & settings);
/// Verify that some settings have sane values. Alters the value to a reasonable one if not
void doSettingsSanityCheckClamp(Settings & settings, LoggerPtr log);
}

View File

@ -151,8 +151,7 @@ static void checkMySQLVariables(const mysqlxx::Pool::Entry & connection, const S
{"log_bin", "ON"},
{"binlog_format", "ROW"},
{"binlog_row_image", "FULL"},
{"default_authentication_plugin", "mysql_native_password"},
{"log_bin_use_v1_row_events", "OFF"}
{"default_authentication_plugin", "mysql_native_password"}
};
QueryPipeline pipeline(std::move(variables_input));

View File

@ -16,6 +16,8 @@ MetadataStorageType metadataTypeFromString(const String & type)
return MetadataStorageType::Local;
if (check_type == "plain")
return MetadataStorageType::Plain;
if (check_type == "plain_rewritable")
return MetadataStorageType::PlainRewritable;
if (check_type == "web")
return MetadataStorageType::StaticWeb;

View File

@ -28,6 +28,7 @@ enum class MetadataStorageType
None,
Local,
Plain,
PlainRewritable,
StaticWeb,
};

View File

@ -363,6 +363,8 @@ public:
virtual bool isWriteOnce() const { return false; }
virtual bool supportsHardLinks() const { return true; }
/// Check if disk is broken. Broken disks will have 0 space and cannot be used.
virtual bool isBroken() const { return false; }

View File

@ -0,0 +1,72 @@
#include "CommonPathPrefixKeyGenerator.h"
#include <Common/getRandomASCIIString.h>
#include <deque>
#include <filesystem>
#include <tuple>
namespace DB
{
CommonPathPrefixKeyGenerator::CommonPathPrefixKeyGenerator(
String key_prefix_, SharedMutex & shared_mutex_, std::weak_ptr<PathMap> path_map_)
: storage_key_prefix(key_prefix_), shared_mutex(shared_mutex_), path_map(std::move(path_map_))
{
}
ObjectStorageKey CommonPathPrefixKeyGenerator::generate(const String & path, bool is_directory) const
{
const auto & [object_key_prefix, suffix_parts] = getLongestObjectKeyPrefix(path);
auto key = std::filesystem::path(object_key_prefix.empty() ? storage_key_prefix : object_key_prefix);
/// The longest prefix is the same as path, meaning that the path is already mapped.
if (suffix_parts.empty())
return ObjectStorageKey::createAsRelative(std::move(key));
/// File and top-level directory paths are mapped as is.
if (!is_directory || object_key_prefix.empty())
for (const auto & part : suffix_parts)
key /= part;
/// Replace the last part of the directory path with a pseudorandom suffix.
else
{
for (size_t i = 0; i + 1 < suffix_parts.size(); ++i)
key /= suffix_parts[i];
constexpr size_t part_size = 16;
key /= getRandomASCIIString(part_size);
}
return ObjectStorageKey::createAsRelative(key);
}
std::tuple<std::string, std::vector<std::string>> CommonPathPrefixKeyGenerator::getLongestObjectKeyPrefix(const std::string & path) const
{
std::filesystem::path p(path);
std::deque<std::string> dq;
std::shared_lock lock(shared_mutex);
auto ptr = path_map.lock();
while (p != p.root_path())
{
auto it = ptr->find(p / "");
if (it != ptr->end())
{
std::vector<std::string> vec(std::make_move_iterator(dq.begin()), std::make_move_iterator(dq.end()));
return std::make_tuple(it->second, std::move(vec));
}
if (!p.filename().empty())
dq.push_front(p.filename());
p = p.parent_path();
}
return {std::string(), std::vector<std::string>(std::make_move_iterator(dq.begin()), std::make_move_iterator(dq.end()))};
}
}

View File

@ -0,0 +1,41 @@
#pragma once
#include <Common/ObjectStorageKeyGenerator.h>
#include <Common/SharedMutex.h>
#include <filesystem>
#include <map>
namespace DB
{
/// Object storage key generator used specifically with the
/// MetadataStorageFromPlainObjectStorage if multiple writes are allowed.
/// It searches for the local (metadata) path in a pre-loaded path map.
/// If no such path exists, it searches for the parent path, until it is found
/// or no parent path exists.
///
/// The key generator ensures that the original directory hierarchy is
/// preserved, which is required for the MergeTree family.
class CommonPathPrefixKeyGenerator : public IObjectStorageKeysGenerator
{
public:
/// Local to remote path map. Leverages filesystem::path comparator for paths.
using PathMap = std::map<std::filesystem::path, std::string>;
explicit CommonPathPrefixKeyGenerator(String key_prefix_, SharedMutex & shared_mutex_, std::weak_ptr<PathMap> path_map_);
ObjectStorageKey generate(const String & path, bool is_directory) const override;
private:
/// Longest key prefix and unresolved parts of the source path.
std::tuple<std::string, std::vector<String>> getLongestObjectKeyPrefix(const String & path) const;
const String storage_key_prefix;
SharedMutex & shared_mutex;
std::weak_ptr<PathMap> path_map;
};
}

View File

@ -112,20 +112,21 @@ size_t DiskObjectStorage::getFileSize(const String & path) const
return metadata_storage->getFileSize(path);
}
void DiskObjectStorage::moveDirectory(const String & from_path, const String & to_path)
{
if (send_metadata)
sendMoveMetadata(from_path, to_path);
auto transaction = createObjectStorageTransaction();
transaction->moveDirectory(from_path, to_path);
transaction->commit();
}
void DiskObjectStorage::moveFile(const String & from_path, const String & to_path, bool should_send_metadata)
{
if (should_send_metadata)
{
auto revision = metadata_helper->revision_counter + 1;
metadata_helper->revision_counter += 1;
const ObjectAttributes object_metadata {
{"from_path", from_path},
{"to_path", to_path}
};
metadata_helper->createFileOperationObject("rename", revision, object_metadata);
}
sendMoveMetadata(from_path, to_path);
auto transaction = createObjectStorageTransaction();
transaction->moveFile(from_path, to_path);
@ -409,6 +410,15 @@ bool DiskObjectStorage::tryReserve(UInt64 bytes)
return false;
}
void DiskObjectStorage::sendMoveMetadata(const String & from_path, const String & to_path)
{
chassert(send_metadata);
auto revision = metadata_helper->revision_counter + 1;
metadata_helper->revision_counter += 1;
const ObjectAttributes object_metadata{{"from_path", from_path}, {"to_path", to_path}};
metadata_helper->createFileOperationObject("rename", revision, object_metadata);
}
bool DiskObjectStorage::supportsCache() const
{
@ -425,6 +435,11 @@ bool DiskObjectStorage::isWriteOnce() const
return object_storage->isWriteOnce();
}
bool DiskObjectStorage::supportsHardLinks() const
{
return !isWriteOnce() && !object_storage->isPlain();
}
DiskObjectStoragePtr DiskObjectStorage::createDiskObjectStorage()
{
const auto config_prefix = "storage_configuration.disks." + name;

View File

@ -112,7 +112,7 @@ public:
void clearDirectory(const String & path) override;
void moveDirectory(const String & from_path, const String & to_path) override { moveFile(from_path, to_path); }
void moveDirectory(const String & from_path, const String & to_path) override;
void removeDirectory(const String & path) override;
@ -183,6 +183,8 @@ public:
/// MergeTree table on this disk.
bool isWriteOnce() const override;
bool supportsHardLinks() const override;
/// Get structure of object storage this disk works with. Examples:
/// DiskObjectStorage(S3ObjectStorage)
/// DiskObjectStorage(CachedObjectStorage(S3ObjectStorage))
@ -228,6 +230,7 @@ private:
std::mutex reservation_mutex;
bool tryReserve(UInt64 bytes);
void sendMoveMetadata(const String & from_path, const String & to_path);
const bool send_metadata;

View File

@ -507,7 +507,7 @@ struct CopyFileObjectStorageOperation final : public IDiskObjectStorageOperation
std::string to_path;
StoredObjects created_objects;
IObjectStorage& destination_object_storage;
IObjectStorage & destination_object_storage;
CopyFileObjectStorageOperation(
IObjectStorage & object_storage_,
@ -714,7 +714,7 @@ std::unique_ptr<WriteBufferFromFileBase> DiskObjectStorageTransaction::writeFile
{
/// Otherwise we will produce lost blobs which nobody points to
/// WriteOnce storages are not affected by the issue
if (!tx->object_storage.isWriteOnce() && tx->metadata_storage.exists(path))
if (!tx->object_storage.isPlain() && tx->metadata_storage.exists(path))
tx->object_storage.removeObjectsIfExist(tx->metadata_storage.getStorageObjects(path));
tx->metadata_transaction->createMetadataFile(path, key_, count);
@ -747,10 +747,9 @@ std::unique_ptr<WriteBufferFromFileBase> DiskObjectStorageTransaction::writeFile
{
/// Otherwise we will produce lost blobs which nobody points to
/// WriteOnce storages are not affected by the issue
if (!object_storage_tx->object_storage.isWriteOnce() && object_storage_tx->metadata_storage.exists(path))
if (!object_storage_tx->object_storage.isPlain() && object_storage_tx->metadata_storage.exists(path))
{
object_storage_tx->object_storage.removeObjectsIfExist(
object_storage_tx->metadata_storage.getStorageObjects(path));
object_storage_tx->object_storage.removeObjectsIfExist(object_storage_tx->metadata_storage.getStorageObjects(path));
}
tx->createMetadataFile(path, key_, count);
@ -877,14 +876,14 @@ void DiskObjectStorageTransaction::createFile(const std::string & path)
void DiskObjectStorageTransaction::copyFile(const std::string & from_file_path, const std::string & to_file_path, const ReadSettings & read_settings, const WriteSettings & write_settings)
{
operations_to_execute.emplace_back(
std::make_unique<CopyFileObjectStorageOperation>(object_storage, metadata_storage, object_storage, read_settings, write_settings, from_file_path, to_file_path));
operations_to_execute.emplace_back(std::make_unique<CopyFileObjectStorageOperation>(
object_storage, metadata_storage, object_storage, read_settings, write_settings, from_file_path, to_file_path));
}
void MultipleDisksObjectStorageTransaction::copyFile(const std::string & from_file_path, const std::string & to_file_path, const ReadSettings & read_settings, const WriteSettings & write_settings)
{
operations_to_execute.emplace_back(
std::make_unique<CopyFileObjectStorageOperation>(object_storage, metadata_storage, destination_object_storage, read_settings, write_settings, from_file_path, to_file_path));
operations_to_execute.emplace_back(std::make_unique<CopyFileObjectStorageOperation>(
object_storage, metadata_storage, destination_object_storage, read_settings, write_settings, from_file_path, to_file_path));
}
void DiskObjectStorageTransaction::commit()

View File

@ -0,0 +1,19 @@
#pragma once
#include <mutex>
#include <Common/SharedMutex.h>
namespace DB
{
struct IMetadataOperation
{
virtual void execute(std::unique_lock<SharedMutex> & metadata_lock) = 0;
virtual void undo(std::unique_lock<SharedMutex> & metadata_lock) = 0;
virtual void finalize() { }
virtual ~IMetadataOperation() = default;
};
using MetadataOperationPtr = std::unique_ptr<IMetadataOperation>;
}

View File

@ -145,7 +145,7 @@ public:
virtual ~IMetadataTransaction() = default;
private:
protected:
[[noreturn]] static void throwNotImplemented()
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Operation is not implemented");
@ -229,7 +229,7 @@ public:
/// object_storage_path is absolute.
virtual StoredObjects getStorageObjects(const std::string & path) const = 0;
private:
protected:
[[noreturn]] static void throwNotImplemented()
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Operation is not implemented");

View File

@ -1,11 +1,12 @@
#include <Disks/ObjectStorages/IObjectStorage.h>
#include <Disks/IO/ThreadPoolRemoteFSReader.h>
#include <Common/Exception.h>
#include <Disks/ObjectStorages/IObjectStorage.h>
#include <Disks/ObjectStorages/ObjectStorageIterator.h>
#include <IO/ReadBufferFromFileBase.h>
#include <IO/WriteBufferFromFileBase.h>
#include <IO/copyData.h>
#include <IO/ReadBufferFromFileBase.h>
#include <Interpreters/Context.h>
#include <Disks/ObjectStorages/ObjectStorageIterator.h>
#include <Common/Exception.h>
#include <Common/ObjectStorageKeyGenerator.h>
namespace DB

View File

@ -83,6 +83,9 @@ using ObjectKeysWithMetadata = std::vector<ObjectKeyWithMetadata>;
class IObjectStorageIterator;
using ObjectStorageIteratorPtr = std::shared_ptr<IObjectStorageIterator>;
class IObjectStorageKeysGenerator;
using ObjectStorageKeysGeneratorPtr = std::shared_ptr<IObjectStorageKeysGenerator>;
/// Base class for all object storages which implement some subset of ordinary filesystem operations.
///
/// Examples of object storages are S3, Azure Blob Storage, HDFS.
@ -208,6 +211,12 @@ public:
/// Path can be generated either independently or based on `path`.
virtual ObjectStorageKey generateObjectKeyForPath(const std::string & path) const = 0;
/// Object key prefix for local paths in the directory 'path'.
virtual ObjectStorageKey generateObjectKeyPrefixForDirectoryPath(const std::string & /* path */) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method 'generateObjectKeyPrefixForDirectoryPath' is not implemented");
}
/// Get unique id for passed absolute path in object storage.
virtual std::string getUniqueId(const std::string & path) const { return path; }
@ -226,6 +235,8 @@ public:
virtual WriteSettings patchSettings(const WriteSettings & write_settings) const;
virtual void setKeysGenerator(ObjectStorageKeysGeneratorPtr) { }
#if USE_AZURE_BLOB_STORAGE
virtual std::shared_ptr<const Azure::Storage::Blobs::BlobContainerClient> getAzureBlobStorageClient()
{

View File

@ -1,23 +0,0 @@
#include <base/defines.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
namespace DB
{
std::string toString(MetadataFromDiskTransactionState state)
{
switch (state)
{
case MetadataFromDiskTransactionState::PREPARING:
return "PREPARING";
case MetadataFromDiskTransactionState::FAILED:
return "FAILED";
case MetadataFromDiskTransactionState::COMMITTED:
return "COMMITTED";
case MetadataFromDiskTransactionState::PARTIALLY_ROLLED_BACK:
return "PARTIALLY_ROLLED_BACK";
}
UNREACHABLE();
}
}

View File

@ -0,0 +1,93 @@
#include "MetadataOperationsHolder.h"
#include <Common/Exception.h>
namespace DB
{
namespace ErrorCodes
{
extern const int FS_METADATA_ERROR;
}
void MetadataOperationsHolder::rollback(std::unique_lock<SharedMutex> & lock, size_t until_pos)
{
/// Otherwise everything is alright
if (state == MetadataStorageTransactionState::FAILED)
{
for (int64_t i = until_pos; i >= 0; --i)
{
try
{
operations[i]->undo(lock);
}
catch (Exception & ex)
{
state = MetadataStorageTransactionState::PARTIALLY_ROLLED_BACK;
ex.addMessage(fmt::format("While rolling back operation #{}", i));
throw;
}
}
}
else
{
/// Nothing to do, transaction committed or not even started to commit
}
}
void MetadataOperationsHolder::addOperation(MetadataOperationPtr && operation)
{
if (state != MetadataStorageTransactionState::PREPARING)
throw Exception(
ErrorCodes::FS_METADATA_ERROR,
"Cannot add operations to transaction in {} state, it should be in {} state",
toString(state),
toString(MetadataStorageTransactionState::PREPARING));
operations.emplace_back(std::move(operation));
}
void MetadataOperationsHolder::commitImpl(SharedMutex & metadata_mutex)
{
if (state != MetadataStorageTransactionState::PREPARING)
throw Exception(
ErrorCodes::FS_METADATA_ERROR,
"Cannot commit transaction in {} state, it should be in {} state",
toString(state),
toString(MetadataStorageTransactionState::PREPARING));
{
std::unique_lock lock(metadata_mutex);
for (size_t i = 0; i < operations.size(); ++i)
{
try
{
operations[i]->execute(lock);
}
catch (Exception & ex)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
ex.addMessage(fmt::format("While committing metadata operation #{}", i));
state = MetadataStorageTransactionState::FAILED;
rollback(lock, i);
throw;
}
}
}
/// Do it in "best effort" mode
for (size_t i = 0; i < operations.size(); ++i)
{
try
{
operations[i]->finalize();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__, fmt::format("Failed to finalize operation #{}", i));
}
}
state = MetadataStorageTransactionState::COMMITTED;
}
}

View File

@ -0,0 +1,29 @@
#pragma once
#include <mutex>
#include <Disks/ObjectStorages/IMetadataOperation.h>
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
#include <Common/SharedMutex.h>
/**
* Implementations for transactional operations with metadata used by MetadataStorageFromDisk
* and MetadataStorageFromPlainObjectStorage.
*/
namespace DB
{
class MetadataOperationsHolder
{
private:
std::vector<MetadataOperationPtr> operations;
MetadataStorageTransactionState state{MetadataStorageTransactionState::PREPARING};
void rollback(std::unique_lock<SharedMutex> & lock, size_t until_pos);
protected:
void addOperation(MetadataOperationPtr && operation);
void commitImpl(SharedMutex & metadata_mutex);
};
}

View File

@ -1,6 +1,7 @@
#include <Disks/ObjectStorages/MetadataStorageFactory.h>
#include <Disks/ObjectStorages/MetadataStorageFromDisk.h>
#include <Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.h>
#include <Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.h>
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD
#include <Disks/ObjectStorages/Web/MetadataStorageFromStaticFilesWebServer.h>
#endif
@ -118,6 +119,20 @@ void registerPlainMetadataStorage(MetadataStorageFactory & factory)
});
}
void registerPlainRewritableMetadataStorage(MetadataStorageFactory & factory)
{
factory.registerMetadataStorageType(
"plain_rewritable",
[](const std::string & /* name */,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
ObjectStoragePtr object_storage) -> MetadataStoragePtr
{
auto key_compatibility_prefix = getObjectKeyCompatiblePrefix(*object_storage, config, config_prefix);
return std::make_shared<MetadataStorageFromPlainRewritableObjectStorage>(object_storage, key_compatibility_prefix);
});
}
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD
void registerMetadataStorageFromStaticFilesWebServer(MetadataStorageFactory & factory)
{
@ -137,6 +152,7 @@ void registerMetadataStorages()
auto & factory = MetadataStorageFactory::instance();
registerMetadataStorageFromDisk(factory);
registerPlainMetadataStorage(factory);
registerPlainRewritableMetadataStorage(factory);
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD
registerMetadataStorageFromStaticFilesWebServer(factory);
#endif

View File

@ -10,14 +10,8 @@
namespace DB
{
namespace ErrorCodes
{
extern const int FS_METADATA_ERROR;
}
MetadataStorageFromDisk::MetadataStorageFromDisk(DiskPtr disk_, String compatible_key_prefix_)
: disk(disk_)
, compatible_key_prefix(compatible_key_prefix_)
: disk(disk_), compatible_key_prefix(compatible_key_prefix_)
{
}
@ -158,83 +152,9 @@ const IMetadataStorage & MetadataStorageFromDiskTransaction::getStorageForNonTra
return metadata_storage;
}
void MetadataStorageFromDiskTransaction::addOperation(MetadataOperationPtr && operation)
{
if (state != MetadataFromDiskTransactionState::PREPARING)
throw Exception(
ErrorCodes::FS_METADATA_ERROR,
"Cannot add operations to transaction in {} state, it should be in {} state",
toString(state), toString(MetadataFromDiskTransactionState::PREPARING));
operations.emplace_back(std::move(operation));
}
void MetadataStorageFromDiskTransaction::commit()
{
if (state != MetadataFromDiskTransactionState::PREPARING)
throw Exception(
ErrorCodes::FS_METADATA_ERROR,
"Cannot commit transaction in {} state, it should be in {} state",
toString(state), toString(MetadataFromDiskTransactionState::PREPARING));
{
std::unique_lock lock(metadata_storage.metadata_mutex);
for (size_t i = 0; i < operations.size(); ++i)
{
try
{
operations[i]->execute(lock);
}
catch (Exception & ex)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
ex.addMessage(fmt::format("While committing metadata operation #{}", i));
state = MetadataFromDiskTransactionState::FAILED;
rollback(i);
throw;
}
}
}
/// Do it in "best effort" mode
for (size_t i = 0; i < operations.size(); ++i)
{
try
{
operations[i]->finalize();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__, fmt::format("Failed to finalize operation #{}", i));
}
}
state = MetadataFromDiskTransactionState::COMMITTED;
}
void MetadataStorageFromDiskTransaction::rollback(size_t until_pos)
{
/// Otherwise everything is alright
if (state == MetadataFromDiskTransactionState::FAILED)
{
for (int64_t i = until_pos; i >= 0; --i)
{
try
{
operations[i]->undo();
}
catch (Exception & ex)
{
state = MetadataFromDiskTransactionState::PARTIALLY_ROLLED_BACK;
ex.addMessage(fmt::format("While rolling back operation #{}", i));
throw;
}
}
}
else
{
/// Nothing to do, transaction committed or not even started to commit
}
MetadataOperationsHolder::commitImpl(metadata_storage.metadata_mutex);
}
void MetadataStorageFromDiskTransaction::writeStringToFile(

View File

@ -5,8 +5,9 @@
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/DiskObjectStorageMetadata.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
#include <Disks/ObjectStorages/MetadataOperationsHolder.h>
#include <Disks/ObjectStorages/MetadataStorageFromDiskTransactionOperations.h>
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
namespace DB
{
@ -74,18 +75,11 @@ public:
DiskObjectStorageMetadataPtr readMetadataUnlocked(const std::string & path, std::shared_lock<SharedMutex> & lock) const;
};
class MetadataStorageFromDiskTransaction final : public IMetadataTransaction
class MetadataStorageFromDiskTransaction final : public IMetadataTransaction, private MetadataOperationsHolder
{
private:
const MetadataStorageFromDisk & metadata_storage;
std::vector<MetadataOperationPtr> operations;
MetadataFromDiskTransactionState state{MetadataFromDiskTransactionState::PREPARING};
void addOperation(MetadataOperationPtr && operation);
void rollback(size_t until_pos);
public:
explicit MetadataStorageFromDiskTransaction(const MetadataStorageFromDisk & metadata_storage_)
: metadata_storage(metadata_storage_)
@ -135,7 +129,6 @@ public:
UnlinkMetadataFileOperationOutcomePtr unlinkMetadata(const std::string & path) override;
};

View File

@ -32,7 +32,7 @@ void SetLastModifiedOperation::execute(std::unique_lock<SharedMutex> &)
disk.setLastModified(path, new_timestamp);
}
void SetLastModifiedOperation::undo()
void SetLastModifiedOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.setLastModified(path, old_timestamp);
}
@ -50,7 +50,7 @@ void ChmodOperation::execute(std::unique_lock<SharedMutex> &)
disk.chmod(path, mode);
}
void ChmodOperation::undo()
void ChmodOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.chmod(path, old_mode);
}
@ -68,7 +68,7 @@ void UnlinkFileOperation::execute(std::unique_lock<SharedMutex> &)
disk.removeFile(path);
}
void UnlinkFileOperation::undo()
void UnlinkFileOperation::undo(std::unique_lock<SharedMutex> &)
{
auto buf = disk.writeFile(path);
writeString(prev_data, *buf);
@ -86,7 +86,7 @@ void CreateDirectoryOperation::execute(std::unique_lock<SharedMutex> &)
disk.createDirectory(path);
}
void CreateDirectoryOperation::undo()
void CreateDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.removeDirectory(path);
}
@ -112,7 +112,7 @@ void CreateDirectoryRecursiveOperation::execute(std::unique_lock<SharedMutex> &)
disk.createDirectory(path_to_create);
}
void CreateDirectoryRecursiveOperation::undo()
void CreateDirectoryRecursiveOperation::undo(std::unique_lock<SharedMutex> &)
{
for (const auto & path_created : paths_created)
disk.removeDirectory(path_created);
@ -129,7 +129,7 @@ void RemoveDirectoryOperation::execute(std::unique_lock<SharedMutex> &)
disk.removeDirectory(path);
}
void RemoveDirectoryOperation::undo()
void RemoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.createDirectory(path);
}
@ -149,7 +149,7 @@ void RemoveRecursiveOperation::execute(std::unique_lock<SharedMutex> &)
disk.moveDirectory(path, temp_path);
}
void RemoveRecursiveOperation::undo()
void RemoveRecursiveOperation::undo(std::unique_lock<SharedMutex> &)
{
if (disk.isFile(temp_path))
disk.moveFile(temp_path, path);
@ -187,10 +187,10 @@ void CreateHardlinkOperation::execute(std::unique_lock<SharedMutex> & lock)
disk.createHardLink(path_from, path_to);
}
void CreateHardlinkOperation::undo()
void CreateHardlinkOperation::undo(std::unique_lock<SharedMutex> & lock)
{
if (write_operation)
write_operation->undo();
write_operation->undo(lock);
disk.removeFile(path_to);
}
@ -206,7 +206,7 @@ void MoveFileOperation::execute(std::unique_lock<SharedMutex> &)
disk.moveFile(path_from, path_to);
}
void MoveFileOperation::undo()
void MoveFileOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.moveFile(path_to, path_from);
}
@ -223,7 +223,7 @@ void MoveDirectoryOperation::execute(std::unique_lock<SharedMutex> &)
disk.moveDirectory(path_from, path_to);
}
void MoveDirectoryOperation::undo()
void MoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.moveDirectory(path_to, path_from);
}
@ -244,7 +244,7 @@ void ReplaceFileOperation::execute(std::unique_lock<SharedMutex> &)
disk.replaceFile(path_from, path_to);
}
void ReplaceFileOperation::undo()
void ReplaceFileOperation::undo(std::unique_lock<SharedMutex> &)
{
disk.moveFile(path_to, path_from);
disk.moveFile(temp_path_to, path_to);
@ -275,7 +275,7 @@ void WriteFileOperation::execute(std::unique_lock<SharedMutex> &)
buf->finalize();
}
void WriteFileOperation::undo()
void WriteFileOperation::undo(std::unique_lock<SharedMutex> &)
{
if (!existed)
{
@ -303,10 +303,10 @@ void AddBlobOperation::execute(std::unique_lock<SharedMutex> & metadata_lock)
write_operation->execute(metadata_lock);
}
void AddBlobOperation::undo()
void AddBlobOperation::undo(std::unique_lock<SharedMutex> & lock)
{
if (write_operation)
write_operation->undo();
write_operation->undo(lock);
}
void UnlinkMetadataFileOperation::execute(std::unique_lock<SharedMutex> & metadata_lock)
@ -325,17 +325,17 @@ void UnlinkMetadataFileOperation::execute(std::unique_lock<SharedMutex> & metada
unlink_operation->execute(metadata_lock);
}
void UnlinkMetadataFileOperation::undo()
void UnlinkMetadataFileOperation::undo(std::unique_lock<SharedMutex> & lock)
{
/// Operations MUST be reverted in the reversed order, so
/// when we apply operation #1 (write) and operation #2 (unlink)
/// we should revert #2 and only after it #1. Otherwise #1 will overwrite
/// file with incorrect data.
if (unlink_operation)
unlink_operation->undo();
unlink_operation->undo(lock);
if (write_operation)
write_operation->undo();
write_operation->undo(lock);
/// Update outcome to reflect the fact that we have restored the file.
outcome->num_hardlinks++;
@ -349,10 +349,10 @@ void SetReadonlyFileOperation::execute(std::unique_lock<SharedMutex> & metadata_
write_operation->execute(metadata_lock);
}
void SetReadonlyFileOperation::undo()
void SetReadonlyFileOperation::undo(std::unique_lock<SharedMutex> & lock)
{
if (write_operation)
write_operation->undo();
write_operation->undo(lock);
}
}

View File

@ -1,6 +1,6 @@
#pragma once
#include <Common/SharedMutex.h>
#include <Disks/ObjectStorages/IMetadataOperation.h>
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <numeric>
@ -14,24 +14,13 @@ class IDisk;
* Implementations for transactional operations with metadata used by MetadataStorageFromDisk.
*/
struct IMetadataOperation
{
virtual void execute(std::unique_lock<SharedMutex> & metadata_lock) = 0;
virtual void undo() = 0;
virtual void finalize() {}
virtual ~IMetadataOperation() = default;
};
using MetadataOperationPtr = std::unique_ptr<IMetadataOperation>;
struct SetLastModifiedOperation final : public IMetadataOperation
{
SetLastModifiedOperation(const std::string & path_, Poco::Timestamp new_timestamp_, IDisk & disk_);
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -46,7 +35,7 @@ struct ChmodOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -62,7 +51,7 @@ struct UnlinkFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -77,7 +66,7 @@ struct CreateDirectoryOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -91,7 +80,7 @@ struct CreateDirectoryRecursiveOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -106,7 +95,7 @@ struct RemoveDirectoryOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -119,7 +108,7 @@ struct RemoveRecursiveOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
void finalize() override;
@ -135,7 +124,8 @@ struct WriteFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
IDisk & disk;
@ -154,7 +144,7 @@ struct CreateHardlinkOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path_from;
@ -171,7 +161,7 @@ struct MoveFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path_from;
@ -186,7 +176,7 @@ struct MoveDirectoryOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path_from;
@ -201,7 +191,7 @@ struct ReplaceFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
void finalize() override;
@ -229,7 +219,7 @@ struct AddBlobOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -257,7 +247,7 @@ struct UnlinkMetadataFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;
@ -282,7 +272,7 @@ struct SetReadonlyFileOperation final : public IMetadataOperation
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo() override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
private:
std::string path;

View File

@ -1,18 +1,27 @@
#include "MetadataStorageFromPlainObjectStorage.h"
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.h>
#include <Disks/ObjectStorages/StaticDirectoryIterator.h>
#include <Common/filesystemHelpers.h>
#include <Common/logger_useful.h>
#include <Common/StringUtils/StringUtils.h>
#include <IO/WriteHelpers.h>
#include <Common/filesystemHelpers.h>
#include <filesystem>
#include <tuple>
namespace DB
{
MetadataStorageFromPlainObjectStorage::MetadataStorageFromPlainObjectStorage(
ObjectStoragePtr object_storage_,
String storage_path_prefix_)
namespace
{
std::filesystem::path normalizeDirectoryPath(const std::filesystem::path & path)
{
return path / "";
}
}
MetadataStorageFromPlainObjectStorage::MetadataStorageFromPlainObjectStorage(ObjectStoragePtr object_storage_, String storage_path_prefix_)
: object_storage(object_storage_)
, storage_path_prefix(std::move(storage_path_prefix_))
{
@ -20,7 +29,7 @@ MetadataStorageFromPlainObjectStorage::MetadataStorageFromPlainObjectStorage(
MetadataTransactionPtr MetadataStorageFromPlainObjectStorage::createTransaction()
{
return std::make_shared<MetadataStorageFromPlainObjectStorageTransaction>(*this);
return std::make_shared<MetadataStorageFromPlainObjectStorageTransaction>(*this, object_storage);
}
const std::string & MetadataStorageFromPlainObjectStorage::getPath() const
@ -44,10 +53,9 @@ bool MetadataStorageFromPlainObjectStorage::isFile(const std::string & path) con
bool MetadataStorageFromPlainObjectStorage::isDirectory(const std::string & path) const
{
auto object_key = object_storage->generateObjectKeyForPath(path);
std::string directory = object_key.serialize();
if (!directory.ends_with('/'))
directory += '/';
auto key_prefix = object_storage->generateObjectKeyForPath(path).serialize();
auto directory = std::filesystem::path(std::move(key_prefix)) / "";
return object_storage->existsOrHasAnyChild(directory);
}
@ -62,33 +70,16 @@ uint64_t MetadataStorageFromPlainObjectStorage::getFileSize(const String & path)
std::vector<std::string> MetadataStorageFromPlainObjectStorage::listDirectory(const std::string & path) const
{
auto object_key = object_storage->generateObjectKeyForPath(path);
auto key_prefix = object_storage->generateObjectKeyForPath(path).serialize();
RelativePathsWithMetadata files;
std::string abs_key = object_key.serialize();
std::string abs_key = key_prefix;
if (!abs_key.ends_with('/'))
abs_key += '/';
object_storage->listObjects(abs_key, files, 0);
std::vector<std::string> result;
for (const auto & path_size : files)
{
result.push_back(path_size.relative_path);
}
std::unordered_set<std::string> duplicates_filter;
for (auto & row : result)
{
chassert(row.starts_with(abs_key));
row.erase(0, abs_key.size());
auto slash_pos = row.find_first_of('/');
if (slash_pos != std::string::npos)
row.erase(slash_pos, row.size() - slash_pos);
duplicates_filter.insert(row);
}
return std::vector<std::string>(duplicates_filter.begin(), duplicates_filter.end());
return getDirectChildrenOnDisk(abs_key, files, path);
}
DirectoryIteratorPtr MetadataStorageFromPlainObjectStorage::iterateDirectory(const std::string & path) const
@ -108,6 +99,25 @@ StoredObjects MetadataStorageFromPlainObjectStorage::getStorageObjects(const std
return {StoredObject(object_key.serialize(), path, object_size)};
}
std::vector<std::string> MetadataStorageFromPlainObjectStorage::getDirectChildrenOnDisk(
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & /* local_path */) const
{
std::unordered_set<std::string> duplicates_filter;
for (const auto & elem : remote_paths)
{
const auto & path = elem.relative_path;
chassert(path.find(storage_key) == 0);
const auto child_pos = storage_key.size();
/// string::npos is ok.
const auto slash_pos = path.find('/', child_pos);
if (slash_pos == std::string::npos)
duplicates_filter.emplace(path.substr(child_pos));
else
duplicates_filter.emplace(path.substr(child_pos, slash_pos - child_pos));
}
return std::vector<std::string>(std::make_move_iterator(duplicates_filter.begin()), std::make_move_iterator(duplicates_filter.end()));
}
const IMetadataStorage & MetadataStorageFromPlainObjectStorageTransaction::getStorageForNonTransactionalReads() const
{
return metadata_storage;
@ -122,18 +132,44 @@ void MetadataStorageFromPlainObjectStorageTransaction::unlinkFile(const std::str
void MetadataStorageFromPlainObjectStorageTransaction::removeDirectory(const std::string & path)
{
for (auto it = metadata_storage.iterateDirectory(path); it->isValid(); it->next())
metadata_storage.object_storage->removeObject(StoredObject(it->path()));
if (metadata_storage.object_storage->isWriteOnce())
{
for (auto it = metadata_storage.iterateDirectory(path); it->isValid(); it->next())
metadata_storage.object_storage->removeObject(StoredObject(it->path()));
}
else
{
addOperation(std::make_unique<MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation>(
normalizeDirectoryPath(path), *metadata_storage.getPathMap(), object_storage));
}
}
void MetadataStorageFromPlainObjectStorageTransaction::createDirectory(const std::string &)
void MetadataStorageFromPlainObjectStorageTransaction::createDirectory(const std::string & path)
{
/// Noop. It is an Object Storage not a filesystem.
if (metadata_storage.object_storage->isWriteOnce())
return;
auto normalized_path = normalizeDirectoryPath(path);
auto key_prefix = object_storage->generateObjectKeyPrefixForDirectoryPath(normalized_path).serialize();
auto op = std::make_unique<MetadataStorageFromPlainObjectStorageCreateDirectoryOperation>(
std::move(normalized_path), std::move(key_prefix), *metadata_storage.getPathMap(), object_storage);
addOperation(std::move(op));
}
void MetadataStorageFromPlainObjectStorageTransaction::createDirectoryRecursive(const std::string &)
void MetadataStorageFromPlainObjectStorageTransaction::createDirectoryRecursive(const std::string & path)
{
/// Noop. It is an Object Storage not a filesystem.
return createDirectory(path);
}
void MetadataStorageFromPlainObjectStorageTransaction::moveDirectory(const std::string & path_from, const std::string & path_to)
{
if (metadata_storage.object_storage->isWriteOnce())
throwNotImplemented();
addOperation(std::make_unique<MetadataStorageFromPlainObjectStorageMoveDirectoryOperation>(
normalizeDirectoryPath(path_from), normalizeDirectoryPath(path_to), *metadata_storage.getPathMap(), object_storage));
}
void MetadataStorageFromPlainObjectStorageTransaction::addBlobToMetadata(
const std::string &, ObjectStorageKey /* object_key */, uint64_t /* size_in_bytes */)
{
@ -146,4 +182,8 @@ UnlinkMetadataFileOperationOutcomePtr MetadataStorageFromPlainObjectStorageTrans
return std::make_shared<UnlinkMetadataFileOperationOutcome>(UnlinkMetadataFileOperationOutcome{0});
}
void MetadataStorageFromPlainObjectStorageTransaction::commit()
{
MetadataOperationsHolder::commitImpl(metadata_storage.metadata_mutex);
}
}

View File

@ -2,9 +2,10 @@
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
#include <Disks/ObjectStorages/MetadataStorageFromDiskTransactionOperations.h>
#include <Disks/ObjectStorages/MetadataOperationsHolder.h>
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
#include <map>
namespace DB
{
@ -23,14 +24,21 @@ using UnlinkMetadataFileOperationOutcomePtr = std::shared_ptr<UnlinkMetadataFile
/// It is used to allow BACKUP/RESTORE to ObjectStorage (S3/...) with the same
/// structure as on disk MergeTree, and does not requires metadata from local
/// disk to restore.
class MetadataStorageFromPlainObjectStorage final : public IMetadataStorage
class MetadataStorageFromPlainObjectStorage : public IMetadataStorage
{
public:
/// Local path prefixes mapped to storage key prefixes.
using PathMap = std::map<std::filesystem::path, std::string>;
private:
friend class MetadataStorageFromPlainObjectStorageTransaction;
protected:
ObjectStoragePtr object_storage;
String storage_path_prefix;
mutable SharedMutex metadata_mutex;
public:
MetadataStorageFromPlainObjectStorage(ObjectStoragePtr object_storage_, String storage_path_prefix_);
@ -69,23 +77,37 @@ public:
bool supportsChmod() const override { return false; }
bool supportsStat() const override { return false; }
protected:
virtual std::shared_ptr<PathMap> getPathMap() const { throwNotImplemented(); }
virtual std::vector<std::string> getDirectChildrenOnDisk(
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & local_path) const;
};
class MetadataStorageFromPlainObjectStorageTransaction final : public IMetadataTransaction
class MetadataStorageFromPlainObjectStorageTransaction final : public IMetadataTransaction, private MetadataOperationsHolder
{
private:
const MetadataStorageFromPlainObjectStorage & metadata_storage;
MetadataStorageFromPlainObjectStorage & metadata_storage;
ObjectStoragePtr object_storage;
std::vector<MetadataOperationPtr> operations;
public:
explicit MetadataStorageFromPlainObjectStorageTransaction(const MetadataStorageFromPlainObjectStorage & metadata_storage_)
: metadata_storage(metadata_storage_)
explicit MetadataStorageFromPlainObjectStorageTransaction(
MetadataStorageFromPlainObjectStorage & metadata_storage_, ObjectStoragePtr object_storage_)
: metadata_storage(metadata_storage_), object_storage(object_storage_)
{}
const IMetadataStorage & getStorageForNonTransactionalReads() const override;
void addBlobToMetadata(const std::string & path, ObjectStorageKey object_key, uint64_t size_in_bytes) override;
void setLastModified(const String &, const Poco::Timestamp &) override
{
/// Noop
}
void createEmptyMetadataFile(const std::string & /* path */) override
{
/// No metadata, no need to create anything.
@ -100,17 +122,15 @@ public:
void createDirectoryRecursive(const std::string & path) override;
void moveDirectory(const std::string & path_from, const std::string & path_to) override;
void unlinkFile(const std::string & path) override;
void removeDirectory(const std::string & path) override;
UnlinkMetadataFileOperationOutcomePtr unlinkMetadata(const std::string & path) override;
void commit() override
{
/// TODO: rewrite with transactions
}
void commit() override;
bool supportsChmod() const override { return false; }
};
}

View File

@ -0,0 +1,190 @@
#include "MetadataStorageFromPlainObjectStorageOperations.h"
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Common/Exception.h>
#include <Common/logger_useful.h>
namespace DB
{
namespace ErrorCodes
{
extern const int FILE_DOESNT_EXIST;
extern const int FILE_ALREADY_EXISTS;
extern const int INCORRECT_DATA;
};
namespace
{
constexpr auto PREFIX_PATH_FILE_NAME = "prefix.path";
}
MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::MetadataStorageFromPlainObjectStorageCreateDirectoryOperation(
std::filesystem::path && path_,
std::string && key_prefix_,
MetadataStorageFromPlainObjectStorage::PathMap & path_map_,
ObjectStoragePtr object_storage_)
: path(std::move(path_)), key_prefix(key_prefix_), path_map(path_map_), object_storage(object_storage_)
{
}
void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::execute(std::unique_lock<SharedMutex> &)
{
if (path_map.contains(path))
return;
LOG_TRACE(getLogger("MetadataStorageFromPlainObjectStorageCreateDirectoryOperation"), "Creating metadata for directory '{}'", path);
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
auto object = StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME);
auto buf = object_storage->writeObject(
object,
WriteMode::Rewrite,
/* object_attributes */ std::nullopt,
/* buf_size */ DBMS_DEFAULT_BUFFER_SIZE,
/* settings */ {});
write_created = true;
[[maybe_unused]] auto result = path_map.emplace(path, std::move(key_prefix));
chassert(result.second);
writeString(path.string(), *buf);
buf->finalize();
write_finalized = true;
}
void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
if (write_finalized)
{
path_map.erase(path);
object_storage->removeObject(StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME));
}
else if (write_created)
object_storage->removeObjectIfExists(StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME));
}
MetadataStorageFromPlainObjectStorageMoveDirectoryOperation::MetadataStorageFromPlainObjectStorageMoveDirectoryOperation(
std::filesystem::path && path_from_,
std::filesystem::path && path_to_,
MetadataStorageFromPlainObjectStorage::PathMap & path_map_,
ObjectStoragePtr object_storage_)
: path_from(std::move(path_from_)), path_to(std::move(path_to_)), path_map(path_map_), object_storage(object_storage_)
{
}
std::unique_ptr<WriteBufferFromFileBase> MetadataStorageFromPlainObjectStorageMoveDirectoryOperation::createWriteBuf(
const std::filesystem::path & expected_path, const std::filesystem::path & new_path, bool validate_content)
{
auto expected_it = path_map.find(expected_path);
if (expected_it == path_map.end())
throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "Metadata object for the expected (source) path '{}' does not exist", expected_path);
if (path_map.contains(new_path))
throw Exception(ErrorCodes::FILE_ALREADY_EXISTS, "Metadata object for the new (destination) path '{}' already exists", new_path);
auto object_key = ObjectStorageKey::createAsRelative(expected_it->second, PREFIX_PATH_FILE_NAME);
auto object = StoredObject(object_key.serialize(), expected_path / PREFIX_PATH_FILE_NAME);
if (validate_content)
{
std::string data;
auto read_buf = object_storage->readObject(object);
readStringUntilEOF(data, *read_buf);
if (data != path_from)
throw Exception(
ErrorCodes::INCORRECT_DATA,
"Incorrect data for object key {}, expected {}, got {}",
object_key.serialize(),
expected_path,
data);
}
auto write_buf = object_storage->writeObject(
object,
WriteMode::Rewrite,
/* object_attributes */ std::nullopt,
/*buf_size*/ DBMS_DEFAULT_BUFFER_SIZE,
/*settings*/ {});
return write_buf;
}
void MetadataStorageFromPlainObjectStorageMoveDirectoryOperation::execute(std::unique_lock<SharedMutex> & /* metadata_lock */)
{
LOG_TRACE(
getLogger("MetadataStorageFromPlainObjectStorageMoveDirectoryOperation"), "Moving directory '{}' to '{}'", path_from, path_to);
auto write_buf = createWriteBuf(path_from, path_to, /* validate_content */ true);
write_created = true;
writeString(path_to.string(), *write_buf);
write_buf->finalize();
[[maybe_unused]] auto result = path_map.emplace(path_to, path_map.extract(path_from).mapped());
chassert(result.second);
write_finalized = true;
}
void MetadataStorageFromPlainObjectStorageMoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
if (write_finalized)
path_map.emplace(path_from, path_map.extract(path_to).mapped());
if (write_created)
{
auto write_buf = createWriteBuf(path_to, path_from, /* verify_content */ false);
writeString(path_from.string(), *write_buf);
write_buf->finalize();
}
}
MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation(
std::filesystem::path && path_, MetadataStorageFromPlainObjectStorage::PathMap & path_map_, ObjectStoragePtr object_storage_)
: path(std::move(path_)), path_map(path_map_), object_storage(object_storage_)
{
}
void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::execute(std::unique_lock<SharedMutex> & /* metadata_lock */)
{
auto path_it = path_map.find(path);
if (path_it == path_map.end())
return;
LOG_TRACE(getLogger("MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation"), "Removing directory '{}'", path);
key_prefix = path_it->second;
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
auto object = StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME);
object_storage->removeObject(object);
path_map.erase(path_it);
}
void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
{
if (!removed)
return;
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
auto object = StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME);
auto buf = object_storage->writeObject(
object,
WriteMode::Rewrite,
/* object_attributes */ std::nullopt,
/* buf_size */ DBMS_DEFAULT_BUFFER_SIZE,
/* settings */ {});
writeString(path.string(), *buf);
buf->finalize();
path_map.emplace(path, std::move(key_prefix));
}
}

View File

@ -0,0 +1,80 @@
#pragma once
#include <Disks/ObjectStorages/IMetadataOperation.h>
#include <Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.h>
#include <filesystem>
#include <map>
namespace DB
{
class MetadataStorageFromPlainObjectStorageCreateDirectoryOperation final : public IMetadataOperation
{
private:
std::filesystem::path path;
std::string key_prefix;
MetadataStorageFromPlainObjectStorage::PathMap & path_map;
ObjectStoragePtr object_storage;
bool write_created = false;
bool write_finalized = false;
public:
// Assuming that paths are normalized.
MetadataStorageFromPlainObjectStorageCreateDirectoryOperation(
std::filesystem::path && path_,
std::string && key_prefix_,
MetadataStorageFromPlainObjectStorage::PathMap & path_map_,
ObjectStoragePtr object_storage_);
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
};
class MetadataStorageFromPlainObjectStorageMoveDirectoryOperation final : public IMetadataOperation
{
private:
std::filesystem::path path_from;
std::filesystem::path path_to;
MetadataStorageFromPlainObjectStorage::PathMap & path_map;
ObjectStoragePtr object_storage;
bool write_created = false;
bool write_finalized = false;
std::unique_ptr<WriteBufferFromFileBase>
createWriteBuf(const std::filesystem::path & expected_path, const std::filesystem::path & new_path, bool validate_content);
public:
MetadataStorageFromPlainObjectStorageMoveDirectoryOperation(
std::filesystem::path && path_from_,
std::filesystem::path && path_to_,
MetadataStorageFromPlainObjectStorage::PathMap & path_map_,
ObjectStoragePtr object_storage_);
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
};
class MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation final : public IMetadataOperation
{
private:
std::filesystem::path path;
MetadataStorageFromPlainObjectStorage::PathMap & path_map;
ObjectStoragePtr object_storage;
std::string key_prefix;
bool removed = false;
public:
MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation(
std::filesystem::path && path_, MetadataStorageFromPlainObjectStorage::PathMap & path_map_, ObjectStoragePtr object_storage_);
void execute(std::unique_lock<SharedMutex> & metadata_lock) override;
void undo(std::unique_lock<SharedMutex> & metadata_lock) override;
};
}

View File

@ -0,0 +1,143 @@
#include <Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.h>
#include <IO/ReadHelpers.h>
#include <Common/ErrorCodes.h>
#include <Common/logger_useful.h>
#include "CommonPathPrefixKeyGenerator.h"
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
namespace
{
constexpr auto PREFIX_PATH_FILE_NAME = "prefix.path";
MetadataStorageFromPlainObjectStorage::PathMap loadPathPrefixMap(const std::string & root, ObjectStoragePtr object_storage)
{
MetadataStorageFromPlainObjectStorage::PathMap result;
RelativePathsWithMetadata files;
object_storage->listObjects(root, files, 0);
for (const auto & file : files)
{
auto remote_path = std::filesystem::path(file.relative_path);
if (remote_path.filename() != PREFIX_PATH_FILE_NAME)
continue;
StoredObject object{file.relative_path};
auto read_buf = object_storage->readObject(object);
String local_path;
readStringUntilEOF(local_path, *read_buf);
chassert(remote_path.has_parent_path());
auto res = result.emplace(local_path, remote_path.parent_path());
/// This can happen if table replication is enabled, then the same local path is written
/// in `prefix.path` of each replica.
/// TODO: should replicated tables (e.g., RMT) be explicitly disallowed?
if (!res.second)
LOG_WARNING(
getLogger("MetadataStorageFromPlainObjectStorage"),
"The local path '{}' is already mapped to a remote path '{}', ignoring: '{}'",
local_path,
res.first->second,
remote_path.parent_path().string());
}
return result;
}
std::vector<std::string> getDirectChildrenOnRewritableDisk(
const std::string & storage_key,
const RelativePathsWithMetadata & remote_paths,
const std::string & local_path,
const MetadataStorageFromPlainObjectStorage::PathMap & local_path_prefixes,
SharedMutex & shared_mutex)
{
using PathMap = MetadataStorageFromPlainObjectStorage::PathMap;
std::unordered_set<std::string> duplicates_filter;
/// Map remote paths into local subdirectories.
std::unordered_map<PathMap::mapped_type, PathMap::key_type> remote_to_local_subdir;
{
std::shared_lock lock(shared_mutex);
auto end_it = local_path_prefixes.end();
for (auto it = local_path_prefixes.lower_bound(local_path); it != end_it; ++it)
{
const auto & [k, v] = std::make_tuple(it->first.string(), it->second);
if (!k.starts_with(local_path))
break;
auto slash_num = count(k.begin() + local_path.size(), k.end(), '/');
if (slash_num != 1)
continue;
chassert(k.back() == '/');
remote_to_local_subdir.emplace(v, std::string(k.begin() + local_path.size(), k.end() - 1));
}
}
auto skip_list = std::set<std::string>{PREFIX_PATH_FILE_NAME};
for (const auto & elem : remote_paths)
{
const auto & path = elem.relative_path;
chassert(path.find(storage_key) == 0);
const auto child_pos = storage_key.size();
auto slash_pos = path.find('/', child_pos);
if (slash_pos == std::string::npos)
{
/// File names.
auto filename = path.substr(child_pos);
if (!skip_list.contains(filename))
duplicates_filter.emplace(std::move(filename));
}
else
{
/// Subdirectories.
auto it = remote_to_local_subdir.find(path.substr(0, slash_pos));
/// Mapped subdirectories.
if (it != remote_to_local_subdir.end())
duplicates_filter.emplace(it->second);
/// The remote subdirectory name is the same as the local subdirectory.
else
duplicates_filter.emplace(path.substr(child_pos, slash_pos - child_pos));
}
}
return std::vector<std::string>(std::make_move_iterator(duplicates_filter.begin()), std::make_move_iterator(duplicates_filter.end()));
}
}
MetadataStorageFromPlainRewritableObjectStorage::MetadataStorageFromPlainRewritableObjectStorage(
ObjectStoragePtr object_storage_, String storage_path_prefix_)
: MetadataStorageFromPlainObjectStorage(object_storage_, storage_path_prefix_)
, path_map(std::make_shared<PathMap>(loadPathPrefixMap(object_storage->getCommonKeyPrefix(), object_storage)))
{
if (object_storage->isWriteOnce())
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"MetadataStorageFromPlainRewritableObjectStorage is not compatible with write-once storage '{}'",
object_storage->getName());
auto keys_gen = std::make_shared<CommonPathPrefixKeyGenerator>(object_storage->getCommonKeyPrefix(), metadata_mutex, path_map);
object_storage->setKeysGenerator(keys_gen);
}
std::vector<std::string> MetadataStorageFromPlainRewritableObjectStorage::getDirectChildrenOnDisk(
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & local_path) const
{
return getDirectChildrenOnRewritableDisk(storage_key, remote_paths, local_path, *getPathMap(), metadata_mutex);
}
}

View File

@ -0,0 +1,26 @@
#pragma once
#include <Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.h>
#include <memory>
namespace DB
{
class MetadataStorageFromPlainRewritableObjectStorage final : public MetadataStorageFromPlainObjectStorage
{
private:
std::shared_ptr<PathMap> path_map;
public:
MetadataStorageFromPlainRewritableObjectStorage(ObjectStoragePtr object_storage_, String storage_path_prefix_);
MetadataStorageType getType() const override { return MetadataStorageType::PlainRewritable; }
protected:
std::shared_ptr<PathMap> getPathMap() const override { return path_map; }
std::vector<std::string> getDirectChildrenOnDisk(
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & local_path) const override;
};
}

View File

@ -0,0 +1,23 @@
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
#include <base/defines.h>
namespace DB
{
std::string toString(MetadataStorageTransactionState state)
{
switch (state)
{
case MetadataStorageTransactionState::PREPARING:
return "PREPARING";
case MetadataStorageTransactionState::FAILED:
return "FAILED";
case MetadataStorageTransactionState::COMMITTED:
return "COMMITTED";
case MetadataStorageTransactionState::PARTIALLY_ROLLED_BACK:
return "PARTIALLY_ROLLED_BACK";
}
UNREACHABLE();
}
}

View File

@ -4,7 +4,7 @@
namespace DB
{
enum class MetadataFromDiskTransactionState
enum class MetadataStorageTransactionState
{
PREPARING,
FAILED,
@ -12,6 +12,5 @@ enum class MetadataFromDiskTransactionState
PARTIALLY_ROLLED_BACK,
};
std::string toString(MetadataFromDiskTransactionState state);
std::string toString(MetadataStorageTransactionState state);
}

View File

@ -1,9 +1,11 @@
#include "config.h"
#include <utility>
#include <Disks/ObjectStorages/ObjectStorageFactory.h>
#include "Disks/DiskType.h"
#include "config.h"
#if USE_AWS_S3
#include <Disks/ObjectStorages/S3/DiskS3Utils.h>
#include <Disks/ObjectStorages/S3/S3ObjectStorage.h>
#include <Disks/ObjectStorages/S3/diskSettings.h>
#include <Disks/ObjectStorages/S3/DiskS3Utils.h>
#endif
#if USE_HDFS && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD)
#include <Disks/ObjectStorages/HDFS/HDFSObjectStorage.h>
@ -20,6 +22,7 @@
#endif
#include <Disks/ObjectStorages/MetadataStorageFactory.h>
#include <Disks/ObjectStorages/PlainObjectStorage.h>
#include <Disks/ObjectStorages/PlainRewritableObjectStorage.h>
#include <Interpreters/Context.h>
#include <Common/Macros.h>
@ -35,36 +38,50 @@ namespace ErrorCodes
extern const int UNKNOWN_ELEMENT_IN_CONFIG;
extern const int BAD_ARGUMENTS;
extern const int LOGICAL_ERROR;
extern const int NOT_IMPLEMENTED;
}
namespace
{
bool isPlainStorage(
ObjectStorageType type,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix)
{
auto compatibility_hint = MetadataStorageFactory::getCompatibilityMetadataTypeHint(type);
auto metadata_type = MetadataStorageFactory::getMetadataType(config, config_prefix, compatibility_hint);
return metadataTypeFromString(metadata_type) == MetadataStorageType::Plain;
}
template <typename BaseObjectStorage, class ...Args>
ObjectStoragePtr createObjectStorage(
ObjectStorageType type,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
Args && ...args)
bool isCompatibleWithMetadataStorage(
ObjectStorageType storage_type,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
MetadataStorageType target_metadata_type)
{
auto compatibility_hint = MetadataStorageFactory::getCompatibilityMetadataTypeHint(storage_type);
auto metadata_type = MetadataStorageFactory::getMetadataType(config, config_prefix, compatibility_hint);
return metadataTypeFromString(metadata_type) == target_metadata_type;
}
bool isPlainStorage(ObjectStorageType type, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix)
{
return isCompatibleWithMetadataStorage(type, config, config_prefix, MetadataStorageType::Plain);
}
bool isPlainRewritableStorage(ObjectStorageType type, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix)
{
return isCompatibleWithMetadataStorage(type, config, config_prefix, MetadataStorageType::PlainRewritable);
}
template <typename BaseObjectStorage, class... Args>
ObjectStoragePtr createObjectStorage(
ObjectStorageType type, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, Args &&... args)
{
if (isPlainStorage(type, config, config_prefix))
return std::make_shared<PlainObjectStorage<BaseObjectStorage>>(std::forward<Args>(args)...);
else if (isPlainRewritableStorage(type, config, config_prefix))
{
if (isPlainStorage(type, config, config_prefix))
{
return std::make_shared<PlainObjectStorage<BaseObjectStorage>>(std::forward<Args>(args)...);
}
else
{
return std::make_shared<BaseObjectStorage>(std::forward<Args>(args)...);
}
/// TODO(jkartseva@): Test support for generic disk type
if (type != ObjectStorageType::S3)
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "plain_rewritable metadata storage support is implemented only for S3");
return std::make_shared<PlainRewritableObjectStorage<BaseObjectStorage>>(std::forward<Args>(args)...);
}
else
return std::make_shared<BaseObjectStorage>(std::forward<Args>(args)...);
}
}
ObjectStorageFactory & ObjectStorageFactory::instance()
@ -76,10 +93,7 @@ ObjectStorageFactory & ObjectStorageFactory::instance()
void ObjectStorageFactory::registerObjectStorageType(const std::string & type, Creator creator)
{
if (!registry.emplace(type, creator).second)
{
throw Exception(ErrorCodes::LOGICAL_ERROR,
"ObjectStorageFactory: the metadata type '{}' is not unique", type);
}
throw Exception(ErrorCodes::LOGICAL_ERROR, "ObjectStorageFactory: the metadata type '{}' is not unique", type);
}
ObjectStoragePtr ObjectStorageFactory::create(
@ -91,13 +105,9 @@ ObjectStoragePtr ObjectStorageFactory::create(
{
std::string type;
if (config.has(config_prefix + ".object_storage_type"))
{
type = config.getString(config_prefix + ".object_storage_type");
}
else if (config.has(config_prefix + ".type")) /// .type -- for compatibility.
{
type = config.getString(config_prefix + ".type");
}
else
{
throw Exception(ErrorCodes::NO_ELEMENTS_IN_CONFIG, "Expected `object_storage_type` in config");
@ -210,31 +220,66 @@ void registerS3PlainObjectStorage(ObjectStorageFactory & factory)
return object_storage;
});
}
void registerS3PlainRewritableObjectStorage(ObjectStorageFactory & factory)
{
static constexpr auto disk_type = "s3_plain_rewritable";
factory.registerObjectStorageType(
disk_type,
[](const std::string & name,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
const ContextPtr & context,
bool skip_access_check) -> ObjectStoragePtr
{
/// send_metadata changes the filenames (includes revision), while
/// s3_plain_rewritable does not support file renaming.
if (config.getBool(config_prefix + ".send_metadata", false))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "s3_plain_rewritable does not supports send_metadata");
auto uri = getS3URI(config, config_prefix, context);
auto s3_capabilities = getCapabilitiesFromConfig(config, config_prefix);
auto settings = getSettings(config, config_prefix, context);
auto client = getClient(config, config_prefix, context, *settings);
auto key_generator = getKeyGenerator(uri, config, config_prefix);
auto object_storage = std::make_shared<PlainRewritableObjectStorage<S3ObjectStorage>>(
std::move(client), std::move(settings), uri, s3_capabilities, key_generator, name);
/// NOTE: should we still perform this check for clickhouse-disks?
if (!skip_access_check)
checkS3Capabilities(*dynamic_cast<S3ObjectStorage *>(object_storage.get()), s3_capabilities, name);
return object_storage;
});
}
#endif
#if USE_HDFS && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD)
void registerHDFSObjectStorage(ObjectStorageFactory & factory)
{
factory.registerObjectStorageType("hdfs", [](
const std::string & /* name */,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
const ContextPtr & context,
bool /* skip_access_check */) -> ObjectStoragePtr
{
auto uri = context->getMacros()->expand(config.getString(config_prefix + ".endpoint"));
checkHDFSURL(uri);
if (uri.back() != '/')
throw Exception(ErrorCodes::BAD_ARGUMENTS, "HDFS path must ends with '/', but '{}' doesn't.", uri);
factory.registerObjectStorageType(
"hdfs",
[](const std::string & /* name */,
const Poco::Util::AbstractConfiguration & config,
const std::string & config_prefix,
const ContextPtr & context,
bool /* skip_access_check */) -> ObjectStoragePtr
{
auto uri = context->getMacros()->expand(config.getString(config_prefix + ".endpoint"));
checkHDFSURL(uri);
if (uri.back() != '/')
throw Exception(ErrorCodes::BAD_ARGUMENTS, "HDFS path must ends with '/', but '{}' doesn't.", uri);
std::unique_ptr<HDFSObjectStorageSettings> settings = std::make_unique<HDFSObjectStorageSettings>(
config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024),
config.getInt(config_prefix + ".objects_chunk_size_to_delete", 1000),
context->getSettingsRef().hdfs_replication
);
std::unique_ptr<HDFSObjectStorageSettings> settings = std::make_unique<HDFSObjectStorageSettings>(
config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024),
config.getInt(config_prefix + ".objects_chunk_size_to_delete", 1000),
context->getSettingsRef().hdfs_replication);
return createObjectStorage<HDFSObjectStorage>(ObjectStorageType::HDFS, config, config_prefix, uri, std::move(settings), config);
});
return createObjectStorage<HDFSObjectStorage>(ObjectStorageType::HDFS, config, config_prefix, uri, std::move(settings), config);
});
}
#endif
@ -317,6 +362,7 @@ void registerObjectStorages()
#if USE_AWS_S3
registerS3ObjectStorage(factory);
registerS3PlainObjectStorage(factory);
registerS3PlainRewritableObjectStorage(factory);
#endif
#if USE_HDFS && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD)

View File

@ -0,0 +1,24 @@
#pragma once
#include <Disks/ObjectStorages/IObjectStorage.h>
namespace DB
{
template <typename BaseObjectStorage>
class PlainRewritableObjectStorage : public BaseObjectStorage
{
public:
template <class... Args>
explicit PlainRewritableObjectStorage(Args &&... args) : BaseObjectStorage(std::forward<Args>(args)...)
{
}
std::string getName() const override { return "PlainRewritable" + BaseObjectStorage::getName(); }
bool isWriteOnce() const override { return false; }
bool isPlain() const override { return true; }
};
}

View File

@ -29,7 +29,10 @@ void registerDiskObjectStorage(DiskFactory & factory, bool global_skip_access_ch
if (!config.has(config_prefix + ".metadata_type"))
{
if (object_storage->isPlain())
compatibility_metadata_type_hint = "plain";
if (object_storage->isWriteOnce())
compatibility_metadata_type_hint = "plain";
else
compatibility_metadata_type_hint = "plain_rewritable";
else
compatibility_metadata_type_hint = MetadataStorageFactory::getCompatibilityMetadataTypeHint(object_storage->getType());
}
@ -53,6 +56,7 @@ void registerDiskObjectStorage(DiskFactory & factory, bool global_skip_access_ch
#if USE_AWS_S3
factory.registerDiskType("s3", creator); /// For compatibility
factory.registerDiskType("s3_plain", creator); /// For compatibility
factory.registerDiskType("s3_plain_rewritable", creator); // For compatibility
#endif
#if USE_HDFS
factory.registerDiskType("hdfs", creator); /// For compatibility

View File

@ -1,4 +1,5 @@
#include <Disks/ObjectStorages/S3/S3ObjectStorage.h>
#include "Common/ObjectStorageKey.h"
#if USE_AWS_S3
@ -568,10 +569,17 @@ ObjectStorageKey S3ObjectStorage::generateObjectKeyForPath(const std::string & p
{
if (!key_generator)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Key generator is not set");
return key_generator->generate(path);
return key_generator->generate(path, /* is_directory */ false);
}
ObjectStorageKey S3ObjectStorage::generateObjectKeyPrefixForDirectoryPath(const std::string & path) const
{
if (!key_generator)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Key generator is not set");
return key_generator->generate(path, /* is_directory */ true);
}
}
#endif

View File

@ -43,8 +43,6 @@ struct S3ObjectStorageSettings
class S3ObjectStorage : public IObjectStorage
{
private:
friend class S3PlainObjectStorage;
S3ObjectStorage(
const char * logger_name,
std::unique_ptr<S3::Client> && client_,
@ -54,11 +52,11 @@ private:
ObjectStorageKeysGeneratorPtr key_generator_,
const String & disk_name_)
: uri(uri_)
, key_generator(std::move(key_generator_))
, disk_name(disk_name_)
, client(std::move(client_))
, s3_settings(std::move(s3_settings_))
, s3_capabilities(s3_capabilities_)
, key_generator(std::move(key_generator_))
, log(getLogger(logger_name))
{
}
@ -161,9 +159,12 @@ public:
bool supportParallelWrite() const override { return true; }
ObjectStorageKey generateObjectKeyForPath(const std::string & path) const override;
ObjectStorageKey generateObjectKeyPrefixForDirectoryPath(const std::string & path) const override;
bool isReadOnly() const override { return s3_settings.get()->read_only; }
void setKeysGenerator(ObjectStorageKeysGeneratorPtr gen) override { key_generator = gen; }
private:
void setNewSettings(std::unique_ptr<S3ObjectStorageSettings> && s3_settings_);
@ -172,13 +173,14 @@ private:
const S3::URI uri;
ObjectStorageKeysGeneratorPtr key_generator;
std::string disk_name;
MultiVersion<S3::Client> client;
MultiVersion<S3ObjectStorageSettings> s3_settings;
S3Capabilities s3_capabilities;
ObjectStorageKeysGeneratorPtr key_generator;
LoggerPtr log;
};

View File

@ -1,9 +1,9 @@
#pragma once
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <Disks/ObjectStorages/MetadataFromDiskTransactionState.h>
#include <Disks/ObjectStorages/Web/WebObjectStorage.h>
#include <Disks/IDisk.h>
#include <Disks/ObjectStorages/IMetadataStorage.h>
#include <Disks/ObjectStorages/MetadataStorageTransactionState.h>
#include <Disks/ObjectStorages/Web/WebObjectStorage.h>
namespace DB

View File

@ -305,7 +305,7 @@ InputFormatPtr FormatFactory::getInput(
auto format_settings = _format_settings ? *_format_settings : getFormatSettings(context);
const Settings & settings = context->getSettingsRef();
size_t max_parsing_threads = _max_parsing_threads.value_or(settings.max_threads);
size_t max_parsing_threads = _max_parsing_threads.value_or(settings.max_parsing_threads);
size_t max_download_threads = _max_download_threads.value_or(settings.max_download_threads);
RowInputFormatParams row_input_format_params;

View File

@ -45,7 +45,7 @@ template <> struct FunctionUnaryArithmeticMonotonicity<NameAbs>
if ((left_float < 0 && right_float > 0) || (left_float > 0 && right_float < 0))
return {};
return { .is_monotonic = true, .is_positive = left_float > 0, .is_strict = true, };
return { .is_monotonic = true, .is_positive = std::min(left_float, right_float) >= 0, .is_strict = true, };
}
};

View File

@ -1374,18 +1374,18 @@ std::shared_ptr<const EnabledRolesInfo> Context::getRolesInfo() const
namespace
{
ALWAYS_INLINE inline void
contextSanityCheckWithLock(const Context & context, const Settings & settings, const std::lock_guard<ContextSharedMutex> &)
contextSanityClampSettingsWithLock(const Context & context, Settings & settings, const std::lock_guard<ContextSharedMutex> &)
{
const auto type = context.getApplicationType();
if (type == Context::ApplicationType::LOCAL || type == Context::ApplicationType::SERVER)
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
}
ALWAYS_INLINE inline void contextSanityCheck(const Context & context, const Settings & settings)
ALWAYS_INLINE inline void contextSanityClampSettings(const Context & context, Settings & settings)
{
const auto type = context.getApplicationType();
if (type == Context::ApplicationType::LOCAL || type == Context::ApplicationType::SERVER)
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
}
}
@ -1498,7 +1498,7 @@ void Context::setCurrentProfilesWithLock(const SettingsProfilesInfo & profiles_i
checkSettingsConstraintsWithLock(profiles_info.settings, SettingSource::PROFILE);
applySettingsChangesWithLock(profiles_info.settings, lock);
settings_constraints_and_current_profiles = profiles_info.getConstraintsAndProfileIDs(settings_constraints_and_current_profiles);
contextSanityCheckWithLock(*this, settings, lock);
contextSanityClampSettingsWithLock(*this, settings, lock);
}
void Context::setCurrentProfile(const String & profile_name, bool check_constraints)
@ -2101,7 +2101,7 @@ void Context::setSettings(const Settings & settings_)
std::lock_guard lock(mutex);
settings = settings_;
need_recalculate_access = true;
contextSanityCheck(*this, settings);
contextSanityClampSettings(*this, settings);
}
void Context::setSettingWithLock(std::string_view name, const String & value, const std::lock_guard<ContextSharedMutex> & lock)
@ -2114,7 +2114,7 @@ void Context::setSettingWithLock(std::string_view name, const String & value, co
settings.set(name, value);
if (ContextAccessParams::dependsOnSettingName(name))
need_recalculate_access = true;
contextSanityCheckWithLock(*this, settings, lock);
contextSanityClampSettingsWithLock(*this, settings, lock);
}
void Context::setSettingWithLock(std::string_view name, const Field & value, const std::lock_guard<ContextSharedMutex> & lock)
@ -2134,7 +2134,7 @@ void Context::applySettingChangeWithLock(const SettingChange & change, const std
try
{
setSettingWithLock(change.name, change.value, lock);
contextSanityCheckWithLock(*this, settings, lock);
contextSanityClampSettingsWithLock(*this, settings, lock);
}
catch (Exception & e)
{
@ -2162,7 +2162,7 @@ void Context::setSetting(std::string_view name, const Field & value)
{
std::lock_guard lock(mutex);
setSettingWithLock(name, value, lock);
contextSanityCheckWithLock(*this, settings, lock);
contextSanityClampSettingsWithLock(*this, settings, lock);
}
void Context::applySettingChange(const SettingChange & change)
@ -2187,39 +2187,39 @@ void Context::applySettingsChanges(const SettingsChanges & changes)
applySettingsChangesWithLock(changes, lock);
}
void Context::checkSettingsConstraintsWithLock(const SettingsProfileElements & profile_elements, SettingSource source) const
void Context::checkSettingsConstraintsWithLock(const SettingsProfileElements & profile_elements, SettingSource source)
{
getSettingsConstraintsAndCurrentProfilesWithLock()->constraints.check(settings, profile_elements, source);
if (getApplicationType() == ApplicationType::LOCAL || getApplicationType() == ApplicationType::SERVER)
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
}
void Context::checkSettingsConstraintsWithLock(const SettingChange & change, SettingSource source) const
void Context::checkSettingsConstraintsWithLock(const SettingChange & change, SettingSource source)
{
getSettingsConstraintsAndCurrentProfilesWithLock()->constraints.check(settings, change, source);
if (getApplicationType() == ApplicationType::LOCAL || getApplicationType() == ApplicationType::SERVER)
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
}
void Context::checkSettingsConstraintsWithLock(const SettingsChanges & changes, SettingSource source) const
void Context::checkSettingsConstraintsWithLock(const SettingsChanges & changes, SettingSource source)
{
getSettingsConstraintsAndCurrentProfilesWithLock()->constraints.check(settings, changes, source);
if (getApplicationType() == ApplicationType::LOCAL || getApplicationType() == ApplicationType::SERVER)
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
}
void Context::checkSettingsConstraintsWithLock(SettingsChanges & changes, SettingSource source) const
void Context::checkSettingsConstraintsWithLock(SettingsChanges & changes, SettingSource source)
{
getSettingsConstraintsAndCurrentProfilesWithLock()->constraints.check(settings, changes, source);
if (getApplicationType() == ApplicationType::LOCAL || getApplicationType() == ApplicationType::SERVER)
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
}
void Context::clampToSettingsConstraintsWithLock(SettingsChanges & changes, SettingSource source) const
void Context::clampToSettingsConstraintsWithLock(SettingsChanges & changes, SettingSource source)
{
getSettingsConstraintsAndCurrentProfilesWithLock()->constraints.clamp(settings, changes, source);
if (getApplicationType() == ApplicationType::LOCAL || getApplicationType() == ApplicationType::SERVER)
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
}
void Context::checkMergeTreeSettingsConstraintsWithLock(const MergeTreeSettings & merge_tree_settings, const SettingsChanges & changes) const
@ -2227,32 +2227,32 @@ void Context::checkMergeTreeSettingsConstraintsWithLock(const MergeTreeSettings
getSettingsConstraintsAndCurrentProfilesWithLock()->constraints.check(merge_tree_settings, changes);
}
void Context::checkSettingsConstraints(const SettingsProfileElements & profile_elements, SettingSource source) const
void Context::checkSettingsConstraints(const SettingsProfileElements & profile_elements, SettingSource source)
{
SharedLockGuard lock(mutex);
checkSettingsConstraintsWithLock(profile_elements, source);
}
void Context::checkSettingsConstraints(const SettingChange & change, SettingSource source) const
void Context::checkSettingsConstraints(const SettingChange & change, SettingSource source)
{
SharedLockGuard lock(mutex);
checkSettingsConstraintsWithLock(change, source);
}
void Context::checkSettingsConstraints(const SettingsChanges & changes, SettingSource source) const
void Context::checkSettingsConstraints(const SettingsChanges & changes, SettingSource source)
{
SharedLockGuard lock(mutex);
getSettingsConstraintsAndCurrentProfilesWithLock()->constraints.check(settings, changes, source);
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
}
void Context::checkSettingsConstraints(SettingsChanges & changes, SettingSource source) const
void Context::checkSettingsConstraints(SettingsChanges & changes, SettingSource source)
{
SharedLockGuard lock(mutex);
checkSettingsConstraintsWithLock(changes, source);
}
void Context::clampToSettingsConstraints(SettingsChanges & changes, SettingSource source) const
void Context::clampToSettingsConstraints(SettingsChanges & changes, SettingSource source)
{
SharedLockGuard lock(mutex);
clampToSettingsConstraintsWithLock(changes, source);
@ -4484,7 +4484,7 @@ void Context::setDefaultProfiles(const Poco::Util::AbstractConfiguration & confi
setCurrentProfile(shared->system_profile_name);
applySettingsQuirks(settings, getLogger("SettingsQuirks"));
doSettingsSanityCheck(settings);
doSettingsSanityCheckClamp(settings, getLogger("SettingsSanity"));
shared->buffer_profile_name = config.getString("buffer_profile", shared->system_profile_name);
buffer_context = Context::createCopy(shared_from_this());

View File

@ -315,6 +315,7 @@ protected:
/// This parameter can be set by the HTTP client to tune the behavior of output formats for compatibility.
UInt64 client_protocol_version = 0;
public:
/// Record entities accessed by current query, and store this information in system.query_log.
struct QueryAccessInfo
{
@ -339,8 +340,10 @@ protected:
return *this;
}
void swap(QueryAccessInfo & rhs) noexcept
void swap(QueryAccessInfo & rhs) noexcept TSA_NO_THREAD_SAFETY_ANALYSIS
{
/// TSA_NO_THREAD_SAFETY_ANALYSIS because it doesn't support scoped_lock
std::scoped_lock lck{mutex, rhs.mutex};
std::swap(databases, rhs.databases);
std::swap(tables, rhs.tables);
std::swap(columns, rhs.columns);
@ -351,19 +354,21 @@ protected:
/// To prevent a race between copy-constructor and other uses of this structure.
mutable std::mutex mutex{};
std::set<std::string> databases{};
std::set<std::string> tables{};
std::set<std::string> columns{};
std::set<std::string> partitions{};
std::set<std::string> projections{};
std::set<std::string> views{};
std::set<std::string> databases TSA_GUARDED_BY(mutex){};
std::set<std::string> tables TSA_GUARDED_BY(mutex){};
std::set<std::string> columns TSA_GUARDED_BY(mutex){};
std::set<std::string> partitions TSA_GUARDED_BY(mutex){};
std::set<std::string> projections TSA_GUARDED_BY(mutex){};
std::set<std::string> views TSA_GUARDED_BY(mutex){};
};
using QueryAccessInfoPtr = std::shared_ptr<QueryAccessInfo>;
protected:
/// In some situations, we want to be able to transfer the access info from children back to parents (e.g. definers context).
/// Therefore, query_access_info must be a pointer.
QueryAccessInfoPtr query_access_info;
public:
/// Record names of created objects of factories (for testing, etc)
struct QueryFactoriesInfo
{
@ -385,19 +390,20 @@ protected:
QueryFactoriesInfo(QueryFactoriesInfo && rhs) = delete;
std::unordered_set<std::string> aggregate_functions;
std::unordered_set<std::string> aggregate_function_combinators;
std::unordered_set<std::string> database_engines;
std::unordered_set<std::string> data_type_families;
std::unordered_set<std::string> dictionaries;
std::unordered_set<std::string> formats;
std::unordered_set<std::string> functions;
std::unordered_set<std::string> storages;
std::unordered_set<std::string> table_functions;
std::unordered_set<std::string> aggregate_functions TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> aggregate_function_combinators TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> database_engines TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> data_type_families TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> dictionaries TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> formats TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> functions TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> storages TSA_GUARDED_BY(mutex);
std::unordered_set<std::string> table_functions TSA_GUARDED_BY(mutex);
mutable std::mutex mutex;
};
protected:
/// Needs to be changed while having const context in factories methods
mutable QueryFactoriesInfo query_factories_info;
/// Query metrics for reading data asynchronously with IAsynchronousReader.
@ -783,11 +789,11 @@ public:
void applySettingsChanges(const SettingsChanges & changes);
/// Checks the constraints.
void checkSettingsConstraints(const SettingsProfileElements & profile_elements, SettingSource source) const;
void checkSettingsConstraints(const SettingChange & change, SettingSource source) const;
void checkSettingsConstraints(const SettingsChanges & changes, SettingSource source) const;
void checkSettingsConstraints(SettingsChanges & changes, SettingSource source) const;
void clampToSettingsConstraints(SettingsChanges & changes, SettingSource source) const;
void checkSettingsConstraints(const SettingsProfileElements & profile_elements, SettingSource source);
void checkSettingsConstraints(const SettingChange & change, SettingSource source);
void checkSettingsConstraints(const SettingsChanges & changes, SettingSource source);
void checkSettingsConstraints(SettingsChanges & changes, SettingSource source);
void clampToSettingsConstraints(SettingsChanges & changes, SettingSource source);
void checkMergeTreeSettingsConstraints(const MergeTreeSettings & merge_tree_settings, const SettingsChanges & changes) const;
/// Reset settings to default value
@ -1293,15 +1299,15 @@ private:
void setCurrentDatabaseWithLock(const String & name, const std::lock_guard<ContextSharedMutex> & lock);
void checkSettingsConstraintsWithLock(const SettingsProfileElements & profile_elements, SettingSource source) const;
void checkSettingsConstraintsWithLock(const SettingsProfileElements & profile_elements, SettingSource source);
void checkSettingsConstraintsWithLock(const SettingChange & change, SettingSource source) const;
void checkSettingsConstraintsWithLock(const SettingChange & change, SettingSource source);
void checkSettingsConstraintsWithLock(const SettingsChanges & changes, SettingSource source) const;
void checkSettingsConstraintsWithLock(const SettingsChanges & changes, SettingSource source);
void checkSettingsConstraintsWithLock(SettingsChanges & changes, SettingSource source) const;
void checkSettingsConstraintsWithLock(SettingsChanges & changes, SettingSource source);
void clampToSettingsConstraintsWithLock(SettingsChanges & changes, SettingSource source) const;
void clampToSettingsConstraintsWithLock(SettingsChanges & changes, SettingSource source);
void checkMergeTreeSettingsConstraintsWithLock(const MergeTreeSettings & merge_tree_settings, const SettingsChanges & changes) const;

View File

@ -780,6 +780,11 @@ BlockIO InterpreterSystemQuery::execute()
resetCoverage();
break;
}
case Type::UNLOAD_PRIMARY_KEY:
{
unloadPrimaryKeys();
break;
}
#if USE_JEMALLOC
case Type::JEMALLOC_PURGE:
@ -1157,6 +1162,42 @@ void InterpreterSystemQuery::waitLoadingParts()
}
}
void InterpreterSystemQuery::unloadPrimaryKeys()
{
if (!table_id.empty())
{
getContext()->checkAccess(AccessType::SYSTEM_UNLOAD_PRIMARY_KEY, table_id.database_name, table_id.table_name);
StoragePtr table = DatabaseCatalog::instance().getTable(table_id, getContext());
if (auto * merge_tree = dynamic_cast<MergeTreeData *>(table.get()))
{
LOG_TRACE(log, "Unloading primary keys for table {}", table_id.getFullTableName());
merge_tree->unloadPrimaryKeys();
}
else
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Command UNLOAD PRIMARY KEY is supported only for MergeTree table, but got: {}", table->getName());
}
}
else
{
getContext()->checkAccess(AccessType::SYSTEM_UNLOAD_PRIMARY_KEY);
LOG_TRACE(log, "Unloading primary keys for all tables");
for (auto & database : DatabaseCatalog::instance().getDatabases())
{
for (auto it = database.second->getTablesIterator(getContext()); it->isValid(); it->next())
{
if (auto * merge_tree = dynamic_cast<MergeTreeData *>(it->table().get()))
{
merge_tree->unloadPrimaryKeys();
}
}
}
}
}
void InterpreterSystemQuery::syncReplicatedDatabase(ASTSystemQuery & query)
{
const auto database_name = query.getDatabase();
@ -1470,6 +1511,14 @@ AccessRightsElements InterpreterSystemQuery::getRequiredAccessForDDLOnCluster()
required_access.emplace_back(AccessType::SYSTEM_JEMALLOC);
break;
}
case Type::UNLOAD_PRIMARY_KEY:
{
if (!query.table)
required_access.emplace_back(AccessType::SYSTEM_UNLOAD_PRIMARY_KEY);
else
required_access.emplace_back(AccessType::SYSTEM_UNLOAD_PRIMARY_KEY, query.getDatabase(), query.getTable());
break;
}
case Type::STOP_THREAD_FUZZER:
case Type::START_THREAD_FUZZER:
case Type::ENABLE_FAILPOINT:

View File

@ -60,6 +60,7 @@ private:
void syncReplica(ASTSystemQuery & query);
void setReplicaReadiness(bool ready);
void waitLoadingParts();
void unloadPrimaryKeys();
void syncReplicatedDatabase(ASTSystemQuery & query);

View File

@ -1,121 +0,0 @@
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <Interpreters/RewriteSumIfFunctionVisitor.h>
#include <Parsers/ASTFunction.h>
#include <Parsers/ASTLiteral.h>
#include <Common/typeid_cast.h>
namespace DB
{
void RewriteSumIfFunctionMatcher::visit(ASTPtr & ast, Data & data)
{
if (auto * func = ast->as<ASTFunction>())
{
if (func->is_window_function)
return;
visit(*func, ast, data);
}
}
void RewriteSumIfFunctionMatcher::visit(const ASTFunction & func, ASTPtr & ast, Data &)
{
if (!func.arguments || func.arguments->children.empty())
return;
auto lower_name = Poco::toLower(func.name);
/// sumIf, SumIf or sUMIf are valid function names, but sumIF or sumiF are not
if (lower_name != "sum" && (lower_name != "sumif" || !endsWith(func.name, "If")))
return;
const auto & func_arguments = func.arguments->children;
if (lower_name == "sumif")
{
const auto * literal = func_arguments[0]->as<ASTLiteral>();
if (!literal || !DB::isInt64OrUInt64FieldType(literal->value.getType()))
return;
if (func_arguments.size() == 2)
{
std::shared_ptr<ASTFunction> new_func;
if (literal->value.get<UInt64>() == 1)
{
/// sumIf(1, cond) -> countIf(cond)
new_func = makeASTFunction("countIf", func_arguments[1]);
}
else
{
/// sumIf(123, cond) -> 123 * countIf(cond)
auto count_if_func = makeASTFunction("countIf", func_arguments[1]);
new_func = makeASTFunction("multiply", func_arguments[0], std::move(count_if_func));
}
new_func->setAlias(func.alias);
ast = std::move(new_func);
return;
}
}
else
{
const auto * nested_func = func_arguments[0]->as<ASTFunction>();
if (!nested_func || Poco::toLower(nested_func->name) != "if" || nested_func->arguments->children.size() != 3)
return;
const auto & if_arguments = nested_func->arguments->children;
const auto * first_literal = if_arguments[1]->as<ASTLiteral>();
const auto * second_literal = if_arguments[2]->as<ASTLiteral>();
if (first_literal && second_literal)
{
if (!DB::isInt64OrUInt64FieldType(first_literal->value.getType()) || !DB::isInt64OrUInt64FieldType(second_literal->value.getType()))
return;
auto first_value = first_literal->value.get<UInt64>();
auto second_value = second_literal->value.get<UInt64>();
std::shared_ptr<ASTFunction> new_func;
if (second_value == 0)
{
if (first_value == 1)
{
/// sum(if(cond, 1, 0)) -> countIf(cond)
new_func = makeASTFunction("countIf", if_arguments[0]);
}
else
{
/// sum(if(cond, 123, 0)) -> 123 * countIf(cond)
auto count_if_func = makeASTFunction("countIf", if_arguments[0]);
new_func = makeASTFunction("multiply", if_arguments[1], std::move(count_if_func));
}
new_func->setAlias(func.alias);
ast = std::move(new_func);
return;
}
if (first_value == 0)
{
auto not_func = makeASTFunction("not", if_arguments[0]);
if (second_value == 1)
{
/// sum(if(cond, 0, 1)) -> countIf(not(cond))
new_func = makeASTFunction("countIf", std::move(not_func));
}
else
{
/// sum(if(cond, 0, 123)) -> 123 * countIf(not(cond))
auto count_if_func = makeASTFunction("countIf", std::move(not_func));
new_func = makeASTFunction("multiply", if_arguments[2], std::move(count_if_func));
}
new_func->setAlias(func.alias);
ast = std::move(new_func);
return;
}
}
}
}
}

View File

@ -1,33 +0,0 @@
#pragma once
#include <unordered_set>
#include <Parsers/IAST.h>
#include <Interpreters/InDepthNodeVisitor.h>
namespace DB
{
class ASTFunction;
/// Rewrite 'sum(if())' and 'sumIf' functions to counIf.
/// sumIf(1, cond) -> countIf(1, cond)
/// sumIf(123, cond) -> 123 * countIf(1, cond)
/// sum(if(cond, 1, 0)) -> countIf(cond)
/// sum(if(cond, 123, 0)) -> 123 * countIf(cond)
/// sum(if(cond, 0, 1)) -> countIf(not(cond))
/// sum(if(cond, 0, 123)) -> 123 * countIf(not(cond))
class RewriteSumIfFunctionMatcher
{
public:
struct Data
{
};
static void visit(ASTPtr & ast, Data &);
static void visit(const ASTFunction &, ASTPtr & ast, Data &);
static bool needChildVisit(const ASTPtr &, const ASTPtr &) { return true; }
};
using RewriteSumIfFunctionVisitor = InDepthNodeVisitor<RewriteSumIfFunctionMatcher, false>;
}

View File

@ -22,7 +22,6 @@
#include <Interpreters/Context.h>
#include <Interpreters/ExternalDictionariesLoader.h>
#include <Interpreters/GatherFunctionQuantileVisitor.h>
#include <Interpreters/RewriteSumIfFunctionVisitor.h>
#include <Interpreters/RewriteArrayExistsFunctionVisitor.h>
#include <Interpreters/RewriteSumFunctionWithSumAndCountVisitor.h>
#include <Interpreters/OptimizeDateOrDateTimeConverterWithPreimageVisitor.h>
@ -516,12 +515,6 @@ void optimizeAggregationFunctions(ASTPtr & query)
ArithmeticOperationsInAgrFuncVisitor(data).visit(query);
}
void optimizeSumIfFunctions(ASTPtr & query)
{
RewriteSumIfFunctionVisitor::Data data = {};
RewriteSumIfFunctionVisitor(data).visit(query);
}
void optimizeArrayExistsFunctions(ASTPtr & query)
{
RewriteArrayExistsFunctionVisitor::Data data = {};
@ -682,9 +675,6 @@ void TreeOptimizer::apply(ASTPtr & query, TreeRewriterResult & result,
if (settings.optimize_normalize_count_variants)
optimizeCountConstantAndSumOne(query, context);
if (settings.optimize_rewrite_sum_if_to_count_if)
optimizeSumIfFunctions(query);
if (settings.optimize_rewrite_array_exists_to_has)
optimizeArrayExistsFunctions(query);

View File

@ -260,23 +260,31 @@ addStatusInfoToQueryLogElement(QueryLogElement & element, const QueryStatusInfo
/// We need to refresh the access info since dependent views might have added extra information, either during
/// creation of the view (PushingToViews chain) or while executing its internal SELECT
const auto & access_info = context_ptr->getQueryAccessInfo();
element.query_databases.insert(access_info.databases.begin(), access_info.databases.end());
element.query_tables.insert(access_info.tables.begin(), access_info.tables.end());
element.query_columns.insert(access_info.columns.begin(), access_info.columns.end());
element.query_partitions.insert(access_info.partitions.begin(), access_info.partitions.end());
element.query_projections.insert(access_info.projections.begin(), access_info.projections.end());
element.query_views.insert(access_info.views.begin(), access_info.views.end());
{
std::lock_guard lock(access_info.mutex);
element.query_databases.insert(access_info.databases.begin(), access_info.databases.end());
element.query_tables.insert(access_info.tables.begin(), access_info.tables.end());
element.query_columns.insert(access_info.columns.begin(), access_info.columns.end());
element.query_partitions.insert(access_info.partitions.begin(), access_info.partitions.end());
element.query_projections.insert(access_info.projections.begin(), access_info.projections.end());
element.query_views.insert(access_info.views.begin(), access_info.views.end());
}
const auto factories_info = context_ptr->getQueryFactoriesInfo();
element.used_aggregate_functions = factories_info.aggregate_functions;
element.used_aggregate_function_combinators = factories_info.aggregate_function_combinators;
element.used_database_engines = factories_info.database_engines;
element.used_data_type_families = factories_info.data_type_families;
element.used_dictionaries = factories_info.dictionaries;
element.used_formats = factories_info.formats;
element.used_functions = factories_info.functions;
element.used_storages = factories_info.storages;
element.used_table_functions = factories_info.table_functions;
/// We copy QueryFactoriesInfo for thread-safety, because it is possible that query context can be modified by some processor even
/// after query is finished
const auto & factories_info(context_ptr->getQueryFactoriesInfo());
{
std::lock_guard lock(factories_info.mutex);
element.used_aggregate_functions = factories_info.aggregate_functions;
element.used_aggregate_function_combinators = factories_info.aggregate_function_combinators;
element.used_database_engines = factories_info.database_engines;
element.used_data_type_families = factories_info.data_type_families;
element.used_dictionaries = factories_info.dictionaries;
element.used_formats = factories_info.formats;
element.used_functions = factories_info.functions;
element.used_storages = factories_info.storages;
element.used_table_functions = factories_info.table_functions;
}
element.async_read_counters = context_ptr->getAsyncReadCounters();
}
@ -325,6 +333,7 @@ QueryLogElement logQueryStart(
if (pipeline.initialized())
{
const auto & info = context->getQueryAccessInfo();
std::lock_guard lock(info.mutex);
elem.query_databases = info.databases;
elem.query_tables = info.tables;
elem.query_columns = info.columns;
@ -1248,7 +1257,12 @@ static std::tuple<ASTPtr, BlockIO> executeQueryImpl(
if (!settings.force_optimize_projection_name.value.empty())
{
bool found = false;
std::set<std::string> projections = context->getQueryAccessInfo().projections;
std::set<std::string> projections;
{
const auto & access_info = context->getQueryAccessInfo();
std::lock_guard lock(access_info.mutex);
projections = access_info.projections;
}
for (const auto &projection : projections)
{

View File

@ -173,6 +173,7 @@ void ASTSystemQuery::formatImpl(const FormatSettings & settings, FormatState & s
case Type::START_PULLING_REPLICATION_LOG:
case Type::STOP_CLEANUP:
case Type::START_CLEANUP:
case Type::UNLOAD_PRIMARY_KEY:
{
if (table)
{

View File

@ -101,6 +101,7 @@ public:
STOP_VIEWS,
CANCEL_VIEW,
TEST_VIEW,
UNLOAD_PRIMARY_KEY,
END
};

View File

@ -2085,28 +2085,28 @@ bool ParserOrderByElement::parseImpl(Pos & pos, ASTPtr & node, Expected & expect
int direction = 1;
if (descending.ignore(pos) || desc.ignore(pos))
if (descending.ignore(pos, expected) || desc.ignore(pos, expected))
direction = -1;
else
ascending.ignore(pos) || asc.ignore(pos);
ascending.ignore(pos, expected) || asc.ignore(pos, expected);
int nulls_direction = direction;
bool nulls_direction_was_explicitly_specified = false;
if (nulls.ignore(pos))
if (nulls.ignore(pos, expected))
{
nulls_direction_was_explicitly_specified = true;
if (first.ignore(pos))
if (first.ignore(pos, expected))
nulls_direction = -direction;
else if (last.ignore(pos))
else if (last.ignore(pos, expected))
;
else
return false;
}
ASTPtr locale_node;
if (collate.ignore(pos))
if (collate.ignore(pos, expected))
{
if (!collate_locale_parser.parse(pos, locale_node, expected))
return false;
@ -2117,16 +2117,16 @@ bool ParserOrderByElement::parseImpl(Pos & pos, ASTPtr & node, Expected & expect
ASTPtr fill_from;
ASTPtr fill_to;
ASTPtr fill_step;
if (with_fill.ignore(pos))
if (with_fill.ignore(pos, expected))
{
has_with_fill = true;
if (from.ignore(pos) && !exp_parser.parse(pos, fill_from, expected))
if (from.ignore(pos, expected) && !exp_parser.parse(pos, fill_from, expected))
return false;
if (to.ignore(pos) && !exp_parser.parse(pos, fill_to, expected))
if (to.ignore(pos, expected) && !exp_parser.parse(pos, fill_to, expected))
return false;
if (step.ignore(pos) && !exp_parser.parse(pos, fill_step, expected))
if (step.ignore(pos, expected) && !exp_parser.parse(pos, fill_step, expected))
return false;
}
@ -2254,27 +2254,27 @@ bool ParserTTLElement::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
DataDestinationType destination_type = DataDestinationType::DELETE;
String destination_name;
if (s_to_disk.ignore(pos))
if (s_to_disk.ignore(pos, expected))
{
mode = TTLMode::MOVE;
destination_type = DataDestinationType::DISK;
}
else if (s_to_volume.ignore(pos))
else if (s_to_volume.ignore(pos, expected))
{
mode = TTLMode::MOVE;
destination_type = DataDestinationType::VOLUME;
}
else if (s_group_by.ignore(pos))
else if (s_group_by.ignore(pos, expected))
{
mode = TTLMode::GROUP_BY;
}
else if (s_recompress.ignore(pos))
else if (s_recompress.ignore(pos, expected))
{
mode = TTLMode::RECOMPRESS;
}
else
{
s_delete.ignore(pos);
s_delete.ignore(pos, expected);
mode = TTLMode::DELETE;
}
@ -2286,7 +2286,7 @@ bool ParserTTLElement::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (mode == TTLMode::MOVE)
{
if (s_if_exists.ignore(pos))
if (s_if_exists.ignore(pos, expected))
if_exists = true;
ASTPtr ast_space_name;
@ -2300,7 +2300,7 @@ bool ParserTTLElement::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (!parser_keys_list.parse(pos, group_by_key, expected))
return false;
if (s_set.ignore(pos))
if (s_set.ignore(pos, expected))
{
ParserList parser_assignment_list(
std::make_unique<ParserAssignment>(), std::make_unique<ParserToken>(TokenType::Comma));
@ -2309,14 +2309,14 @@ bool ParserTTLElement::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
return false;
}
}
else if (mode == TTLMode::DELETE && s_where.ignore(pos))
else if (mode == TTLMode::DELETE && s_where.ignore(pos, expected))
{
if (!parser_exp.parse(pos, where_expr, expected))
return false;
}
else if (mode == TTLMode::RECOMPRESS)
{
if (!s_codec.ignore(pos))
if (!s_codec.ignore(pos, expected))
return false;
if (!parser_codec.parse(pos, recompression_codec, expected))

View File

@ -47,6 +47,7 @@ public:
protected:
const char * getName() const override { return "string literal table identifier"; }
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
Highlight highlight() const override { return Highlight::identifier; }
};

View File

@ -416,13 +416,13 @@ bool ParserKeyValuePair::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
ParserToken open(TokenType::OpeningRoundBracket);
ParserToken close(TokenType::ClosingRoundBracket);
if (!open.ignore(pos))
if (!open.ignore(pos, expected))
return false;
if (!kv_pairs_list.parse(pos, value, expected))
return false;
if (!close.ignore(pos))
if (!close.ignore(pos, expected))
return false;
with_brackets = true;

View File

@ -495,11 +495,11 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
command->type = ASTAlterCommand::MOVE_PARTITION;
command->part = true;
if (s_to_disk.ignore(pos))
if (s_to_disk.ignore(pos, expected))
command->move_destination_type = DataDestinationType::DISK;
else if (s_to_volume.ignore(pos))
else if (s_to_volume.ignore(pos, expected))
command->move_destination_type = DataDestinationType::VOLUME;
else if (s_to_shard.ignore(pos))
else if (s_to_shard.ignore(pos, expected))
{
command->move_destination_type = DataDestinationType::SHARD;
}
@ -519,11 +519,11 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
command->type = ASTAlterCommand::MOVE_PARTITION;
if (s_to_disk.ignore(pos))
if (s_to_disk.ignore(pos, expected))
command->move_destination_type = DataDestinationType::DISK;
else if (s_to_volume.ignore(pos))
else if (s_to_volume.ignore(pos, expected))
command->move_destination_type = DataDestinationType::VOLUME;
else if (s_to_table.ignore(pos))
else if (s_to_table.ignore(pos, expected))
{
if (!parseDatabaseAndTableName(pos, expected, command->to_database, command->to_table))
return false;
@ -584,7 +584,7 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected
if (!parser_partition.parse(pos, command_partition, expected))
return false;
if (s_from.ignore(pos))
if (s_from.ignore(pos, expected))
{
if (!parseDatabaseAndTableName(pos, expected, command->from_database, command->from_table))
return false;

View File

@ -66,13 +66,13 @@ bool ParserNestedTable::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (!name_p.parse(pos, name, expected))
return false;
if (!open.ignore(pos))
if (!open.ignore(pos, expected))
return false;
if (!columns_p.parse(pos, columns, expected))
return false;
if (!close.ignore(pos))
if (!close.ignore(pos, expected))
return false;
auto func = std::make_shared<ASTFunction>();

View File

@ -93,9 +93,9 @@ bool ParserDataType::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
else if (type_name_upper.find("INT") != std::string::npos)
{
/// Support SIGNED and UNSIGNED integer type modifiers for compatibility with MySQL
if (ParserKeyword(Keyword::SIGNED).ignore(pos))
if (ParserKeyword(Keyword::SIGNED).ignore(pos, expected))
type_name_suffix = toStringView(Keyword::SIGNED);
else if (ParserKeyword(Keyword::UNSIGNED).ignore(pos))
else if (ParserKeyword(Keyword::UNSIGNED).ignore(pos, expected))
type_name_suffix = toStringView(Keyword::UNSIGNED);
else if (pos->type == TokenType::OpeningRoundBracket)
{
@ -105,9 +105,9 @@ bool ParserDataType::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (pos->type != TokenType::ClosingRoundBracket)
return false;
++pos;
if (ParserKeyword(Keyword::SIGNED).ignore(pos))
if (ParserKeyword(Keyword::SIGNED).ignore(pos, expected))
type_name_suffix = toStringView(Keyword::SIGNED);
else if (ParserKeyword(Keyword::UNSIGNED).ignore(pos))
else if (ParserKeyword(Keyword::UNSIGNED).ignore(pos, expected))
type_name_suffix = toStringView(Keyword::UNSIGNED);
}

View File

@ -188,7 +188,7 @@ bool ParserDictionary::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
ASTPtr ast_settings;
/// Primary is required to be the first in dictionary definition
if (primary_key_keyword.ignore(pos))
if (primary_key_keyword.ignore(pos, expected))
{
bool was_open = false;
@ -208,13 +208,13 @@ bool ParserDictionary::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (!ast_source && source_keyword.ignore(pos, expected))
{
if (!open.ignore(pos))
if (!open.ignore(pos, expected))
return false;
if (!key_value_pairs_p.parse(pos, ast_source, expected))
return false;
if (!close.ignore(pos))
if (!close.ignore(pos, expected))
return false;
continue;
@ -222,13 +222,13 @@ bool ParserDictionary::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (!ast_lifetime && lifetime_keyword.ignore(pos, expected))
{
if (!open.ignore(pos))
if (!open.ignore(pos, expected))
return false;
if (!lifetime_p.parse(pos, ast_lifetime, expected))
return false;
if (!close.ignore(pos))
if (!close.ignore(pos, expected))
return false;
continue;
@ -236,13 +236,13 @@ bool ParserDictionary::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (!ast_layout && layout_keyword.ignore(pos, expected))
{
if (!open.ignore(pos))
if (!open.ignore(pos, expected))
return false;
if (!layout_p.parse(pos, ast_layout, expected))
return false;
if (!close.ignore(pos))
if (!close.ignore(pos, expected))
return false;
continue;
@ -250,13 +250,13 @@ bool ParserDictionary::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (!ast_range && range_keyword.ignore(pos, expected))
{
if (!open.ignore(pos))
if (!open.ignore(pos, expected))
return false;
if (!range_p.parse(pos, ast_range, expected))
return false;
if (!close.ignore(pos))
if (!close.ignore(pos, expected))
return false;
continue;
@ -264,13 +264,13 @@ bool ParserDictionary::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
if (!ast_settings && settings_keyword.ignore(pos, expected))
{
if (!open.ignore(pos))
if (!open.ignore(pos, expected))
return false;
if (!settings_p.parse(pos, ast_settings, expected))
return false;
if (!close.ignore(pos))
if (!close.ignore(pos, expected))
return false;
continue;

View File

@ -72,8 +72,7 @@ bool ParserRenameQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
else
return false;
const auto ignore_delim = [&] { return exchange ? s_and.ignore(pos) : s_to.ignore(pos); };
const auto ignore_delim = [&] { return exchange ? s_and.ignore(pos, expected) : s_to.ignore(pos, expected); };
ASTRenameQuery::Elements elements;

View File

@ -161,7 +161,7 @@ bool ParserShowTablesQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec
}
else
{
if (s_temporary.ignore(pos))
if (s_temporary.ignore(pos, expected))
query->temporary = true;
if (!s_tables.ignore(pos, expected))

View File

@ -323,6 +323,7 @@ bool ParserSystemQuery::parseImpl(IParser::Pos & pos, ASTPtr & node, Expected &
/// START/STOP DISTRIBUTED SENDS does not require table
case Type::STOP_DISTRIBUTED_SENDS:
case Type::START_DISTRIBUTED_SENDS:
case Type::UNLOAD_PRIMARY_KEY:
{
if (!parseQueryWithOnClusterAndMaybeTable(res, pos, expected, /* require table = */ false, /* allow_string_literal = */ false))
return false;

View File

@ -109,17 +109,17 @@ bool ParserArrayJoin::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
}
void ParserTablesInSelectQueryElement::parseJoinStrictness(Pos & pos, ASTTableJoin & table_join)
static void parseJoinStrictness(IParser::Pos & pos, ASTTableJoin & table_join, Expected & expected)
{
if (ParserKeyword(Keyword::ANY).ignore(pos))
if (ParserKeyword(Keyword::ANY).ignore(pos, expected))
table_join.strictness = JoinStrictness::Any;
else if (ParserKeyword(Keyword::ALL).ignore(pos))
else if (ParserKeyword(Keyword::ALL).ignore(pos, expected))
table_join.strictness = JoinStrictness::All;
else if (ParserKeyword(Keyword::ASOF).ignore(pos))
else if (ParserKeyword(Keyword::ASOF).ignore(pos, expected))
table_join.strictness = JoinStrictness::Asof;
else if (ParserKeyword(Keyword::SEMI).ignore(pos))
else if (ParserKeyword(Keyword::SEMI).ignore(pos, expected))
table_join.strictness = JoinStrictness::Semi;
else if (ParserKeyword(Keyword::ANTI).ignore(pos) || ParserKeyword(Keyword::ONLY).ignore(pos))
else if (ParserKeyword(Keyword::ANTI).ignore(pos, expected) || ParserKeyword(Keyword::ONLY).ignore(pos, expected))
table_join.strictness = JoinStrictness::Anti;
}
@ -146,41 +146,41 @@ bool ParserTablesInSelectQueryElement::parseImpl(Pos & pos, ASTPtr & node, Expec
}
else
{
if (ParserKeyword(Keyword::GLOBAL).ignore(pos))
if (ParserKeyword(Keyword::GLOBAL).ignore(pos, expected))
table_join->locality = JoinLocality::Global;
else if (ParserKeyword(Keyword::LOCAL).ignore(pos))
else if (ParserKeyword(Keyword::LOCAL).ignore(pos, expected))
table_join->locality = JoinLocality::Local;
table_join->strictness = JoinStrictness::Unspecified;
/// Legacy: allow JOIN type before JOIN kind
parseJoinStrictness(pos, *table_join);
parseJoinStrictness(pos, *table_join, expected);
bool no_kind = false;
if (ParserKeyword(Keyword::INNER).ignore(pos))
if (ParserKeyword(Keyword::INNER).ignore(pos, expected))
table_join->kind = JoinKind::Inner;
else if (ParserKeyword(Keyword::LEFT).ignore(pos))
else if (ParserKeyword(Keyword::LEFT).ignore(pos, expected))
table_join->kind = JoinKind::Left;
else if (ParserKeyword(Keyword::RIGHT).ignore(pos))
else if (ParserKeyword(Keyword::RIGHT).ignore(pos, expected))
table_join->kind = JoinKind::Right;
else if (ParserKeyword(Keyword::FULL).ignore(pos))
else if (ParserKeyword(Keyword::FULL).ignore(pos, expected))
table_join->kind = JoinKind::Full;
else if (ParserKeyword(Keyword::CROSS).ignore(pos))
else if (ParserKeyword(Keyword::CROSS).ignore(pos, expected))
table_join->kind = JoinKind::Cross;
else if (ParserKeyword(Keyword::PASTE).ignore(pos))
else if (ParserKeyword(Keyword::PASTE).ignore(pos, expected))
table_join->kind = JoinKind::Paste;
else
no_kind = true;
/// Standard position: JOIN type after JOIN kind
parseJoinStrictness(pos, *table_join);
parseJoinStrictness(pos, *table_join, expected);
/// Optional OUTER keyword for outer joins.
if (table_join->kind == JoinKind::Left
|| table_join->kind == JoinKind::Right
|| table_join->kind == JoinKind::Full)
{
ParserKeyword(Keyword::OUTER).ignore(pos);
ParserKeyword(Keyword::OUTER).ignore(pos, expected);
}
if (no_kind)

View File

@ -38,8 +38,6 @@ protected:
private:
bool is_first;
bool allow_alias_without_as_keyword;
static void parseJoinStrictness(Pos & pos, ASTTableJoin & table_join);
};

View File

@ -85,6 +85,8 @@ std::optional<AggregationAnalysisResult> analyzeAggregation(const QueryTreeNodeP
bool group_by_use_nulls = planner_context->getQueryContext()->getSettingsRef().group_by_use_nulls &&
(query_node.isGroupByWithGroupingSets() || query_node.isGroupByWithRollup() || query_node.isGroupByWithCube());
bool is_secondary_query = planner_context->getQueryContext()->getClientInfo().query_kind == ClientInfo::QueryKind::SECONDARY_QUERY;
if (query_node.hasGroupBy())
{
if (query_node.isGroupByWithGroupingSets())
@ -100,7 +102,7 @@ std::optional<AggregationAnalysisResult> analyzeAggregation(const QueryTreeNodeP
auto is_constant_key = grouping_set_key_node->as<ConstantNode>() != nullptr;
group_by_with_constant_keys |= is_constant_key;
if (is_constant_key && !aggregates_descriptions.empty())
if (!is_secondary_query && is_constant_key && !aggregates_descriptions.empty())
continue;
auto expression_dag_nodes = actions_visitor.visit(before_aggregation_actions, grouping_set_key_node);
@ -152,7 +154,7 @@ std::optional<AggregationAnalysisResult> analyzeAggregation(const QueryTreeNodeP
auto is_constant_key = group_by_key_node->as<ConstantNode>() != nullptr;
group_by_with_constant_keys |= is_constant_key;
if (is_constant_key && !aggregates_descriptions.empty())
if (!is_secondary_query && is_constant_key && !aggregates_descriptions.empty())
continue;
auto expression_dag_nodes = actions_visitor.visit(before_aggregation_actions, group_by_key_node);

View File

@ -131,7 +131,7 @@ std::shared_ptr<NumpyDataType> parseType(String type)
NumpyDataType::Endianness endianness;
if (type[0] == '<')
endianness = NumpyDataType::Endianness::LITTLE;
else if (type[1] == '>')
else if (type[0] == '>')
endianness = NumpyDataType::Endianness::BIG;
else if (type[0] == '|')
endianness = NumpyDataType::Endianness::NONE;

View File

@ -92,7 +92,7 @@ public:
/// Just heuristic. We need one thread for collecting, one thread for receiving chunks
/// and n threads for formatting.
processing_units.resize(params.max_threads_for_parallel_formatting + 2);
processing_units.resize(std::min(params.max_threads_for_parallel_formatting + 2, size_t{1024}));
/// Do not put any code that could throw an exception under this line.
/// Because otherwise the destructor of this class won't be called and this thread won't be joined.

View File

@ -47,16 +47,6 @@ void MergingSortedAlgorithm::addInput()
cursors.emplace_back();
}
static void prepareChunk(Chunk & chunk)
{
auto num_rows = chunk.getNumRows();
auto columns = chunk.detachColumns();
for (auto & column : columns)
column = column->convertToFullColumnIfConst();
chunk.setColumns(std::move(columns), num_rows);
}
void MergingSortedAlgorithm::initialize(Inputs inputs)
{
current_inputs = std::move(inputs);
@ -68,7 +58,7 @@ void MergingSortedAlgorithm::initialize(Inputs inputs)
if (!chunk)
continue;
prepareChunk(chunk);
convertToFullIfConst(chunk);
cursors[source_num] = SortCursorImpl(header, chunk.getColumns(), description, source_num);
}
@ -92,7 +82,7 @@ void MergingSortedAlgorithm::initialize(Inputs inputs)
void MergingSortedAlgorithm::consume(Input & input, size_t source_num)
{
prepareChunk(input.chunk);
convertToFullIfConst(input.chunk);
current_inputs[source_num].swap(input);
cursors[source_num].reset(current_inputs[source_num].chunk.getColumns(), header);

View File

@ -131,17 +131,17 @@ public:
/// Some suffix of index columns might not be loaded (see `primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns`)
/// and we need to use the same set of index columns across all parts.
for (const auto & part : parts)
loaded_columns = std::min(loaded_columns, part.data_part->getIndex().size());
loaded_columns = std::min(loaded_columns, part.data_part->getIndex()->size());
}
Values getValue(size_t part_idx, size_t mark) const
{
const auto & index = parts[part_idx].data_part->getIndex();
chassert(index.size() >= loaded_columns);
chassert(index->size() >= loaded_columns);
Values values(loaded_columns);
for (size_t i = 0; i < loaded_columns; ++i)
{
index[i]->get(mark, values[i]);
index->at(i)->get(mark, values[i]);
if (values[i].isNull())
values[i] = POSITIVE_INFINITY;
}

View File

@ -367,9 +367,9 @@ namespace
}
bool isMongoDBWireProtocolOld(Poco::MongoDB::Connection & connection_)
bool isMongoDBWireProtocolOld(Poco::MongoDB::Connection & connection_, const std::string & database_name_)
{
Poco::MongoDB::Database db("config");
Poco::MongoDB::Database db(database_name_);
Poco::MongoDB::Document::Ptr doc = db.queryServerHello(connection_, false);
if (doc->exists("maxWireVersion"))
@ -395,7 +395,7 @@ MongoDBCursor::MongoDBCursor(
const Block & sample_block_to_select,
const Poco::MongoDB::Document & query,
Poco::MongoDB::Connection & connection)
: is_wire_protocol_old(isMongoDBWireProtocolOld(connection))
: is_wire_protocol_old(isMongoDBWireProtocolOld(connection, database))
{
Poco::MongoDB::Document projection;

View File

@ -33,7 +33,7 @@ struct MongoDBArrayInfo
void authenticate(Poco::MongoDB::Connection & connection, const std::string & database, const std::string & user, const std::string & password);
bool isMongoDBWireProtocolOld(Poco::MongoDB::Connection & connection_);
bool isMongoDBWireProtocolOld(Poco::MongoDB::Connection & connection_, const std::string & database_name_);
class MongoDBCursor
{

View File

@ -39,6 +39,9 @@ MergeSorter::MergeSorter(const Block & header, Chunks chunks_, SortDescription &
/// which can be inefficient.
convertToFullIfSparse(chunk);
/// Convert to full column, because some cursors expect non-contant columns
convertToFullIfConst(chunk);
cursors.emplace_back(header, chunk.getColumns(), description, chunk_index);
has_collation |= cursors.back().has_collation;

View File

@ -326,6 +326,7 @@ IMergeTreeDataPart::IMergeTreeDataPart(
incrementStateMetric(state);
incrementTypeMetric(part_type);
index = std::make_shared<Columns>();
minmax_idx = std::make_shared<MinMaxIndex>();
initializeIndexGranularityInfo();
@ -339,7 +340,7 @@ IMergeTreeDataPart::~IMergeTreeDataPart()
}
const IMergeTreeDataPart::Index & IMergeTreeDataPart::getIndex() const
IMergeTreeDataPart::Index IMergeTreeDataPart::getIndex() const
{
std::scoped_lock lock(index_mutex);
if (!index_loaded)
@ -349,15 +350,21 @@ const IMergeTreeDataPart::Index & IMergeTreeDataPart::getIndex() const
}
void IMergeTreeDataPart::setIndex(Columns index_)
void IMergeTreeDataPart::setIndex(Index index_)
{
std::scoped_lock lock(index_mutex);
if (!index.empty())
if (!index->empty())
throw Exception(ErrorCodes::LOGICAL_ERROR, "The index of data part can be set only once");
index = std::move(index_);
index = index_;
index_loaded = true;
}
void IMergeTreeDataPart::unloadIndex()
{
std::scoped_lock lock(index_mutex);
index = std::make_shared<Columns>();
index_loaded = false;
}
void IMergeTreeDataPart::setName(const String & new_name)
{
@ -567,7 +574,7 @@ UInt64 IMergeTreeDataPart::getIndexSizeInBytes() const
{
std::scoped_lock lock(index_mutex);
UInt64 res = 0;
for (const ColumnPtr & column : index)
for (const ColumnPtr & column : *index)
res += column->byteSize();
return res;
}
@ -576,7 +583,7 @@ UInt64 IMergeTreeDataPart::getIndexSizeInAllocatedBytes() const
{
std::scoped_lock lock(index_mutex);
UInt64 res = 0;
for (const ColumnPtr & column : index)
for (const ColumnPtr & column : *index)
res += column->allocatedBytes();
return res;
}
@ -843,10 +850,6 @@ void IMergeTreeDataPart::loadIndex() const
/// Memory for index must not be accounted as memory usage for query, because it belongs to a table.
MemoryTrackerBlockerInThread temporarily_disable_memory_tracker;
/// It can be empty in case of mutations
if (!index_granularity.isInitialized())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Index granularity is not loaded before index loading");
auto metadata_snapshot = storage.getInMemoryMetadataPtr();
if (parent_part)
metadata_snapshot = metadata_snapshot->projections.get(name).metadata;
@ -910,7 +913,7 @@ void IMergeTreeDataPart::loadIndex() const
if (!index_file->eof())
throw Exception(ErrorCodes::EXPECTED_END_OF_FILE, "Index file {} is unexpectedly long", index_path);
index.assign(std::make_move_iterator(loaded_index.begin()), std::make_move_iterator(loaded_index.end()));
index->assign(std::make_move_iterator(loaded_index.begin()), std::make_move_iterator(loaded_index.end()));
}
}

View File

@ -79,7 +79,7 @@ public:
using ColumnSizeByName = std::unordered_map<std::string, ColumnSize>;
using NameToNumber = std::unordered_map<std::string, size_t>;
using Index = Columns;
using Index = std::shared_ptr<Columns>;
using IndexSizeByName = std::unordered_map<std::string, ColumnSize>;
using Type = MergeTreeDataPartType;
@ -367,8 +367,9 @@ public:
/// Version of part metadata (columns, pk and so on). Managed properly only for replicated merge tree.
int32_t metadata_version;
const Index & getIndex() const;
void setIndex(Columns index_);
Index getIndex() const;
void setIndex(Index index_);
void unloadIndex();
/// For data in RAM ('index')
UInt64 getIndexSizeInBytes() const;

View File

@ -2864,9 +2864,9 @@ bool KeyCondition::mayBeTrueInRange(
String KeyCondition::RPNElement::toString() const
{
if (argument_num_of_space_filling_curve)
return toString(fmt::format("argument {} of column {}", *argument_num_of_space_filling_curve, key_column), false);
return toString(fmt::format("argument {} of column {}", *argument_num_of_space_filling_curve, key_column), true);
else
return toString(fmt::format("column {}", key_column), false);
return toString(fmt::format("column {}", key_column), true);
}
String KeyCondition::RPNElement::toString(std::string_view column_name, bool print_constants) const

View File

@ -860,6 +860,42 @@ void MergeTreeData::checkTTLExpressions(const StorageInMemoryMetadata & new_meta
}
}
namespace
{
template <typename TMustHaveDataType>
void checkSpecialColumn(const std::string_view column_meta_name, const AlterCommand & command)
{
if (command.type == AlterCommand::MODIFY_COLUMN)
{
if (!typeid_cast<const TMustHaveDataType *>(command.data_type.get()))
{
throw Exception(
ErrorCodes::ALTER_OF_COLUMN_IS_FORBIDDEN,
"Cannot alter {} column ({}) to type {}, because it must have type {}",
column_meta_name,
command.column_name,
command.data_type->getName(),
TypeName<TMustHaveDataType>);
}
}
else if (command.type == AlterCommand::DROP_COLUMN)
{
throw Exception(
ErrorCodes::ALTER_OF_COLUMN_IS_FORBIDDEN,
"Trying to ALTER DROP {} ({}) column",
column_meta_name,
backQuoteIfNeed(command.column_name));
}
else if (command.type == AlterCommand::RENAME_COLUMN)
{
throw Exception(
ErrorCodes::ALTER_OF_COLUMN_IS_FORBIDDEN,
"Trying to ALTER RENAME {} ({}) column",
column_meta_name,
backQuoteIfNeed(command.column_name));
}
};
}
void MergeTreeData::checkStoragePolicy(const StoragePolicyPtr & new_storage_policy) const
{
@ -985,6 +1021,11 @@ void MergeTreeData::MergingParams::check(const StorageInMemoryMetadata & metadat
if (mode == MergingParams::Summing)
{
auto columns_to_sum_copy = columns_to_sum;
std::sort(columns_to_sum_copy.begin(), columns_to_sum_copy.end());
if (const auto it = std::adjacent_find(columns_to_sum_copy.begin(), columns_to_sum_copy.end()); it != columns_to_sum_copy.end())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Column {} is listed multiple times in the list of columns to sum", *it);
/// If columns_to_sum are set, then check that such columns exist.
for (const auto & column_to_sum : columns_to_sum)
{
@ -993,9 +1034,10 @@ void MergeTreeData::MergingParams::check(const StorageInMemoryMetadata & metadat
return column_to_sum == Nested::extractTableName(name_and_type.name);
};
if (columns.end() == std::find_if(columns.begin(), columns.end(), check_column_to_sum_exists))
throw Exception(ErrorCodes::NO_SUCH_COLUMN_IN_TABLE,
"Column {} listed in columns to sum does not exist in table declaration.",
column_to_sum);
throw Exception(
ErrorCodes::NO_SUCH_COLUMN_IN_TABLE,
"Column {} listed in columns to sum does not exist in table declaration.",
column_to_sum);
}
/// Check that summing columns are not in partition key.
@ -1016,12 +1058,18 @@ void MergeTreeData::MergingParams::check(const StorageInMemoryMetadata & metadat
if (mode == MergingParams::Replacing)
{
if (!version_column.empty() && version_column == is_deleted_column)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "The version and is_deleted column cannot be the same column ({})", version_column);
check_is_deleted_column(true, "ReplacingMergeTree");
check_version_column(true, "ReplacingMergeTree");
}
if (mode == MergingParams::VersionedCollapsing)
{
if (!version_column.empty() && version_column == sign_column)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "The version and sign column cannot be the same column ({})", version_column);
check_sign_column(false, "VersionedCollapsingMergeTree");
check_version_column(false, "VersionedCollapsingMergeTree");
}
@ -2964,6 +3012,10 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, Context
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED,
"Experimental Inverted Index feature is not enabled (turn on setting 'allow_experimental_inverted_index')");
for (const auto & disk : getDisks())
if (!disk->supportsHardLinks())
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "ALTER TABLE is not supported for immutable disk '{}'", disk->getName());
/// Set of columns that shouldn't be altered.
NameSet columns_alter_type_forbidden;
@ -3070,6 +3122,14 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, Context
"Trying to ALTER RENAME version {} column", backQuoteIfNeed(command.column_name));
}
}
else if (command.column_name == merging_params.is_deleted_column)
{
checkSpecialColumn<DataTypeUInt8>("is_deleted", command);
}
else if (command.column_name == merging_params.sign_column)
{
checkSpecialColumn<DataTypeUInt8>("sign", command);
}
if (command.type == AlterCommand::MODIFY_QUERY)
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
@ -3334,7 +3394,9 @@ void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, Context
void MergeTreeData::checkMutationIsPossible(const MutationCommands & /*commands*/, const Settings & /*settings*/) const
{
/// Some validation will be added
for (const auto & disk : getDisks())
if (!disk->supportsHardLinks())
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Mutations are not supported for immutable disk '{}'", disk->getName());
}
MergeTreeDataPartFormat MergeTreeData::choosePartFormat(size_t bytes_uncompressed, size_t rows_count) const
@ -4824,6 +4886,11 @@ void MergeTreeData::removePartContributionToColumnAndSecondaryIndexSizes(const D
void MergeTreeData::checkAlterPartitionIsPossible(
const PartitionCommands & commands, const StorageMetadataPtr & /*metadata_snapshot*/, const Settings & settings, ContextPtr local_context) const
{
for (const auto & disk : getDisks())
if (!disk->supportsHardLinks())
throw Exception(
ErrorCodes::SUPPORT_IS_DISABLED, "ALTER TABLE PARTITION is not supported for immutable disk '{}'", disk->getName());
for (const auto & command : commands)
{
if (command.type == PartitionCommand::DROP_DETACHED_PARTITION
@ -6811,7 +6878,7 @@ Block MergeTreeData::getMinMaxCountProjectionBlock(
{
for (const auto & part : real_parts)
{
const auto & primary_key_column = *part->getIndex()[0];
const auto & primary_key_column = *part->getIndex()->at(0);
auto & min_column = assert_cast<ColumnAggregateFunction &>(*partition_minmax_count_columns[pos]);
insert(min_column, primary_key_column[0]);
}
@ -6822,7 +6889,7 @@ Block MergeTreeData::getMinMaxCountProjectionBlock(
{
for (const auto & part : real_parts)
{
const auto & primary_key_column = *part->getIndex()[0];
const auto & primary_key_column = *part->getIndex()->at(0);
auto & max_column = assert_cast<ColumnAggregateFunction &>(*partition_minmax_count_columns[pos]);
insert(max_column, primary_key_column[primary_key_column.size() - 1]);
}
@ -8372,6 +8439,14 @@ bool MergeTreeData::initializeDiskOnConfigChange(const std::set<String> & new_ad
return true;
}
void MergeTreeData::unloadPrimaryKeys()
{
for (auto & part : getAllDataPartsVector())
{
const_cast<IMergeTreeDataPart &>(*part).unloadIndex();
}
}
bool updateAlterConversionsMutations(const MutationCommands & commands, std::atomic<ssize_t> & alter_conversions_mutations, bool remove)
{
for (const auto & command : commands)

View File

@ -1090,6 +1090,8 @@ public:
static VirtualColumnsDescription createVirtuals(const StorageInMemoryMetadata & metadata);
void unloadPrimaryKeys();
protected:
friend class IMergeTreeDataPart;
friend class MergeTreeDataMergerMutator;

View File

@ -1021,8 +1021,8 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
DataTypes key_types;
for (size_t i : key_indices)
{
if (i < index.size())
index_columns->emplace_back(index[i], primary_key.data_types[i], primary_key.column_names[i]);
if (i < index->size())
index_columns->emplace_back(index->at(i), primary_key.data_types[i], primary_key.column_names[i]);
else
index_columns->emplace_back(); /// The column of the primary key was not loaded in memory - we'll skip it.

View File

@ -181,7 +181,7 @@ MergedBlockOutputStream::Finalizer MergedBlockOutputStream::finalizePartAsync(
new_part->rows_count = rows_count;
new_part->modification_time = time(nullptr);
new_part->setIndex(writer->releaseIndexColumns());
new_part->setIndex(std::make_shared<Columns>(writer->releaseIndexColumns()));
new_part->checksums = checksums;
new_part->setBytesOnDisk(checksums.getTotalSizeOnDisk());
new_part->setBytesUncompressedOnDisk(checksums.getTotalSizeUncompressedOnDisk());

View File

@ -1392,7 +1392,7 @@ Chunk StorageFileSource::generate()
chassert(file_num > 0);
const auto max_parsing_threads = std::max<size_t>(settings.max_threads / file_num, 1UL);
const auto max_parsing_threads = std::max<size_t>(settings.max_parsing_threads / file_num, 1UL);
input_format = FormatFactory::instance().getInput(
storage->format_name, *read_buf, block_for_format, getContext(), max_block_size, storage->format_settings,
max_parsing_threads, std::nullopt, /*is_remote_fs*/ false, CompressionMethod::None, need_only_count);

View File

@ -82,9 +82,9 @@ protected:
/// Some of the columns from suffix of primary index may be not loaded
/// according to setting 'primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns'.
if (index_position < index.size())
if (index_position < index->size())
{
result_columns[pos] = index[index_position];
result_columns[pos] = index->at(index_position);
}
else
{

View File

@ -101,7 +101,7 @@ public:
, db_name(db_name_)
, metadata_snapshot{metadata_snapshot_}
, connection(connection_)
, is_wire_protocol_old(isMongoDBWireProtocolOld(*connection_))
, is_wire_protocol_old(isMongoDBWireProtocolOld(*connection_, db_name))
{
}

View File

@ -42,6 +42,8 @@ Poco::Net::StreamSocket StorageMongoDBSocketFactory::createSecureSocket(const st
Poco::Net::SocketAddress address(host, port);
Poco::Net::SecureStreamSocket socket;
socket.setPeerHostName(host);
socket.connect(address, connectTimeout);
return socket;

View File

@ -1239,8 +1239,8 @@ void ReadFromStorageS3Step::initializePipeline(QueryPipelineBuilder & pipeline,
/// Disclosed glob iterator can underestimate the amount of keys in some cases. We will keep one stream for this particular case.
num_streams = 1;
const size_t max_threads = context->getSettingsRef().max_threads;
const size_t max_parsing_threads = num_streams >= max_threads ? 1 : (max_threads / std::max(num_streams, 1ul));
const auto & settings = context->getSettingsRef();
const size_t max_parsing_threads = num_streams >= settings.max_parsing_threads ? 1 : (settings.max_parsing_threads / std::max(num_streams, 1ul));
LOG_DEBUG(getLogger("StorageS3"), "Reading in {} streams, {} threads per stream", num_streams, max_parsing_threads);
Pipes pipes;

View File

@ -1171,8 +1171,8 @@ void ReadFromURL::initializePipeline(QueryPipelineBuilder & pipeline, const Buil
Pipes pipes;
pipes.reserve(num_streams);
const size_t max_threads = context->getSettingsRef().max_threads;
const size_t max_parsing_threads = num_streams >= max_threads ? 1 : (max_threads / num_streams);
const auto & settings = context->getSettingsRef();
const size_t max_parsing_threads = num_streams >= settings.max_parsing_threads ? 1 : (settings.max_parsing_threads / num_streams);
for (size_t i = 0; i < num_streams; ++i)
{
@ -1203,7 +1203,7 @@ void ReadFromURL::initializePipeline(QueryPipelineBuilder & pipeline, const Buil
auto pipe = Pipe::unitePipes(std::move(pipes));
size_t output_ports = pipe.numOutputPorts();
const bool parallelize_output = context->getSettingsRef().parallelize_output_from_storages;
const bool parallelize_output = settings.parallelize_output_from_storages;
if (parallelize_output && storage->parallelizeOutputAfterReading(context) && output_ports > 0 && output_ports < max_num_streams)
pipe.resize(max_num_streams);

View File

@ -2574,7 +2574,7 @@ def reportLogStats(args):
count() AS count,
substr(replaceRegexpAll(message, '[^A-Za-z]+', ''), 1, 32) AS pattern,
substr(any(message), 1, 256) as runtime_message,
any((extract(source_file, '\/[a-zA-Z0-9_]+\.[a-z]+'), source_line)) as line
any((extract(source_file, '/[a-zA-Z0-9_]+\\.[a-z]+'), source_line)) as line
FROM system.text_log
WHERE (now() - toIntervalMinute(mins)) < event_time AND message_format_string = ''
GROUP BY pattern

View File

@ -185,7 +185,7 @@ def test_grant_all_on_table():
instance.query("GRANT ALL ON test.table TO B", user="A")
assert (
instance.query("SHOW GRANTS FOR B")
== "GRANT SHOW TABLES, SHOW COLUMNS, SHOW DICTIONARIES, SELECT, INSERT, ALTER TABLE, ALTER VIEW, CREATE TABLE, CREATE VIEW, CREATE DICTIONARY, DROP TABLE, DROP VIEW, DROP DICTIONARY, UNDROP TABLE, TRUNCATE, OPTIMIZE, BACKUP, CREATE ROW POLICY, ALTER ROW POLICY, DROP ROW POLICY, SHOW ROW POLICIES, SYSTEM MERGES, SYSTEM TTL MERGES, SYSTEM FETCHES, SYSTEM MOVES, SYSTEM PULLING REPLICATION LOG, SYSTEM CLEANUP, SYSTEM VIEWS, SYSTEM SENDS, SYSTEM REPLICATION QUEUES, SYSTEM VIRTUAL PARTS UPDATE, SYSTEM DROP REPLICA, SYSTEM SYNC REPLICA, SYSTEM RESTART REPLICA, SYSTEM RESTORE REPLICA, SYSTEM WAIT LOADING PARTS, SYSTEM FLUSH DISTRIBUTED, dictGet ON test.`table` TO B\n"
== "GRANT SHOW TABLES, SHOW COLUMNS, SHOW DICTIONARIES, SELECT, INSERT, ALTER TABLE, ALTER VIEW, CREATE TABLE, CREATE VIEW, CREATE DICTIONARY, DROP TABLE, DROP VIEW, DROP DICTIONARY, UNDROP TABLE, TRUNCATE, OPTIMIZE, BACKUP, CREATE ROW POLICY, ALTER ROW POLICY, DROP ROW POLICY, SHOW ROW POLICIES, SYSTEM MERGES, SYSTEM TTL MERGES, SYSTEM FETCHES, SYSTEM MOVES, SYSTEM PULLING REPLICATION LOG, SYSTEM CLEANUP, SYSTEM VIEWS, SYSTEM SENDS, SYSTEM REPLICATION QUEUES, SYSTEM VIRTUAL PARTS UPDATE, SYSTEM DROP REPLICA, SYSTEM SYNC REPLICA, SYSTEM RESTART REPLICA, SYSTEM RESTORE REPLICA, SYSTEM WAIT LOADING PARTS, SYSTEM FLUSH DISTRIBUTED, SYSTEM UNLOAD PRIMARY KEY, dictGet ON test.`table` TO B\n"
)
instance.query("REVOKE ALL ON test.table FROM B", user="A")
assert instance.query("SHOW GRANTS FOR B") == ""

Some files were not shown because too many files have changed in this diff Show More