diff --git a/CHANGELOG.md b/CHANGELOG.md index 6c0d21a4698..90285582b4e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,5 @@ ### Table of Contents +**[ClickHouse release v24.10, 2024-10-31](#2410)**
**[ClickHouse release v24.9, 2024-09-26](#249)**
**[ClickHouse release v24.8 LTS, 2024-08-20](#248)**
**[ClickHouse release v24.7, 2024-07-30](#247)**
@@ -12,6 +13,165 @@ # 2024 Changelog +### ClickHouse release 24.10, 2024-10-31 + +#### Backward Incompatible Change +* Allow to write `SETTINGS` before `FORMAT` in a chain of queries with `UNION` when subqueries are inside parentheses. This closes [#39712](https://github.com/ClickHouse/ClickHouse/issues/39712). Change the behavior when a query has the SETTINGS clause specified twice in a sequence. The closest SETTINGS clause will have a preference for the corresponding subquery. In the previous versions, the outermost SETTINGS clause could take a preference over the inner one. [#68614](https://github.com/ClickHouse/ClickHouse/pull/68614) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Reordering of filter conditions from `[PRE]WHERE` clause is now allowed by default. It could be disabled by setting `allow_reorder_prewhere_conditions` to `false`. [#70657](https://github.com/ClickHouse/ClickHouse/pull/70657) ([Nikita Taranov](https://github.com/nickitat)). +* Remove the `idxd-config` library, which has an incompatible license. This also removes the experimental Intel DeflateQPL codec. [#70987](https://github.com/ClickHouse/ClickHouse/pull/70987) ([Alexey Milovidov](https://github.com/alexey-milovidov)). + +#### New Feature +* Allow to grant access to the wildcard prefixes. `GRANT SELECT ON db.table_pefix_* TO user`. [#65311](https://github.com/ClickHouse/ClickHouse/pull/65311) ([pufit](https://github.com/pufit)). +* If you press space bar during query runtime, the client will display a real-time table with detailed metrics. You can enable it globally with the new `--progress-table` option in clickhouse-client; a new `--enable-progress-table-toggle` is associated with the `--progress-table` option, and toggles the rendering of the progress table by pressing the control key (Space). [#63689](https://github.com/ClickHouse/ClickHouse/pull/63689) ([Maria Khristenko](https://github.com/mariaKhr)), [#70423](https://github.com/ClickHouse/ClickHouse/pull/70423) ([Julia Kartseva](https://github.com/jkartseva)). +* Allow to cache read files for object storage table engines and data lakes using hash from ETag + file path as cache key. [#70135](https://github.com/ClickHouse/ClickHouse/pull/70135) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Support creating a table with a query: `CREATE TABLE ... CLONE AS ...`. It clones the source table's schema and then attaches all partitions to the newly created table. This feature is only supported with tables of the `MergeTree` family Closes [#65015](https://github.com/ClickHouse/ClickHouse/issues/65015). [#69091](https://github.com/ClickHouse/ClickHouse/pull/69091) ([tuanpach](https://github.com/tuanpach)). +* Add a new system table, `system.query_metric_log` which contains history of memory and metric values from table system.events for individual queries, periodically flushed to disk. [#66532](https://github.com/ClickHouse/ClickHouse/pull/66532) ([Pablo Marcos](https://github.com/pamarcos)). +* A simple SELECT query can be written with implicit SELECT to enable calculator-style expressions, e.g., `ch "1 + 2"`. This is controlled by a new setting, `implicit_select`. [#68502](https://github.com/ClickHouse/ClickHouse/pull/68502) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Support the `--copy` mode for clickhouse local as a shortcut for format conversion [#68503](https://github.com/ClickHouse/ClickHouse/issues/68503). [#68583](https://github.com/ClickHouse/ClickHouse/pull/68583) ([Denis Hananein](https://github.com/denis-hananein)). +* Add a builtin HTML page for visualizing merges which is available at the `/merges` path. [#70821](https://github.com/ClickHouse/ClickHouse/pull/70821) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add support for `arrayUnion` function. [#68989](https://github.com/ClickHouse/ClickHouse/pull/68989) ([Peter Nguyen](https://github.com/petern48)). +* Allow parametrised SQL aliases. [#50665](https://github.com/ClickHouse/ClickHouse/pull/50665) ([Anton Kozlov](https://github.com/tonickkozlov)). +* A new aggregate function `quantileExactWeightedInterpolated`, which is a interpolated version based on quantileExactWeighted. Some people may wonder why we need a new `quantileExactWeightedInterpolated` since we already have `quantileExactInterpolatedWeighted`. The reason is the new one is more accurate than the old one. This is for spark compatibility. [#69619](https://github.com/ClickHouse/ClickHouse/pull/69619) ([李扬](https://github.com/taiyang-li)). +* A new function `arrayElementOrNull`. It returns `NULL` if the array index is out of range or a Map key not found. [#69646](https://github.com/ClickHouse/ClickHouse/pull/69646) ([李扬](https://github.com/taiyang-li)). +* Allows users to specify regular expressions through new `message_regexp` and `message_regexp_negative` fields in the `config.xml` file to filter out logging. The logging is applied to the formatted un-colored text for the most intuitive developer experience. [#69657](https://github.com/ClickHouse/ClickHouse/pull/69657) ([Peter Nguyen](https://github.com/petern48)). +* Added `RIPEMD160` function, which computes the RIPEMD-160 cryptographic hash of a string. Example: `SELECT HEX(RIPEMD160('The quick brown fox jumps over the lazy dog'))` returns `37F332F68DB77BD9D7EDD4969571AD671CF9DD3B`. [#70087](https://github.com/ClickHouse/ClickHouse/pull/70087) ([Dergousov Maxim](https://github.com/m7kss1)). +* Support reading `Iceberg` tables on `HDFS`. [#70268](https://github.com/ClickHouse/ClickHouse/pull/70268) ([flynn](https://github.com/ucasfl)). +* Support for CTE in the form of `WITH ... INSERT`, as previously we only supported `INSERT ... WITH ...`. [#70593](https://github.com/ClickHouse/ClickHouse/pull/70593) ([Shichao Jin](https://github.com/jsc0218)). +* MongoDB integration: support for all MongoDB types, support for WHERE and ORDER BY statements on MongoDB side, restriction for expressions unsupported by MongoDB. Note that the new inegration is disabled by default, to use it, please set `` to `false` in server config. [#63279](https://github.com/ClickHouse/ClickHouse/pull/63279) ([Kirill Nikiforov](https://github.com/allmazz)). +* A new function `getSettingOrDefault` added to return the default value and avoid exception if a custom setting is not found in the current profile. [#69917](https://github.com/ClickHouse/ClickHouse/pull/69917) ([Shankar](https://github.com/shiyer7474)). + +#### Experimental feature +* Refreshable materialized views are production ready. [#70550](https://github.com/ClickHouse/ClickHouse/pull/70550) ([Michael Kolupaev](https://github.com/al13n321)). Refreshable materialized views are now supported in Replicated databases. [#60669](https://github.com/ClickHouse/ClickHouse/pull/60669) ([Michael Kolupaev](https://github.com/al13n321)). +* Parallel replicas are moved from experimental to beta. Reworked settings that control the behavior of parallel replicas algorithms. A quick recap: ClickHouse has four different algorithms for parallel reading involving multiple replicas, which is reflected in the setting `parallel_replicas_mode`, the default value for it is `read_tasks` Additionally, the toggle-switch setting `enable_parallel_replicas` has been added. [#63151](https://github.com/ClickHouse/ClickHouse/pull/63151) ([Alexey Milovidov](https://github.com/alexey-milovidov)), ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Support for the `Dynamic` type in most functions by executing them on internal types inside `Dynamic`. [#69691](https://github.com/ClickHouse/ClickHouse/pull/69691) ([Pavel Kruglov](https://github.com/Avogar)). +* Allow to read/write the `JSON` type as a binary string in `RowBinary` format under settings `input_format_binary_read_json_as_string/output_format_binary_write_json_as_string`. [#70288](https://github.com/ClickHouse/ClickHouse/pull/70288) ([Pavel Kruglov](https://github.com/Avogar)). +* Allow to serialize/deserialize `JSON` column as single String column in the Native format. For output use setting `output_format_native_write_json_as_string`. For input, use serialization version `1` before the column data. [#70312](https://github.com/ClickHouse/ClickHouse/pull/70312) ([Pavel Kruglov](https://github.com/Avogar)). +* Introduced a special (experimental) mode of a merge selector for MergeTree tables which makes it more aggressive for the partitions that are close to the limit by the number of parts. It is controlled by the `merge_selector_use_blurry_base` MergeTree-level setting. [#70645](https://github.com/ClickHouse/ClickHouse/pull/70645) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Implement generic ser/de between Avro's `Union` and ClickHouse's `Variant` types. Resolves [#69713](https://github.com/ClickHouse/ClickHouse/issues/69713). [#69712](https://github.com/ClickHouse/ClickHouse/pull/69712) ([Jiří Kozlovský](https://github.com/jirislav)). + +#### Performance Improvement +* Refactor `IDisk` and `IObjectStorage` for better performance. Tables from `plain` and `plain_rewritable` object storages will initialize faster. [#68146](https://github.com/ClickHouse/ClickHouse/pull/68146) ([Alexey Milovidov](https://github.com/alexey-milovidov), [Julia Kartseva](https://github.com/jkartseva)). Do not call the LIST object storage API when determining if a file or directory exists on the plain rewritable disk, as it can be cost-inefficient. [#70852](https://github.com/ClickHouse/ClickHouse/pull/70852) ([Julia Kartseva](https://github.com/jkartseva)). Reduce the number of object storage HEAD API requests in the plain_rewritable disk. [#70915](https://github.com/ClickHouse/ClickHouse/pull/70915) ([Julia Kartseva](https://github.com/jkartseva)). +* Added an ability to parse data directly into sparse columns. [#69828](https://github.com/ClickHouse/ClickHouse/pull/69828) ([Anton Popov](https://github.com/CurtizJ)). +* Improved performance of parsing formats with high number of missed values (e.g. `JSONEachRow`). [#69875](https://github.com/ClickHouse/ClickHouse/pull/69875) ([Anton Popov](https://github.com/CurtizJ)). +* Supports parallel reading of parquet row groups and prefetching of row groups in single-threaded mode. [#69862](https://github.com/ClickHouse/ClickHouse/pull/69862) ([LiuNeng](https://github.com/liuneng1994)). +* Support minmax index for `pointInPolygon`. [#62085](https://github.com/ClickHouse/ClickHouse/pull/62085) ([JackyWoo](https://github.com/JackyWoo)). +* Use bloom filters when reading Parquet files. [#62966](https://github.com/ClickHouse/ClickHouse/pull/62966) ([Arthur Passos](https://github.com/arthurpassos)). +* Lock-free parts rename to avoid INSERT affect SELECT (due to parts lock) (under normal circumstances with `fsync_part_directory`, QPS of SELECT with INSERT in parallel, increased 2x, under heavy load the effect is even bigger). Note, this only includes `ReplicatedMergeTree` for now. [#64955](https://github.com/ClickHouse/ClickHouse/pull/64955) ([Azat Khuzhin](https://github.com/azat)). +* Respect `ttl_only_drop_parts` on `materialize ttl`; only read necessary columns to recalculate TTL and drop parts by replacing them with an empty one. [#65488](https://github.com/ClickHouse/ClickHouse/pull/65488) ([Andrey Zvonov](https://github.com/zvonand)). +* Optimized thread creation in the ThreadPool to minimize lock contention. Thread creation is now performed outside of the critical section to avoid delays in job scheduling and thread management under high load conditions. This leads to a much more responsive ClickHouse under heavy concurrent load. [#68694](https://github.com/ClickHouse/ClickHouse/pull/68694) ([filimonov](https://github.com/filimonov)). +* Enable reading `LowCardinality` string columns from `ORC`. [#69481](https://github.com/ClickHouse/ClickHouse/pull/69481) ([李扬](https://github.com/taiyang-li)). +* Use `LowCardinality` for `ProfileEvents` in system logs such as `part_log`, `query_views_log`, `filesystem_cache_log`. [#70152](https://github.com/ClickHouse/ClickHouse/pull/70152) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Improve performance of `fromUnixTimestamp`/`toUnixTimestamp` functions. [#71042](https://github.com/ClickHouse/ClickHouse/pull/71042) ([kevinyhzou](https://github.com/KevinyhZou)). +* Don't disable nonblocking read from page cache for the entire server when reading from a blocking I/O. This was leading to a poorer performance when a single filesystem (e.g., tmpfs) didn't support the `preadv2` syscall while others do. [#70299](https://github.com/ClickHouse/ClickHouse/pull/70299) ([Antonio Andelic](https://github.com/antonio2368)). +* `ALTER TABLE .. REPLACE PARTITION` doesn't wait anymore for mutations/merges that happen in other partitions. [#59138](https://github.com/ClickHouse/ClickHouse/pull/59138) ([Vasily Nemkov](https://github.com/Enmk)). +* Don't do validation when synchronizing ACL from Keeper. It's validating during creation. It shouldn't matter that much, but there are installations with tens of thousands or even more user created, and the unnecessary hash validation can take a long time to finish during server startup (it synchronizes everything from keeper). [#70644](https://github.com/ClickHouse/ClickHouse/pull/70644) ([Raúl Marín](https://github.com/Algunenano)). + +#### Improvement +* `CREATE TABLE AS` will copy `PRIMARY KEY`, `ORDER BY`, and similar clauses (of `MergeTree` tables). [#69739](https://github.com/ClickHouse/ClickHouse/pull/69739) ([sakulali](https://github.com/sakulali)). +* Support 64-bit XID in Keeper. It can be enabled with the `use_xid_64` configuration value. [#69908](https://github.com/ClickHouse/ClickHouse/pull/69908) ([Antonio Andelic](https://github.com/antonio2368)). +* Command-line arguments for Bool settings are set to true when no value is provided for the argument (e.g. `clickhouse-client --optimize_aggregation_in_order --query "SELECT 1"`). [#70459](https://github.com/ClickHouse/ClickHouse/pull/70459) ([davidtsuk](https://github.com/davidtsuk)). +* Added user-level settings `min_free_disk_bytes_to_throw_insert` and `min_free_disk_ratio_to_throw_insert` to prevent insertions on disks that are almost full. [#69755](https://github.com/ClickHouse/ClickHouse/pull/69755) ([Marco Vilas Boas](https://github.com/marco-vb)). +* Embedded documentation for settings will be strictly more detailed and complete than the documentation on the website. This is the first step before making the website documentation always auto-generated from the source code. This has long-standing implications: - it will be guaranteed to have every setting; - there is no chance of having default values obsolete; - we can generate this documentation for each ClickHouse version; - the documentation can be displayed by the server itself even without Internet access. Generate the docs on the website from the source code. [#70289](https://github.com/ClickHouse/ClickHouse/pull/70289) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Allow empty needle in the function `replace`, the same behavior with PostgreSQL. [#69918](https://github.com/ClickHouse/ClickHouse/pull/69918) ([zhanglistar](https://github.com/zhanglistar)). +* Allow empty needle in functions `replaceRegexp*`. [#70053](https://github.com/ClickHouse/ClickHouse/pull/70053) ([zhanglistar](https://github.com/zhanglistar)). +* Symbolic links for tables in the `data/database_name/` directory are created for the actual paths to the table's data, depending on the storage policy, instead of the `store/...` directory on the default disk. [#61777](https://github.com/ClickHouse/ClickHouse/pull/61777) ([Kirill](https://github.com/kirillgarbar)). +* While parsing an `Enum` field from `JSON`, a string containing an integer will be interpreted as the corresponding `Enum` element. This closes [#65119](https://github.com/ClickHouse/ClickHouse/issues/65119). [#66801](https://github.com/ClickHouse/ClickHouse/pull/66801) ([scanhex12](https://github.com/scanhex12)). +* Allow `TRIM` -ing `LEADING` or `TRAILING` empty string as a no-op. Closes [#67792](https://github.com/ClickHouse/ClickHouse/issues/67792). [#68455](https://github.com/ClickHouse/ClickHouse/pull/68455) ([Peter Nguyen](https://github.com/petern48)). +* Improve compatibility of `cast(timestamp as String)` with Spark. [#69179](https://github.com/ClickHouse/ClickHouse/pull/69179) ([Wenzheng Liu](https://github.com/lwz9103)). +* Always use the new analyzer to calculate constant expressions when `enable_analyzer` is set to `true`. Support calculation of `executable` table function arguments without using `SELECT` query for constant expressions. [#69292](https://github.com/ClickHouse/ClickHouse/pull/69292) ([Dmitry Novik](https://github.com/novikd)). +* Add a setting `enable_secure_identifiers` to disallow identifiers with special characters. [#69411](https://github.com/ClickHouse/ClickHouse/pull/69411) ([tuanpach](https://github.com/tuanpach)). +* Add `show_create_query_identifier_quoting_rule` to define identifier quoting behavior in the `SHOW CREATE TABLE` query result. Possible values: - `user_display`: When the identifiers is a keyword. - `when_necessary`: When the identifiers is one of `{"distinct", "all", "table"}` and when it could lead to ambiguity: column names, dictionary attribute names. - `always`: Always quote identifiers. [#69448](https://github.com/ClickHouse/ClickHouse/pull/69448) ([tuanpach](https://github.com/tuanpach)). +* Improve restoring of access entities' dependencies [#69563](https://github.com/ClickHouse/ClickHouse/pull/69563) ([Vitaly Baranov](https://github.com/vitlibar)). +* If you run `clickhouse-client` or other CLI application, and it starts up slowly due to an overloaded server, and you start typing your query, such as `SELECT`, the previous versions will display the remaining of the terminal echo contents before printing the greetings message, such as `SELECTClickHouse local version 24.10.1.1.` instead of `ClickHouse local version 24.10.1.1.`. Now it is fixed. This closes [#31696](https://github.com/ClickHouse/ClickHouse/issues/31696). [#69856](https://github.com/ClickHouse/ClickHouse/pull/69856) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add new column `readonly_duration` to the `system.replicas` table. Needed to be able to distinguish actual readonly replicas from sentinel ones in alerts. [#69871](https://github.com/ClickHouse/ClickHouse/pull/69871) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)). +* Change the type of `join_output_by_rowlist_perkey_rows_threshold` setting type to unsigned integer. [#69886](https://github.com/ClickHouse/ClickHouse/pull/69886) ([kevinyhzou](https://github.com/KevinyhZou)). +* Enhance OpenTelemetry span logging to include query settings. [#70011](https://github.com/ClickHouse/ClickHouse/pull/70011) ([sharathks118](https://github.com/sharathks118)). +* Add diagnostic info about higher-order array functions if lambda result type is unexpected. [#70093](https://github.com/ClickHouse/ClickHouse/pull/70093) ([ttanay](https://github.com/ttanay)). +* Keeper improvement: less locking during cluster changes. [#70275](https://github.com/ClickHouse/ClickHouse/pull/70275) ([Antonio Andelic](https://github.com/antonio2368)). +* Add `WITH IMPLICIT` and `FINAL` keywords to the `SHOW GRANTS` command. Fix a minor bug with implicit grants: [#70094](https://github.com/ClickHouse/ClickHouse/issues/70094). [#70293](https://github.com/ClickHouse/ClickHouse/pull/70293) ([pufit](https://github.com/pufit)). +* Respect `compatibility` for MergeTree settings. The `compatibility` value is taken from the `default` profile on server startup, and default MergeTree settings are changed accordingly. Further changes of the `compatibility` setting do not affect MergeTree settings. [#70322](https://github.com/ClickHouse/ClickHouse/pull/70322) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Avoid spamming the logs with large HTTP response bodies in case of errors during inter-server communication. [#70487](https://github.com/ClickHouse/ClickHouse/pull/70487) ([Vladimir Cherkasov](https://github.com/vdimir)). +* Added a new setting `max_parts_to_move` to control the maximum number of parts that can be moved at once. [#70520](https://github.com/ClickHouse/ClickHouse/pull/70520) ([Vladimir Cherkasov](https://github.com/vdimir)). +* Limit the frequency of certain log messages. [#70601](https://github.com/ClickHouse/ClickHouse/pull/70601) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* `CHECK TABLE` with `PART` qualifier was incorrectly formatted in the client. [#70660](https://github.com/ClickHouse/ClickHouse/pull/70660) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Support writing the column index and the offset index using parquet native writer. [#70669](https://github.com/ClickHouse/ClickHouse/pull/70669) ([LiuNeng](https://github.com/liuneng1994)). +* Support parsing `DateTime64` for microsecond and timezone in joda syntax ("joda" is a popular Java library for date and time, and the "joda syntax" is that library's style). [#70737](https://github.com/ClickHouse/ClickHouse/pull/70737) ([kevinyhzou](https://github.com/KevinyhZou)). +* Changed an approach to figure out if a cloud storage supports [batch delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) or not. [#70786](https://github.com/ClickHouse/ClickHouse/pull/70786) ([Vitaly Baranov](https://github.com/vitlibar)). +* Support for Parquet page v2 in the native reader. [#70807](https://github.com/ClickHouse/ClickHouse/pull/70807) ([Arthur Passos](https://github.com/arthurpassos)). +* A check if table has both `storage_policy` and `disk` set. A check if a new storage policy is compatible with an old one when using `disk` setting is added. [#70839](https://github.com/ClickHouse/ClickHouse/pull/70839) ([Kirill](https://github.com/kirillgarbar)). +* Add `system.s3_queue_settings` and `system.azure_queue_settings`. [#70841](https://github.com/ClickHouse/ClickHouse/pull/70841) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Functions `base58Encode` and `base58Decode` now accept arguments of type `FixedString`. Example: `SELECT base58Encode(toFixedString('plaintext', 9));`. [#70846](https://github.com/ClickHouse/ClickHouse/pull/70846) ([Faizan Patel](https://github.com/faizan2786)). +* Add the `partition` column to every entry type of the part log. Previously, it was set only for some entries. This closes [#70819](https://github.com/ClickHouse/ClickHouse/issues/70819). [#70848](https://github.com/ClickHouse/ClickHouse/pull/70848) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add `MergeStart` and `MutateStart` events into `system.part_log` which helps with merges analysis and visualization. [#70850](https://github.com/ClickHouse/ClickHouse/pull/70850) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add a profile event about the number of merged source parts. It allows the monitoring of the fanout of the merge tree in production. [#70908](https://github.com/ClickHouse/ClickHouse/pull/70908) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Background downloads to the filesystem cache were enabled back. [#70929](https://github.com/ClickHouse/ClickHouse/pull/70929) ([Nikita Taranov](https://github.com/nickitat)). +* Add a new merge selector algorithm, named `Trivial`, for professional usage only. It is worse than the `Simple` merge selector. [#70969](https://github.com/ClickHouse/ClickHouse/pull/70969) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Support for atomic `CREATE OR REPLACE VIEW`. [#70536](https://github.com/ClickHouse/ClickHouse/pull/70536) ([tuanpach](https://github.com/tuanpach)) +* Added `strict_once` mode to aggregate function `windowFunnel` to avoid counting one event several times in case it matches multiple conditions, close [#21835](https://github.com/ClickHouse/ClickHouse/issues/21835). [#69738](https://github.com/ClickHouse/ClickHouse/pull/69738) ([Vladimir Cherkasov](https://github.com/vdimir)). + +#### Bug Fix (user-visible misbehavior in an official stable release) +* Apply configuration updates in global context object. It fixes issues like [#62308](https://github.com/ClickHouse/ClickHouse/issues/62308). [#62944](https://github.com/ClickHouse/ClickHouse/pull/62944) ([Amos Bird](https://github.com/amosbird)). +* Fix `ReadSettings` not using user set values, because defaults were only used. [#65625](https://github.com/ClickHouse/ClickHouse/pull/65625) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix type mismatch issue in `sumMapFiltered` when using signed arguments. [#58408](https://github.com/ClickHouse/ClickHouse/pull/58408) ([Chen768959](https://github.com/Chen768959)). +* Fix toHour-like conversion functions' monotonicity when optional time zone argument is passed. [#60264](https://github.com/ClickHouse/ClickHouse/pull/60264) ([Amos Bird](https://github.com/amosbird)). +* Relax `supportsPrewhere` check for `Merge` tables. This fixes [#61064](https://github.com/ClickHouse/ClickHouse/issues/61064). It was hardened unnecessarily in [#60082](https://github.com/ClickHouse/ClickHouse/issues/60082). [#61091](https://github.com/ClickHouse/ClickHouse/pull/61091) ([Amos Bird](https://github.com/amosbird)). +* Fix `use_concurrency_control` setting handling for proper `concurrent_threads_soft_limit_num` limit enforcing. This enables concurrency control by default because previously it was broken. [#61473](https://github.com/ClickHouse/ClickHouse/pull/61473) ([Sergei Trifonov](https://github.com/serxa)). +* Fix incorrect `JOIN ON` section optimization in case of `IS NULL` check under any other function (like `NOT`) that may lead to wrong results. Closes [#67915](https://github.com/ClickHouse/ClickHouse/issues/67915). [#68049](https://github.com/ClickHouse/ClickHouse/pull/68049) ([Vladimir Cherkasov](https://github.com/vdimir)). +* Prevent `ALTER` queries that would make the `CREATE` query of tables invalid. [#68574](https://github.com/ClickHouse/ClickHouse/pull/68574) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Fix inconsistent AST formatting for `negate` (`-`) and `NOT` functions with tuples and arrays. [#68600](https://github.com/ClickHouse/ClickHouse/pull/68600) ([Vladimir Cherkasov](https://github.com/vdimir)). +* Fix insertion of incomplete type into `Dynamic` during deserialization. It could lead to `Parameter out of bound` errors. [#69291](https://github.com/ClickHouse/ClickHouse/pull/69291) ([Pavel Kruglov](https://github.com/Avogar)). +* Zero-copy replication, which is experimental and should not be used in production: fix inf loop after `restore replica` in the replicated merge tree with zero copy. [#69293](https://github.com/CljmnickHouse/ClickHouse/pull/69293) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Return back default value of `processing_threads_num` as number of cpu cores in storage `S3Queue`. [#69384](https://github.com/ClickHouse/ClickHouse/pull/69384) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Bypass try/catch flow when de/serializing nested repeated protobuf to nested columns (fixes [#41971](https://github.com/ClickHouse/ClickHouse/issues/41971)). [#69556](https://github.com/ClickHouse/ClickHouse/pull/69556) ([Eliot Hautefeuille](https://github.com/hileef)). +* Fix crash during insertion into FixedString column in PostgreSQL engine. [#69584](https://github.com/ClickHouse/ClickHouse/pull/69584) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix crash when executing `create view t as (with recursive 42 as ttt select ttt);`. [#69676](https://github.com/ClickHouse/ClickHouse/pull/69676) ([Han Fei](https://github.com/hanfei1991)). +* Fixed `maxMapState` throwing 'Bad get' if value type is DateTime64. [#69787](https://github.com/ClickHouse/ClickHouse/pull/69787) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix `getSubcolumn` with `LowCardinality` columns by overriding `useDefaultImplementationForLowCardinalityColumns` to return `true`. [#69831](https://github.com/ClickHouse/ClickHouse/pull/69831) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)). +* Fix permanent blocked distributed sends if a DROP of distributed table failed. [#69843](https://github.com/ClickHouse/ClickHouse/pull/69843) ([Azat Khuzhin](https://github.com/azat)). +* Fix non-cancellable queries containing WITH FILL with NaN keys. This closes [#69261](https://github.com/ClickHouse/ClickHouse/issues/69261). [#69845](https://github.com/ClickHouse/ClickHouse/pull/69845) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix analyzer default with old compatibility value. [#69895](https://github.com/ClickHouse/ClickHouse/pull/69895) ([Raúl Marín](https://github.com/Algunenano)). +* Don't check dependencies during CREATE OR REPLACE VIEW during DROP of old table. Previously CREATE OR REPLACE query failed when there are dependent tables of the recreated view. [#69907](https://github.com/ClickHouse/ClickHouse/pull/69907) ([Pavel Kruglov](https://github.com/Avogar)). +* Something for Decimal. Fixes [#69730](https://github.com/ClickHouse/ClickHouse/issues/69730). [#69978](https://github.com/ClickHouse/ClickHouse/pull/69978) ([Arthur Passos](https://github.com/arthurpassos)). +* Now DEFINER/INVOKER will work with parameterized views. [#69984](https://github.com/ClickHouse/ClickHouse/pull/69984) ([pufit](https://github.com/pufit)). +* Fix parsing for view's definers. [#69985](https://github.com/ClickHouse/ClickHouse/pull/69985) ([pufit](https://github.com/pufit)). +* Fixed a bug when the timezone could change the result of the query with a `Date` or `Date32` arguments. [#70036](https://github.com/ClickHouse/ClickHouse/pull/70036) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fixes `Block structure mismatch` for queries with nested views and `WHERE` condition. Fixes [#66209](https://github.com/ClickHouse/ClickHouse/issues/66209). [#70054](https://github.com/ClickHouse/ClickHouse/pull/70054) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Avoid reusing columns among different named tuples when evaluating `tuple` functions. This fixes [#70022](https://github.com/ClickHouse/ClickHouse/issues/70022). [#70103](https://github.com/ClickHouse/ClickHouse/pull/70103) ([Amos Bird](https://github.com/amosbird)). +* Fix wrong LOGICAL_ERROR when replacing literals in ranges. [#70122](https://github.com/ClickHouse/ClickHouse/pull/70122) ([Pablo Marcos](https://github.com/pamarcos)). +* Check for Nullable(Nothing) type during ALTER TABLE MODIFY COLUMN/QUERY to prevent tables with such data type. [#70123](https://github.com/ClickHouse/ClickHouse/pull/70123) ([Pavel Kruglov](https://github.com/Avogar)). +* Proper error message for illegal query `JOIN ... ON *` , close [#68650](https://github.com/ClickHouse/ClickHouse/issues/68650). [#70124](https://github.com/ClickHouse/ClickHouse/pull/70124) ([Vladimir Cherkasov](https://github.com/vdimir)). +* Fix wrong result with skipping index. [#70127](https://github.com/ClickHouse/ClickHouse/pull/70127) ([Raúl Marín](https://github.com/Algunenano)). +* Fix data race in ColumnObject/ColumnTuple decompress method that could lead to heap use after free. [#70137](https://github.com/ClickHouse/ClickHouse/pull/70137) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix possible hung in ALTER COLUMN with Dynamic type. [#70144](https://github.com/ClickHouse/ClickHouse/pull/70144) ([Pavel Kruglov](https://github.com/Avogar)). +* Now ClickHouse will consider more errors as retriable and will not mark data parts as broken in case of such errors. [#70145](https://github.com/ClickHouse/ClickHouse/pull/70145) ([alesapin](https://github.com/alesapin)). +* Use correct `max_types` parameter during Dynamic type creation for JSON subcolumn. [#70147](https://github.com/ClickHouse/ClickHouse/pull/70147) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix the password being displayed in `system.query_log` for users with bcrypt password authentication method. [#70148](https://github.com/ClickHouse/ClickHouse/pull/70148) ([Nikolay Degterinsky](https://github.com/evillique)). +* Fix event counter for the native interface (InterfaceNativeSendBytes). [#70153](https://github.com/ClickHouse/ClickHouse/pull/70153) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix possible crash related to JSON columns. [#70172](https://github.com/ClickHouse/ClickHouse/pull/70172) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix multiple issues with arrayMin and arrayMax. [#70207](https://github.com/ClickHouse/ClickHouse/pull/70207) ([Raúl Marín](https://github.com/Algunenano)). +* Respect setting allow_simdjson in the JSON type parser. [#70218](https://github.com/ClickHouse/ClickHouse/pull/70218) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix a null pointer dereference on creating a materialized view with two selects and an `INTERSECT`, e.g. `CREATE MATERIALIZED VIEW v0 AS (SELECT 1) INTERSECT (SELECT 1);`. [#70264](https://github.com/ClickHouse/ClickHouse/pull/70264) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Don't modify global settings with startup scripts. Previously, changing a setting in a startup script would change it globally. [#70310](https://github.com/ClickHouse/ClickHouse/pull/70310) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix ALTER of `Dynamic` type with reducing max_types parameter that could lead to server crash. [#70328](https://github.com/ClickHouse/ClickHouse/pull/70328) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix crash when using WITH FILL incorrectly. [#70338](https://github.com/ClickHouse/ClickHouse/pull/70338) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible use-after-free in `SYSTEM DROP FORMAT SCHEMA CACHE FOR Protobuf`. [#70358](https://github.com/ClickHouse/ClickHouse/pull/70358) ([Azat Khuzhin](https://github.com/azat)). +* Fix crash during GROUP BY JSON sub-object subcolumn. [#70374](https://github.com/ClickHouse/ClickHouse/pull/70374) ([Pavel Kruglov](https://github.com/Avogar)). +* Don't prefetch parts for vertical merges if part has no rows. [#70452](https://github.com/ClickHouse/ClickHouse/pull/70452) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix crash in WHERE with lambda functions. [#70464](https://github.com/ClickHouse/ClickHouse/pull/70464) ([Raúl Marín](https://github.com/Algunenano)). +* Fix table creation with `CREATE ... AS table_function(...)` with database `Replicated` and unavailable table function source on secondary replica. [#70511](https://github.com/ClickHouse/ClickHouse/pull/70511) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Ignore all output on async insert with `wait_for_async_insert=1`. Closes [#62644](https://github.com/ClickHouse/ClickHouse/issues/62644). [#70530](https://github.com/ClickHouse/ClickHouse/pull/70530) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Ignore frozen_metadata.txt while traversing shadow directory from system.remote_data_paths. [#70590](https://github.com/ClickHouse/ClickHouse/pull/70590) ([Aleksei Filatov](https://github.com/aalexfvk)). +* Fix creation of stateful window functions on misaligned memory. [#70631](https://github.com/ClickHouse/ClickHouse/pull/70631) ([Raúl Marín](https://github.com/Algunenano)). +* Fixed rare crashes in `SELECT`-s and merges after adding a column of `Array` type with non-empty default expression. [#70695](https://github.com/ClickHouse/ClickHouse/pull/70695) ([Anton Popov](https://github.com/CurtizJ)). +* Insert into table function s3 will respect query settings. [#70696](https://github.com/ClickHouse/ClickHouse/pull/70696) ([Vladimir Cherkasov](https://github.com/vdimir)). +* Fix infinite recursion when inferring a protobuf schema when skipping unsupported fields is enabled. [#70697](https://github.com/ClickHouse/ClickHouse/pull/70697) ([Raúl Marín](https://github.com/Algunenano)). +* Disable enable_named_columns_in_function_tuple by default. [#70833](https://github.com/ClickHouse/ClickHouse/pull/70833) ([Raúl Marín](https://github.com/Algunenano)). +* Fix S3Queue table engine setting processing_threads_num not being effective in case it was deduced from the number of cpu cores on the server. [#70837](https://github.com/ClickHouse/ClickHouse/pull/70837) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Normalize named tuple arguments in aggregation states. This fixes [#69732](https://github.com/ClickHouse/ClickHouse/issues/69732) . [#70853](https://github.com/ClickHouse/ClickHouse/pull/70853) ([Amos Bird](https://github.com/amosbird)). +* Fix a logical error due to negative zeros in the two-level hash table. This closes [#70973](https://github.com/ClickHouse/ClickHouse/issues/70973). [#70979](https://github.com/ClickHouse/ClickHouse/pull/70979) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix `limit by`, `limit with ties` for distributed and parallel replicas. [#70880](https://github.com/ClickHouse/ClickHouse/pull/70880) ([Nikita Taranov](https://github.com/nickitat)). + + ### ClickHouse release 24.9, 2024-09-26 #### Backward Incompatible Change diff --git a/README.md b/README.md index 9d55d1fe9da..dcaeda13acd 100644 --- a/README.md +++ b/README.md @@ -48,6 +48,7 @@ Upcoming meetups * [Paris Meetup](https://www.meetup.com/clickhouse-france-user-group/events/303096434) - November 26 * [Amsterdam Meetup](https://www.meetup.com/clickhouse-netherlands-user-group/events/303638814) - December 3 * [New York Meetup](https://www.meetup.com/clickhouse-new-york-user-group/events/304268174) - December 9 +* [San Francisco Meetup](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/304286951/) - December 12 Recently completed meetups diff --git a/docker/test/style/Dockerfile b/docker/test/style/Dockerfile index fa6b087eb7d..564301f447c 100644 --- a/docker/test/style/Dockerfile +++ b/docker/test/style/Dockerfile @@ -28,7 +28,7 @@ COPY requirements.txt / RUN pip3 install --no-cache-dir -r requirements.txt RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && locale-gen en_US.UTF-8 -ENV LC_ALL en_US.UTF-8 +ENV LC_ALL=en_US.UTF-8 # Architecture of the image when BuildKit/buildx is used ARG TARGETARCH diff --git a/docker/test/style/requirements.txt b/docker/test/style/requirements.txt index cc87f6e548d..aab20b5bee0 100644 --- a/docker/test/style/requirements.txt +++ b/docker/test/style/requirements.txt @@ -12,6 +12,7 @@ charset-normalizer==3.3.2 click==8.1.7 codespell==2.2.1 cryptography==43.0.1 +datacompy==0.7.3 Deprecated==1.2.14 dill==0.3.8 flake8==4.0.1 @@ -23,6 +24,7 @@ mccabe==0.6.1 multidict==6.0.5 mypy==1.8.0 mypy-extensions==1.0.0 +pandas==2.2.3 packaging==24.1 pathspec==0.9.0 pip==24.1.1 diff --git a/docs/en/engines/table-engines/integrations/s3.md b/docs/en/engines/table-engines/integrations/s3.md index 2675c193519..fd27d4b6ed9 100644 --- a/docs/en/engines/table-engines/integrations/s3.md +++ b/docs/en/engines/table-engines/integrations/s3.md @@ -290,6 +290,7 @@ The following settings can be specified in configuration file for given endpoint - `expiration_window_seconds` — Grace period for checking if expiration-based credentials have expired. Optional, default value is `120`. - `no_sign_request` - Ignore all the credentials so requests are not signed. Useful for accessing public buckets. - `header` — Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times. +- `access_header` - Adds specified HTTP header to a request to given endpoint, in cases where there are no other credentials from another source. - `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set. Optional. - `server_side_encryption_kms_key_id` - If specified, required headers for accessing S3 objects with [SSE-KMS encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) will be set. If an empty string is specified, the AWS managed S3 key will be used. Optional. - `server_side_encryption_kms_encryption_context` - If specified alongside `server_side_encryption_kms_key_id`, the given encryption context header for SSE-KMS will be set. Optional. @@ -320,6 +321,32 @@ The following settings can be specified in configuration file for given endpoint ``` +## Working with archives + +Suppose that we have several archive files with following URIs on S3: + +- 'https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-2018-01-10.csv.zip' +- 'https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-2018-01-11.csv.zip' +- 'https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-2018-01-12.csv.zip' + +Extracting data from these archives is possible using ::. Globs can be used both in the url part as well as in the part after :: (responsible for the name of a file inside the archive). + +``` sql +SELECT * +FROM s3( + 'https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-2018-01-1{0..2}.csv.zip :: *.csv' +); +``` + +:::note +ClickHouse supports three archive formats: +ZIP +TAR +7Z +While ZIP and TAR archives can be accessed from any supported storage location, 7Z archives can only be read from the local filesystem where ClickHouse is installed. +::: + + ## Accessing public buckets ClickHouse tries to fetch credentials from many different types of sources. diff --git a/docs/en/operations/system-tables/merge_tree_settings.md b/docs/en/operations/system-tables/merge_tree_settings.md index 48217d63f9d..473315d3941 100644 --- a/docs/en/operations/system-tables/merge_tree_settings.md +++ b/docs/en/operations/system-tables/merge_tree_settings.md @@ -18,6 +18,11 @@ Columns: - `1` — Current user can’t change the setting. - `type` ([String](../../sql-reference/data-types/string.md)) — Setting type (implementation specific string value). - `is_obsolete` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) - Shows whether a setting is obsolete. +- `tier` ([Enum8](../../sql-reference/data-types/enum.md)) — Support level for this feature. ClickHouse features are organized in tiers, varying depending on the current status of their development and the expectations one might have when using them. Values: + - `'Production'` — The feature is stable, safe to use and does not have issues interacting with other **production** features. . + - `'Beta'` — The feature is stable and safe. The outcome of using it together with other features is unknown and correctness is not guaranteed. Testing and reports are welcome. + - `'Experimental'` — The feature is under development. Only intended for developers and ClickHouse enthusiasts. The feature might or might not work and could be removed at any time. + - `'Obsolete'` — No longer supported. Either it is already removed or it will be removed in future releases. **Example** ```sql diff --git a/docs/en/operations/system-tables/settings.md b/docs/en/operations/system-tables/settings.md index a04e095e990..1cfee0ba5f4 100644 --- a/docs/en/operations/system-tables/settings.md +++ b/docs/en/operations/system-tables/settings.md @@ -18,6 +18,11 @@ Columns: - `1` — Current user can’t change the setting. - `default` ([String](../../sql-reference/data-types/string.md)) — Setting default value. - `is_obsolete` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) - Shows whether a setting is obsolete. +- `tier` ([Enum8](../../sql-reference/data-types/enum.md)) — Support level for this feature. ClickHouse features are organized in tiers, varying depending on the current status of their development and the expectations one might have when using them. Values: + - `'Production'` — The feature is stable, safe to use and does not have issues interacting with other **production** features. . + - `'Beta'` — The feature is stable and safe. The outcome of using it together with other features is unknown and correctness is not guaranteed. Testing and reports are welcome. + - `'Experimental'` — The feature is under development. Only intended for developers and ClickHouse enthusiasts. The feature might or might not work and could be removed at any time. + - `'Obsolete'` — No longer supported. Either it is already removed or it will be removed in future releases. **Example** @@ -26,19 +31,99 @@ The following example shows how to get information about settings which name con ``` sql SELECT * FROM system.settings -WHERE name LIKE '%min_i%' +WHERE name LIKE '%min_insert_block_size_%' +FORMAT Vertical ``` ``` text -┌─name───────────────────────────────────────────────_─value─────_─changed─_─description───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────_─min──_─max──_─readonly─_─type─────────_─default───_─alias_for─_─is_obsolete─┐ -│ min_insert_block_size_rows │ 1048449 │ 0 │ Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough. │ ____ │ ____ │ 0 │ UInt64 │ 1048449 │ │ 0 │ -│ min_insert_block_size_bytes │ 268402944 │ 0 │ Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough. │ ____ │ ____ │ 0 │ UInt64 │ 268402944 │ │ 0 │ -│ min_insert_block_size_rows_for_materialized_views │ 0 │ 0 │ Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows) │ ____ │ ____ │ 0 │ UInt64 │ 0 │ │ 0 │ -│ min_insert_block_size_bytes_for_materialized_views │ 0 │ 0 │ Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes) │ ____ │ ____ │ 0 │ UInt64 │ 0 │ │ 0 │ -│ read_backoff_min_interval_between_events_ms │ 1000 │ 0 │ Settings to reduce the number of threads in case of slow reads. Do not pay attention to the event, if the previous one has passed less than a certain amount of time. │ ____ │ ____ │ 0 │ Milliseconds │ 1000 │ │ 0 │ -└────────────────────────────────────────────────────┴───────────┴─────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────── -──────────────────────────────────────────────────────┴──────┴──────┴──────────┴──────────────┴───────────┴───────────┴─────────────┘ -``` +Row 1: +────── +name: min_insert_block_size_rows +value: 1048449 +changed: 0 +description: Sets the minimum number of rows in the block that can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. + +Possible values: + +- Positive integer. +- 0 — Squashing disabled. +min: ᴺᵁᴸᴸ +max: ᴺᵁᴸᴸ +readonly: 0 +type: UInt64 +default: 1048449 +alias_for: +is_obsolete: 0 +tier: Production + +Row 2: +────── +name: min_insert_block_size_bytes +value: 268402944 +changed: 0 +description: Sets the minimum number of bytes in the block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. + +Possible values: + +- Positive integer. +- 0 — Squashing disabled. +min: ᴺᵁᴸᴸ +max: ᴺᵁᴸᴸ +readonly: 0 +type: UInt64 +default: 268402944 +alias_for: +is_obsolete: 0 +tier: Production + +Row 3: +────── +name: min_insert_block_size_rows_for_materialized_views +value: 0 +changed: 0 +description: Sets the minimum number of rows in the block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. This setting is applied only for blocks inserted into [materialized view](../../sql-reference/statements/create/view.md). By adjusting this setting, you control blocks squashing while pushing to materialized view and avoid excessive memory usage. + +Possible values: + +- Any positive integer. +- 0 — Squashing disabled. + +**See Also** + +- [min_insert_block_size_rows](#min-insert-block-size-rows) +min: ᴺᵁᴸᴸ +max: ᴺᵁᴸᴸ +readonly: 0 +type: UInt64 +default: 0 +alias_for: +is_obsolete: 0 +tier: Production + +Row 4: +────── +name: min_insert_block_size_bytes_for_materialized_views +value: 0 +changed: 0 +description: Sets the minimum number of bytes in the block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. This setting is applied only for blocks inserted into [materialized view](../../sql-reference/statements/create/view.md). By adjusting this setting, you control blocks squashing while pushing to materialized view and avoid excessive memory usage. + +Possible values: + +- Any positive integer. +- 0 — Squashing disabled. + +**See also** + +- [min_insert_block_size_bytes](#min-insert-block-size-bytes) +min: ᴺᵁᴸᴸ +max: ᴺᵁᴸᴸ +readonly: 0 +type: UInt64 +default: 0 +alias_for: +is_obsolete: 0 +tier: Production + ``` Using of `WHERE changed` can be useful, for example, when you want to check: diff --git a/docs/en/sql-reference/statements/alter/user.md b/docs/en/sql-reference/statements/alter/user.md index a56532e2ab0..1514b16a657 100644 --- a/docs/en/sql-reference/statements/alter/user.md +++ b/docs/en/sql-reference/statements/alter/user.md @@ -12,7 +12,7 @@ Syntax: ``` sql ALTER USER [IF EXISTS] name1 [RENAME TO new_name |, name2 [,...]] [ON CLUSTER cluster_name] - [NOT IDENTIFIED | RESET AUTHENTICATION METHODS TO NEW | {IDENTIFIED | ADD IDENTIFIED} {[WITH {plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | WITH NO_PASSWORD | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name' | SAN 'TYPE:subject_alt_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']} + [NOT IDENTIFIED | RESET AUTHENTICATION METHODS TO NEW | {IDENTIFIED | ADD IDENTIFIED} {[WITH {plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | WITH NO_PASSWORD | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name' | SAN 'TYPE:subject_alt_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']} [VALID UNTIL datetime] [, {[{plaintext_password | sha256_password | sha256_hash | ...}] BY {'password' | 'hash'}} | {ldap SERVER 'server_name'} | {...} | ... [,...]]] [[ADD | DROP] HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE] [VALID UNTIL datetime] @@ -91,3 +91,15 @@ Reset authentication methods and keep the most recent added one: ``` sql ALTER USER user1 RESET AUTHENTICATION METHODS TO NEW ``` + +## VALID UNTIL Clause + +Allows you to specify the expiration date and, optionally, the time for an authentication method. It accepts a string as a parameter. It is recommended to use the `YYYY-MM-DD [hh:mm:ss] [timezone]` format for datetime. By default, this parameter equals `'infinity'`. +The `VALID UNTIL` clause can only be specified along with an authentication method, except for the case where no authentication method has been specified in the query. In this scenario, the `VALID UNTIL` clause will be applied to all existing authentication methods. + +Examples: + +- `ALTER USER name1 VALID UNTIL '2025-01-01'` +- `ALTER USER name1 VALID UNTIL '2025-01-01 12:00:00 UTC'` +- `ALTER USER name1 VALID UNTIL 'infinity'` +- `ALTER USER name1 IDENTIFIED WITH plaintext_password BY 'no_expiration', bcrypt_password BY 'expiration_set' VALID UNTIL'2025-01-01''` diff --git a/docs/en/sql-reference/statements/create/user.md b/docs/en/sql-reference/statements/create/user.md index 21f3dd6692c..03d93fc3365 100644 --- a/docs/en/sql-reference/statements/create/user.md +++ b/docs/en/sql-reference/statements/create/user.md @@ -11,7 +11,7 @@ Syntax: ``` sql CREATE USER [IF NOT EXISTS | OR REPLACE] name1 [, name2 [,...]] [ON CLUSTER cluster_name] - [NOT IDENTIFIED | IDENTIFIED {[WITH {plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | WITH NO_PASSWORD | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name' | SAN 'TYPE:subject_alt_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']} + [NOT IDENTIFIED | IDENTIFIED {[WITH {plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | WITH NO_PASSWORD | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name' | SAN 'TYPE:subject_alt_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']} [VALID UNTIL datetime] [, {[{plaintext_password | sha256_password | sha256_hash | ...}] BY {'password' | 'hash'}} | {ldap SERVER 'server_name'} | {...} | ... [,...]]] [HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE] [VALID UNTIL datetime] @@ -178,7 +178,8 @@ ClickHouse treats `user_name@'address'` as a username as a whole. Thus, technica ## VALID UNTIL Clause -Allows you to specify the expiration date and, optionally, the time for user credentials. It accepts a string as a parameter. It is recommended to use the `YYYY-MM-DD [hh:mm:ss] [timezone]` format for datetime. By default, this parameter equals `'infinity'`. +Allows you to specify the expiration date and, optionally, the time for an authentication method. It accepts a string as a parameter. It is recommended to use the `YYYY-MM-DD [hh:mm:ss] [timezone]` format for datetime. By default, this parameter equals `'infinity'`. +The `VALID UNTIL` clause can only be specified along with an authentication method, except for the case where no authentication method has been specified in the query. In this scenario, the `VALID UNTIL` clause will be applied to all existing authentication methods. Examples: @@ -186,6 +187,7 @@ Examples: - `CREATE USER name1 VALID UNTIL '2025-01-01 12:00:00 UTC'` - `CREATE USER name1 VALID UNTIL 'infinity'` - ```CREATE USER name1 VALID UNTIL '2025-01-01 12:00:00 `Asia/Tokyo`'``` +- `CREATE USER name1 IDENTIFIED WITH plaintext_password BY 'no_expiration', bcrypt_password BY 'expiration_set' VALID UNTIL '2025-01-01''` ## GRANTEES Clause diff --git a/docs/en/sql-reference/statements/kill.md b/docs/en/sql-reference/statements/kill.md index 667a5b51f5c..ff6f64a97fe 100644 --- a/docs/en/sql-reference/statements/kill.md +++ b/docs/en/sql-reference/statements/kill.md @@ -83,7 +83,7 @@ The presence of long-running or incomplete mutations often indicates that a Clic - Or manually kill some of these mutations by sending a `KILL` command. ``` sql -KILL MUTATION [ON CLUSTER cluster] +KILL MUTATION WHERE [TEST] [FORMAT format] @@ -135,7 +135,6 @@ KILL MUTATION WHERE database = 'default' AND table = 'table' -- Cancel the specific mutation: KILL MUTATION WHERE database = 'default' AND table = 'table' AND mutation_id = 'mutation_3.txt' ``` -:::tip If you are killing a mutation in ClickHouse Cloud or in a self-managed cluster, then be sure to use the ```ON CLUSTER [cluster-name]``` option, in order to ensure the mutation is killed on all replicas::: The query is useful when a mutation is stuck and cannot finish (e.g. if some function in the mutation query throws an exception when applied to the data contained in the table). diff --git a/docs/en/sql-reference/table-functions/s3.md b/docs/en/sql-reference/table-functions/s3.md index df4e10425a5..b14eb84392f 100644 --- a/docs/en/sql-reference/table-functions/s3.md +++ b/docs/en/sql-reference/table-functions/s3.md @@ -284,6 +284,14 @@ FROM s3( ); ``` +:::note +ClickHouse supports three archive formats: +ZIP +TAR +7Z +While ZIP and TAR archives can be accessed from any supported storage location, 7Z archives can only be read from the local filesystem where ClickHouse is installed. +::: + ## Virtual Columns {#virtual-columns} diff --git a/docs/ru/engines/table-engines/integrations/s3.md b/docs/ru/engines/table-engines/integrations/s3.md index a1c69df4d0a..2bab78c0612 100644 --- a/docs/ru/engines/table-engines/integrations/s3.md +++ b/docs/ru/engines/table-engines/integrations/s3.md @@ -138,6 +138,7 @@ CREATE TABLE table_with_asterisk (name String, value UInt32) - `use_insecure_imds_request` — признак использования менее безопасного соединения при выполнении запроса к IMDS при получении учётных данных из метаданных Amazon EC2. Значение по умолчанию — `false`. - `region` — название региона S3. - `header` — добавляет указанный HTTP-заголовок к запросу на заданную точку приема запроса. Может быть определен несколько раз. +- `access_header` - добавляет указанный HTTP-заголовок к запросу на заданную точку приема запроса, в случая если не указаны другие способы авторизации. - `server_side_encryption_customer_key_base64` — устанавливает необходимые заголовки для доступа к объектам S3 с шифрованием SSE-C. - `single_read_retries` — Максимальное количество попыток запроса при единичном чтении. Значение по умолчанию — `4`. diff --git a/programs/keeper/Keeper.cpp b/programs/keeper/Keeper.cpp index 3007df60765..74af9950e13 100644 --- a/programs/keeper/Keeper.cpp +++ b/programs/keeper/Keeper.cpp @@ -590,6 +590,7 @@ try #if USE_SSL CertificateReloader::instance().tryLoad(*config); + CertificateReloader::instance().tryLoadClient(*config); #endif }); diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp index 826100f68e2..1f481381b2b 100644 --- a/programs/server/Server.cpp +++ b/programs/server/Server.cpp @@ -2341,6 +2341,7 @@ try #if USE_SSL CertificateReloader::instance().tryLoad(config()); + CertificateReloader::instance().tryLoadClient(config()); #endif /// Must be done after initialization of `servers`, because async_metrics will access `servers` variable from its thread. diff --git a/src/Access/AuthenticationData.cpp b/src/Access/AuthenticationData.cpp index 57a1cd756ff..37a4e356af8 100644 --- a/src/Access/AuthenticationData.cpp +++ b/src/Access/AuthenticationData.cpp @@ -1,12 +1,16 @@ #include #include #include +#include #include #include #include #include #include #include +#include +#include +#include #include #include @@ -113,7 +117,8 @@ bool operator ==(const AuthenticationData & lhs, const AuthenticationData & rhs) && (lhs.ssh_keys == rhs.ssh_keys) #endif && (lhs.http_auth_scheme == rhs.http_auth_scheme) - && (lhs.http_auth_server_name == rhs.http_auth_server_name); + && (lhs.http_auth_server_name == rhs.http_auth_server_name) + && (lhs.valid_until == rhs.valid_until); } @@ -384,14 +389,34 @@ std::shared_ptr AuthenticationData::toAST() const throw Exception(ErrorCodes::LOGICAL_ERROR, "AST: Unexpected authentication type {}", toString(auth_type)); } + + if (valid_until) + { + WriteBufferFromOwnString out; + writeDateTimeText(valid_until, out); + + node->valid_until = std::make_shared(out.str()); + } + return node; } AuthenticationData AuthenticationData::fromAST(const ASTAuthenticationData & query, ContextPtr context, bool validate) { + time_t valid_until = 0; + + if (query.valid_until) + { + valid_until = getValidUntilFromAST(query.valid_until, context); + } + if (query.type && query.type == AuthenticationType::NO_PASSWORD) - return AuthenticationData(); + { + AuthenticationData auth_data; + auth_data.setValidUntil(valid_until); + return auth_data; + } /// For this type of authentication we have ASTPublicSSHKey as children for ASTAuthenticationData if (query.type && query.type == AuthenticationType::SSH_KEY) @@ -418,6 +443,7 @@ AuthenticationData AuthenticationData::fromAST(const ASTAuthenticationData & que } auth_data.setSSHKeys(std::move(keys)); + auth_data.setValidUntil(valid_until); return auth_data; #else throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "SSH is disabled, because ClickHouse is built without libssh"); @@ -451,6 +477,8 @@ AuthenticationData AuthenticationData::fromAST(const ASTAuthenticationData & que AuthenticationData auth_data(current_type); + auth_data.setValidUntil(valid_until); + if (validate) context->getAccessControl().checkPasswordComplexityRules(value); @@ -494,6 +522,7 @@ AuthenticationData AuthenticationData::fromAST(const ASTAuthenticationData & que } AuthenticationData auth_data(*query.type); + auth_data.setValidUntil(valid_until); if (query.contains_hash) { diff --git a/src/Access/AuthenticationData.h b/src/Access/AuthenticationData.h index a0c100264f8..2d8d008c925 100644 --- a/src/Access/AuthenticationData.h +++ b/src/Access/AuthenticationData.h @@ -74,6 +74,9 @@ public: const String & getHTTPAuthenticationServerName() const { return http_auth_server_name; } void setHTTPAuthenticationServerName(const String & name) { http_auth_server_name = name; } + time_t getValidUntil() const { return valid_until; } + void setValidUntil(time_t valid_until_) { valid_until = valid_until_; } + friend bool operator ==(const AuthenticationData & lhs, const AuthenticationData & rhs); friend bool operator !=(const AuthenticationData & lhs, const AuthenticationData & rhs) { return !(lhs == rhs); } @@ -106,6 +109,7 @@ private: /// HTTP authentication properties String http_auth_server_name; HTTPAuthenticationScheme http_auth_scheme = HTTPAuthenticationScheme::BASIC; + time_t valid_until = 0; }; } diff --git a/src/Access/IAccessStorage.cpp b/src/Access/IAccessStorage.cpp index 3249d89ba87..72e0933e214 100644 --- a/src/Access/IAccessStorage.cpp +++ b/src/Access/IAccessStorage.cpp @@ -554,7 +554,7 @@ std::optional IAccessStorage::authenticateImpl( continue; } - if (areCredentialsValid(user->getName(), user->valid_until, auth_method, credentials, external_authenticators, auth_result.settings)) + if (areCredentialsValid(user->getName(), auth_method, credentials, external_authenticators, auth_result.settings)) { auth_result.authentication_data = auth_method; return auth_result; @@ -579,7 +579,6 @@ std::optional IAccessStorage::authenticateImpl( bool IAccessStorage::areCredentialsValid( const std::string & user_name, - time_t valid_until, const AuthenticationData & authentication_method, const Credentials & credentials, const ExternalAuthenticators & external_authenticators, @@ -591,6 +590,7 @@ bool IAccessStorage::areCredentialsValid( if (credentials.getUserName() != user_name) return false; + auto valid_until = authentication_method.getValidUntil(); if (valid_until) { const time_t now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); diff --git a/src/Access/IAccessStorage.h b/src/Access/IAccessStorage.h index 84cbdd0a751..4e2b27a1864 100644 --- a/src/Access/IAccessStorage.h +++ b/src/Access/IAccessStorage.h @@ -236,7 +236,6 @@ protected: bool allow_plaintext_password) const; virtual bool areCredentialsValid( const std::string & user_name, - time_t valid_until, const AuthenticationData & authentication_method, const Credentials & credentials, const ExternalAuthenticators & external_authenticators, diff --git a/src/Access/User.cpp b/src/Access/User.cpp index 887abc213f9..1c92f467003 100644 --- a/src/Access/User.cpp +++ b/src/Access/User.cpp @@ -19,8 +19,7 @@ bool User::equal(const IAccessEntity & other) const return (authentication_methods == other_user.authentication_methods) && (allowed_client_hosts == other_user.allowed_client_hosts) && (access == other_user.access) && (granted_roles == other_user.granted_roles) && (default_roles == other_user.default_roles) - && (settings == other_user.settings) && (grantees == other_user.grantees) && (default_database == other_user.default_database) - && (valid_until == other_user.valid_until); + && (settings == other_user.settings) && (grantees == other_user.grantees) && (default_database == other_user.default_database); } void User::setName(const String & name_) @@ -88,7 +87,6 @@ void User::clearAllExceptDependencies() access = {}; settings.removeSettingsKeepProfiles(); default_database = {}; - valid_until = 0; } } diff --git a/src/Access/User.h b/src/Access/User.h index 03d62bf2277..f54e74a305d 100644 --- a/src/Access/User.h +++ b/src/Access/User.h @@ -23,7 +23,6 @@ struct User : public IAccessEntity SettingsProfileElements settings; RolesOrUsersSet grantees = RolesOrUsersSet::AllTag{}; String default_database; - time_t valid_until = 0; bool equal(const IAccessEntity & other) const override; std::shared_ptr clone() const override { return cloneImpl(); } diff --git a/src/Analyzer/Resolve/QueryAnalyzer.cpp b/src/Analyzer/Resolve/QueryAnalyzer.cpp index 381edee607d..cb3087af707 100644 --- a/src/Analyzer/Resolve/QueryAnalyzer.cpp +++ b/src/Analyzer/Resolve/QueryAnalyzer.cpp @@ -227,8 +227,13 @@ void QueryAnalyzer::resolveConstantExpression(QueryTreeNodePtr & node, const Que scope.context = context; auto node_type = node->getNodeType(); + if (node_type == QueryTreeNodeType::QUERY || node_type == QueryTreeNodeType::UNION) + { + evaluateScalarSubqueryIfNeeded(node, scope); + return; + } - if (table_expression && node_type != QueryTreeNodeType::QUERY && node_type != QueryTreeNodeType::UNION) + if (table_expression) { scope.expression_join_tree_node = table_expression; validateTableExpressionModifiers(scope.expression_join_tree_node, scope); diff --git a/src/Client/ClientApplicationBase.cpp b/src/Client/ClientApplicationBase.cpp index d26641fe5f9..f7d2d0035d9 100644 --- a/src/Client/ClientApplicationBase.cpp +++ b/src/Client/ClientApplicationBase.cpp @@ -418,7 +418,7 @@ void ClientApplicationBase::init(int argc, char ** argv) UInt64 max_client_memory_usage_int = parseWithSizeSuffix(max_client_memory_usage.c_str(), max_client_memory_usage.length()); total_memory_tracker.setHardLimit(max_client_memory_usage_int); - total_memory_tracker.setDescription("(total)"); + total_memory_tracker.setDescription("Global"); total_memory_tracker.setMetric(CurrentMetrics::MemoryTracking); } diff --git a/src/Client/ClientBase.cpp b/src/Client/ClientBase.cpp index 73885ba522d..b6bf637ab44 100644 --- a/src/Client/ClientBase.cpp +++ b/src/Client/ClientBase.cpp @@ -1454,8 +1454,22 @@ void ClientBase::resetOutput() /// Order is important: format, compression, file - if (output_format) - output_format->finalize(); + try + { + if (output_format) + output_format->finalize(); + } + catch (...) + { + /// We need to make sure we continue resetting output_format (will stop threads on parallel output) + /// as well as cleaning other output related setup + if (!have_error) + { + client_exception + = std::make_unique(getCurrentExceptionMessageAndPattern(print_stack_trace), getCurrentExceptionCode()); + have_error = true; + } + } output_format.reset(); logs_out_stream.reset(); diff --git a/src/Common/MemoryTracker.cpp b/src/Common/MemoryTracker.cpp index 3ed943f217d..f4af019605e 100644 --- a/src/Common/MemoryTracker.cpp +++ b/src/Common/MemoryTracker.cpp @@ -68,15 +68,15 @@ inline std::string_view toDescription(OvercommitResult result) case OvercommitResult::NONE: return ""; case OvercommitResult::DISABLED: - return "Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero."; + return "Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero"; case OvercommitResult::MEMORY_FREED: throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "OvercommitResult::MEMORY_FREED shouldn't be asked for description"); case OvercommitResult::SELECTED: - return "Query was selected to stop by OvercommitTracker."; + return "Query was selected to stop by OvercommitTracker"; case OvercommitResult::TIMEOUTED: - return "Waiting timeout for memory to be freed is reached."; + return "Waiting timeout for memory to be freed is reached"; case OvercommitResult::NOT_ENOUGH_FREED: - return "Memory overcommit has freed not enough memory."; + return "Memory overcommit has not freed enough memory"; } } @@ -150,15 +150,23 @@ void MemoryTracker::logPeakMemoryUsage() auto peak_bytes = peak.load(std::memory_order::relaxed); if (peak_bytes < 128 * 1024) return; - LOG_DEBUG(getLogger("MemoryTracker"), - "Peak memory usage{}: {}.", (description ? " " + std::string(description) : ""), ReadableSize(peak_bytes)); + LOG_DEBUG( + getLogger("MemoryTracker"), + "{}{} memory usage: {}.", + description ? std::string(description) : "", + description ? " peak" : "Peak", + ReadableSize(peak_bytes)); } void MemoryTracker::logMemoryUsage(Int64 current) const { const auto * description = description_ptr.load(std::memory_order_relaxed); - LOG_DEBUG(getLogger("MemoryTracker"), - "Current memory usage{}: {}.", (description ? " " + std::string(description) : ""), ReadableSize(current)); + LOG_DEBUG( + getLogger("MemoryTracker"), + "{}{} memory usage: {}.", + description ? std::string(description) : "", + description ? " current" : "Current", + ReadableSize(current)); } void MemoryTracker::injectFault() const @@ -178,9 +186,9 @@ void MemoryTracker::injectFault() const const auto * description = description_ptr.load(std::memory_order_relaxed); throw DB::Exception( DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED, - "Memory tracker{}{}: fault injected (at specific point)", - description ? " " : "", - description ? description : ""); + "{}{}: fault injected (at specific point)", + description ? description : "", + description ? " memory tracker" : "Memory tracker"); } void MemoryTracker::debugLogBigAllocationWithoutCheck(Int64 size [[maybe_unused]]) @@ -282,9 +290,9 @@ AllocationTrace MemoryTracker::allocImpl(Int64 size, bool throw_if_memory_exceed const auto * description = description_ptr.load(std::memory_order_relaxed); throw DB::Exception( DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED, - "Memory tracker{}{}: fault injected. Would use {} (attempt to allocate chunk of {} bytes), maximum: {}", - description ? " " : "", + "{}{}: fault injected. Would use {} (attempt to allocate chunk of {} bytes), maximum: {}", description ? description : "", + description ? " memory tracker" : "Memory tracker", formatReadableSizeWithBinarySuffix(will_be), size, formatReadableSizeWithBinarySuffix(current_hard_limit)); @@ -305,6 +313,8 @@ AllocationTrace MemoryTracker::allocImpl(Int64 size, bool throw_if_memory_exceed if (overcommit_result != OvercommitResult::MEMORY_FREED) { + bool overcommit_result_ignore + = overcommit_result == OvercommitResult::NONE || overcommit_result == OvercommitResult::DISABLED; /// Revert amount.fetch_sub(size, std::memory_order_relaxed); rss.fetch_sub(size, std::memory_order_relaxed); @@ -314,18 +324,18 @@ AllocationTrace MemoryTracker::allocImpl(Int64 size, bool throw_if_memory_exceed ProfileEvents::increment(ProfileEvents::QueryMemoryLimitExceeded); const auto * description = description_ptr.load(std::memory_order_relaxed); throw DB::Exception( - DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED, - "Memory limit{}{} exceeded: " - "would use {} (attempt to allocate chunk of {} bytes), current RSS {}, maximum: {}." - "{}{}", - description ? " " : "", - description ? description : "", - formatReadableSizeWithBinarySuffix(will_be), - size, - formatReadableSizeWithBinarySuffix(rss.load(std::memory_order_relaxed)), - formatReadableSizeWithBinarySuffix(current_hard_limit), - overcommit_result == OvercommitResult::NONE ? "" : " OvercommitTracker decision: ", - toDescription(overcommit_result)); + DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED, + "{}{} exceeded: " + "would use {} (attempt to allocate chunk of {} bytes), current RSS {}, maximum: {}." + "{}{}", + description ? description : "", + description ? " memory limit" : "Memory limit", + formatReadableSizeWithBinarySuffix(will_be), + size, + formatReadableSizeWithBinarySuffix(rss.load(std::memory_order_relaxed)), + formatReadableSizeWithBinarySuffix(current_hard_limit), + overcommit_result_ignore ? "" : " OvercommitTracker decision: ", + overcommit_result_ignore ? "" : toDescription(overcommit_result)); } // If OvercommitTracker::needToStopQuery returned false, it guarantees that enough memory is freed. diff --git a/src/Common/ThreadStatus.cpp b/src/Common/ThreadStatus.cpp index e38d3480664..268d97e62ef 100644 --- a/src/Common/ThreadStatus.cpp +++ b/src/Common/ThreadStatus.cpp @@ -78,7 +78,7 @@ ThreadStatus::ThreadStatus(bool check_current_thread_on_destruction_) last_rusage = std::make_unique(); - memory_tracker.setDescription("(for thread)"); + memory_tracker.setDescription("Thread"); log = getLogger("ThreadStatus"); current_thread = this; diff --git a/src/Core/BaseSettings.cpp b/src/Core/BaseSettings.cpp index c535b9ce65e..2cce94f9d0a 100644 --- a/src/Core/BaseSettings.cpp +++ b/src/Core/BaseSettings.cpp @@ -8,6 +8,7 @@ namespace DB { namespace ErrorCodes { + extern const int INCORRECT_DATA; extern const int UNKNOWN_SETTING; } @@ -31,11 +32,19 @@ void BaseSettingsHelpers::writeFlags(Flags flags, WriteBuffer & out) } -BaseSettingsHelpers::Flags BaseSettingsHelpers::readFlags(ReadBuffer & in) +UInt64 BaseSettingsHelpers::readFlags(ReadBuffer & in) { UInt64 res; readVarUInt(res, in); - return static_cast(res); + return res; +} + +SettingsTierType BaseSettingsHelpers::getTier(UInt64 flags) +{ + int8_t tier = static_cast(flags & Flags::TIER); + if (tier > SettingsTierType::BETA) + throw Exception(ErrorCodes::INCORRECT_DATA, "Unknown tier value: '{}'", tier); + return SettingsTierType{tier}; } diff --git a/src/Core/BaseSettings.h b/src/Core/BaseSettings.h index 2a2e0bb334e..949b884636f 100644 --- a/src/Core/BaseSettings.h +++ b/src/Core/BaseSettings.h @@ -2,6 +2,7 @@ #include #include +#include #include #include #include @@ -21,6 +22,27 @@ namespace DB class ReadBuffer; class WriteBuffer; +struct BaseSettingsHelpers +{ + [[noreturn]] static void throwSettingNotFound(std::string_view name); + static void warningSettingNotFound(std::string_view name); + + static void writeString(std::string_view str, WriteBuffer & out); + static String readString(ReadBuffer & in); + + enum Flags : UInt64 + { + IMPORTANT = 0x01, + CUSTOM = 0x02, + TIER = 0x0c, /// 0b1100 == 2 bits + /// If adding new flags, consider first if Tier might need more bits + }; + + static SettingsTierType getTier(UInt64 flags); + static void writeFlags(Flags flags, WriteBuffer & out); + static UInt64 readFlags(ReadBuffer & in); +}; + /** Template class to define collections of settings. * If you create a new setting, please also add it to ./utils/check-style/check-settings-style * for validation @@ -138,7 +160,7 @@ public: const char * getTypeName() const; const char * getDescription() const; bool isCustom() const; - bool isObsolete() const; + SettingsTierType getTier() const; bool operator==(const SettingFieldRef & other) const { return (getName() == other.getName()) && (getValue() == other.getValue()); } bool operator!=(const SettingFieldRef & other) const { return !(*this == other); } @@ -225,24 +247,6 @@ private: std::conditional_t custom_settings_map; }; -struct BaseSettingsHelpers -{ - [[noreturn]] static void throwSettingNotFound(std::string_view name); - static void warningSettingNotFound(std::string_view name); - - static void writeString(std::string_view str, WriteBuffer & out); - static String readString(ReadBuffer & in); - - enum Flags : UInt64 - { - IMPORTANT = 0x01, - CUSTOM = 0x02, - OBSOLETE = 0x04, - }; - static void writeFlags(Flags flags, WriteBuffer & out); - static Flags readFlags(ReadBuffer & in); -}; - template void BaseSettings::set(std::string_view name, const Field & value) { @@ -477,7 +481,7 @@ void BaseSettings::read(ReadBuffer & in, SettingsWriteFormat format) size_t index = accessor.find(name); using Flags = BaseSettingsHelpers::Flags; - Flags flags{0}; + UInt64 flags{0}; if (format >= SettingsWriteFormat::STRINGS_WITH_FLAGS) flags = BaseSettingsHelpers::readFlags(in); bool is_important = (flags & Flags::IMPORTANT); @@ -797,14 +801,14 @@ bool BaseSettings::SettingFieldRef::isCustom() const } template -bool BaseSettings::SettingFieldRef::isObsolete() const +SettingsTierType BaseSettings::SettingFieldRef::getTier() const { if constexpr (Traits::allow_custom_settings) { if (custom_setting) - return false; + return SettingsTierType::PRODUCTION; } - return accessor->isObsolete(index); + return accessor->getTier(index); } using AliasMap = std::unordered_map; @@ -835,8 +839,8 @@ using AliasMap = std::unordered_map; const String & getName(size_t index) const { return field_infos[index].name; } \ const char * getTypeName(size_t index) const { return field_infos[index].type; } \ const char * getDescription(size_t index) const { return field_infos[index].description; } \ - bool isImportant(size_t index) const { return field_infos[index].is_important; } \ - bool isObsolete(size_t index) const { return field_infos[index].is_obsolete; } \ + bool isImportant(size_t index) const { return field_infos[index].flags & BaseSettingsHelpers::Flags::IMPORTANT; } \ + SettingsTierType getTier(size_t index) const { return BaseSettingsHelpers::getTier(field_infos[index].flags); } \ Field castValueUtil(size_t index, const Field & value) const { return field_infos[index].cast_value_util_function(value); } \ String valueToStringUtil(size_t index, const Field & value) const { return field_infos[index].value_to_string_util_function(value); } \ Field stringToValueUtil(size_t index, const String & str) const { return field_infos[index].string_to_value_util_function(str); } \ @@ -856,8 +860,7 @@ using AliasMap = std::unordered_map; String name; \ const char * type; \ const char * description; \ - bool is_important; \ - bool is_obsolete; \ + UInt64 flags; \ Field (*cast_value_util_function)(const Field &); \ String (*value_to_string_util_function)(const Field &); \ Field (*string_to_value_util_function)(const String &); \ @@ -968,8 +971,8 @@ struct DefineAliases /// NOLINTNEXTLINE #define IMPLEMENT_SETTINGS_TRAITS_(TYPE, NAME, DEFAULT, DESCRIPTION, FLAGS) \ res.field_infos.emplace_back( \ - FieldInfo{#NAME, #TYPE, DESCRIPTION, (FLAGS) & IMPORTANT, \ - static_cast((FLAGS) & BaseSettingsHelpers::Flags::OBSOLETE), \ + FieldInfo{#NAME, #TYPE, DESCRIPTION, \ + static_cast(FLAGS), \ [](const Field & value) -> Field { return static_cast(SettingField##TYPE{value}); }, \ [](const Field & value) -> String { return SettingField##TYPE{value}.toString(); }, \ [](const String & str) -> Field { SettingField##TYPE temp; temp.parseFromString(str); return static_cast(temp); }, \ diff --git a/src/Core/ServerSettings.cpp b/src/Core/ServerSettings.cpp index ead40061493..d573377fc8b 100644 --- a/src/Core/ServerSettings.cpp +++ b/src/Core/ServerSettings.cpp @@ -192,6 +192,13 @@ namespace DB DECLARE(UInt64, parts_killer_pool_size, 128, "Threads for cleanup of shared merge tree outdated threads. Only available in ClickHouse Cloud", 0) \ DECLARE(UInt64, keeper_multiread_batch_size, 10'000, "Maximum size of batch for MultiRead request to [Zoo]Keeper that support batching. If set to 0, batching is disabled. Available only in ClickHouse Cloud.", 0) \ DECLARE(Bool, use_legacy_mongodb_integration, true, "Use the legacy MongoDB integration implementation. Note: it's highly recommended to set this option to false, since legacy implementation will be removed in the future. Please submit any issues you encounter with the new implementation.", 0) \ + \ + DECLARE(UInt64, prefetch_threadpool_pool_size, 100, "Size of background pool for prefetches for remote object storages", 0) \ + DECLARE(UInt64, prefetch_threadpool_queue_size, 1000000, "Number of tasks which is possible to push into prefetches pool", 0) \ + DECLARE(UInt64, load_marks_threadpool_pool_size, 50, "Size of background pool for marks loading", 0) \ + DECLARE(UInt64, load_marks_threadpool_queue_size, 1000000, "Number of tasks which is possible to push into prefetches pool", 0) \ + DECLARE(UInt64, threadpool_writer_pool_size, 100, "Size of background pool for write requests to object storages", 0) \ + DECLARE(UInt64, threadpool_writer_queue_size, 1000000, "Number of tasks which is possible to push into background pool for write requests to object storages", 0) /// If you add a setting which can be updated at runtime, please update 'changeable_settings' map in dumpToSystemServerSettingsColumns below @@ -339,7 +346,7 @@ void ServerSettings::dumpToSystemServerSettingsColumns(ServerSettingColumnsParam res_columns[4]->insert(setting.getDescription()); res_columns[5]->insert(setting.getTypeName()); res_columns[6]->insert(is_changeable ? changeable_settings_it->second.second : ChangeableWithoutRestart::No); - res_columns[7]->insert(setting.isObsolete()); + res_columns[7]->insert(setting.getTier() == SettingsTierType::OBSOLETE); } } } diff --git a/src/Core/Settings.cpp b/src/Core/Settings.cpp index eb20a8354cb..dcb2f0ffe21 100644 --- a/src/Core/Settings.cpp +++ b/src/Core/Settings.cpp @@ -1,7 +1,5 @@ -#include #include #include -#include #include #include #include @@ -40,10 +38,17 @@ namespace ErrorCodes * Note: as an alternative, we could implement settings to be completely dynamic in the form of the map: String -> Field, * but we are not going to do it, because settings are used everywhere as static struct fields. * - * `flags` can be either 0 or IMPORTANT. - * A setting is "IMPORTANT" if it affects the results of queries and can't be ignored by older versions. + * `flags` can include a Tier (BETA | EXPERIMENTAL) and an optional bitwise AND with IMPORTANT. + * The default (0) means a PRODUCTION ready setting * - * When adding new or changing existing settings add them to the settings changes history in SettingsChangesHistory.h + * A setting is "IMPORTANT" if it affects the results of queries and can't be ignored by older versions. + * Tiers: + * EXPERIMENTAL: The feature is in active development stage. Mostly for developers or for ClickHouse enthusiasts. + * BETA: There are no known bugs problems in the functionality, but the outcome of using it together with other + * features/components is unknown and correctness is not guaranteed. + * PRODUCTION (Default): The feature is safe to use along with other features from the PRODUCTION tier. + * + * When adding new or changing existing settings add them to the settings changes history in SettingsChangesHistory.cpp * for tracking settings changes in different versions and for special `compatibility` settings to work correctly. */ @@ -437,6 +442,9 @@ Enables or disables creating a new file on each insert in azure engine tables )", 0) \ DECLARE(Bool, s3_check_objects_after_upload, false, R"( Check each uploaded object to s3 with head request to be sure that upload was successful +)", 0) \ + DECLARE(Bool, azure_check_objects_after_upload, false, R"( +Check each uploaded object in azure blob storage to be sure that upload was successful )", 0) \ DECLARE(Bool, s3_allow_parallel_part_upload, true, R"( Use multiple threads for s3 multipart upload. It may lead to slightly higher memory usage @@ -4451,9 +4459,8 @@ Optimize GROUP BY when all keys in block are constant DECLARE(Bool, legacy_column_name_of_tuple_literal, false, R"( List all names of element of large tuple literals in their column names instead of hash. This settings exists only for compatibility reasons. It makes sense to set to 'true', while doing rolling update of cluster from version lower than 21.7 to higher. )", 0) \ - DECLARE(Bool, enable_named_columns_in_function_tuple, true, R"( + DECLARE(Bool, enable_named_columns_in_function_tuple, false, R"( Generate named tuples in function tuple() when all names are unique and can be treated as unquoted identifiers. -Beware that this setting might currently result in broken queries. It's not recommended to use in production )", 0) \ \ DECLARE(Bool, query_plan_enable_optimizations, true, R"( @@ -5105,6 +5112,9 @@ Only in ClickHouse Cloud. A maximum number of unacknowledged in-flight packets i )", 0) \ DECLARE(UInt64, distributed_cache_data_packet_ack_window, DistributedCache::ACK_DATA_PACKET_WINDOW, R"( Only in ClickHouse Cloud. A window for sending ACK for DataPacket sequence in a single distributed cache read request +)", 0) \ + DECLARE(Bool, distributed_cache_discard_connection_if_unread_data, true, R"( +Only in ClickHouse Cloud. Discard connection if some data is unread. )", 0) \ \ DECLARE(Bool, parallelize_output_from_storages, true, R"( @@ -5504,90 +5514,102 @@ For testing purposes. Replaces all external table functions to Null to not initi DECLARE(Bool, restore_replace_external_dictionary_source_to_null, false, R"( Replace external dictionary sources to Null on restore. Useful for testing purposes )", 0) \ - DECLARE(Bool, create_if_not_exists, false, R"( -Enable `IF NOT EXISTS` for `CREATE` statement by default. If either this setting or `IF NOT EXISTS` is specified and a table with the provided name already exists, no exception will be thrown. -)", 0) \ - DECLARE(Bool, enforce_strict_identifier_format, false, R"( -If enabled, only allow identifiers containing alphanumeric characters and underscores. -)", 0) \ - DECLARE(Bool, mongodb_throw_on_unsupported_query, true, R"( -If enabled, MongoDB tables will return an error when a MongoDB query cannot be built. Otherwise, ClickHouse reads the full table and processes it locally. This option does not apply to the legacy implementation or when 'allow_experimental_analyzer=0'. -)", 0) \ - \ - /* ###################################### */ \ - /* ######## EXPERIMENTAL FEATURES ####### */ \ - /* ###################################### */ \ - DECLARE(Bool, allow_experimental_materialized_postgresql_table, false, R"( -Allows to use the MaterializedPostgreSQL table engine. Disabled by default, because this feature is experimental -)", 0) \ - DECLARE(Bool, allow_experimental_funnel_functions, false, R"( -Enable experimental functions for funnel analysis. -)", 0) \ - DECLARE(Bool, allow_experimental_nlp_functions, false, R"( -Enable experimental functions for natural language processing. -)", 0) \ - DECLARE(Bool, allow_experimental_hash_functions, false, R"( -Enable experimental hash functions -)", 0) \ - DECLARE(Bool, allow_experimental_object_type, false, R"( -Allow Object and JSON data types -)", 0) \ - DECLARE(Bool, allow_experimental_time_series_table, false, R"( -Allows creation of tables with the [TimeSeries](../../engines/table-engines/integrations/time-series.md) table engine. + /* Parallel replicas */ \ + DECLARE(UInt64, allow_experimental_parallel_reading_from_replicas, 0, R"( +Use up to `max_parallel_replicas` the number of replicas from each shard for SELECT query execution. Reading is parallelized and coordinated dynamically. 0 - disabled, 1 - enabled, silently disable them in case of failure, 2 - enabled, throw an exception in case of failure +)", BETA) ALIAS(enable_parallel_replicas) \ + DECLARE(NonZeroUInt64, max_parallel_replicas, 1, R"( +The maximum number of replicas for each shard when executing a query. Possible values: -- 0 — the [TimeSeries](../../engines/table-engines/integrations/time-series.md) table engine is disabled. -- 1 — the [TimeSeries](../../engines/table-engines/integrations/time-series.md) table engine is enabled. -)", 0) \ - DECLARE(Bool, allow_experimental_vector_similarity_index, false, R"( -Allow experimental vector similarity index -)", 0) \ - DECLARE(Bool, allow_experimental_variant_type, false, R"( -Allows creation of experimental [Variant](../../sql-reference/data-types/variant.md). -)", 0) \ - DECLARE(Bool, allow_experimental_dynamic_type, false, R"( -Allow Dynamic data type -)", 0) \ - DECLARE(Bool, allow_experimental_json_type, false, R"( -Allow JSON data type -)", 0) \ - DECLARE(Bool, allow_experimental_codecs, false, R"( -If it is set to true, allow to specify experimental compression codecs (but we don't have those yet and this option does nothing). -)", 0) \ - DECLARE(Bool, allow_experimental_shared_set_join, true, R"( -Only in ClickHouse Cloud. Allow to create ShareSet and SharedJoin -)", 0) \ - DECLARE(UInt64, max_limit_for_ann_queries, 1'000'000, R"( -SELECT queries with LIMIT bigger than this setting cannot use vector similarity indexes. Helps to prevent memory overflows in vector similarity indexes. -)", 0) \ - DECLARE(UInt64, hnsw_candidate_list_size_for_search, 256, R"( -The size of the dynamic candidate list when searching the vector similarity index, also known as 'ef_search'. -)", 0) \ - DECLARE(Bool, throw_on_unsupported_query_inside_transaction, true, R"( -Throw exception if unsupported query is used inside transaction -)", 0) \ - DECLARE(TransactionsWaitCSNMode, wait_changes_become_visible_after_commit_mode, TransactionsWaitCSNMode::WAIT_UNKNOWN, R"( -Wait for committed changes to become actually visible in the latest snapshot -)", 0) \ - DECLARE(Bool, implicit_transaction, false, R"( -If enabled and not already inside a transaction, wraps the query inside a full transaction (begin + commit or rollback) -)", 0) \ - DECLARE(UInt64, grace_hash_join_initial_buckets, 1, R"( -Initial number of grace hash join buckets -)", 0) \ - DECLARE(UInt64, grace_hash_join_max_buckets, 1024, R"( -Limit on the number of grace hash join buckets -)", 0) \ - DECLARE(UInt64, join_to_sort_minimum_perkey_rows, 40, R"( -The lower limit of per-key average rows in the right table to determine whether to rerange the right table by key in left or inner join. This setting ensures that the optimization is not applied for sparse table keys -)", 0) \ - DECLARE(UInt64, join_to_sort_maximum_table_rows, 10000, R"( -The maximum number of rows in the right table to determine whether to rerange the right table by key in left or inner join. -)", 0) \ - DECLARE(Bool, allow_experimental_join_right_table_sorting, false, R"( -If it is set to true, and the conditions of `join_to_sort_minimum_perkey_rows` and `join_to_sort_maximum_table_rows` are met, rerange the right table by key to improve the performance in left or inner hash join. +- Positive integer. + +**Additional Info** + +This options will produce different results depending on the settings used. + +:::note +This setting will produce incorrect results when joins or subqueries are involved, and all tables don't meet certain requirements. See [Distributed Subqueries and max_parallel_replicas](../../sql-reference/operators/in.md/#max_parallel_replica-subqueries) for more details. +::: + +### Parallel processing using `SAMPLE` key + +A query may be processed faster if it is executed on several servers in parallel. But the query performance may degrade in the following cases: + +- The position of the sampling key in the partitioning key does not allow efficient range scans. +- Adding a sampling key to the table makes filtering by other columns less efficient. +- The sampling key is an expression that is expensive to calculate. +- The cluster latency distribution has a long tail, so that querying more servers increases the query overall latency. + +### Parallel processing using [parallel_replicas_custom_key](#parallel_replicas_custom_key) + +This setting is useful for any replicated table. )", 0) \ + DECLARE(ParallelReplicasMode, parallel_replicas_mode, ParallelReplicasMode::READ_TASKS, R"( +Type of filter to use with custom key for parallel replicas. default - use modulo operation on the custom key, range - use range filter on custom key using all possible values for the value type of custom key. +)", BETA) \ + DECLARE(UInt64, parallel_replicas_count, 0, R"( +This is internal setting that should not be used directly and represents an implementation detail of the 'parallel replicas' mode. This setting will be automatically set up by the initiator server for distributed queries to the number of parallel replicas participating in query processing. +)", BETA) \ + DECLARE(UInt64, parallel_replica_offset, 0, R"( +This is internal setting that should not be used directly and represents an implementation detail of the 'parallel replicas' mode. This setting will be automatically set up by the initiator server for distributed queries to the index of the replica participating in query processing among parallel replicas. +)", BETA) \ + DECLARE(String, parallel_replicas_custom_key, "", R"( +An arbitrary integer expression that can be used to split work between replicas for a specific table. +The value can be any integer expression. + +Simple expressions using primary keys are preferred. + +If the setting is used on a cluster that consists of a single shard with multiple replicas, those replicas will be converted into virtual shards. +Otherwise, it will behave same as for `SAMPLE` key, it will use multiple replicas of each shard. +)", BETA) \ + DECLARE(UInt64, parallel_replicas_custom_key_range_lower, 0, R"( +Allows the filter type `range` to split the work evenly between replicas based on the custom range `[parallel_replicas_custom_key_range_lower, INT_MAX]`. + +When used in conjunction with [parallel_replicas_custom_key_range_upper](#parallel_replicas_custom_key_range_upper), it lets the filter evenly split the work over replicas for the range `[parallel_replicas_custom_key_range_lower, parallel_replicas_custom_key_range_upper]`. + +Note: This setting will not cause any additional data to be filtered during query processing, rather it changes the points at which the range filter breaks up the range `[0, INT_MAX]` for parallel processing. +)", BETA) \ + DECLARE(UInt64, parallel_replicas_custom_key_range_upper, 0, R"( +Allows the filter type `range` to split the work evenly between replicas based on the custom range `[0, parallel_replicas_custom_key_range_upper]`. A value of 0 disables the upper bound, setting it the max value of the custom key expression. + +When used in conjunction with [parallel_replicas_custom_key_range_lower](#parallel_replicas_custom_key_range_lower), it lets the filter evenly split the work over replicas for the range `[parallel_replicas_custom_key_range_lower, parallel_replicas_custom_key_range_upper]`. + +Note: This setting will not cause any additional data to be filtered during query processing, rather it changes the points at which the range filter breaks up the range `[0, INT_MAX]` for parallel processing +)", BETA) \ + DECLARE(String, cluster_for_parallel_replicas, "", R"( +Cluster for a shard in which current server is located +)", BETA) \ + DECLARE(Bool, parallel_replicas_allow_in_with_subquery, true, R"( +If true, subquery for IN will be executed on every follower replica. +)", BETA) \ + DECLARE(Float, parallel_replicas_single_task_marks_count_multiplier, 2, R"( +A multiplier which will be added during calculation for minimal number of marks to retrieve from coordinator. This will be applied only for remote replicas. +)", BETA) \ + DECLARE(Bool, parallel_replicas_for_non_replicated_merge_tree, false, R"( +If true, ClickHouse will use parallel replicas algorithm also for non-replicated MergeTree tables +)", BETA) \ + DECLARE(UInt64, parallel_replicas_min_number_of_rows_per_replica, 0, R"( +Limit the number of replicas used in a query to (estimated rows to read / min_number_of_rows_per_replica). The max is still limited by 'max_parallel_replicas' +)", BETA) \ + DECLARE(Bool, parallel_replicas_prefer_local_join, true, R"( +If true, and JOIN can be executed with parallel replicas algorithm, and all storages of right JOIN part are *MergeTree, local JOIN will be used instead of GLOBAL JOIN. +)", BETA) \ + DECLARE(UInt64, parallel_replicas_mark_segment_size, 0, R"( +Parts virtually divided into segments to be distributed between replicas for parallel reading. This setting controls the size of these segments. Not recommended to change until you're absolutely sure in what you're doing. Value should be in range [128; 16384] +)", BETA) \ + DECLARE(Bool, parallel_replicas_local_plan, false, R"( +Build local plan for local replica +)", BETA) \ + \ + DECLARE(Bool, allow_experimental_analyzer, true, R"( +Allow new query analyzer. +)", IMPORTANT | BETA) ALIAS(enable_analyzer) \ + DECLARE(Bool, analyzer_compatibility_join_using_top_level_identifier, false, R"( +Force to resolve identifier in JOIN USING from projection (for example, in `SELECT a + 1 AS b FROM t1 JOIN t2 USING (b)` join will be performed by `t1.a + 1 = t2.b`, rather then `t1.b = t2.b`). +)", BETA) \ + \ DECLARE(Timezone, session_timezone, "", R"( Sets the implicit time zone of the current session or query. The implicit time zone is the time zone applied to values of type DateTime/DateTime64 which have no explicitly specified time zone. @@ -5647,126 +5669,121 @@ This happens due to different parsing pipelines: **See also** - [timezone](../server-configuration-parameters/settings.md#timezone) +)", BETA) \ +DECLARE(Bool, create_if_not_exists, false, R"( +Enable `IF NOT EXISTS` for `CREATE` statement by default. If either this setting or `IF NOT EXISTS` is specified and a table with the provided name already exists, no exception will be thrown. +)", 0) \ + DECLARE(Bool, enforce_strict_identifier_format, false, R"( +If enabled, only allow identifiers containing alphanumeric characters and underscores. +)", 0) \ + DECLARE(Bool, mongodb_throw_on_unsupported_query, true, R"( +If enabled, MongoDB tables will return an error when a MongoDB query cannot be built. Otherwise, ClickHouse reads the full table and processes it locally. This option does not apply to the legacy implementation or when 'allow_experimental_analyzer=0'. +)", 0) \ + DECLARE(Bool, implicit_select, false, R"( +Allow writing simple SELECT queries without the leading SELECT keyword, which makes it simple for calculator-style usage, e.g. `1 + 2` becomes a valid query. )", 0) \ - DECLARE(Bool, use_hive_partitioning, false, R"( -When enabled, ClickHouse will detect Hive-style partitioning in path (`/name=value/`) in file-like table engines [File](../../engines/table-engines/special/file.md#hive-style-partitioning)/[S3](../../engines/table-engines/integrations/s3.md#hive-style-partitioning)/[URL](../../engines/table-engines/special/url.md#hive-style-partitioning)/[HDFS](../../engines/table-engines/integrations/hdfs.md#hive-style-partitioning)/[AzureBlobStorage](../../engines/table-engines/integrations/azureBlobStorage.md#hive-style-partitioning) and will allow to use partition columns as virtual columns in the query. These virtual columns will have the same names as in the partitioned path, but starting with `_`. -)", 0)\ \ - DECLARE(Bool, allow_statistics_optimize, false, R"( -Allows using statistics to optimize queries -)", 0) ALIAS(allow_statistic_optimize) \ - DECLARE(Bool, allow_experimental_statistics, false, R"( -Allows defining columns with [statistics](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) and [manipulate statistics](../../engines/table-engines/mergetree-family/mergetree.md#column-statistics). -)", 0) ALIAS(allow_experimental_statistic) \ \ - /* Parallel replicas */ \ - DECLARE(UInt64, allow_experimental_parallel_reading_from_replicas, 0, R"( -Use up to `max_parallel_replicas` the number of replicas from each shard for SELECT query execution. Reading is parallelized and coordinated dynamically. 0 - disabled, 1 - enabled, silently disable them in case of failure, 2 - enabled, throw an exception in case of failure -)", 0) ALIAS(enable_parallel_replicas) \ - DECLARE(NonZeroUInt64, max_parallel_replicas, 1, R"( -The maximum number of replicas for each shard when executing a query. + /* ####################################################### */ \ + /* ########### START OF EXPERIMENTAL FEATURES ############ */ \ + /* ## ADD PRODUCTION / BETA FEATURES BEFORE THIS BLOCK ## */ \ + /* ####################################################### */ \ + \ + DECLARE(Bool, allow_experimental_materialized_postgresql_table, false, R"( +Allows to use the MaterializedPostgreSQL table engine. Disabled by default, because this feature is experimental +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_funnel_functions, false, R"( +Enable experimental functions for funnel analysis. +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_nlp_functions, false, R"( +Enable experimental functions for natural language processing. +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_hash_functions, false, R"( +Enable experimental hash functions +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_object_type, false, R"( +Allow Object and JSON data types +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_time_series_table, false, R"( +Allows creation of tables with the [TimeSeries](../../engines/table-engines/integrations/time-series.md) table engine. Possible values: -- Positive integer. - -**Additional Info** - -This options will produce different results depending on the settings used. - -:::note -This setting will produce incorrect results when joins or subqueries are involved, and all tables don't meet certain requirements. See [Distributed Subqueries and max_parallel_replicas](../../sql-reference/operators/in.md/#max_parallel_replica-subqueries) for more details. -::: - -### Parallel processing using `SAMPLE` key - -A query may be processed faster if it is executed on several servers in parallel. But the query performance may degrade in the following cases: - -- The position of the sampling key in the partitioning key does not allow efficient range scans. -- Adding a sampling key to the table makes filtering by other columns less efficient. -- The sampling key is an expression that is expensive to calculate. -- The cluster latency distribution has a long tail, so that querying more servers increases the query overall latency. - -### Parallel processing using [parallel_replicas_custom_key](#parallel_replicas_custom_key) - -This setting is useful for any replicated table. -)", 0) \ - DECLARE(ParallelReplicasMode, parallel_replicas_mode, ParallelReplicasMode::READ_TASKS, R"( -Type of filter to use with custom key for parallel replicas. default - use modulo operation on the custom key, range - use range filter on custom key using all possible values for the value type of custom key. -)", 0) \ - DECLARE(UInt64, parallel_replicas_count, 0, R"( -This is internal setting that should not be used directly and represents an implementation detail of the 'parallel replicas' mode. This setting will be automatically set up by the initiator server for distributed queries to the number of parallel replicas participating in query processing. -)", 0) \ - DECLARE(UInt64, parallel_replica_offset, 0, R"( -This is internal setting that should not be used directly and represents an implementation detail of the 'parallel replicas' mode. This setting will be automatically set up by the initiator server for distributed queries to the index of the replica participating in query processing among parallel replicas. -)", 0) \ - DECLARE(String, parallel_replicas_custom_key, "", R"( -An arbitrary integer expression that can be used to split work between replicas for a specific table. -The value can be any integer expression. - -Simple expressions using primary keys are preferred. - -If the setting is used on a cluster that consists of a single shard with multiple replicas, those replicas will be converted into virtual shards. -Otherwise, it will behave same as for `SAMPLE` key, it will use multiple replicas of each shard. -)", 0) \ - DECLARE(UInt64, parallel_replicas_custom_key_range_lower, 0, R"( -Allows the filter type `range` to split the work evenly between replicas based on the custom range `[parallel_replicas_custom_key_range_lower, INT_MAX]`. - -When used in conjunction with [parallel_replicas_custom_key_range_upper](#parallel_replicas_custom_key_range_upper), it lets the filter evenly split the work over replicas for the range `[parallel_replicas_custom_key_range_lower, parallel_replicas_custom_key_range_upper]`. - -Note: This setting will not cause any additional data to be filtered during query processing, rather it changes the points at which the range filter breaks up the range `[0, INT_MAX]` for parallel processing. -)", 0) \ - DECLARE(UInt64, parallel_replicas_custom_key_range_upper, 0, R"( -Allows the filter type `range` to split the work evenly between replicas based on the custom range `[0, parallel_replicas_custom_key_range_upper]`. A value of 0 disables the upper bound, setting it the max value of the custom key expression. - -When used in conjunction with [parallel_replicas_custom_key_range_lower](#parallel_replicas_custom_key_range_lower), it lets the filter evenly split the work over replicas for the range `[parallel_replicas_custom_key_range_lower, parallel_replicas_custom_key_range_upper]`. - -Note: This setting will not cause any additional data to be filtered during query processing, rather it changes the points at which the range filter breaks up the range `[0, INT_MAX]` for parallel processing -)", 0) \ - DECLARE(String, cluster_for_parallel_replicas, "", R"( -Cluster for a shard in which current server is located -)", 0) \ - DECLARE(Bool, parallel_replicas_allow_in_with_subquery, true, R"( -If true, subquery for IN will be executed on every follower replica. -)", 0) \ - DECLARE(Float, parallel_replicas_single_task_marks_count_multiplier, 2, R"( -A multiplier which will be added during calculation for minimal number of marks to retrieve from coordinator. This will be applied only for remote replicas. -)", 0) \ - DECLARE(Bool, parallel_replicas_for_non_replicated_merge_tree, false, R"( -If true, ClickHouse will use parallel replicas algorithm also for non-replicated MergeTree tables -)", 0) \ - DECLARE(UInt64, parallel_replicas_min_number_of_rows_per_replica, 0, R"( -Limit the number of replicas used in a query to (estimated rows to read / min_number_of_rows_per_replica). The max is still limited by 'max_parallel_replicas' -)", 0) \ - DECLARE(Bool, parallel_replicas_prefer_local_join, true, R"( -If true, and JOIN can be executed with parallel replicas algorithm, and all storages of right JOIN part are *MergeTree, local JOIN will be used instead of GLOBAL JOIN. -)", 0) \ - DECLARE(UInt64, parallel_replicas_mark_segment_size, 0, R"( -Parts virtually divided into segments to be distributed between replicas for parallel reading. This setting controls the size of these segments. Not recommended to change until you're absolutely sure in what you're doing. Value should be in range [128; 16384] +- 0 — the [TimeSeries](../../engines/table-engines/integrations/time-series.md) table engine is disabled. +- 1 — the [TimeSeries](../../engines/table-engines/integrations/time-series.md) table engine is enabled. )", 0) \ + DECLARE(Bool, allow_experimental_vector_similarity_index, false, R"( +Allow experimental vector similarity index +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_variant_type, false, R"( +Allows creation of experimental [Variant](../../sql-reference/data-types/variant.md). +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_dynamic_type, false, R"( +Allow Dynamic data type +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_json_type, false, R"( +Allow JSON data type +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_codecs, false, R"( +If it is set to true, allow to specify experimental compression codecs (but we don't have those yet and this option does nothing). +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_shared_set_join, true, R"( +Only in ClickHouse Cloud. Allow to create ShareSet and SharedJoin +)", EXPERIMENTAL) \ + DECLARE(UInt64, max_limit_for_ann_queries, 1'000'000, R"( +SELECT queries with LIMIT bigger than this setting cannot use vector similarity indexes. Helps to prevent memory overflows in vector similarity indexes. +)", EXPERIMENTAL) \ + DECLARE(UInt64, hnsw_candidate_list_size_for_search, 256, R"( +The size of the dynamic candidate list when searching the vector similarity index, also known as 'ef_search'. +)", EXPERIMENTAL) \ + DECLARE(Bool, throw_on_unsupported_query_inside_transaction, true, R"( +Throw exception if unsupported query is used inside transaction +)", EXPERIMENTAL) \ + DECLARE(TransactionsWaitCSNMode, wait_changes_become_visible_after_commit_mode, TransactionsWaitCSNMode::WAIT_UNKNOWN, R"( +Wait for committed changes to become actually visible in the latest snapshot +)", EXPERIMENTAL) \ + DECLARE(Bool, implicit_transaction, false, R"( +If enabled and not already inside a transaction, wraps the query inside a full transaction (begin + commit or rollback) +)", EXPERIMENTAL) \ + DECLARE(UInt64, grace_hash_join_initial_buckets, 1, R"( +Initial number of grace hash join buckets +)", EXPERIMENTAL) \ + DECLARE(UInt64, grace_hash_join_max_buckets, 1024, R"( +Limit on the number of grace hash join buckets +)", EXPERIMENTAL) \ + DECLARE(UInt64, join_to_sort_minimum_perkey_rows, 40, R"( +The lower limit of per-key average rows in the right table to determine whether to rerange the right table by key in left or inner join. This setting ensures that the optimization is not applied for sparse table keys +)", EXPERIMENTAL) \ + DECLARE(UInt64, join_to_sort_maximum_table_rows, 10000, R"( +The maximum number of rows in the right table to determine whether to rerange the right table by key in left or inner join. +)", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_join_right_table_sorting, false, R"( +If it is set to true, and the conditions of `join_to_sort_minimum_perkey_rows` and `join_to_sort_maximum_table_rows` are met, rerange the right table by key to improve the performance in left or inner hash join. +)", EXPERIMENTAL) \ + DECLARE(Bool, use_hive_partitioning, false, R"( +When enabled, ClickHouse will detect Hive-style partitioning in path (`/name=value/`) in file-like table engines [File](../../engines/table-engines/special/file.md#hive-style-partitioning)/[S3](../../engines/table-engines/integrations/s3.md#hive-style-partitioning)/[URL](../../engines/table-engines/special/url.md#hive-style-partitioning)/[HDFS](../../engines/table-engines/integrations/hdfs.md#hive-style-partitioning)/[AzureBlobStorage](../../engines/table-engines/integrations/azureBlobStorage.md#hive-style-partitioning) and will allow to use partition columns as virtual columns in the query. These virtual columns will have the same names as in the partitioned path, but starting with `_`. +)", EXPERIMENTAL)\ + \ + DECLARE(Bool, allow_statistics_optimize, false, R"( +Allows using statistics to optimize queries +)", EXPERIMENTAL) ALIAS(allow_statistic_optimize) \ + DECLARE(Bool, allow_experimental_statistics, false, R"( +Allows defining columns with [statistics](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) and [manipulate statistics](../../engines/table-engines/mergetree-family/mergetree.md#column-statistics). +)", EXPERIMENTAL) ALIAS(allow_experimental_statistic) \ + \ DECLARE(Bool, allow_archive_path_syntax, true, R"( File/S3 engines/table function will parse paths with '::' as '\\ :: \\' if archive has correct extension -)", 0) \ - DECLARE(Bool, parallel_replicas_local_plan, false, R"( -Build local plan for local replica -)", 0) \ +)", EXPERIMENTAL) \ \ DECLARE(Bool, allow_experimental_inverted_index, false, R"( If it is set to true, allow to use experimental inverted index. -)", 0) \ +)", EXPERIMENTAL) \ DECLARE(Bool, allow_experimental_full_text_index, false, R"( If it is set to true, allow to use experimental full-text index. -)", 0) \ +)", EXPERIMENTAL) \ \ DECLARE(Bool, allow_experimental_join_condition, false, R"( Support join with inequal conditions which involve columns from both left and right table. e.g. t1.y < t2.y. -)", 0) \ - \ - DECLARE(Bool, allow_experimental_analyzer, true, R"( -Allow new query analyzer. -)", IMPORTANT) ALIAS(enable_analyzer) \ - DECLARE(Bool, analyzer_compatibility_join_using_top_level_identifier, false, R"( -Force to resolve identifier in JOIN USING from projection (for example, in `SELECT a + 1 AS b FROM t1 JOIN t2 USING (b)` join will be performed by `t1.a + 1 = t2.b`, rather then `t1.b = t2.b`). )", 0) \ \ DECLARE(Bool, allow_experimental_live_view, false, R"( @@ -5779,43 +5796,43 @@ Possible values: )", 0) \ DECLARE(Seconds, live_view_heartbeat_interval, 15, R"( The heartbeat interval in seconds to indicate live query is alive. -)", 0) \ +)", EXPERIMENTAL) \ DECLARE(UInt64, max_live_view_insert_blocks_before_refresh, 64, R"( Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed. -)", 0) \ +)", EXPERIMENTAL) \ \ DECLARE(Bool, allow_experimental_window_view, false, R"( Enable WINDOW VIEW. Not mature enough. -)", 0) \ +)", EXPERIMENTAL) \ DECLARE(Seconds, window_view_clean_interval, 60, R"( The clean interval of window view in seconds to free outdated data. -)", 0) \ +)", EXPERIMENTAL) \ DECLARE(Seconds, window_view_heartbeat_interval, 15, R"( The heartbeat interval in seconds to indicate watch query is alive. -)", 0) \ +)", EXPERIMENTAL) \ DECLARE(Seconds, wait_for_window_view_fire_signal_timeout, 10, R"( Timeout for waiting for window view fire signal in event time processing -)", 0) \ +)", EXPERIMENTAL) \ \ DECLARE(Bool, stop_refreshable_materialized_views_on_startup, false, R"( On server startup, prevent scheduling of refreshable materialized views, as if with SYSTEM STOP VIEWS. You can manually start them with SYSTEM START VIEWS or SYSTEM START VIEW \\ afterwards. Also applies to newly created views. Has no effect on non-refreshable materialized views. -)", 0) \ +)", EXPERIMENTAL) \ \ DECLARE(Bool, allow_experimental_database_materialized_mysql, false, R"( Allow to create database with Engine=MaterializedMySQL(...). -)", 0) \ +)", EXPERIMENTAL) \ DECLARE(Bool, allow_experimental_database_materialized_postgresql, false, R"( Allow to create database with Engine=MaterializedPostgreSQL(...). -)", 0) \ +)", EXPERIMENTAL) \ \ /** Experimental feature for moving data between shards. */ \ DECLARE(Bool, allow_experimental_query_deduplication, false, R"( Experimental data deduplication for SELECT queries based on part UUIDs -)", 0) \ - DECLARE(Bool, implicit_select, false, R"( -Allow writing simple SELECT queries without the leading SELECT keyword, which makes it simple for calculator-style usage, e.g. `1 + 2` becomes a valid query. -)", 0) - +)", EXPERIMENTAL) \ + \ + /* ####################################################### */ \ + /* ############ END OF EXPERIMENTAL FEATURES ############# */ \ + /* ####################################################### */ \ // End of COMMON_SETTINGS // Please add settings related to formats in Core/FormatFactorySettings.h, move obsolete settings to OBSOLETE_SETTINGS and obsolete format settings to OBSOLETE_FORMAT_SETTINGS. @@ -5894,13 +5911,14 @@ Allow writing simple SELECT queries without the leading SELECT keyword, which ma /** The section above is for obsolete settings. Do not add anything there. */ #endif /// __CLION_IDE__ - #define LIST_OF_SETTINGS(M, ALIAS) \ COMMON_SETTINGS(M, ALIAS) \ OBSOLETE_SETTINGS(M, ALIAS) \ FORMAT_FACTORY_SETTINGS(M, ALIAS) \ OBSOLETE_FORMAT_SETTINGS(M, ALIAS) \ +// clang-format on + DECLARE_SETTINGS_TRAITS_ALLOW_CUSTOM_SETTINGS(SettingsTraits, LIST_OF_SETTINGS) IMPLEMENT_SETTINGS_TRAITS(SettingsTraits, LIST_OF_SETTINGS) @@ -6008,7 +6026,7 @@ void SettingsImpl::checkNoSettingNamesAtTopLevel(const Poco::Util::AbstractConfi { const auto & name = setting.getName(); bool should_skip_check = name == "max_table_size_to_drop" || name == "max_partition_size_to_drop"; - if (config.has(name) && !setting.isObsolete() && !should_skip_check) + if (config.has(name) && (setting.getTier() != SettingsTierType::OBSOLETE) && !should_skip_check) { throw Exception(ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG, "A setting '{}' appeared at top level in config {}." " But it is user-level setting that should be located in users.xml inside section for specific profile." @@ -6184,7 +6202,7 @@ std::vector Settings::getChangedAndObsoleteNames() const std::vector setting_names; for (const auto & setting : impl->allChanged()) { - if (setting.isObsolete()) + if (setting.getTier() == SettingsTierType::OBSOLETE) setting_names.emplace_back(setting.getName()); } return setting_names; @@ -6233,7 +6251,8 @@ void Settings::dumpToSystemSettingsColumns(MutableColumnsAndConstraints & params res_columns[6]->insert(writability == SettingConstraintWritability::CONST); res_columns[7]->insert(setting.getTypeName()); res_columns[8]->insert(setting.getDefaultValueString()); - res_columns[10]->insert(setting.isObsolete()); + res_columns[10]->insert(setting.getTier() == SettingsTierType::OBSOLETE); + res_columns[11]->insert(setting.getTier()); }; const auto & settings_to_aliases = SettingsImpl::Traits::settingsToAliases(); diff --git a/src/Core/SettingsChangesHistory.cpp b/src/Core/SettingsChangesHistory.cpp index 40f7d64e667..65cf6de2300 100644 --- a/src/Core/SettingsChangesHistory.cpp +++ b/src/Core/SettingsChangesHistory.cpp @@ -65,6 +65,7 @@ static std::initializer_list +#include + +namespace DB +{ + +std::shared_ptr getSettingsTierEnum() +{ + return std::make_shared( + DataTypeEnum8::Values + { + {"Production", static_cast(SettingsTierType::PRODUCTION)}, + {"Obsolete", static_cast(SettingsTierType::OBSOLETE)}, + {"Experimental", static_cast(SettingsTierType::EXPERIMENTAL)}, + {"Beta", static_cast(SettingsTierType::BETA)} + }); +} + +} diff --git a/src/Core/SettingsTierType.h b/src/Core/SettingsTierType.h new file mode 100644 index 00000000000..d8bba89bc18 --- /dev/null +++ b/src/Core/SettingsTierType.h @@ -0,0 +1,26 @@ +#pragma once + +#include + +#include +#include + +namespace DB +{ + +template +class DataTypeEnum; +using DataTypeEnum8 = DataTypeEnum; + +// Make it signed for compatibility with DataTypeEnum8 +enum SettingsTierType : int8_t +{ + PRODUCTION = 0b0000, + OBSOLETE = 0b0100, + EXPERIMENTAL = 0b1000, + BETA = 0b1100 +}; + +std::shared_ptr getSettingsTierEnum(); + +} diff --git a/src/DataTypes/Serializations/ISerialization.cpp b/src/DataTypes/Serializations/ISerialization.cpp index fdcdf9e0cda..5a60dc30b02 100644 --- a/src/DataTypes/Serializations/ISerialization.cpp +++ b/src/DataTypes/Serializations/ISerialization.cpp @@ -161,7 +161,7 @@ String getNameForSubstreamPath( String stream_name, SubstreamIterator begin, SubstreamIterator end, - bool escape_tuple_delimiter) + bool escape_for_file_name) { using Substream = ISerialization::Substream; @@ -186,7 +186,7 @@ String getNameForSubstreamPath( /// Because nested data may be represented not by Array of Tuple, /// but by separate Array columns with names in a form of a.b, /// and name is encoded as a whole. - if (it->type == Substream::TupleElement && escape_tuple_delimiter) + if (it->type == Substream::TupleElement && escape_for_file_name) stream_name += escapeForFileName(substream_name); else stream_name += substream_name; @@ -206,7 +206,7 @@ String getNameForSubstreamPath( else if (it->type == SubstreamType::ObjectSharedData) stream_name += ".object_shared_data"; else if (it->type == SubstreamType::ObjectTypedPath || it->type == SubstreamType::ObjectDynamicPath) - stream_name += "." + it->object_path_name; + stream_name += "." + (escape_for_file_name ? escapeForFileName(it->object_path_name) : it->object_path_name); } return stream_name; @@ -434,6 +434,14 @@ bool ISerialization::isDynamicSubcolumn(const DB::ISerialization::SubstreamPath return false; } +bool ISerialization::isLowCardinalityDictionarySubcolumn(const DB::ISerialization::SubstreamPath & path) +{ + if (path.empty()) + return false; + + return path[path.size() - 1].type == SubstreamType::DictionaryKeys; +} + ISerialization::SubstreamData ISerialization::createFromPath(const SubstreamPath & path, size_t prefix_len) { assert(prefix_len <= path.size()); diff --git a/src/DataTypes/Serializations/ISerialization.h b/src/DataTypes/Serializations/ISerialization.h index 7bd58a8a981..400bdbf32d3 100644 --- a/src/DataTypes/Serializations/ISerialization.h +++ b/src/DataTypes/Serializations/ISerialization.h @@ -463,6 +463,8 @@ public: /// Returns true if stream with specified path corresponds to dynamic subcolumn. static bool isDynamicSubcolumn(const SubstreamPath & path, size_t prefix_len); + static bool isLowCardinalityDictionarySubcolumn(const SubstreamPath & path); + protected: template State * checkAndGetState(const StatePtr & state) const; diff --git a/src/DataTypes/Serializations/SerializationLowCardinality.cpp b/src/DataTypes/Serializations/SerializationLowCardinality.cpp index baaab6ba3c3..248fe2681b0 100644 --- a/src/DataTypes/Serializations/SerializationLowCardinality.cpp +++ b/src/DataTypes/Serializations/SerializationLowCardinality.cpp @@ -54,7 +54,7 @@ void SerializationLowCardinality::enumerateStreams( .withSerializationInfo(data.serialization_info); settings.path.back().data = dict_data; - dict_inner_serialization->enumerateStreams(settings, callback, dict_data); + callback(settings.path); settings.path.back() = Substream::DictionaryIndexes; settings.path.back().data = data; diff --git a/src/Disks/IO/AsynchronousBoundedReadBuffer.cpp b/src/Disks/IO/AsynchronousBoundedReadBuffer.cpp index b24b95af85c..c405d296e60 100644 --- a/src/Disks/IO/AsynchronousBoundedReadBuffer.cpp +++ b/src/Disks/IO/AsynchronousBoundedReadBuffer.cpp @@ -365,7 +365,7 @@ AsynchronousBoundedReadBuffer::~AsynchronousBoundedReadBuffer() } catch (...) { - tryLogCurrentException(__PRETTY_FUNCTION__); + tryLogCurrentException(log); } } diff --git a/src/Disks/IO/WriteBufferFromAzureBlobStorage.cpp b/src/Disks/IO/WriteBufferFromAzureBlobStorage.cpp index a9c0b26aa8d..cf88a54db86 100644 --- a/src/Disks/IO/WriteBufferFromAzureBlobStorage.cpp +++ b/src/Disks/IO/WriteBufferFromAzureBlobStorage.cpp @@ -30,6 +30,7 @@ namespace DB namespace ErrorCodes { + extern const int AZURE_BLOB_STORAGE_ERROR; extern const int LOGICAL_ERROR; } @@ -72,6 +73,7 @@ WriteBufferFromAzureBlobStorage::WriteBufferFromAzureBlobStorage( std::move(schedule_), settings_->max_inflight_parts_for_one_file, limited_log)) + , check_objects_after_upload(settings_->check_objects_after_upload) { allocateBuffer(); } @@ -178,6 +180,24 @@ void WriteBufferFromAzureBlobStorage::finalizeImpl() execWithRetry([&](){ block_blob_client.CommitBlockList(block_ids); }, max_unexpected_write_error_retries); LOG_TRACE(log, "Committed {} blocks for blob `{}`", block_ids.size(), blob_path); } + + if (check_objects_after_upload) + { + try + { + auto blob_client = blob_container_client->GetBlobClient(blob_path); + blob_client.GetProperties(); + } + catch (const Azure::Storage::StorageException & e) + { + if (e.StatusCode == Azure::Core::Http::HttpStatusCode::NotFound) + throw Exception( + ErrorCodes::AZURE_BLOB_STORAGE_ERROR, + "Object {} not uploaded to azure blob storage, it's a bug in Azure Blob Storage or its API.", + blob_path); + throw; + } + } } void WriteBufferFromAzureBlobStorage::nextImpl() diff --git a/src/Disks/IO/WriteBufferFromAzureBlobStorage.h b/src/Disks/IO/WriteBufferFromAzureBlobStorage.h index 351fb309d2d..23dd89874bc 100644 --- a/src/Disks/IO/WriteBufferFromAzureBlobStorage.h +++ b/src/Disks/IO/WriteBufferFromAzureBlobStorage.h @@ -90,6 +90,7 @@ private: size_t hidden_size = 0; std::unique_ptr task_tracker; + bool check_objects_after_upload = false; std::deque detached_part_data; }; diff --git a/src/Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageCommon.cpp b/src/Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageCommon.cpp index 931deed30ce..49355c15491 100644 --- a/src/Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageCommon.cpp +++ b/src/Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageCommon.cpp @@ -39,6 +39,7 @@ namespace Setting extern const SettingsUInt64 azure_sdk_max_retries; extern const SettingsUInt64 azure_sdk_retry_initial_backoff_ms; extern const SettingsUInt64 azure_sdk_retry_max_backoff_ms; + extern const SettingsBool azure_check_objects_after_upload; } namespace ErrorCodes @@ -352,6 +353,7 @@ std::unique_ptr getRequestSettings(const Settings & query_setti settings->sdk_max_retries = query_settings[Setting::azure_sdk_max_retries]; settings->sdk_retry_initial_backoff_ms = query_settings[Setting::azure_sdk_retry_initial_backoff_ms]; settings->sdk_retry_max_backoff_ms = query_settings[Setting::azure_sdk_retry_max_backoff_ms]; + settings->check_objects_after_upload = query_settings[Setting::azure_check_objects_after_upload]; return settings; } @@ -389,6 +391,8 @@ std::unique_ptr getRequestSettings(const Poco::Util::AbstractCo settings->sdk_retry_initial_backoff_ms = config.getUInt64(config_prefix + ".retry_initial_backoff_ms", settings_ref[Setting::azure_sdk_retry_initial_backoff_ms]); settings->sdk_retry_max_backoff_ms = config.getUInt64(config_prefix + ".retry_max_backoff_ms", settings_ref[Setting::azure_sdk_retry_max_backoff_ms]); + settings->check_objects_after_upload = config.getBool(config_prefix + ".check_objects_after_upload", settings_ref[Setting::azure_check_objects_after_upload]); + if (config.has(config_prefix + ".curl_ip_resolve")) { using CurlOptions = Azure::Core::Http::CurlTransportOptions; diff --git a/src/Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageCommon.h b/src/Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageCommon.h index bb2f0270924..4f7fd4e88cc 100644 --- a/src/Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageCommon.h +++ b/src/Disks/ObjectStorages/AzureBlobStorage/AzureBlobStorageCommon.h @@ -50,6 +50,7 @@ struct RequestSettings size_t sdk_retry_initial_backoff_ms = 10; size_t sdk_retry_max_backoff_ms = 1000; bool use_native_copy = false; + bool check_objects_after_upload = false; using CurlOptions = Azure::Core::Http::CurlTransportOptions; CurlOptions::CurlOptIPResolve curl_ip_resolve = CurlOptions::CURL_IPRESOLVE_WHATEVER; diff --git a/src/Disks/ObjectStorages/HDFS/HDFSObjectStorage.cpp b/src/Disks/ObjectStorages/HDFS/HDFSObjectStorage.cpp index 182534529ea..7698193ee2f 100644 --- a/src/Disks/ObjectStorages/HDFS/HDFSObjectStorage.cpp +++ b/src/Disks/ObjectStorages/HDFS/HDFSObjectStorage.cpp @@ -103,15 +103,15 @@ std::unique_ptr HDFSObjectStorage::writeObject( /// NOL ErrorCodes::UNSUPPORTED_METHOD, "HDFS API doesn't support custom attributes/metadata for stored objects"); - std::string path = object.remote_path; - if (path.starts_with("/")) - path = path.substr(1); - if (!path.starts_with(url)) - path = fs::path(url) / path; - + auto path = extractObjectKeyFromURL(object); /// Single O_WRONLY in libhdfs adds O_TRUNC return std::make_unique( - path, config, settings->replication, patchSettings(write_settings), buf_size, + url_without_path, + fs::path(data_directory) / path, + config, + settings->replication, + patchSettings(write_settings), + buf_size, mode == WriteMode::Rewrite ? O_WRONLY : O_WRONLY | O_APPEND); } diff --git a/src/Disks/ObjectStorages/S3/diskSettings.cpp b/src/Disks/ObjectStorages/S3/diskSettings.cpp index 1ae3730e4c7..92be835560b 100644 --- a/src/Disks/ObjectStorages/S3/diskSettings.cpp +++ b/src/Disks/ObjectStorages/S3/diskSettings.cpp @@ -177,7 +177,7 @@ std::unique_ptr getClient( auth_settings[S3AuthSetting::secret_access_key], auth_settings[S3AuthSetting::server_side_encryption_customer_key_base64], auth_settings.server_side_encryption_kms_config, - auth_settings.headers, + auth_settings.getHeaders(), credentials_configuration, auth_settings[S3AuthSetting::session_token]); } diff --git a/src/Functions/FunctionsComparison.h b/src/Functions/FunctionsComparison.h index bd6f0361307..be0875581a5 100644 --- a/src/Functions/FunctionsComparison.h +++ b/src/Functions/FunctionsComparison.h @@ -1171,7 +1171,7 @@ public: if (left_tuple && right_tuple) { - auto func = FunctionToOverloadResolverAdaptor(std::make_shared>(check_decimal_overflow)); + auto func = std::make_shared(std::make_shared>(check_decimal_overflow)); bool has_nullable = false; bool has_null = false; @@ -1181,7 +1181,7 @@ public: { ColumnsWithTypeAndName args = {{nullptr, left_tuple->getElements()[i], ""}, {nullptr, right_tuple->getElements()[i], ""}}; - auto element_type = func.build(args)->getResultType(); + auto element_type = func->build(args)->getResultType(); has_nullable = has_nullable || element_type->isNullable(); has_null = has_null || element_type->onlyNull(); } diff --git a/src/Functions/array/FunctionsMapMiscellaneous.cpp b/src/Functions/array/FunctionsMapMiscellaneous.cpp index 368c0ad620f..c3586a57161 100644 --- a/src/Functions/array/FunctionsMapMiscellaneous.cpp +++ b/src/Functions/array/FunctionsMapMiscellaneous.cpp @@ -349,19 +349,14 @@ struct MapKeyLikeAdapter } }; -struct FunctionIdentityMap : public FunctionIdentity -{ - bool useDefaultImplementationForLowCardinalityColumns() const override { return false; } -}; - struct NameMapConcat { static constexpr auto name = "mapConcat"; }; using FunctionMapConcat = FunctionMapToArrayAdapter, NameMapConcat>; struct NameMapKeys { static constexpr auto name = "mapKeys"; }; -using FunctionMapKeys = FunctionMapToArrayAdapter, NameMapKeys>; +using FunctionMapKeys = FunctionMapToArrayAdapter, NameMapKeys>; struct NameMapValues { static constexpr auto name = "mapValues"; }; -using FunctionMapValues = FunctionMapToArrayAdapter, NameMapValues>; +using FunctionMapValues = FunctionMapToArrayAdapter, NameMapValues>; struct NameMapContains { static constexpr auto name = "mapContains"; }; using FunctionMapContains = FunctionMapToArrayAdapter, MapToSubcolumnAdapter, NameMapContains>; diff --git a/src/Functions/transform.cpp b/src/Functions/transform.cpp index 45f0a7f5c17..e5445b36809 100644 --- a/src/Functions/transform.cpp +++ b/src/Functions/transform.cpp @@ -211,7 +211,7 @@ namespace ColumnsWithTypeAndName args = arguments; args[0].column = args[0].column->cloneResized(input_rows_count)->convertToFullColumnIfConst(); - auto impl = FunctionToOverloadResolverAdaptor(std::make_shared()).build(args); + auto impl = std::make_shared(std::make_shared())->build(args); return impl->execute(args, result_type, input_rows_count); } diff --git a/src/IO/Archives/createArchiveReader.cpp b/src/IO/Archives/createArchiveReader.cpp index dfa098eede0..97597cc4db7 100644 --- a/src/IO/Archives/createArchiveReader.cpp +++ b/src/IO/Archives/createArchiveReader.cpp @@ -43,7 +43,10 @@ std::shared_ptr createArchiveReader( else if (hasSupported7zExtension(path_to_archive)) { #if USE_LIBARCHIVE - return std::make_shared(path_to_archive); + if (archive_read_function) + throw Exception(ErrorCodes::CANNOT_UNPACK_ARCHIVE, "7z archive supports only local files reading"); + else + return std::make_shared(path_to_archive); #else throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "libarchive library is disabled"); #endif diff --git a/src/IO/S3/Credentials.cpp b/src/IO/S3/Credentials.cpp index a3f671e76d9..cde9a7a3662 100644 --- a/src/IO/S3/Credentials.cpp +++ b/src/IO/S3/Credentials.cpp @@ -1,3 +1,5 @@ +#include +#include #include #include @@ -693,6 +695,7 @@ S3CredentialsProviderChain::S3CredentialsProviderChain( static const char AWS_ECS_CONTAINER_CREDENTIALS_RELATIVE_URI[] = "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI"; static const char AWS_ECS_CONTAINER_CREDENTIALS_FULL_URI[] = "AWS_CONTAINER_CREDENTIALS_FULL_URI"; static const char AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN[] = "AWS_CONTAINER_AUTHORIZATION_TOKEN"; + static const char AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_PATH[] = "AWS_CONTAINER_AUTHORIZATION_TOKEN_PATH"; static const char AWS_EC2_METADATA_DISABLED[] = "AWS_EC2_METADATA_DISABLED"; /// The only difference from DefaultAWSCredentialsProviderChain::DefaultAWSCredentialsProviderChain() @@ -750,7 +753,22 @@ S3CredentialsProviderChain::S3CredentialsProviderChain( } else if (!absolute_uri.empty()) { - const auto token = Aws::Environment::GetEnv(AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN); + auto token = Aws::Environment::GetEnv(AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN); + const auto token_path = Aws::Environment::GetEnv(AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_PATH); + + if (!token_path.empty()) + { + LOG_INFO(logger, "The environment variable value {} is {}", AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_PATH, token_path); + + String token_from_file; + + ReadBufferFromFile in(token_path); + readStringUntilEOF(token_from_file, in); + Poco::trimInPlace(token_from_file); + + token = token_from_file; + } + AddProvider(std::make_shared(absolute_uri.c_str(), token.c_str())); /// DO NOT log the value of the authorization token for security purposes. diff --git a/src/IO/S3AuthSettings.cpp b/src/IO/S3AuthSettings.cpp index 799dc6692fa..5d7d4678977 100644 --- a/src/IO/S3AuthSettings.cpp +++ b/src/IO/S3AuthSettings.cpp @@ -105,7 +105,9 @@ S3AuthSettings::S3AuthSettings( } } - headers = getHTTPHeaders(config_prefix, config); + headers = getHTTPHeaders(config_prefix, config, "header"); + access_headers = getHTTPHeaders(config_prefix, config, "access_header"); + server_side_encryption_kms_config = getSSEKMSConfig(config_prefix, config); Poco::Util::AbstractConfiguration::Keys keys; @@ -119,6 +121,7 @@ S3AuthSettings::S3AuthSettings( S3AuthSettings::S3AuthSettings(const S3AuthSettings & settings) : headers(settings.headers) + , access_headers(settings.access_headers) , users(settings.users) , server_side_encryption_kms_config(settings.server_side_encryption_kms_config) , impl(std::make_unique(*settings.impl)) @@ -127,6 +130,7 @@ S3AuthSettings::S3AuthSettings(const S3AuthSettings & settings) S3AuthSettings::S3AuthSettings(S3AuthSettings && settings) noexcept : headers(std::move(settings.headers)) + , access_headers(std::move(settings.access_headers)) , users(std::move(settings.users)) , server_side_encryption_kms_config(std::move(settings.server_side_encryption_kms_config)) , impl(std::make_unique(std::move(*settings.impl))) @@ -145,6 +149,7 @@ S3AUTH_SETTINGS_SUPPORTED_TYPES(S3AuthSettings, IMPLEMENT_SETTING_SUBSCRIPT_OPER S3AuthSettings & S3AuthSettings::operator=(S3AuthSettings && settings) noexcept { headers = std::move(settings.headers); + access_headers = std::move(settings.access_headers); users = std::move(settings.users); server_side_encryption_kms_config = std::move(settings.server_side_encryption_kms_config); *impl = std::move(*settings.impl); @@ -157,6 +162,9 @@ bool S3AuthSettings::operator==(const S3AuthSettings & right) if (headers != right.headers) return false; + if (access_headers != right.access_headers) + return false; + if (users != right.users) return false; @@ -196,6 +204,9 @@ void S3AuthSettings::updateIfChanged(const S3AuthSettings & settings) if (!settings.headers.empty()) headers = settings.headers; + if (!settings.access_headers.empty()) + access_headers = settings.access_headers; + if (!settings.users.empty()) users.insert(settings.users.begin(), settings.users.end()); @@ -205,6 +216,17 @@ void S3AuthSettings::updateIfChanged(const S3AuthSettings & settings) server_side_encryption_kms_config = settings.server_side_encryption_kms_config; } +HTTPHeaderEntries S3AuthSettings::getHeaders() const +{ + bool auth_settings_is_default = !impl->isChanged("access_key_id"); + if (access_headers.empty() || !auth_settings_is_default) + return headers; + + HTTPHeaderEntries result(headers); + result.insert(result.end(), access_headers.begin(), access_headers.end()); + + return result; +} } } diff --git a/src/IO/S3AuthSettings.h b/src/IO/S3AuthSettings.h index 4026adb1e68..38f46cfeccd 100644 --- a/src/IO/S3AuthSettings.h +++ b/src/IO/S3AuthSettings.h @@ -55,8 +55,11 @@ struct S3AuthSettings bool hasUpdates(const S3AuthSettings & other) const; void updateIfChanged(const S3AuthSettings & settings); bool canBeUsedByUser(const String & user) const { return users.empty() || users.contains(user); } + HTTPHeaderEntries getHeaders() const; HTTPHeaderEntries headers; + HTTPHeaderEntries access_headers; + std::unordered_set users; ServerSideEncryptionKMSConfig server_side_encryption_kms_config; diff --git a/src/IO/S3Common.cpp b/src/IO/S3Common.cpp index 5c1ee6ccc78..f12de6a7b54 100644 --- a/src/IO/S3Common.cpp +++ b/src/IO/S3Common.cpp @@ -74,14 +74,14 @@ namespace ErrorCodes namespace S3 { -HTTPHeaderEntries getHTTPHeaders(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config) +HTTPHeaderEntries getHTTPHeaders(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config, const std::string header_key) { HTTPHeaderEntries headers; Poco::Util::AbstractConfiguration::Keys subconfig_keys; config.keys(config_elem, subconfig_keys); for (const std::string & subkey : subconfig_keys) { - if (subkey.starts_with("header")) + if (subkey.starts_with(header_key)) { auto header_str = config.getString(config_elem + "." + subkey); auto delimiter = header_str.find(':'); diff --git a/src/IO/S3Common.h b/src/IO/S3Common.h index 1e40108b09f..22b590dcb18 100644 --- a/src/IO/S3Common.h +++ b/src/IO/S3Common.h @@ -69,7 +69,7 @@ struct ProxyConfigurationResolver; namespace S3 { -HTTPHeaderEntries getHTTPHeaders(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config); +HTTPHeaderEntries getHTTPHeaders(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config, std::string header_key = "header"); ServerSideEncryptionKMSConfig getSSEKMSConfig(const std::string & config_elem, const Poco::Util::AbstractConfiguration & config); } diff --git a/src/Interpreters/Access/InterpreterCreateUserQuery.cpp b/src/Interpreters/Access/InterpreterCreateUserQuery.cpp index d1d41a45793..f2e65ca4a10 100644 --- a/src/Interpreters/Access/InterpreterCreateUserQuery.cpp +++ b/src/Interpreters/Access/InterpreterCreateUserQuery.cpp @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -44,7 +45,7 @@ namespace const std::optional & override_default_roles, const std::optional & override_settings, const std::optional & override_grantees, - const std::optional & valid_until, + const std::optional & global_valid_until, bool reset_authentication_methods, bool replace_authentication_methods, bool allow_implicit_no_password, @@ -105,12 +106,20 @@ namespace user.authentication_methods.emplace_back(authentication_method); } - bool has_no_password_authentication_method = std::find_if(user.authentication_methods.begin(), - user.authentication_methods.end(), - [](const AuthenticationData & authentication_data) - { - return authentication_data.getType() == AuthenticationType::NO_PASSWORD; - }) != user.authentication_methods.end(); + bool has_no_password_authentication_method = false; + + for (auto & authentication_method : user.authentication_methods) + { + if (global_valid_until) + { + authentication_method.setValidUntil(*global_valid_until); + } + + if (authentication_method.getType() == AuthenticationType::NO_PASSWORD) + { + has_no_password_authentication_method = true; + } + } if (has_no_password_authentication_method && user.authentication_methods.size() > 1) { @@ -133,9 +142,6 @@ namespace } } - if (valid_until) - user.valid_until = *valid_until; - if (override_name && !override_name->host_pattern.empty()) { user.allowed_client_hosts = AllowedClientHosts{}; @@ -175,34 +181,6 @@ namespace else if (query.grantees) user.grantees = *query.grantees; } - - time_t getValidUntilFromAST(ASTPtr valid_until, ContextPtr context) - { - if (context) - valid_until = evaluateConstantExpressionAsLiteral(valid_until, context); - - const String valid_until_str = checkAndGetLiteralArgument(valid_until, "valid_until"); - - if (valid_until_str == "infinity") - return 0; - - time_t time = 0; - ReadBufferFromString in(valid_until_str); - - if (context) - { - const auto & time_zone = DateLUT::instance(""); - const auto & utc_time_zone = DateLUT::instance("UTC"); - - parseDateTimeBestEffort(time, in, time_zone, utc_time_zone); - } - else - { - readDateTimeText(time, in); - } - - return time; - } } BlockIO InterpreterCreateUserQuery::execute() @@ -226,9 +204,9 @@ BlockIO InterpreterCreateUserQuery::execute() } } - std::optional valid_until; - if (query.valid_until) - valid_until = getValidUntilFromAST(query.valid_until, getContext()); + std::optional global_valid_until; + if (query.global_valid_until) + global_valid_until = getValidUntilFromAST(query.global_valid_until, getContext()); std::optional default_roles_from_query; if (query.default_roles) @@ -274,7 +252,7 @@ BlockIO InterpreterCreateUserQuery::execute() auto updated_user = typeid_cast>(entity->clone()); updateUserFromQueryImpl( *updated_user, query, authentication_methods, {}, default_roles_from_query, settings_from_query, grantees_from_query, - valid_until, query.reset_authentication_methods_to_new, query.replace_authentication_methods, + global_valid_until, query.reset_authentication_methods_to_new, query.replace_authentication_methods, implicit_no_password_allowed, no_password_allowed, plaintext_password_allowed, getContext()->getServerSettings()[ServerSetting::max_authentication_methods_per_user]); return updated_user; @@ -296,7 +274,7 @@ BlockIO InterpreterCreateUserQuery::execute() auto new_user = std::make_shared(); updateUserFromQueryImpl( *new_user, query, authentication_methods, name, default_roles_from_query, settings_from_query, RolesOrUsersSet::AllTag{}, - valid_until, query.reset_authentication_methods_to_new, query.replace_authentication_methods, + global_valid_until, query.reset_authentication_methods_to_new, query.replace_authentication_methods, implicit_no_password_allowed, no_password_allowed, plaintext_password_allowed, getContext()->getServerSettings()[ServerSetting::max_authentication_methods_per_user]); new_users.emplace_back(std::move(new_user)); @@ -351,9 +329,9 @@ void InterpreterCreateUserQuery::updateUserFromQuery( } } - std::optional valid_until; - if (query.valid_until) - valid_until = getValidUntilFromAST(query.valid_until, {}); + std::optional global_valid_until; + if (query.global_valid_until) + global_valid_until = getValidUntilFromAST(query.global_valid_until, {}); updateUserFromQueryImpl( user, @@ -363,7 +341,7 @@ void InterpreterCreateUserQuery::updateUserFromQuery( {}, {}, {}, - valid_until, + global_valid_until, query.reset_authentication_methods_to_new, query.replace_authentication_methods, allow_no_password, diff --git a/src/Interpreters/Access/InterpreterShowCreateAccessEntityQuery.cpp b/src/Interpreters/Access/InterpreterShowCreateAccessEntityQuery.cpp index ef6ddf1866d..8b7cef056ed 100644 --- a/src/Interpreters/Access/InterpreterShowCreateAccessEntityQuery.cpp +++ b/src/Interpreters/Access/InterpreterShowCreateAccessEntityQuery.cpp @@ -69,13 +69,6 @@ namespace query->authentication_methods.push_back(authentication_method.toAST()); } - if (user.valid_until) - { - WriteBufferFromOwnString out; - writeDateTimeText(user.valid_until, out); - query->valid_until = std::make_shared(out.str()); - } - if (!user.settings.empty()) { if (attach_mode) diff --git a/src/Interpreters/Access/getValidUntilFromAST.cpp b/src/Interpreters/Access/getValidUntilFromAST.cpp new file mode 100644 index 00000000000..caf831e61ee --- /dev/null +++ b/src/Interpreters/Access/getValidUntilFromAST.cpp @@ -0,0 +1,37 @@ +#include +#include +#include +#include +#include +#include + +namespace DB +{ + time_t getValidUntilFromAST(ASTPtr valid_until, ContextPtr context) + { + if (context) + valid_until = evaluateConstantExpressionAsLiteral(valid_until, context); + + const String valid_until_str = checkAndGetLiteralArgument(valid_until, "valid_until"); + + if (valid_until_str == "infinity") + return 0; + + time_t time = 0; + ReadBufferFromString in(valid_until_str); + + if (context) + { + const auto & time_zone = DateLUT::instance(""); + const auto & utc_time_zone = DateLUT::instance("UTC"); + + parseDateTimeBestEffort(time, in, time_zone, utc_time_zone); + } + else + { + readDateTimeText(time, in); + } + + return time; + } +} diff --git a/src/Interpreters/Access/getValidUntilFromAST.h b/src/Interpreters/Access/getValidUntilFromAST.h new file mode 100644 index 00000000000..ab0c6c8c9b6 --- /dev/null +++ b/src/Interpreters/Access/getValidUntilFromAST.h @@ -0,0 +1,9 @@ +#pragma once + +#include +#include + +namespace DB +{ + time_t getValidUntilFromAST(ASTPtr valid_until, ContextPtr context); +} diff --git a/src/Interpreters/AsynchronousInsertQueue.cpp b/src/Interpreters/AsynchronousInsertQueue.cpp index 359aae3e72c..1a2efa2461f 100644 --- a/src/Interpreters/AsynchronousInsertQueue.cpp +++ b/src/Interpreters/AsynchronousInsertQueue.cpp @@ -1120,6 +1120,13 @@ Chunk AsynchronousInsertQueue::processPreprocessedEntries( "Expected entry with data kind Preprocessed. Got: {}", entry->chunk.getDataKind()); Block block_to_insert = *block; + if (block_to_insert.rows() == 0) + { + add_to_async_insert_log(entry, /*parsing_exception=*/ "", block_to_insert.rows(), block_to_insert.bytes()); + entry->resetChunk(); + continue; + } + if (!isCompatibleHeader(block_to_insert, header)) convertBlockToHeader(block_to_insert, header); diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index fbf0cbd0eb7..4f82ed7b046 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -273,6 +273,13 @@ namespace ServerSetting extern const ServerSettingsUInt64 max_replicated_sends_network_bandwidth_for_server; extern const ServerSettingsUInt64 tables_loader_background_pool_size; extern const ServerSettingsUInt64 tables_loader_foreground_pool_size; + extern const ServerSettingsUInt64 prefetch_threadpool_pool_size; + extern const ServerSettingsUInt64 prefetch_threadpool_queue_size; + extern const ServerSettingsUInt64 load_marks_threadpool_pool_size; + extern const ServerSettingsUInt64 load_marks_threadpool_queue_size; + extern const ServerSettingsUInt64 threadpool_writer_pool_size; + extern const ServerSettingsUInt64 threadpool_writer_queue_size; + } namespace ErrorCodes @@ -3215,9 +3222,8 @@ void Context::clearMarkCache() const ThreadPool & Context::getLoadMarksThreadpool() const { callOnce(shared->load_marks_threadpool_initialized, [&] { - const auto & config = getConfigRef(); - auto pool_size = config.getUInt(".load_marks_threadpool_pool_size", 50); - auto queue_size = config.getUInt(".load_marks_threadpool_queue_size", 1000000); + auto pool_size = shared->server_settings[ServerSetting::load_marks_threadpool_pool_size]; + auto queue_size = shared->server_settings[ServerSetting::load_marks_threadpool_queue_size]; shared->load_marks_threadpool = std::make_unique( CurrentMetrics::MarksLoaderThreads, CurrentMetrics::MarksLoaderThreadsActive, CurrentMetrics::MarksLoaderThreadsScheduled, pool_size, pool_size, queue_size); }); @@ -3410,9 +3416,9 @@ AsynchronousMetrics * Context::getAsynchronousMetrics() const ThreadPool & Context::getPrefetchThreadpool() const { callOnce(shared->prefetch_threadpool_initialized, [&] { - const auto & config = getConfigRef(); - auto pool_size = config.getUInt(".prefetch_threadpool_pool_size", 100); - auto queue_size = config.getUInt(".prefetch_threadpool_queue_size", 1000000); + auto pool_size = shared->server_settings[ServerSetting::prefetch_threadpool_pool_size]; + auto queue_size = shared->server_settings[ServerSetting::prefetch_threadpool_queue_size]; + shared->prefetch_threadpool = std::make_unique( CurrentMetrics::IOPrefetchThreads, CurrentMetrics::IOPrefetchThreadsActive, CurrentMetrics::IOPrefetchThreadsScheduled, pool_size, pool_size, queue_size); }); @@ -3422,8 +3428,7 @@ ThreadPool & Context::getPrefetchThreadpool() const size_t Context::getPrefetchThreadpoolSize() const { - const auto & config = getConfigRef(); - return config.getUInt(".prefetch_threadpool_pool_size", 100); + return shared->server_settings[ServerSetting::prefetch_threadpool_pool_size]; } ThreadPool & Context::getBuildVectorSimilarityIndexThreadPool() const @@ -5696,9 +5701,8 @@ IOUringReader & Context::getIOUringReader() const ThreadPool & Context::getThreadPoolWriter() const { callOnce(shared->threadpool_writer_initialized, [&] { - const auto & config = getConfigRef(); - auto pool_size = config.getUInt(".threadpool_writer_pool_size", 100); - auto queue_size = config.getUInt(".threadpool_writer_queue_size", 1000000); + auto pool_size = shared->server_settings[ServerSetting::threadpool_writer_pool_size]; + auto queue_size = shared->server_settings[ServerSetting::threadpool_writer_queue_size]; shared->threadpool_writer = std::make_unique( CurrentMetrics::IOWriterThreads, CurrentMetrics::IOWriterThreadsActive, CurrentMetrics::IOWriterThreadsScheduled, pool_size, pool_size, queue_size); diff --git a/src/Interpreters/InterpreterSystemQuery.cpp b/src/Interpreters/InterpreterSystemQuery.cpp index f877b74c5ff..45636ab40b9 100644 --- a/src/Interpreters/InterpreterSystemQuery.cpp +++ b/src/Interpreters/InterpreterSystemQuery.cpp @@ -1025,7 +1025,7 @@ void InterpreterSystemQuery::dropReplica(ASTSystemQuery & query) { ReplicatedTableStatus status; storage_replicated->getStatus(status); - if (status.zookeeper_info.path == query.replica_zk_path) + if (status.replica_path == remote_replica_path) throw Exception(ErrorCodes::TABLE_WAS_NOT_DROPPED, "There is a local table {}, which has the same table path in ZooKeeper. " "Please check the path in query. " diff --git a/src/Interpreters/MetricLog.cpp b/src/Interpreters/MetricLog.cpp index 16a88b976ba..d0d799ea693 100644 --- a/src/Interpreters/MetricLog.cpp +++ b/src/Interpreters/MetricLog.cpp @@ -70,6 +70,15 @@ void MetricLog::stepFunction(const std::chrono::system_clock::time_point current { const ProfileEvents::Count new_value = ProfileEvents::global_counters[i].load(std::memory_order_relaxed); auto & old_value = prev_profile_events[i]; + + /// Profile event counters are supposed to be monotonic. However, at least the `NetworkReceiveBytes` can be inaccurate. + /// So, since in the future the counter should always have a bigger value than in the past, we skip this event. + /// It can be reproduced with the following integration tests: + /// - test_hedged_requests/test.py::test_receive_timeout2 + /// - test_secure_socket::test + if (new_value < old_value) + continue; + elem.profile_events[i] = new_value - old_value; old_value = new_value; } diff --git a/src/Interpreters/ProcessList.cpp b/src/Interpreters/ProcessList.cpp index 177468f1c8b..21c30a60617 100644 --- a/src/Interpreters/ProcessList.cpp +++ b/src/Interpreters/ProcessList.cpp @@ -276,7 +276,7 @@ ProcessList::insert(const String & query_, const IAST * ast, ContextMutablePtr q thread_group->performance_counters.setTraceProfileEvents(settings[Setting::trace_profile_events]); } - thread_group->memory_tracker.setDescription("(for query)"); + thread_group->memory_tracker.setDescription("Query"); if (settings[Setting::memory_tracker_fault_probability] > 0.0) thread_group->memory_tracker.setFaultProbability(settings[Setting::memory_tracker_fault_probability]); @@ -311,7 +311,7 @@ ProcessList::insert(const String & query_, const IAST * ast, ContextMutablePtr q /// Track memory usage for all simultaneously running queries from single user. user_process_list.user_memory_tracker.setOrRaiseHardLimit(settings[Setting::max_memory_usage_for_user]); user_process_list.user_memory_tracker.setSoftLimit(settings[Setting::memory_overcommit_ratio_denominator_for_user]); - user_process_list.user_memory_tracker.setDescription("(for user)"); + user_process_list.user_memory_tracker.setDescription("User"); if (!total_network_throttler && settings[Setting::max_network_bandwidth_for_all_users]) { diff --git a/src/Interpreters/QueryMetricLog.cpp b/src/Interpreters/QueryMetricLog.cpp index fea2024d3e4..5ab3fe590e0 100644 --- a/src/Interpreters/QueryMetricLog.cpp +++ b/src/Interpreters/QueryMetricLog.cpp @@ -15,6 +15,7 @@ #include #include +#include #include @@ -86,11 +87,11 @@ void QueryMetricLog::shutdown() Base::shutdown(); } -void QueryMetricLog::startQuery(const String & query_id, TimePoint query_start_time, UInt64 interval_milliseconds) +void QueryMetricLog::startQuery(const String & query_id, TimePoint start_time, UInt64 interval_milliseconds) { QueryMetricLogStatus status; status.interval_milliseconds = interval_milliseconds; - status.next_collect_time = query_start_time + std::chrono::milliseconds(interval_milliseconds); + status.next_collect_time = start_time + std::chrono::milliseconds(interval_milliseconds); auto context = getContext(); const auto & process_list = context->getProcessList(); @@ -99,24 +100,21 @@ void QueryMetricLog::startQuery(const String & query_id, TimePoint query_start_t const auto query_info = process_list.getQueryInfo(query_id, false, true, false); if (!query_info) { - LOG_TRACE(logger, "Query {} is not running anymore, so we couldn't get its QueryInfo", query_id); + LOG_TRACE(logger, "Query {} is not running anymore, so we couldn't get its QueryStatusInfo", query_id); return; } auto elem = createLogMetricElement(query_id, *query_info, current_time); if (elem) add(std::move(elem.value())); - else - LOG_TRACE(logger, "Query {} finished already while this collecting task was running", query_id); }); - status.task->scheduleAfter(interval_milliseconds); - std::lock_guard lock(queries_mutex); + status.task->scheduleAfter(interval_milliseconds); queries.emplace(query_id, std::move(status)); } -void QueryMetricLog::finishQuery(const String & query_id, QueryStatusInfoPtr query_info) +void QueryMetricLog::finishQuery(const String & query_id, TimePoint finish_time, QueryStatusInfoPtr query_info) { std::unique_lock lock(queries_mutex); auto it = queries.find(query_id); @@ -128,7 +126,7 @@ void QueryMetricLog::finishQuery(const String & query_id, QueryStatusInfoPtr que if (query_info) { - auto elem = createLogMetricElement(query_id, *query_info, std::chrono::system_clock::now(), false); + auto elem = createLogMetricElement(query_id, *query_info, finish_time, false); if (elem) add(std::move(elem.value())); } @@ -137,57 +135,84 @@ void QueryMetricLog::finishQuery(const String & query_id, QueryStatusInfoPtr que /// deactivating the task, which happens automatically on its destructor. Thus, we cannot /// deactivate/destroy the task while it's running. Now, the task locks `queries_mutex` to /// prevent concurrent edition of the queries. In short, the mutex order is: exec_mutex -> - /// queries_mutex. Thus, to prevent a deadblock we need to make sure that we always lock them in + /// queries_mutex. So, to prevent a deadblock we need to make sure that we always lock them in /// that order. { - /// Take ownership of the task so that we can destroy it in this scope after unlocking `queries_lock`. + /// Take ownership of the task so that we can destroy it in this scope after unlocking `queries_mutex`. auto task = std::move(it->second.task); /// Build an empty task for the old task to make sure it does not lock any mutex on its destruction. it->second.task = {}; + queries.erase(query_id); + /// Ensure `queries_mutex` is unlocked before calling task's destructor at the end of this /// scope which will lock `exec_mutex`. lock.unlock(); } - - lock.lock(); - queries.erase(query_id); } -std::optional QueryMetricLog::createLogMetricElement(const String & query_id, const QueryStatusInfo & query_info, TimePoint current_time, bool schedule_next) +std::optional QueryMetricLog::createLogMetricElement(const String & query_id, const QueryStatusInfo & query_info, TimePoint query_info_time, bool schedule_next) { - LOG_DEBUG(logger, "Collecting query_metric_log for query {}. Schedule next: {}", query_id, schedule_next); - std::lock_guard lock(queries_mutex); + /// fmtlib supports subsecond formatting in 10.0.0. We're in 9.1.0, so we need to add the milliseconds ourselves. + auto seconds = std::chrono::time_point_cast(query_info_time); + auto microseconds = std::chrono::duration_cast(query_info_time - seconds).count(); + LOG_DEBUG(logger, "Collecting query_metric_log for query {} with QueryStatusInfo from {:%Y.%m.%d %H:%M:%S}.{:06}. Schedule next: {}", query_id, seconds, microseconds, schedule_next); + + std::unique_lock lock(queries_mutex); auto query_status_it = queries.find(query_id); /// The query might have finished while the scheduled task is running. if (query_status_it == queries.end()) + { + lock.unlock(); + LOG_TRACE(logger, "Query {} finished already while this collecting task was running", query_id); return {}; + } + + auto & query_status = query_status_it->second; + if (query_info_time <= query_status.last_collect_time) + { + lock.unlock(); + LOG_TRACE(logger, "Query {} has a more recent metrics collected. Skipping this one", query_id); + return {}; + } + + query_status.last_collect_time = query_info_time; QueryMetricLogElement elem; - elem.event_time = timeInSeconds(current_time); - elem.event_time_microseconds = timeInMicroseconds(current_time); + elem.event_time = timeInSeconds(query_info_time); + elem.event_time_microseconds = timeInMicroseconds(query_info_time); elem.query_id = query_status_it->first; elem.memory_usage = query_info.memory_usage > 0 ? query_info.memory_usage : 0; elem.peak_memory_usage = query_info.peak_memory_usage > 0 ? query_info.peak_memory_usage : 0; - auto & query_status = query_status_it->second; if (query_info.profile_counters) { for (ProfileEvents::Event i = ProfileEvents::Event(0), end = ProfileEvents::end(); i < end; ++i) { const auto & new_value = (*(query_info.profile_counters))[i]; - elem.profile_events[i] = new_value - query_status.last_profile_events[i]; - query_status.last_profile_events[i] = new_value; + auto & old_value = query_status.last_profile_events[i]; + + /// Profile event counters are supposed to be monotonic. However, at least the `NetworkReceiveBytes` can be inaccurate. + /// So, since in the future the counter should always have a bigger value than in the past, we skip this event. + /// It can be reproduced with the following integration tests: + /// - test_hedged_requests/test.py::test_receive_timeout2 + /// - test_secure_socket::test + if (new_value < old_value) + continue; + + elem.profile_events[i] = new_value - old_value; + old_value = new_value; } } else { - elem.profile_events = query_status.last_profile_events; + LOG_TRACE(logger, "Query {} has no profile counters", query_id); + elem.profile_events = std::vector(ProfileEvents::end()); } - if (query_status.task && schedule_next) + if (schedule_next) { query_status.next_collect_time += std::chrono::milliseconds(query_status.interval_milliseconds); const auto wait_time = std::chrono::duration_cast(query_status.next_collect_time - std::chrono::system_clock::now()).count(); diff --git a/src/Interpreters/QueryMetricLog.h b/src/Interpreters/QueryMetricLog.h index d7642bf0ab1..802cee7bf26 100644 --- a/src/Interpreters/QueryMetricLog.h +++ b/src/Interpreters/QueryMetricLog.h @@ -37,6 +37,7 @@ struct QueryMetricLogElement struct QueryMetricLogStatus { UInt64 interval_milliseconds; + std::chrono::system_clock::time_point last_collect_time; std::chrono::system_clock::time_point next_collect_time; std::vector last_profile_events = std::vector(ProfileEvents::end()); BackgroundSchedulePool::TaskHolder task; @@ -52,11 +53,11 @@ public: void shutdown() final; // Both startQuery and finishQuery are called from the thread that executes the query - void startQuery(const String & query_id, TimePoint query_start_time, UInt64 interval_milliseconds); - void finishQuery(const String & query_id, QueryStatusInfoPtr query_info = nullptr); + void startQuery(const String & query_id, TimePoint start_time, UInt64 interval_milliseconds); + void finishQuery(const String & query_id, TimePoint finish_time, QueryStatusInfoPtr query_info = nullptr); private: - std::optional createLogMetricElement(const String & query_id, const QueryStatusInfo & query_info, TimePoint current_time, bool schedule_next = true); + std::optional createLogMetricElement(const String & query_id, const QueryStatusInfo & query_info, TimePoint query_info_time, bool schedule_next = true); std::recursive_mutex queries_mutex; std::unordered_map queries; diff --git a/src/Interpreters/Session.cpp b/src/Interpreters/Session.cpp index c1286e9ac3e..bc6555af595 100644 --- a/src/Interpreters/Session.cpp +++ b/src/Interpreters/Session.cpp @@ -383,12 +383,12 @@ void Session::authenticate(const Credentials & credentials_, const Poco::Net::So void Session::checkIfUserIsStillValid() { - if (user && user->valid_until) + if (const auto valid_until = user_authenticated_with.getValidUntil()) { const time_t now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); - if (now > user->valid_until) - throw Exception(ErrorCodes::USER_EXPIRED, "User expired"); + if (now > valid_until) + throw Exception(ErrorCodes::USER_EXPIRED, "Authentication method used has expired"); } } diff --git a/src/Interpreters/Set.cpp b/src/Interpreters/Set.cpp index bf6be4c0349..c6f0455652a 100644 --- a/src/Interpreters/Set.cpp +++ b/src/Interpreters/Set.cpp @@ -6,7 +6,9 @@ #include #include +#include +#include #include #include @@ -278,6 +280,108 @@ void Set::checkIsCreated() const throw Exception(ErrorCodes::LOGICAL_ERROR, "Trying to use set before it has been built."); } +ColumnUInt8::Ptr checkDateTimePrecision(const ColumnWithTypeAndName & column_to_cast) +{ + // Handle nullable columns + const ColumnNullable * original_nullable_column = typeid_cast(column_to_cast.column.get()); + const IColumn * original_nested_column = original_nullable_column + ? &original_nullable_column->getNestedColumn() + : column_to_cast.column.get(); + + // Check if the original column is of ColumnDecimal type + const auto * original_decimal_column = typeid_cast *>(original_nested_column); + if (!original_decimal_column) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Expected ColumnDecimal for DateTime64"); + + // Get the data array from the original column + const auto & original_data = original_decimal_column->getData(); + size_t vec_res_size = original_data.size(); + + // Prepare the precision null map + auto precision_null_map_column = ColumnUInt8::create(vec_res_size, 0); + NullMap & precision_null_map = precision_null_map_column->getData(); + + // Determine which rows should be null based on precision loss + const auto * datetime64_type = assert_cast(column_to_cast.type.get()); + auto scale = datetime64_type->getScale(); + if (scale >= 1) + { + Int64 scale_multiplier = common::exp10_i32(scale); + for (size_t row = 0; row < vec_res_size; ++row) + { + Int64 value = original_data[row]; + if (value % scale_multiplier != 0) + precision_null_map[row] = 1; // Mark as null due to precision loss + else + precision_null_map[row] = 0; + } + } + + return precision_null_map_column; +} + +ColumnPtr mergeNullMaps(const ColumnPtr & null_map_column1, const ColumnUInt8::Ptr & null_map_column2) +{ + if (!null_map_column1) + return null_map_column2; + if (!null_map_column2) + return null_map_column1; + + const auto & null_map1 = assert_cast(*null_map_column1).getData(); + const auto & null_map2 = (*null_map_column2).getData(); + + size_t size = null_map1.size(); + if (size != null_map2.size()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Null maps have different sizes"); + + auto merged_null_map_column = ColumnUInt8::create(size); + auto & merged_null_map = merged_null_map_column->getData(); + + for (size_t i = 0; i < size; ++i) + merged_null_map[i] = null_map1[i] || null_map2[i]; + + return merged_null_map_column; +} + +void Set::processDateTime64Column( + const ColumnWithTypeAndName & column_to_cast, + ColumnPtr & result, + ColumnPtr & null_map_holder, + ConstNullMapPtr & null_map) const +{ + // Check for sub-second precision and create a null map + ColumnUInt8::Ptr filtered_null_map_column = checkDateTimePrecision(column_to_cast); + + // Extract existing null map and nested column from the result + const ColumnNullable * result_nullable_column = typeid_cast(result.get()); + const IColumn * nested_result_column = result_nullable_column + ? &result_nullable_column->getNestedColumn() + : result.get(); + + ColumnPtr existing_null_map_column = result_nullable_column + ? result_nullable_column->getNullMapColumnPtr() + : nullptr; + + if (transform_null_in) + { + if (!null_map_holder) + null_map_holder = filtered_null_map_column; + else + null_map_holder = mergeNullMaps(null_map_holder, filtered_null_map_column); + + const ColumnUInt8 * null_map_column = checkAndGetColumn(null_map_holder.get()); + if (!null_map_column) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Null map must be ColumnUInt8"); + + null_map = &null_map_column->getData(); + } + else + { + ColumnPtr merged_null_map_column = mergeNullMaps(existing_null_map_column, filtered_null_map_column); + result = ColumnNullable::create(nested_result_column->getPtr(), merged_null_map_column); + } +} + ColumnPtr Set::execute(const ColumnsWithTypeAndName & columns, bool negative) const { size_t num_key_columns = columns.size(); @@ -314,6 +418,10 @@ ColumnPtr Set::execute(const ColumnsWithTypeAndName & columns, bool negative) co Columns materialized_columns; materialized_columns.reserve(num_key_columns); + /// We will check existence in Set only for keys whose components do not contain any NULL value. + ConstNullMapPtr null_map{}; + ColumnPtr null_map_holder; + for (size_t i = 0; i < num_key_columns; ++i) { ColumnPtr result; @@ -331,13 +439,17 @@ ColumnPtr Set::execute(const ColumnsWithTypeAndName & columns, bool negative) co result = castColumnAccurate(column_to_cast, data_types[i], cast_cache.get()); } - materialized_columns.emplace_back() = result; - key_columns.emplace_back() = materialized_columns.back().get(); + // If the original column is DateTime64, check for sub-second precision + if (isDateTime64(column_to_cast.column->getDataType())) + { + processDateTime64Column(column_to_cast, result, null_map_holder, null_map); + } + + // Append the result to materialized columns + materialized_columns.emplace_back(std::move(result)); + key_columns.emplace_back(materialized_columns.back().get()); } - /// We will check existence in Set only for keys whose components do not contain any NULL value. - ConstNullMapPtr null_map{}; - ColumnPtr null_map_holder; if (!transform_null_in) null_map_holder = extractNestedColumnsAndNullMap(key_columns, null_map); diff --git a/src/Interpreters/Set.h b/src/Interpreters/Set.h index 8a821d87dfb..240d651352d 100644 --- a/src/Interpreters/Set.h +++ b/src/Interpreters/Set.h @@ -61,6 +61,8 @@ public: void checkIsCreated() const; + void processDateTime64Column(const ColumnWithTypeAndName & column_to_cast, ColumnPtr & result, ColumnPtr & null_map_holder, ConstNullMapPtr & null_map) const; + /** For columns of 'block', check belonging of corresponding rows to the set. * Return UInt8 column with the result. */ diff --git a/src/Interpreters/ThreadStatusExt.cpp b/src/Interpreters/ThreadStatusExt.cpp index 0544bbcc92e..4d27a840d51 100644 --- a/src/Interpreters/ThreadStatusExt.cpp +++ b/src/Interpreters/ThreadStatusExt.cpp @@ -119,7 +119,7 @@ void ThreadGroup::unlinkThread() ThreadGroupPtr ThreadGroup::createForQuery(ContextPtr query_context_, std::function fatal_error_callback_) { auto group = std::make_shared(query_context_, std::move(fatal_error_callback_)); - group->memory_tracker.setDescription("(for query)"); + group->memory_tracker.setDescription("Query"); return group; } @@ -127,7 +127,7 @@ ThreadGroupPtr ThreadGroup::createForBackgroundProcess(ContextPtr storage_contex { auto group = std::make_shared(storage_context); - group->memory_tracker.setDescription("background process to apply mutate/merge in table"); + group->memory_tracker.setDescription("Background process (mutate/merge)"); /// However settings from storage context have to be applied const Settings & settings = storage_context->getSettingsRef(); group->memory_tracker.setProfilerStep(settings[Setting::memory_profiler_step]); @@ -384,7 +384,7 @@ void ThreadStatus::initPerformanceCounters() /// TODO: make separate query_thread_performance_counters and thread_performance_counters performance_counters.resetCounters(); memory_tracker.resetCounters(); - memory_tracker.setDescription("(for thread)"); + memory_tracker.setDescription("Thread"); query_start_time.setUp(); diff --git a/src/Interpreters/executeQuery.cpp b/src/Interpreters/executeQuery.cpp index a8fcfff65ad..9250c069283 100644 --- a/src/Interpreters/executeQuery.cpp +++ b/src/Interpreters/executeQuery.cpp @@ -81,6 +81,7 @@ #include #include +#include #include #include @@ -460,7 +461,7 @@ QueryLogElement logQueryStart( return elem; } -void logQueryMetricLogFinish(ContextPtr context, bool internal, String query_id, QueryStatusInfoPtr info) +void logQueryMetricLogFinish(ContextPtr context, bool internal, String query_id, std::chrono::system_clock::time_point finish_time, QueryStatusInfoPtr info) { if (auto query_metric_log = context->getQueryMetricLog(); query_metric_log && !internal) { @@ -475,11 +476,11 @@ void logQueryMetricLogFinish(ContextPtr context, bool internal, String query_id, /// to query the final state in query_log. auto collect_on_finish = info->elapsed_microseconds > interval_milliseconds * 1000; auto query_info = collect_on_finish ? info : nullptr; - query_metric_log->finishQuery(query_id, query_info); + query_metric_log->finishQuery(query_id, finish_time, query_info); } else { - query_metric_log->finishQuery(query_id, nullptr); + query_metric_log->finishQuery(query_id, finish_time, nullptr); } } } @@ -503,6 +504,7 @@ void logQueryFinish( /// Update performance counters before logging to query_log CurrentThread::finalizePerformanceCounters(); + auto time_now = std::chrono::system_clock::now(); QueryStatusInfo info = process_list_elem->getInfo(true, settings[Setting::log_profile_events]); elem.type = QueryLogElementType::QUERY_FINISH; @@ -597,7 +599,7 @@ void logQueryFinish( } } - logQueryMetricLogFinish(context, internal, elem.client_info.current_query_id, std::make_shared(info)); + logQueryMetricLogFinish(context, internal, elem.client_info.current_query_id, time_now, std::make_shared(info)); } if (query_span) @@ -697,7 +699,7 @@ void logQueryException( query_span->finish(); } - logQueryMetricLogFinish(context, internal, elem.client_info.current_query_id, info); + logQueryMetricLogFinish(context, internal, elem.client_info.current_query_id, time_now, info); } void logExceptionBeforeStart( @@ -796,7 +798,7 @@ void logExceptionBeforeStart( } } - logQueryMetricLogFinish(context, false, elem.client_info.current_query_id, nullptr); + logQueryMetricLogFinish(context, false, elem.client_info.current_query_id, std::chrono::system_clock::now(), nullptr); } void validateAnalyzerSettings(ASTPtr ast, bool context_value) diff --git a/src/Parsers/Access/ASTAuthenticationData.cpp b/src/Parsers/Access/ASTAuthenticationData.cpp index 7a1091d8a1a..c7a6429f6aa 100644 --- a/src/Parsers/Access/ASTAuthenticationData.cpp +++ b/src/Parsers/Access/ASTAuthenticationData.cpp @@ -14,6 +14,15 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; } +namespace +{ + void formatValidUntil(const IAST & valid_until, const IAST::FormatSettings & settings) + { + settings.ostr << (settings.hilite ? IAST::hilite_keyword : "") << " VALID UNTIL " << (settings.hilite ? IAST::hilite_none : ""); + valid_until.format(settings); + } +} + std::optional ASTAuthenticationData::getPassword() const { if (contains_password) @@ -46,6 +55,12 @@ void ASTAuthenticationData::formatImpl(const FormatSettings & settings, FormatSt { settings.ostr << (settings.hilite ? IAST::hilite_keyword : "") << " no_password" << (settings.hilite ? IAST::hilite_none : ""); + + if (valid_until) + { + formatValidUntil(*valid_until, settings); + } + return; } @@ -205,6 +220,11 @@ void ASTAuthenticationData::formatImpl(const FormatSettings & settings, FormatSt children[1]->format(settings); } + if (valid_until) + { + formatValidUntil(*valid_until, settings); + } + } bool ASTAuthenticationData::hasSecretParts() const diff --git a/src/Parsers/Access/ASTAuthenticationData.h b/src/Parsers/Access/ASTAuthenticationData.h index 7f0644b3437..24c4c015efd 100644 --- a/src/Parsers/Access/ASTAuthenticationData.h +++ b/src/Parsers/Access/ASTAuthenticationData.h @@ -41,6 +41,7 @@ public: bool contains_password = false; bool contains_hash = false; + ASTPtr valid_until; protected: void formatImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const override; diff --git a/src/Parsers/Access/ASTCreateUserQuery.cpp b/src/Parsers/Access/ASTCreateUserQuery.cpp index ec48c32b684..eb4503acf82 100644 --- a/src/Parsers/Access/ASTCreateUserQuery.cpp +++ b/src/Parsers/Access/ASTCreateUserQuery.cpp @@ -260,8 +260,10 @@ void ASTCreateUserQuery::formatImpl(const FormatSettings & format, FormatState & formatAuthenticationData(authentication_methods, format); } - if (valid_until) - formatValidUntil(*valid_until, format); + if (global_valid_until) + { + formatValidUntil(*global_valid_until, format); + } if (hosts) formatHosts(nullptr, *hosts, format); diff --git a/src/Parsers/Access/ASTCreateUserQuery.h b/src/Parsers/Access/ASTCreateUserQuery.h index e1bae98f2f3..8926c7cad44 100644 --- a/src/Parsers/Access/ASTCreateUserQuery.h +++ b/src/Parsers/Access/ASTCreateUserQuery.h @@ -62,7 +62,7 @@ public: std::shared_ptr default_database; - ASTPtr valid_until; + ASTPtr global_valid_until; String getID(char) const override; ASTPtr clone() const override; diff --git a/src/Parsers/Access/ParserCreateUserQuery.cpp b/src/Parsers/Access/ParserCreateUserQuery.cpp index 8bfc84a28a6..657302574c2 100644 --- a/src/Parsers/Access/ParserCreateUserQuery.cpp +++ b/src/Parsers/Access/ParserCreateUserQuery.cpp @@ -43,6 +43,19 @@ namespace }); } + bool parseValidUntil(IParserBase::Pos & pos, Expected & expected, ASTPtr & valid_until) + { + return IParserBase::wrapParseImpl(pos, [&] + { + if (!ParserKeyword{Keyword::VALID_UNTIL}.ignore(pos, expected)) + return false; + + ParserStringAndSubstitution until_p; + + return until_p.parse(pos, valid_until, expected); + }); + } + bool parseAuthenticationData( IParserBase::Pos & pos, Expected & expected, @@ -223,6 +236,8 @@ namespace if (http_auth_scheme) auth_data->children.push_back(std::move(http_auth_scheme)); + parseValidUntil(pos, expected, auth_data->valid_until); + return true; }); } @@ -283,6 +298,8 @@ namespace authentication_methods.emplace_back(std::make_shared()); authentication_methods.back()->type = AuthenticationType::NO_PASSWORD; + parseValidUntil(pos, expected, authentication_methods.back()->valid_until); + return true; } @@ -471,19 +488,6 @@ namespace }); } - bool parseValidUntil(IParserBase::Pos & pos, Expected & expected, ASTPtr & valid_until) - { - return IParserBase::wrapParseImpl(pos, [&] - { - if (!ParserKeyword{Keyword::VALID_UNTIL}.ignore(pos, expected)) - return false; - - ParserStringAndSubstitution until_p; - - return until_p.parse(pos, valid_until, expected); - }); - } - bool parseAddIdentifiedWith(IParserBase::Pos & pos, Expected & expected, std::vector> & auth_data) { return IParserBase::wrapParseImpl(pos, [&] @@ -554,7 +558,7 @@ bool ParserCreateUserQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec std::shared_ptr settings; std::shared_ptr grantees; std::shared_ptr default_database; - ASTPtr valid_until; + ASTPtr global_valid_until; String cluster; String storage_name; bool reset_authentication_methods_to_new = false; @@ -568,20 +572,27 @@ bool ParserCreateUserQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec { parsed_identified_with = parseIdentifiedOrNotIdentified(pos, expected, auth_data); - if (!parsed_identified_with && alter) + if (parsed_identified_with) + { + continue; + } + else if (alter) { parsed_add_identified_with = parseAddIdentifiedWith(pos, expected, auth_data); + if (parsed_add_identified_with) + { + continue; + } } } if (!reset_authentication_methods_to_new && alter && auth_data.empty()) { reset_authentication_methods_to_new = parseResetAuthenticationMethods(pos, expected); - } - - if (!valid_until) - { - parseValidUntil(pos, expected, valid_until); + if (reset_authentication_methods_to_new) + { + continue; + } } AllowedClientHosts new_hosts; @@ -640,6 +651,14 @@ bool ParserCreateUserQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec if (storage_name.empty() && ParserKeyword{Keyword::IN}.ignore(pos, expected) && parseAccessStorageName(pos, expected, storage_name)) continue; + if (auth_data.empty() && !global_valid_until) + { + if (parseValidUntil(pos, expected, global_valid_until)) + { + continue; + } + } + break; } @@ -674,7 +693,7 @@ bool ParserCreateUserQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec query->settings = std::move(settings); query->grantees = std::move(grantees); query->default_database = std::move(default_database); - query->valid_until = std::move(valid_until); + query->global_valid_until = std::move(global_valid_until); query->storage_name = std::move(storage_name); query->reset_authentication_methods_to_new = reset_authentication_methods_to_new; query->add_identified_with = parsed_add_identified_with; @@ -685,8 +704,8 @@ bool ParserCreateUserQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec query->children.push_back(authentication_method); } - if (query->valid_until) - query->children.push_back(query->valid_until); + if (query->global_valid_until) + query->children.push_back(query->global_valid_until); return true; } diff --git a/src/Processors/Formats/Impl/NativeFormat.cpp b/src/Processors/Formats/Impl/NativeFormat.cpp index 5411e2e7811..022cb38596b 100644 --- a/src/Processors/Formats/Impl/NativeFormat.cpp +++ b/src/Processors/Formats/Impl/NativeFormat.cpp @@ -15,16 +15,17 @@ namespace DB class NativeInputFormat final : public IInputFormat { public: - NativeInputFormat(ReadBuffer & buf, const Block & header_, const FormatSettings & settings) + NativeInputFormat(ReadBuffer & buf, const Block & header_, const FormatSettings & settings_) : IInputFormat(header_, &buf) , reader(std::make_unique( buf, header_, 0, - settings, - settings.defaults_for_omitted_fields ? &block_missing_values : nullptr)) + settings_, + settings_.defaults_for_omitted_fields ? &block_missing_values : nullptr)) , header(header_) , block_missing_values(header.columns()) + , settings(settings_) { } @@ -55,7 +56,7 @@ public: void setReadBuffer(ReadBuffer & in_) override { - reader = std::make_unique(in_, header, 0); + reader = std::make_unique(in_, header, 0, settings, settings.defaults_for_omitted_fields ? &block_missing_values : nullptr); IInputFormat::setReadBuffer(in_); } @@ -67,6 +68,7 @@ private: std::unique_ptr reader; Block header; BlockMissingValues block_missing_values; + const FormatSettings settings; size_t approx_bytes_read_for_chunk = 0; }; diff --git a/src/Processors/Formats/Impl/Parquet/ParquetDataValuesReader.cpp b/src/Processors/Formats/Impl/Parquet/ParquetDataValuesReader.cpp index b8e4db8700c..b471989076b 100644 --- a/src/Processors/Formats/Impl/Parquet/ParquetDataValuesReader.cpp +++ b/src/Processors/Formats/Impl/Parquet/ParquetDataValuesReader.cpp @@ -296,6 +296,31 @@ void ParquetPlainValuesReader::readBatch( ); } +template +void ParquetBitPlainReader::readBatch( + MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) +{ + auto cursor = col_ptr->size(); + auto * column_data = getResizedPrimitiveData(*assert_cast(col_ptr.get()), cursor + num_values); + + def_level_reader->visitNullableValues( + cursor, + num_values, + max_def_level, + null_map, + /* individual_visitor */ [&](size_t nest_cursor) + { + uint8_t byte; + bit_reader->GetValue(1, &byte); + column_data[nest_cursor] = byte; + }, + /* repeated_visitor */ [&](size_t nest_cursor, UInt32 count) + { + bit_reader->GetBatch(1, &column_data[nest_cursor], count); + } + ); +} + template <> void ParquetPlainValuesReader, ParquetReaderTypes::TimestampInt96>::readBatch( @@ -561,6 +586,9 @@ template class ParquetPlainValuesReader>; template class ParquetPlainValuesReader>; template class ParquetPlainValuesReader>; template class ParquetPlainValuesReader; +template class ParquetPlainValuesReader; + +template class ParquetBitPlainReader; template class ParquetFixedLenPlainReader>; template class ParquetFixedLenPlainReader>; @@ -569,6 +597,7 @@ template class ParquetRleLCReader; template class ParquetRleLCReader; template class ParquetRleLCReader; +template class ParquetRleDictReader; template class ParquetRleDictReader; template class ParquetRleDictReader; template class ParquetRleDictReader; diff --git a/src/Processors/Formats/Impl/Parquet/ParquetDataValuesReader.h b/src/Processors/Formats/Impl/Parquet/ParquetDataValuesReader.h index fbccb612b3c..db55f7e2d6a 100644 --- a/src/Processors/Formats/Impl/Parquet/ParquetDataValuesReader.h +++ b/src/Processors/Formats/Impl/Parquet/ParquetDataValuesReader.h @@ -172,6 +172,27 @@ private: ParquetDataBuffer plain_data_buffer; }; +template +class ParquetBitPlainReader : public ParquetDataValuesReader +{ +public: + ParquetBitPlainReader( + Int32 max_def_level_, + std::unique_ptr def_level_reader_, + std::unique_ptr bit_reader_) + : max_def_level(max_def_level_) + , def_level_reader(std::move(def_level_reader_)) + , bit_reader(std::move(bit_reader_)) + {} + + void readBatch(MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) override; + +private: + Int32 max_def_level; + std::unique_ptr def_level_reader; + std::unique_ptr bit_reader; +}; + /** * The data and definition level encoding are same as ParquetPlainValuesReader. * But the element size is const and bigger than primitive data type. diff --git a/src/Processors/Formats/Impl/Parquet/ParquetLeafColReader.cpp b/src/Processors/Formats/Impl/Parquet/ParquetLeafColReader.cpp index 4b5880eba37..c3c7db510ed 100644 --- a/src/Processors/Formats/Impl/Parquet/ParquetLeafColReader.cpp +++ b/src/Processors/Formats/Impl/Parquet/ParquetLeafColReader.cpp @@ -425,16 +425,29 @@ void ParquetLeafColReader::initDataReader( degradeDictionary(); } - ParquetDataBuffer parquet_buffer = [&]() + if (col_descriptor.physical_type() == parquet::Type::BOOLEAN) { - if constexpr (!std::is_same_v, TColumn>) - return ParquetDataBuffer(buffer, max_size); + if constexpr (std::is_same_v) + { + auto bit_reader = std::make_unique(buffer, max_size); + data_values_reader = std::make_unique>(col_descriptor.max_definition_level(), + std::move(def_level_reader), + std::move(bit_reader)); + } + } + else + { + ParquetDataBuffer parquet_buffer = [&]() + { + if constexpr (!std::is_same_v, TColumn>) + return ParquetDataBuffer(buffer, max_size); - auto scale = assert_cast(*base_data_type).getScale(); - return ParquetDataBuffer(buffer, max_size, scale); - }(); - data_values_reader = createPlainReader( - col_descriptor, std::move(def_level_reader), std::move(parquet_buffer)); + auto scale = assert_cast(*base_data_type).getScale(); + return ParquetDataBuffer(buffer, max_size, scale); + }(); + data_values_reader = createPlainReader( + col_descriptor, std::move(def_level_reader), std::move(parquet_buffer)); + } break; } case parquet::Encoding::RLE_DICTIONARY: @@ -612,6 +625,12 @@ std::unique_ptr ParquetLeafColReader::createDi }); return res; } + + if (col_descriptor.physical_type() == parquet::Type::type::BOOLEAN) + { + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Dictionary encoding for booleans is not supported"); + } + return std::make_unique>( col_descriptor.max_definition_level(), std::move(def_level_reader), @@ -620,6 +639,7 @@ std::unique_ptr ParquetLeafColReader::createDi } +template class ParquetLeafColReader; template class ParquetLeafColReader; template class ParquetLeafColReader; template class ParquetLeafColReader; diff --git a/src/Processors/Formats/Impl/Parquet/ParquetRecordReader.cpp b/src/Processors/Formats/Impl/Parquet/ParquetRecordReader.cpp index acf11a30162..971bb9e1be5 100644 --- a/src/Processors/Formats/Impl/Parquet/ParquetRecordReader.cpp +++ b/src/Processors/Formats/Impl/Parquet/ParquetRecordReader.cpp @@ -263,7 +263,7 @@ std::unique_ptr ColReaderFactory::makeReader() switch (col_descriptor.physical_type()) { case parquet::Type::BOOLEAN: - break; + return makeLeafReader(); case parquet::Type::INT32: return fromInt32(); case parquet::Type::INT64: diff --git a/src/Processors/QueryPlan/Optimizations/optimizeUseAggregateProjection.cpp b/src/Processors/QueryPlan/Optimizations/optimizeUseAggregateProjection.cpp index 511ae274101..dee16bfcb1a 100644 --- a/src/Processors/QueryPlan/Optimizations/optimizeUseAggregateProjection.cpp +++ b/src/Processors/QueryPlan/Optimizations/optimizeUseAggregateProjection.cpp @@ -752,7 +752,8 @@ std::optional optimizeUseAggregateProjections(QueryPlan::Node & node, Qu Pipe pipe(std::make_shared(std::move(block_with_count))); projection_reading = std::make_unique(std::move(pipe)); - selected_projection_name = "Optimized trivial count"; + /// Use @minmax_count_projection name as it goes through the same optimization. + selected_projection_name = metadata->minmax_count_projection->name; has_ordinary_parts = reading->getAnalyzedResult() != nullptr; } else diff --git a/src/Server/CertificateReloader.cpp b/src/Server/CertificateReloader.cpp index 5b981fc7a87..aa84b26af69 100644 --- a/src/Server/CertificateReloader.cpp +++ b/src/Server/CertificateReloader.cpp @@ -91,6 +91,12 @@ void CertificateReloader::tryLoad(const Poco::Util::AbstractConfiguration & conf } +void CertificateReloader::tryLoadClient(const Poco::Util::AbstractConfiguration & config) +{ + tryLoad(config, nullptr, Poco::Net::SSLManager::CFG_CLIENT_PREFIX); +} + + void CertificateReloader::tryLoad(const Poco::Util::AbstractConfiguration & config, SSL_CTX * ctx, const std::string & prefix) { std::lock_guard lock{data_mutex}; @@ -107,7 +113,12 @@ std::list::iterator CertificateReloader::findOrI else { if (!ctx) - ctx = Poco::Net::SSLManager::instance().defaultServerContext()->sslContext(); + { + if (prefix == Poco::Net::SSLManager::CFG_CLIENT_PREFIX) + ctx = Poco::Net::SSLManager::instance().defaultClientContext()->sslContext(); + else + ctx = Poco::Net::SSLManager::instance().defaultServerContext()->sslContext(); + } data.push_back(MultiData(ctx)); --it; data_index[prefix] = it; diff --git a/src/Server/CertificateReloader.h b/src/Server/CertificateReloader.h index 28737988fdd..0e4ea8b989e 100644 --- a/src/Server/CertificateReloader.h +++ b/src/Server/CertificateReloader.h @@ -77,6 +77,9 @@ public: /// Handle configuration reload for default path void tryLoad(const Poco::Util::AbstractConfiguration & config); + /// Handle configuration reload client for default path + void tryLoadClient(const Poco::Util::AbstractConfiguration & config); + /// Handle configuration reload void tryLoad(const Poco::Util::AbstractConfiguration & config, SSL_CTX * ctx, const std::string & prefix); diff --git a/src/Storages/MergeTree/MergeTreeReaderWide.cpp b/src/Storages/MergeTree/MergeTreeReaderWide.cpp index 898bf5a2933..77231d8d392 100644 --- a/src/Storages/MergeTree/MergeTreeReaderWide.cpp +++ b/src/Storages/MergeTree/MergeTreeReaderWide.cpp @@ -262,7 +262,7 @@ MergeTreeReaderWide::FileStreams::iterator MergeTreeReaderWide::addStream(const /*num_columns_in_mark=*/ 1); auto stream_settings = settings; - stream_settings.is_low_cardinality_dictionary = substream_path.size() > 1 && substream_path[substream_path.size() - 2].type == ISerialization::Substream::Type::DictionaryKeys; + stream_settings.is_low_cardinality_dictionary = ISerialization::isLowCardinalityDictionarySubcolumn(substream_path); auto create_stream = [&]() { diff --git a/src/Storages/MergeTree/MergeTreeSettings.cpp b/src/Storages/MergeTree/MergeTreeSettings.cpp index 3d2c9c63598..3abba83758b 100644 --- a/src/Storages/MergeTree/MergeTreeSettings.cpp +++ b/src/Storages/MergeTree/MergeTreeSettings.cpp @@ -30,10 +30,11 @@ namespace ErrorCodes extern const int BAD_ARGUMENTS; } +// clang-format off + /** These settings represent fine tunes for internal details of MergeTree storages * and should not be changed by the user without a reason. */ - #define MERGE_TREE_SETTINGS(DECLARE, ALIAS) \ DECLARE(UInt64, min_compress_block_size, 0, "When granule is written, compress the data in buffer if the size of pending uncompressed data is larger or equal than the specified threshold. If this setting is not set, the corresponding global setting is used.", 0) \ DECLARE(UInt64, max_compress_block_size, 0, "Compress the pending uncompressed data in buffer if its size is larger or equal than the specified threshold. Block of data will be compressed even if the current granule is not finished. If this setting is not set, the corresponding global setting is used.", 0) \ @@ -88,7 +89,7 @@ namespace ErrorCodes DECLARE(UInt64, min_age_to_force_merge_seconds, 0, "If all parts in a certain range are older than this value, range will be always eligible for merging. Set to 0 to disable.", 0) \ DECLARE(Bool, min_age_to_force_merge_on_partition_only, false, "Whether min_age_to_force_merge_seconds should be applied only on the entire partition and not on subset.", false) \ DECLARE(UInt64, number_of_free_entries_in_pool_to_execute_optimize_entire_partition, 25, "When there is less than specified number of free entries in pool, do not try to execute optimize entire partition with a merge (this merge is created when set min_age_to_force_merge_seconds > 0 and min_age_to_force_merge_on_partition_only = true). This is to leave free threads for regular merges and avoid \"Too many parts\"", 0) \ - DECLARE(Bool, remove_rolled_back_parts_immediately, 1, "Setting for an incomplete experimental feature.", 0) \ + DECLARE(Bool, remove_rolled_back_parts_immediately, 1, "Setting for an incomplete experimental feature.", EXPERIMENTAL) \ DECLARE(UInt64, replicated_max_mutations_in_one_entry, 10000, "Max number of mutation commands that can be merged together and executed in one MUTATE_PART entry (0 means unlimited)", 0) \ DECLARE(UInt64, number_of_mutations_to_delay, 500, "If table has at least that many unfinished mutations, artificially slow down mutations of table. Disabled if set to 0", 0) \ DECLARE(UInt64, number_of_mutations_to_throw, 1000, "If table has at least that many unfinished mutations, throw 'Too many mutations' exception. Disabled if set to 0", 0) \ @@ -98,7 +99,7 @@ namespace ErrorCodes DECLARE(String, merge_workload, "", "Name of workload to be used to access resources for merges", 0) \ DECLARE(String, mutation_workload, "", "Name of workload to be used to access resources for mutations", 0) \ DECLARE(Milliseconds, background_task_preferred_step_execution_time_ms, 50, "Target time to execution of one step of merge or mutation. Can be exceeded if one step takes longer time", 0) \ - DECLARE(MergeSelectorAlgorithm, merge_selector_algorithm, MergeSelectorAlgorithm::SIMPLE, "The algorithm to select parts for merges assignment", 0) \ + DECLARE(MergeSelectorAlgorithm, merge_selector_algorithm, MergeSelectorAlgorithm::SIMPLE, "The algorithm to select parts for merges assignment", EXPERIMENTAL) \ \ /** Inserts settings. */ \ DECLARE(UInt64, parts_to_delay_insert, 1000, "If table contains at least that many active parts in single partition, artificially slow down insert into table. Disabled if set to 0", 0) \ @@ -214,14 +215,14 @@ namespace ErrorCodes DECLARE(Bool, enable_block_offset_column, false, "Enable persisting column _block_offset for each row.", 0) \ \ /** Experimental/work in progress feature. Unsafe for production. */ \ - DECLARE(UInt64, part_moves_between_shards_enable, 0, "Experimental/Incomplete feature to move parts between shards. Does not take into account sharding expressions.", 0) \ - DECLARE(UInt64, part_moves_between_shards_delay_seconds, 30, "Time to wait before/after moving parts between shards.", 0) \ - DECLARE(Bool, allow_remote_fs_zero_copy_replication, false, "Don't use this setting in production, because it is not ready.", 0) \ - DECLARE(String, remote_fs_zero_copy_zookeeper_path, "/clickhouse/zero_copy", "ZooKeeper path for zero-copy table-independent info.", 0) \ - DECLARE(Bool, remote_fs_zero_copy_path_compatible_mode, false, "Run zero-copy in compatible mode during conversion process.", 0) \ - DECLARE(Bool, cache_populated_by_fetch, false, "Only available in ClickHouse Cloud", 0) \ - DECLARE(Bool, force_read_through_cache_for_merges, false, "Force read-through filesystem cache for merges", 0) \ - DECLARE(Bool, allow_experimental_replacing_merge_with_cleanup, false, "Allow experimental CLEANUP merges for ReplacingMergeTree with is_deleted column.", 0) \ + DECLARE(UInt64, part_moves_between_shards_enable, 0, "Experimental/Incomplete feature to move parts between shards. Does not take into account sharding expressions.", EXPERIMENTAL) \ + DECLARE(UInt64, part_moves_between_shards_delay_seconds, 30, "Time to wait before/after moving parts between shards.", EXPERIMENTAL) \ + DECLARE(Bool, allow_remote_fs_zero_copy_replication, false, "Don't use this setting in production, because it is not ready.", BETA) \ + DECLARE(String, remote_fs_zero_copy_zookeeper_path, "/clickhouse/zero_copy", "ZooKeeper path for zero-copy table-independent info.", EXPERIMENTAL) \ + DECLARE(Bool, remote_fs_zero_copy_path_compatible_mode, false, "Run zero-copy in compatible mode during conversion process.", EXPERIMENTAL) \ + DECLARE(Bool, cache_populated_by_fetch, false, "Only available in ClickHouse Cloud", EXPERIMENTAL) \ + DECLARE(Bool, force_read_through_cache_for_merges, false, "Force read-through filesystem cache for merges", EXPERIMENTAL) \ + DECLARE(Bool, allow_experimental_replacing_merge_with_cleanup, false, "Allow experimental CLEANUP merges for ReplacingMergeTree with is_deleted column.", EXPERIMENTAL) \ \ /** Compress marks and primary key. */ \ DECLARE(Bool, compress_marks, true, "Marks support compression, reduce mark file size and speed up network transmission.", 0) \ @@ -240,7 +241,7 @@ namespace ErrorCodes DECLARE(DeduplicateMergeProjectionMode, deduplicate_merge_projection_mode, DeduplicateMergeProjectionMode::THROW, "Whether to allow create projection for the table with non-classic MergeTree. Ignore option is purely for compatibility which might result in incorrect answer. Otherwise, if allowed, what is the action when merge, drop or rebuild.", 0) \ #define MAKE_OBSOLETE_MERGE_TREE_SETTING(M, TYPE, NAME, DEFAULT) \ - M(TYPE, NAME, DEFAULT, "Obsolete setting, does nothing.", BaseSettingsHelpers::Flags::OBSOLETE) + M(TYPE, NAME, DEFAULT, "Obsolete setting, does nothing.", SettingsTierType::OBSOLETE) #define OBSOLETE_MERGE_TREE_SETTINGS(M, ALIAS) \ /** Obsolete settings that do nothing but left for compatibility reasons. */ \ @@ -278,8 +279,9 @@ namespace ErrorCodes MERGE_TREE_SETTINGS(M, ALIAS) \ OBSOLETE_MERGE_TREE_SETTINGS(M, ALIAS) -DECLARE_SETTINGS_TRAITS(MergeTreeSettingsTraits, LIST_OF_MERGE_TREE_SETTINGS) +// clang-format on +DECLARE_SETTINGS_TRAITS(MergeTreeSettingsTraits, LIST_OF_MERGE_TREE_SETTINGS) /** Settings for the MergeTree family of engines. * Could be loaded from config or from a CREATE TABLE query (SETTINGS clause). @@ -650,7 +652,8 @@ void MergeTreeSettings::dumpToSystemMergeTreeSettingsColumns(MutableColumnsAndCo res_columns[5]->insert(max); res_columns[6]->insert(writability == SettingConstraintWritability::CONST); res_columns[7]->insert(setting.getTypeName()); - res_columns[8]->insert(setting.isObsolete()); + res_columns[8]->insert(setting.getTier() == SettingsTierType::OBSOLETE); + res_columns[9]->insert(setting.getTier()); } } diff --git a/src/Storages/ObjectStorage/HDFS/WriteBufferFromHDFS.cpp b/src/Storages/ObjectStorage/HDFS/WriteBufferFromHDFS.cpp index 4f6f8c782f2..4879dc41d53 100644 --- a/src/Storages/ObjectStorage/HDFS/WriteBufferFromHDFS.cpp +++ b/src/Storages/ObjectStorage/HDFS/WriteBufferFromHDFS.cpp @@ -29,6 +29,7 @@ extern const int CANNOT_FSYNC; struct WriteBufferFromHDFS::WriteBufferFromHDFSImpl { std::string hdfs_uri; + std::string hdfs_file_path; hdfsFile fout; HDFSBuilderWrapper builder; HDFSFSPtr fs; @@ -36,25 +37,24 @@ struct WriteBufferFromHDFS::WriteBufferFromHDFSImpl WriteBufferFromHDFSImpl( const std::string & hdfs_uri_, + const std::string & hdfs_file_path_, const Poco::Util::AbstractConfiguration & config_, int replication_, const WriteSettings & write_settings_, int flags) : hdfs_uri(hdfs_uri_) + , hdfs_file_path(hdfs_file_path_) , builder(createHDFSBuilder(hdfs_uri, config_)) , fs(createHDFSFS(builder.get())) , write_settings(write_settings_) { - const size_t begin_of_path = hdfs_uri.find('/', hdfs_uri.find("//") + 2); - const String path = hdfs_uri.substr(begin_of_path); - /// O_WRONLY meaning create or overwrite i.e., implies O_TRUNCAT here - fout = hdfsOpenFile(fs.get(), path.c_str(), flags, 0, replication_, 0); + fout = hdfsOpenFile(fs.get(), hdfs_file_path.c_str(), flags, 0, replication_, 0); if (fout == nullptr) { throw Exception(ErrorCodes::CANNOT_OPEN_FILE, "Unable to open HDFS file: {} ({}) error: {}", - path, hdfs_uri, std::string(hdfsGetLastError())); + hdfs_file_path, hdfs_uri, std::string(hdfsGetLastError())); } } @@ -71,7 +71,7 @@ struct WriteBufferFromHDFS::WriteBufferFromHDFSImpl rlock.unlock(std::max(0, bytes_written)); if (bytes_written < 0) - throw Exception(ErrorCodes::NETWORK_ERROR, "Fail to write HDFS file: {} {}", hdfs_uri, std::string(hdfsGetLastError())); + throw Exception(ErrorCodes::NETWORK_ERROR, "Fail to write HDFS file: {}, hdfs_uri: {}, {}", hdfs_file_path, hdfs_uri, std::string(hdfsGetLastError())); if (write_settings.remote_throttler) write_settings.remote_throttler->add(bytes_written, ProfileEvents::RemoteWriteThrottlerBytes, ProfileEvents::RemoteWriteThrottlerSleepMicroseconds); @@ -83,20 +83,21 @@ struct WriteBufferFromHDFS::WriteBufferFromHDFSImpl { int result = hdfsSync(fs.get(), fout); if (result < 0) - throw ErrnoException(ErrorCodes::CANNOT_FSYNC, "Cannot HDFS sync {} {}", hdfs_uri, std::string(hdfsGetLastError())); + throw ErrnoException(ErrorCodes::CANNOT_FSYNC, "Cannot HDFS sync {}, hdfs_url: {}, {}", hdfs_file_path, hdfs_uri, std::string(hdfsGetLastError())); } }; WriteBufferFromHDFS::WriteBufferFromHDFS( - const std::string & hdfs_name_, + const std::string & hdfs_uri_, + const std::string & hdfs_file_path_, const Poco::Util::AbstractConfiguration & config_, int replication_, const WriteSettings & write_settings_, size_t buf_size_, int flags_) : WriteBufferFromFileBase(buf_size_, nullptr, 0) - , impl(std::make_unique(hdfs_name_, config_, replication_, write_settings_, flags_)) - , filename(hdfs_name_) + , impl(std::make_unique(hdfs_uri_, hdfs_file_path_, config_, replication_, write_settings_, flags_)) + , filename(hdfs_file_path_) { } diff --git a/src/Storages/ObjectStorage/HDFS/WriteBufferFromHDFS.h b/src/Storages/ObjectStorage/HDFS/WriteBufferFromHDFS.h index e3f0ae96a8f..8166da92e16 100644 --- a/src/Storages/ObjectStorage/HDFS/WriteBufferFromHDFS.h +++ b/src/Storages/ObjectStorage/HDFS/WriteBufferFromHDFS.h @@ -22,7 +22,8 @@ class WriteBufferFromHDFS final : public WriteBufferFromFileBase public: WriteBufferFromHDFS( - const String & hdfs_name_, + const String & hdfs_uri_, + const String & hdfs_file_path_, const Poco::Util::AbstractConfiguration & config_, int replication_, const WriteSettings & write_settings_ = {}, diff --git a/src/Storages/System/StorageSystemMergeTreeSettings.cpp b/src/Storages/System/StorageSystemMergeTreeSettings.cpp index 35d975216f6..1da4835dba5 100644 --- a/src/Storages/System/StorageSystemMergeTreeSettings.cpp +++ b/src/Storages/System/StorageSystemMergeTreeSettings.cpp @@ -1,4 +1,5 @@ -#include +#include +#include #include #include #include @@ -30,6 +31,14 @@ ColumnsDescription SystemMergeTreeSettings::getColumnsDescription() }, {"type", std::make_shared(), "Setting type (implementation specific string value)."}, {"is_obsolete", std::make_shared(), "Shows whether a setting is obsolete."}, + {"tier", getSettingsTierEnum(), R"( +Support level for this feature. ClickHouse features are organized in tiers, varying depending on the current status of their +development and the expectations one might have when using them: +* PRODUCTION: The feature is stable, safe to use and does not have issues interacting with other PRODUCTION features. +* BETA: The feature is stable and safe. The outcome of using it together with other features is unknown and correctness is not guaranteed. Testing and reports are welcome. +* EXPERIMENTAL: The feature is under development. Only intended for developers and ClickHouse enthusiasts. The feature might or might not work and could be removed at any time. +* OBSOLETE: No longer supported. Either it is already removed or it will be removed in future releases. +)"}, }; } diff --git a/src/Storages/System/StorageSystemSettings.cpp b/src/Storages/System/StorageSystemSettings.cpp index 9309f10378e..debd40386a6 100644 --- a/src/Storages/System/StorageSystemSettings.cpp +++ b/src/Storages/System/StorageSystemSettings.cpp @@ -2,6 +2,8 @@ #include #include +#include +#include #include #include #include @@ -34,6 +36,14 @@ ColumnsDescription StorageSystemSettings::getColumnsDescription() {"default", std::make_shared(), "Setting default value."}, {"alias_for", std::make_shared(), "Flag that shows whether this name is an alias to another setting."}, {"is_obsolete", std::make_shared(), "Shows whether a setting is obsolete."}, + {"tier", getSettingsTierEnum(), R"( +Support level for this feature. ClickHouse features are organized in tiers, varying depending on the current status of their +development and the expectations one might have when using them: +* PRODUCTION: The feature is stable, safe to use and does not have issues interacting with other PRODUCTION features. +* BETA: The feature is stable and safe. The outcome of using it together with other features is unknown and correctness is not guaranteed. Testing and reports are welcome. +* EXPERIMENTAL: The feature is under development. Only intended for developers and ClickHouse enthusiasts. The feature might or might not work and could be removed at any time. +* OBSOLETE: No longer supported. Either it is already removed or it will be removed in future releases. +)"}, }; } diff --git a/tests/ci/ci_config.py b/tests/ci/ci_config.py index b9885a89444..9f5d5f1983d 100644 --- a/tests/ci/ci_config.py +++ b/tests/ci/ci_config.py @@ -97,9 +97,9 @@ class CI: ), runner_type=Runners.BUILDER_ARM, ), - BuildNames.PACKAGE_AARCH64_ASAN: CommonJobConfigs.BUILD.with_properties( + BuildNames.PACKAGE_ARM_ASAN: CommonJobConfigs.BUILD.with_properties( build_config=BuildConfig( - name=BuildNames.PACKAGE_AARCH64_ASAN, + name=BuildNames.PACKAGE_ARM_ASAN, compiler="clang-18-aarch64", sanitizer="address", package_type="deb", @@ -283,6 +283,10 @@ class CI: JobNames.STATEFUL_TEST_ASAN: CommonJobConfigs.STATEFUL_TEST.with_properties( required_builds=[BuildNames.PACKAGE_ASAN] ), + JobNames.STATEFUL_TEST_ARM_ASAN: CommonJobConfigs.STATEFUL_TEST.with_properties( + required_builds=[BuildNames.PACKAGE_ARM_ASAN], + runner_type=Runners.FUNC_TESTER_ARM, + ), JobNames.STATEFUL_TEST_TSAN: CommonJobConfigs.STATEFUL_TEST.with_properties( required_builds=[BuildNames.PACKAGE_TSAN] ), @@ -331,6 +335,11 @@ class CI: JobNames.STATELESS_TEST_ASAN: CommonJobConfigs.STATELESS_TEST.with_properties( required_builds=[BuildNames.PACKAGE_ASAN], num_batches=2 ), + JobNames.STATELESS_TEST_ARM_ASAN: CommonJobConfigs.STATELESS_TEST.with_properties( + required_builds=[BuildNames.PACKAGE_ARM_ASAN], + num_batches=2, + runner_type=Runners.FUNC_TESTER_ARM, + ), JobNames.STATELESS_TEST_TSAN: CommonJobConfigs.STATELESS_TEST.with_properties( required_builds=[BuildNames.PACKAGE_TSAN], num_batches=4 ), diff --git a/tests/ci/ci_definitions.py b/tests/ci/ci_definitions.py index fc67959013b..dd86dc320c2 100644 --- a/tests/ci/ci_definitions.py +++ b/tests/ci/ci_definitions.py @@ -106,7 +106,7 @@ class BuildNames(metaclass=WithIter): PACKAGE_MSAN = "package_msan" PACKAGE_DEBUG = "package_debug" PACKAGE_AARCH64 = "package_aarch64" - PACKAGE_AARCH64_ASAN = "package_aarch64_asan" + PACKAGE_ARM_ASAN = "package_aarch64_asan" PACKAGE_RELEASE_COVERAGE = "package_release_coverage" BINARY_RELEASE = "binary_release" BINARY_TIDY = "binary_tidy" @@ -141,6 +141,7 @@ class JobNames(metaclass=WithIter): STATELESS_TEST_RELEASE_COVERAGE = "Stateless tests (coverage)" STATELESS_TEST_AARCH64 = "Stateless tests (aarch64)" STATELESS_TEST_ASAN = "Stateless tests (asan)" + STATELESS_TEST_ARM_ASAN = "Stateless tests (aarch64, asan)" STATELESS_TEST_TSAN = "Stateless tests (tsan)" STATELESS_TEST_MSAN = "Stateless tests (msan)" STATELESS_TEST_UBSAN = "Stateless tests (ubsan)" @@ -157,6 +158,7 @@ class JobNames(metaclass=WithIter): STATEFUL_TEST_RELEASE_COVERAGE = "Stateful tests (coverage)" STATEFUL_TEST_AARCH64 = "Stateful tests (aarch64)" STATEFUL_TEST_ASAN = "Stateful tests (asan)" + STATEFUL_TEST_ARM_ASAN = "Stateful tests (aarch64, asan)" STATEFUL_TEST_TSAN = "Stateful tests (tsan)" STATEFUL_TEST_MSAN = "Stateful tests (msan)" STATEFUL_TEST_UBSAN = "Stateful tests (ubsan)" @@ -241,7 +243,7 @@ class StatusNames(metaclass=WithIter): # mergeable status MERGEABLE = "Mergeable Check" # status of a sync pr - SYNC = "Cloud fork sync (only for ClickHouse Inc. employees)" + SYNC = "CH Inc sync" # PR formatting check status PR_CHECK = "PR Check" @@ -632,6 +634,8 @@ REQUIRED_CHECKS = [ JobNames.STATEFUL_TEST_RELEASE, JobNames.STATELESS_TEST_RELEASE, JobNames.STATELESS_TEST_ASAN, + JobNames.STATELESS_TEST_ARM_ASAN, + JobNames.STATEFUL_TEST_ARM_ASAN, JobNames.STATELESS_TEST_FLAKY_ASAN, JobNames.STATEFUL_TEST_ASAN, JobNames.STYLE_CHECK, diff --git a/tests/ci/test_ci_config.py b/tests/ci/test_ci_config.py index 29b184a4e61..0e396b827ea 100644 --- a/tests/ci/test_ci_config.py +++ b/tests/ci/test_ci_config.py @@ -36,7 +36,7 @@ class TestCIConfig(unittest.TestCase): elif "binary_" in job.lower() or "package_" in job.lower(): if job.lower() in ( CI.BuildNames.PACKAGE_AARCH64, - CI.BuildNames.PACKAGE_AARCH64_ASAN, + CI.BuildNames.PACKAGE_ARM_ASAN, ): self.assertTrue( CI.JOB_CONFIGS[job].runner_type in (CI.Runners.BUILDER_ARM,), @@ -95,69 +95,39 @@ class TestCIConfig(unittest.TestCase): self.assertTrue(CI.JOB_CONFIGS[job].required_builds is None) else: self.assertTrue(CI.JOB_CONFIGS[job].build_config is None) - if "asan" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_ASAN, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + if "asan" in job and "aarch" in job: + expected_builds = [CI.BuildNames.PACKAGE_ARM_ASAN] + elif "asan" in job: + expected_builds = [CI.BuildNames.PACKAGE_ASAN] elif "msan" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_MSAN, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_MSAN] elif "tsan" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_TSAN, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_TSAN] elif "ubsan" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_UBSAN, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_UBSAN] elif "debug" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_DEBUG, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_DEBUG] + elif job in ( + "Unit tests (release)", + "ClickHouse Keeper Jepsen", + "ClickHouse Server Jepsen", + ): + expected_builds = [CI.BuildNames.BINARY_RELEASE] elif "release" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - in ( - CI.BuildNames.PACKAGE_RELEASE, - CI.BuildNames.BINARY_RELEASE, - ), - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_RELEASE] elif "coverage" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_RELEASE_COVERAGE, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_RELEASE_COVERAGE] elif "aarch" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_AARCH64, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_AARCH64] elif "amd64" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_RELEASE, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_RELEASE] elif "uzzer" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] == CI.BuildNames.FUZZERS, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.FUZZERS] elif "Docker" in job: + expected_builds = [ + CI.BuildNames.PACKAGE_RELEASE, + CI.BuildNames.PACKAGE_AARCH64, + ] self.assertTrue( CI.JOB_CONFIGS[job].required_builds[0] in ( @@ -167,20 +137,12 @@ class TestCIConfig(unittest.TestCase): f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", ) elif "SQLTest" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - == CI.BuildNames.PACKAGE_RELEASE, - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [CI.BuildNames.PACKAGE_RELEASE] elif "Jepsen" in job: - self.assertTrue( - CI.JOB_CONFIGS[job].required_builds[0] - in ( - CI.BuildNames.PACKAGE_RELEASE, - CI.BuildNames.BINARY_RELEASE, - ), - f"Job [{job}] probably has wrong required build [{CI.JOB_CONFIGS[job].required_builds[0]}] in JobConfig", - ) + expected_builds = [ + CI.BuildNames.PACKAGE_RELEASE, + CI.BuildNames.BINARY_RELEASE, + ] elif job in ( CI.JobNames.STYLE_CHECK, CI.JobNames.FAST_TEST, @@ -188,9 +150,16 @@ class TestCIConfig(unittest.TestCase): CI.JobNames.DOCS_CHECK, CI.JobNames.BUGFIX_VALIDATE, ): - self.assertTrue(CI.JOB_CONFIGS[job].required_builds is None) + expected_builds = [] else: print(f"Job [{job}] required build not checked") + assert False + + self.assertCountEqual( + expected_builds, + CI.JOB_CONFIGS[job].required_builds or [], + f"Required builds are not valid for job [{job}]", + ) def test_job_stage_config(self): """ diff --git a/tests/docker_scripts/stress_tests.lib b/tests/docker_scripts/stress_tests.lib index 3ab52c19dbd..5c346a2d17f 100644 --- a/tests/docker_scripts/stress_tests.lib +++ b/tests/docker_scripts/stress_tests.lib @@ -263,8 +263,12 @@ function check_logs_for_critical_errors() # Remove file logical_errors.txt if it's empty [ -s /test_output/logical_errors.txt ] || rm /test_output/logical_errors.txt - # No such key errors (ignore a.myext which is used in 02724_database_s3.sh and does not exist) - rg --text "Code: 499.*The specified key does not exist" /var/log/clickhouse-server/clickhouse-server*.log | grep -v "a.myext" > /test_output/no_such_key_errors.txt \ + # ignore: + # - a.myext which is used in 02724_database_s3.sh and does not exist + # - "DistributedCacheTCPHandler" and "caller id: None:DistribCache" because they happen inside distributed cache server + # - "ReadBufferFromDistributedCache", "AsynchronousBoundedReadBuffer", "ReadBufferFromS3", "ReadBufferFromAzureBlobStorage" + # exceptions printed internally by a buffer, exception will be rethrown and handled correctly + rg --text "Code: 499.*The specified key does not exist" /var/log/clickhouse-server/clickhouse-server*.log | grep -v -e "a.myext" -e "DistributedCacheTCPHandler" -e "ReadBufferFromDistributedCache" -e "ReadBufferFromS3" -e "ReadBufferFromAzureBlobStorage" -e "AsynchronousBoundedReadBuffer" -e "caller id: None:DistribCache" > /test_output/no_such_key_errors.txt \ && echo -e "S3_ERROR No such key thrown (see clickhouse-server.log or no_such_key_errors.txt)$FAIL$(trim_server_logs no_such_key_errors.txt)" >> /test_output/test_results.tsv \ || echo -e "No lost s3 keys$OK" >> /test_output/test_results.tsv diff --git a/tests/integration/test_distributed_directory_monitor_split_batch_on_failure/test.py b/tests/integration/test_distributed_directory_monitor_split_batch_on_failure/test.py index 7a843a87ec2..74c35e7f4ea 100644 --- a/tests/integration/test_distributed_directory_monitor_split_batch_on_failure/test.py +++ b/tests/integration/test_distributed_directory_monitor_split_batch_on_failure/test.py @@ -78,7 +78,7 @@ def test_distributed_background_insert_split_batch_on_failure_OFF(started_cluste with pytest.raises( QueryRuntimeException, # no DOTALL in pytest.raises, use '(.|\n)' - match=r"DB::Exception: Received from.*Memory limit \(for query\) exceeded: (.|\n)*While sending a batch", + match=r"DB::Exception: Received from.*Query memory limit exceeded: (.|\n)*While sending a batch", ): node2.query("system flush distributed dist") assert int(node2.query("select count() from dist_data")) == 0 diff --git a/tests/integration/test_drop_replica/test.py b/tests/integration/test_drop_replica/test.py index fc6c225a0d0..eae997340e5 100644 --- a/tests/integration/test_drop_replica/test.py +++ b/tests/integration/test_drop_replica/test.py @@ -9,6 +9,7 @@ def fill_nodes(nodes, shard): for node in nodes: node.query( """ + DROP DATABASE IF EXISTS test SYNC; CREATE DATABASE test; CREATE TABLE test.test_table(date Date, id UInt32) @@ -21,6 +22,7 @@ def fill_nodes(nodes, shard): node.query( """ + DROP DATABASE IF EXISTS test1 SYNC; CREATE DATABASE test1; CREATE TABLE test1.test_table(date Date, id UInt32) @@ -33,6 +35,7 @@ def fill_nodes(nodes, shard): node.query( """ + DROP DATABASE IF EXISTS test2 SYNC; CREATE DATABASE test2; CREATE TABLE test2.test_table(date Date, id UInt32) @@ -45,7 +48,8 @@ def fill_nodes(nodes, shard): node.query( """ - CREATE DATABASE test3; + DROP DATABASE IF EXISTS test3 SYNC; + CREATE DATABASE test3; CREATE TABLE test3.test_table(date Date, id UInt32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/test3/{shard}/replicated/test_table', '{replica}') ORDER BY id PARTITION BY toYYYYMM(date) @@ -57,6 +61,7 @@ def fill_nodes(nodes, shard): node.query( """ + DROP DATABASE IF EXISTS test4 SYNC; CREATE DATABASE test4; CREATE TABLE test4.test_table(date Date, id UInt32) @@ -84,9 +89,6 @@ node_1_3 = cluster.add_instance( def start_cluster(): try: cluster.start() - - fill_nodes([node_1_1, node_1_2], 1) - yield cluster except Exception as ex: @@ -102,6 +104,8 @@ def check_exists(zk, path): def test_drop_replica(start_cluster): + fill_nodes([node_1_1, node_1_2], 1) + node_1_1.query( "INSERT INTO test.test_table SELECT number, toString(number) FROM numbers(100)" ) @@ -142,11 +146,7 @@ def test_drop_replica(start_cluster): shard=1 ) ) - assert "There is a local table" in node_1_2.query_and_get_error( - "SYSTEM DROP REPLICA 'node_1_1' FROM ZKPATH '/clickhouse/tables/test/{shard}/replicated/test_table'".format( - shard=1 - ) - ) + assert "There is a local table" in node_1_1.query_and_get_error( "SYSTEM DROP REPLICA 'node_1_1' FROM ZKPATH '/clickhouse/tables/test/{shard}/replicated/test_table'".format( shard=1 @@ -222,11 +222,22 @@ def test_drop_replica(start_cluster): ) assert exists_replica_1_1 == None - node_1_2.query("SYSTEM DROP REPLICA 'node_1_1'") - exists_replica_1_1 = check_exists( + node_1_1.query("ATTACH DATABASE test4") + + node_1_2.query("DETACH TABLE test4.test_table") + node_1_1.query( + "SYSTEM DROP REPLICA 'node_1_2' FROM ZKPATH '/clickhouse/tables/test4/{shard}/replicated/test_table'".format( + shard=1 + ) + ) + exists_replica_1_2 = check_exists( zk, "/clickhouse/tables/test4/{shard}/replicated/test_table/replicas/{replica}".format( - shard=1, replica="node_1_1" + shard=1, replica="node_1_2" ), ) - assert exists_replica_1_1 == None + assert exists_replica_1_2 == None + + node_1_1.query("ATTACH DATABASE test") + for i in range(1, 4): + node_1_1.query("ATTACH DATABASE test{}".format(i)) diff --git a/tests/integration/test_grpc_protocol/test.py b/tests/integration/test_grpc_protocol/test.py index 732907eed7a..561f5144aac 100644 --- a/tests/integration/test_grpc_protocol/test.py +++ b/tests/integration/test_grpc_protocol/test.py @@ -364,7 +364,7 @@ def test_logs(): ) assert query in logs assert "Read 1000000 rows" in logs - assert "Peak memory usage" in logs + assert "Query peak memory usage" in logs def test_progress(): diff --git a/tests/integration/test_peak_memory_usage/test.py b/tests/integration/test_peak_memory_usage/test.py index 877cf97bb18..51268dcf386 100644 --- a/tests/integration/test_peak_memory_usage/test.py +++ b/tests/integration/test_peak_memory_usage/test.py @@ -68,7 +68,8 @@ def get_memory_usage_from_client_output_and_close(client_output): for line in client_output: print(f"'{line}'\n") if not peek_memory_usage_str_found: - peek_memory_usage_str_found = "Peak memory usage" in line + # Can be both Peak/peak + peek_memory_usage_str_found = "eak memory usage" in line if peek_memory_usage_str_found: search_obj = re.search(r"[+-]?[0-9]+\.[0-9]+", line) @@ -97,9 +98,7 @@ def test_clickhouse_client_max_peak_memory_usage_distributed(started_cluster): peak_memory_usage = get_memory_usage_from_client_output_and_close(client_output) assert peak_memory_usage - assert shard_2.contains_in_log( - f"Peak memory usage (for query): {peak_memory_usage}" - ) + assert shard_2.contains_in_log(f"Query peak memory usage: {peak_memory_usage}") def test_clickhouse_client_max_peak_memory_single_node(started_cluster): @@ -118,6 +117,4 @@ def test_clickhouse_client_max_peak_memory_single_node(started_cluster): peak_memory_usage = get_memory_usage_from_client_output_and_close(client_output) assert peak_memory_usage - assert shard_1.contains_in_log( - f"Peak memory usage (for query): {peak_memory_usage}" - ) + assert shard_1.contains_in_log(f"Query peak memory usage: {peak_memory_usage}") diff --git a/tests/integration/test_reload_client_certificate/__init__.py b/tests/integration/test_reload_client_certificate/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_reload_client_certificate/configs/remote_servers.xml b/tests/integration/test_reload_client_certificate/configs/remote_servers.xml new file mode 100644 index 00000000000..63fdcea5dab --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs/remote_servers.xml @@ -0,0 +1,23 @@ + + + + + + node1 + 9000 + + + + node2 + 9000 + + + + node3 + 9000 + + + + + + diff --git a/tests/integration/test_reload_client_certificate/configs/zookeeper_config_with_ssl.xml b/tests/integration/test_reload_client_certificate/configs/zookeeper_config_with_ssl.xml new file mode 100644 index 00000000000..dc0fe771426 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs/zookeeper_config_with_ssl.xml @@ -0,0 +1,20 @@ + + + + zoo1 + 2281 + 1 + + + zoo2 + 2281 + 1 + + + zoo3 + 2281 + 1 + + 3000 + + diff --git a/tests/integration/test_reload_client_certificate/configs_secure/conf.d/remote_servers.xml b/tests/integration/test_reload_client_certificate/configs_secure/conf.d/remote_servers.xml new file mode 100644 index 00000000000..548819a8c97 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs_secure/conf.d/remote_servers.xml @@ -0,0 +1,17 @@ + + + + + + node1 + 9000 + + + + node2 + 9000 + + + + + diff --git a/tests/integration/test_reload_client_certificate/configs_secure/conf.d/ssl_conf.xml b/tests/integration/test_reload_client_certificate/configs_secure/conf.d/ssl_conf.xml new file mode 100644 index 00000000000..d620bcee919 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs_secure/conf.d/ssl_conf.xml @@ -0,0 +1,16 @@ + + + + /etc/clickhouse-server/config.d/first_client.crt + /etc/clickhouse-server/config.d/first_client.key + true + true + sslv2,sslv3 + true + none + + RejectCertificateHandler + + + + diff --git a/tests/integration/test_reload_client_certificate/configs_secure/first_client.crt b/tests/integration/test_reload_client_certificate/configs_secure/first_client.crt new file mode 100644 index 00000000000..7ade2d96273 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs_secure/first_client.crt @@ -0,0 +1,19 @@ +-----BEGIN CERTIFICATE----- +MIIC/TCCAeWgAwIBAgIJANjx1QSR77HBMA0GCSqGSIb3DQEBCwUAMBQxEjAQBgNV +BAMMCWxvY2FsaG9zdDAgFw0xODA3MzAxODE2MDhaGA8yMjkyMDUxNDE4MTYwOFow +FDESMBAGA1UEAwwJbG9jYWxob3N0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB +CgKCAQEAs9uSo6lJG8o8pw0fbVGVu0tPOljSWcVSXH9uiJBwlZLQnhN4SFSFohfI +4K8U1tBDTnxPLUo/V1K9yzoLiRDGMkwVj6+4+hE2udS2ePTQv5oaMeJ9wrs+5c9T +4pOtlq3pLAdm04ZMB1nbrEysceVudHRkQbGHzHp6VG29Fw7Ga6YpqyHQihRmEkTU +7UCYNA+Vk7aDPdMS/khweyTpXYZimaK9f0ECU3/VOeG3fH6Sp2X6FN4tUj/aFXEj +sRmU5G2TlYiSIUMF2JPdhSihfk1hJVALrHPTU38SOL+GyyBRWdNcrIwVwbpvsvPg +pryMSNxnpr0AK0dFhjwnupIv5hJIOQIDAQABo1AwTjAdBgNVHQ4EFgQUjPLb3uYC +kcamyZHK4/EV8jAP0wQwHwYDVR0jBBgwFoAUjPLb3uYCkcamyZHK4/EV8jAP0wQw +DAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAM/ocuDvfPus/KpMVD51j +4IdlU8R0vmnYLQ+ygzOAo7+hUWP5j0yvq4ILWNmQX6HNvUggCgFv9bjwDFhb/5Vr +85ieWfTd9+LTjrOzTw4avdGwpX9G+6jJJSSq15tw5ElOIFb/qNA9O4dBiu8vn03C +L/zRSXrARhSqTW5w/tZkUcSTT+M5h28+Lgn9ysx4Ff5vi44LJ1NnrbJbEAIYsAAD ++UA+4MBFKx1r6hHINULev8+lCfkpwIaeS8RL+op4fr6kQPxnULw8wT8gkuc8I4+L +P9gg/xDHB44T3ADGZ5Ib6O0DJaNiToO6rnoaaxs0KkotbvDWvRoxEytSbXKoYjYp +0g== +-----END CERTIFICATE----- diff --git a/tests/integration/test_reload_client_certificate/configs_secure/first_client.key b/tests/integration/test_reload_client_certificate/configs_secure/first_client.key new file mode 100644 index 00000000000..f0fb61ac443 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs_secure/first_client.key @@ -0,0 +1,28 @@ +-----BEGIN PRIVATE KEY----- +MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCz25KjqUkbyjyn +DR9tUZW7S086WNJZxVJcf26IkHCVktCeE3hIVIWiF8jgrxTW0ENOfE8tSj9XUr3L +OguJEMYyTBWPr7j6ETa51LZ49NC/mhox4n3Cuz7lz1Pik62WreksB2bThkwHWdus +TKxx5W50dGRBsYfMenpUbb0XDsZrpimrIdCKFGYSRNTtQJg0D5WTtoM90xL+SHB7 +JOldhmKZor1/QQJTf9U54bd8fpKnZfoU3i1SP9oVcSOxGZTkbZOViJIhQwXYk92F +KKF+TWElUAusc9NTfxI4v4bLIFFZ01ysjBXBum+y8+CmvIxI3GemvQArR0WGPCe6 +ki/mEkg5AgMBAAECggEATrbIBIxwDJOD2/BoUqWkDCY3dGevF8697vFuZKIiQ7PP +TX9j4vPq0DfsmDjHvAPFkTHiTQXzlroFik3LAp+uvhCCVzImmHq0IrwvZ9xtB43f +7Pkc5P6h1l3Ybo8HJ6zRIY3TuLtLxuPSuiOMTQSGRL0zq3SQ5DKuGwkz+kVjHXUN +MR2TECFwMHKQ5VLrC+7PMpsJYyOMlDAWhRfUalxC55xOXTpaN8TxNnwQ8K2ISVY5 +212Jz/a4hn4LdwxSz3Tiu95PN072K87HLWx3EdT6vW4Ge5P/A3y+smIuNAlanMnu +plHBRtpATLiTxZt/n6npyrfQVbYjSH7KWhB8hBHtaQKBgQDh9Cq1c/KtqDtE0Ccr +/r9tZNTUwBE6VP+3OJeKdEdtsfuxjOCkS1oAjgBJiSDOiWPh1DdoDeVZjPKq6pIu +Mq12OE3Doa8znfCXGbkSzEKOb2unKZMJxzrz99kXt40W5DtrqKPNb24CNqTiY8Aa +CjtcX+3weat82VRXvph6U8ltMwKBgQDLxjiQQzNoY7qvg7CwJCjf9qq8jmLK766g +1FHXopqS+dTxDLM8eJSRrpmxGWJvNeNc1uPhsKsKgotqAMdBUQTf7rSTbt4MyoH5 +bUcRLtr+0QTK9hDWMOOvleqNXha68vATkohWYfCueNsC60qD44o8RZAS6UNy3ENq +cM1cxqe84wKBgQDKkHutWnooJtajlTxY27O/nZKT/HA1bDgniMuKaz4R4Gr1PIez +on3YW3V0d0P7BP6PWRIm7bY79vkiMtLEKdiKUGWeyZdo3eHvhDb/3DCawtau8L2K +GZsHVp2//mS1Lfz7Qh8/L/NedqCQ+L4iWiPnZ3THjjwn3CoZ05ucpvrAMwKBgB54 +nay039MUVq44Owub3KDg+dcIU62U+cAC/9oG7qZbxYPmKkc4oL7IJSNecGHA5SbU +2268RFdl/gLz6tfRjbEOuOHzCjFPdvAdbysanpTMHLNc6FefJ+zxtgk9sJh0C4Jh +vxFrw9nTKKzfEl12gQ1SOaEaUIO0fEBGbe8ZpauRAoGAMAlGV+2/K4ebvAJKOVTa +dKAzQ+TD2SJmeR1HZmKDYddNqwtZlzg3v4ZhCk4eaUmGeC1Bdh8MDuB3QQvXz4Dr +vOIP4UVaOr+uM+7TgAgVnP4/K6IeJGzUDhX93pmpWhODfdu/oojEKVcpCojmEmS1 +KCBtmIrQLqzMpnBpLNuSY+Q= +-----END PRIVATE KEY----- diff --git a/tests/integration/test_reload_client_certificate/configs_secure/second_client.crt b/tests/integration/test_reload_client_certificate/configs_secure/second_client.crt new file mode 100644 index 00000000000..ff62438af62 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs_secure/second_client.crt @@ -0,0 +1,19 @@ +-----BEGIN CERTIFICATE----- +MIIDEDCCAfigAwIBAgIUEAdT/eB4tswNzGZg1V0rVP8WzJwwDQYJKoZIhvcNAQEL +BQAwGDEWMBQGA1UEAwwNbG9jYWxob3N0X25ldzAgFw0yNDEwMjQyMzE5MjJaGA8y +Mjk4MDgwOTIzMTkyMlowGDEWMBQGA1UEAwwNbG9jYWxob3N0X25ldzCCASIwDQYJ +KoZIhvcNAQEBBQADggEPADCCAQoCggEBALPbkqOpSRvKPKcNH21RlbtLTzpY0lnF +Ulx/boiQcJWS0J4TeEhUhaIXyOCvFNbQQ058Ty1KP1dSvcs6C4kQxjJMFY+vuPoR +NrnUtnj00L+aGjHifcK7PuXPU+KTrZat6SwHZtOGTAdZ26xMrHHlbnR0ZEGxh8x6 +elRtvRcOxmumKash0IoUZhJE1O1AmDQPlZO2gz3TEv5IcHsk6V2GYpmivX9BAlN/ +1Tnht3x+kqdl+hTeLVI/2hVxI7EZlORtk5WIkiFDBdiT3YUooX5NYSVQC6xz01N/ +Eji/hssgUVnTXKyMFcG6b7Lz4Ka8jEjcZ6a9ACtHRYY8J7qSL+YSSDkCAwEAAaNQ +ME4wHQYDVR0OBBYEFIzy297mApHGpsmRyuPxFfIwD9MEMB8GA1UdIwQYMBaAFIzy +297mApHGpsmRyuPxFfIwD9MEMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQELBQAD +ggEBAD0z8mRBdk93+HxqJdW1qZBN2g+AUc/GUaTUa8oW9baHOOvdwUacfdVXpyDo +ffdeTKfdQNs7JYMP5tWupHCrvAGK3sIzPMt7Yr06tBD720IIyPTR3J7A5RmpQNKm +2RCqfO49Pg6U8kx+bDBKNjdCGWowt31cZTlJNXk7NPewtWaGYhuskbvH8gJDtbMd +d9fOepIbzl3u+us8JHFVglBRgjy9sYjUYUT9mnTzfbpebmkdtiicJZNP1j08VZFR +lXoHiESasyzlP8DLI/PQcpL6Lh8KnIifKGEkvXVaryPT2wlEo6Kti2cY8AIJKQgl +0U1jwiNcCwjYoKIXjunOO8T8mKg= +-----END CERTIFICATE----- \ No newline at end of file diff --git a/tests/integration/test_reload_client_certificate/configs_secure/second_client.key b/tests/integration/test_reload_client_certificate/configs_secure/second_client.key new file mode 100644 index 00000000000..f0fb61ac443 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs_secure/second_client.key @@ -0,0 +1,28 @@ +-----BEGIN PRIVATE KEY----- +MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCz25KjqUkbyjyn +DR9tUZW7S086WNJZxVJcf26IkHCVktCeE3hIVIWiF8jgrxTW0ENOfE8tSj9XUr3L +OguJEMYyTBWPr7j6ETa51LZ49NC/mhox4n3Cuz7lz1Pik62WreksB2bThkwHWdus +TKxx5W50dGRBsYfMenpUbb0XDsZrpimrIdCKFGYSRNTtQJg0D5WTtoM90xL+SHB7 +JOldhmKZor1/QQJTf9U54bd8fpKnZfoU3i1SP9oVcSOxGZTkbZOViJIhQwXYk92F +KKF+TWElUAusc9NTfxI4v4bLIFFZ01ysjBXBum+y8+CmvIxI3GemvQArR0WGPCe6 +ki/mEkg5AgMBAAECggEATrbIBIxwDJOD2/BoUqWkDCY3dGevF8697vFuZKIiQ7PP +TX9j4vPq0DfsmDjHvAPFkTHiTQXzlroFik3LAp+uvhCCVzImmHq0IrwvZ9xtB43f +7Pkc5P6h1l3Ybo8HJ6zRIY3TuLtLxuPSuiOMTQSGRL0zq3SQ5DKuGwkz+kVjHXUN +MR2TECFwMHKQ5VLrC+7PMpsJYyOMlDAWhRfUalxC55xOXTpaN8TxNnwQ8K2ISVY5 +212Jz/a4hn4LdwxSz3Tiu95PN072K87HLWx3EdT6vW4Ge5P/A3y+smIuNAlanMnu +plHBRtpATLiTxZt/n6npyrfQVbYjSH7KWhB8hBHtaQKBgQDh9Cq1c/KtqDtE0Ccr +/r9tZNTUwBE6VP+3OJeKdEdtsfuxjOCkS1oAjgBJiSDOiWPh1DdoDeVZjPKq6pIu +Mq12OE3Doa8znfCXGbkSzEKOb2unKZMJxzrz99kXt40W5DtrqKPNb24CNqTiY8Aa +CjtcX+3weat82VRXvph6U8ltMwKBgQDLxjiQQzNoY7qvg7CwJCjf9qq8jmLK766g +1FHXopqS+dTxDLM8eJSRrpmxGWJvNeNc1uPhsKsKgotqAMdBUQTf7rSTbt4MyoH5 +bUcRLtr+0QTK9hDWMOOvleqNXha68vATkohWYfCueNsC60qD44o8RZAS6UNy3ENq +cM1cxqe84wKBgQDKkHutWnooJtajlTxY27O/nZKT/HA1bDgniMuKaz4R4Gr1PIez +on3YW3V0d0P7BP6PWRIm7bY79vkiMtLEKdiKUGWeyZdo3eHvhDb/3DCawtau8L2K +GZsHVp2//mS1Lfz7Qh8/L/NedqCQ+L4iWiPnZ3THjjwn3CoZ05ucpvrAMwKBgB54 +nay039MUVq44Owub3KDg+dcIU62U+cAC/9oG7qZbxYPmKkc4oL7IJSNecGHA5SbU +2268RFdl/gLz6tfRjbEOuOHzCjFPdvAdbysanpTMHLNc6FefJ+zxtgk9sJh0C4Jh +vxFrw9nTKKzfEl12gQ1SOaEaUIO0fEBGbe8ZpauRAoGAMAlGV+2/K4ebvAJKOVTa +dKAzQ+TD2SJmeR1HZmKDYddNqwtZlzg3v4ZhCk4eaUmGeC1Bdh8MDuB3QQvXz4Dr +vOIP4UVaOr+uM+7TgAgVnP4/K6IeJGzUDhX93pmpWhODfdu/oojEKVcpCojmEmS1 +KCBtmIrQLqzMpnBpLNuSY+Q= +-----END PRIVATE KEY----- diff --git a/tests/integration/test_reload_client_certificate/configs_secure/third_client.crt b/tests/integration/test_reload_client_certificate/configs_secure/third_client.crt new file mode 100644 index 00000000000..4efb8f1b7b9 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs_secure/third_client.crt @@ -0,0 +1,19 @@ +-----BEGIN CERTIFICATE----- +MIIDCDCCAfCgAwIBAgIUC749qXQA+HcnMauXvrmGf+Yz7KswDQYJKoZIhvcNAQEL +BQAwFDESMBAGA1UEAwwJbG9jYWxob3N0MCAXDTI0MTAyNTA4NDg1N1oYDzIyOTgw +ODEwMDg0ODU3WjAUMRIwEAYDVQQDDAlsb2NhbGhvc3QwggEiMA0GCSqGSIb3DQEB +AQUAA4IBDwAwggEKAoIBAQCz25KjqUkbyjynDR9tUZW7S086WNJZxVJcf26IkHCV +ktCeE3hIVIWiF8jgrxTW0ENOfE8tSj9XUr3LOguJEMYyTBWPr7j6ETa51LZ49NC/ +mhox4n3Cuz7lz1Pik62WreksB2bThkwHWdusTKxx5W50dGRBsYfMenpUbb0XDsZr +pimrIdCKFGYSRNTtQJg0D5WTtoM90xL+SHB7JOldhmKZor1/QQJTf9U54bd8fpKn +ZfoU3i1SP9oVcSOxGZTkbZOViJIhQwXYk92FKKF+TWElUAusc9NTfxI4v4bLIFFZ +01ysjBXBum+y8+CmvIxI3GemvQArR0WGPCe6ki/mEkg5AgMBAAGjUDBOMB0GA1Ud +DgQWBBSM8tve5gKRxqbJkcrj8RXyMA/TBDAfBgNVHSMEGDAWgBSM8tve5gKRxqbJ +kcrj8RXyMA/TBDAMBgNVHRMEBTADAQH/MA0GCSqGSIb3DQEBCwUAA4IBAQB/QYNd +q8ub45u2tsCEr8xgON4CB2UGZD5RazY//W6kPWmLBf8fZjepF7yLjEWP6iQHWVWk +vIVmVsAnIyfOruUYQmxR4N770Tlit9PH7OqNtRzXHGV2el3Rp62mg8NneOx4SHX+ +HITyPF3Wcg7YyWCuwwGXXS2hZ20csQXZima1jVyTNRN0GDvp0xjX+o7gyANGxbxa +EnjXTc4IWbLJ/+k4I38suavXg8RToHt+1Ndp0sHoT7Fxj+mbxOcc3QVtYU/Ct1W7 +cirraodxjWkYX63zDeqteXU8JtNdJE43qFK4BVh3QTj7PhD3PFEAKcPbnJLbdTYC +ZU36rm75uOSdLXNB +-----END CERTIFICATE----- diff --git a/tests/integration/test_reload_client_certificate/configs_secure/third_client.key b/tests/integration/test_reload_client_certificate/configs_secure/third_client.key new file mode 100644 index 00000000000..f0fb61ac443 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/configs_secure/third_client.key @@ -0,0 +1,28 @@ +-----BEGIN PRIVATE KEY----- +MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCz25KjqUkbyjyn +DR9tUZW7S086WNJZxVJcf26IkHCVktCeE3hIVIWiF8jgrxTW0ENOfE8tSj9XUr3L +OguJEMYyTBWPr7j6ETa51LZ49NC/mhox4n3Cuz7lz1Pik62WreksB2bThkwHWdus +TKxx5W50dGRBsYfMenpUbb0XDsZrpimrIdCKFGYSRNTtQJg0D5WTtoM90xL+SHB7 +JOldhmKZor1/QQJTf9U54bd8fpKnZfoU3i1SP9oVcSOxGZTkbZOViJIhQwXYk92F +KKF+TWElUAusc9NTfxI4v4bLIFFZ01ysjBXBum+y8+CmvIxI3GemvQArR0WGPCe6 +ki/mEkg5AgMBAAECggEATrbIBIxwDJOD2/BoUqWkDCY3dGevF8697vFuZKIiQ7PP +TX9j4vPq0DfsmDjHvAPFkTHiTQXzlroFik3LAp+uvhCCVzImmHq0IrwvZ9xtB43f +7Pkc5P6h1l3Ybo8HJ6zRIY3TuLtLxuPSuiOMTQSGRL0zq3SQ5DKuGwkz+kVjHXUN +MR2TECFwMHKQ5VLrC+7PMpsJYyOMlDAWhRfUalxC55xOXTpaN8TxNnwQ8K2ISVY5 +212Jz/a4hn4LdwxSz3Tiu95PN072K87HLWx3EdT6vW4Ge5P/A3y+smIuNAlanMnu +plHBRtpATLiTxZt/n6npyrfQVbYjSH7KWhB8hBHtaQKBgQDh9Cq1c/KtqDtE0Ccr +/r9tZNTUwBE6VP+3OJeKdEdtsfuxjOCkS1oAjgBJiSDOiWPh1DdoDeVZjPKq6pIu +Mq12OE3Doa8znfCXGbkSzEKOb2unKZMJxzrz99kXt40W5DtrqKPNb24CNqTiY8Aa +CjtcX+3weat82VRXvph6U8ltMwKBgQDLxjiQQzNoY7qvg7CwJCjf9qq8jmLK766g +1FHXopqS+dTxDLM8eJSRrpmxGWJvNeNc1uPhsKsKgotqAMdBUQTf7rSTbt4MyoH5 +bUcRLtr+0QTK9hDWMOOvleqNXha68vATkohWYfCueNsC60qD44o8RZAS6UNy3ENq +cM1cxqe84wKBgQDKkHutWnooJtajlTxY27O/nZKT/HA1bDgniMuKaz4R4Gr1PIez +on3YW3V0d0P7BP6PWRIm7bY79vkiMtLEKdiKUGWeyZdo3eHvhDb/3DCawtau8L2K +GZsHVp2//mS1Lfz7Qh8/L/NedqCQ+L4iWiPnZ3THjjwn3CoZ05ucpvrAMwKBgB54 +nay039MUVq44Owub3KDg+dcIU62U+cAC/9oG7qZbxYPmKkc4oL7IJSNecGHA5SbU +2268RFdl/gLz6tfRjbEOuOHzCjFPdvAdbysanpTMHLNc6FefJ+zxtgk9sJh0C4Jh +vxFrw9nTKKzfEl12gQ1SOaEaUIO0fEBGbe8ZpauRAoGAMAlGV+2/K4ebvAJKOVTa +dKAzQ+TD2SJmeR1HZmKDYddNqwtZlzg3v4ZhCk4eaUmGeC1Bdh8MDuB3QQvXz4Dr +vOIP4UVaOr+uM+7TgAgVnP4/K6IeJGzUDhX93pmpWhODfdu/oojEKVcpCojmEmS1 +KCBtmIrQLqzMpnBpLNuSY+Q= +-----END PRIVATE KEY----- diff --git a/tests/integration/test_reload_client_certificate/test.py b/tests/integration/test_reload_client_certificate/test.py new file mode 100644 index 00000000000..cb091d92ea6 --- /dev/null +++ b/tests/integration/test_reload_client_certificate/test.py @@ -0,0 +1,197 @@ +import os +import threading +import time + +import pytest + +from helpers.cluster import ClickHouseCluster + +TEST_DIR = os.path.dirname(__file__) + +cluster = ClickHouseCluster( + __file__, + zookeeper_certfile=os.path.join(TEST_DIR, "configs_secure", "first_client.crt"), + zookeeper_keyfile=os.path.join(TEST_DIR, "configs_secure", "first_client.key"), +) + +node1 = cluster.add_instance( + "node1", + main_configs=[ + "configs_secure/first_client.crt", + "configs_secure/first_client.key", + "configs_secure/second_client.crt", + "configs_secure/second_client.key", + "configs_secure/third_client.crt", + "configs_secure/third_client.key", + "configs_secure/conf.d/remote_servers.xml", + "configs_secure/conf.d/ssl_conf.xml", + "configs/zookeeper_config_with_ssl.xml", + ], + with_zookeeper_secure=True, +) + +node2 = cluster.add_instance( + "node2", + main_configs=[ + "configs_secure/first_client.crt", + "configs_secure/first_client.key", + "configs_secure/second_client.crt", + "configs_secure/second_client.key", + "configs_secure/third_client.crt", + "configs_secure/third_client.key", + "configs_secure/conf.d/remote_servers.xml", + "configs_secure/conf.d/ssl_conf.xml", + "configs/zookeeper_config_with_ssl.xml", + ], + with_zookeeper_secure=True, +) + +nodes = [node1, node2] + + +@pytest.fixture(scope="module", autouse=True) +def started_cluster(): + try: + cluster.start() + yield cluster + finally: + cluster.shutdown() + + +def secure_connection_test(started_cluster): + # No asserts, connection works + + node1.query("SELECT count() FROM system.zookeeper WHERE path = '/'") + node2.query("SELECT count() FROM system.zookeeper WHERE path = '/'") + + threads_number = 4 + iterations = 4 + threads = [] + + # Just checking for race conditions + + for _ in range(threads_number): + threads.append( + threading.Thread( + target=lambda: [ + node1.query("SELECT count() FROM system.zookeeper WHERE path = '/'") + for _ in range(iterations) + ] + ) + ) + for thread in threads: + thread.start() + for thread in threads: + thread.join() + + +def change_config_to_key(name): + """ + Generate config with certificate/key name from args. + Reload config. + """ + for node in nodes: + node.exec_in_container( + [ + "bash", + "-c", + """cat > /etc/clickhouse-server/config.d/ssl_conf.xml << EOF + + + + /etc/clickhouse-server/config.d/{cur_name}_client.crt + /etc/clickhouse-server/config.d/{cur_name}_client.key + true + true + sslv2,sslv3 + true + none + + RejectCertificateHandler + + + + +EOF""".format( + cur_name=name + ), + ] + ) + + node.exec_in_container( + ["bash", "-c", "touch /etc/clickhouse-server/config.d/ssl_conf.xml"], + ) + + +def check_reload_successful(node, cert_name): + return node.grep_in_log( + f"Reloaded certificate (/etc/clickhouse-server/config.d/{cert_name}_client.crt)" + ) + + +def check_error_handshake(node): + return node.count_in_log("Code: 210.") + + +def clean_logs(): + for node in nodes: + node.exec_in_container( + [ + "bash", + "-c", + "echo -n > /var/log/clickhouse-server/clickhouse-server.log", + ] + ) + + +def check_certificate_switch(first, second): + # Set first certificate + + change_config_to_key(first) + + # Restart zookeeper to reload the session + + cluster.stop_zookeeper_nodes(["zoo1", "zoo2", "zoo3"]) + cluster.start_zookeeper_nodes(["zoo1", "zoo2", "zoo3"]) + cluster.wait_zookeeper_nodes_to_start(["zoo1", "zoo2", "zoo3"]) + clean_logs() + + # Change certificate + + change_config_to_key(second) + + # Time to log + + time.sleep(10) + + # Check information about client certificates reloading in log Clickhouse + + reload_successful = any(check_reload_successful(node, second) for node in nodes) + + # Restart zookeeper to reload the session and clean logs for new check + + cluster.stop_zookeeper_nodes(["zoo1", "zoo2", "zoo3"]) + cluster.start_zookeeper_nodes(["zoo1", "zoo2", "zoo3"]) + clean_logs() + cluster.wait_zookeeper_nodes_to_start(["zoo1", "zoo2", "zoo3"]) + + if second == "second": + try: + secure_connection_test(started_cluster) + assert False + except: + assert True + else: + secure_connection_test(started_cluster) + error_handshake = any(check_error_handshake(node) == "0\n" for node in nodes) + assert reload_successful and error_handshake + + +def test_wrong_cn_cert(): + """Checking the certificate reload with an incorrect CN, the expected behavior is Code: 210.""" + check_certificate_switch("first", "second") + + +def test_correct_cn_cert(): + """Replacement with a valid certificate, the expected behavior is to restore the connection with Zookeeper.""" + check_certificate_switch("second", "third") diff --git a/tests/integration/test_s3_access_headers/__init__.py b/tests/integration/test_s3_access_headers/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_s3_access_headers/configs/config.d/named_collections.xml b/tests/integration/test_s3_access_headers/configs/config.d/named_collections.xml new file mode 100644 index 00000000000..d08d3401778 --- /dev/null +++ b/tests/integration/test_s3_access_headers/configs/config.d/named_collections.xml @@ -0,0 +1,9 @@ + + + + http://resolver:8081/root/test_named_colections.csv + minio + minio123 + + + diff --git a/tests/integration/test_s3_access_headers/configs/config.d/s3_headers.xml b/tests/integration/test_s3_access_headers/configs/config.d/s3_headers.xml new file mode 100644 index 00000000000..2d2eeb3c7b1 --- /dev/null +++ b/tests/integration/test_s3_access_headers/configs/config.d/s3_headers.xml @@ -0,0 +1,8 @@ + + + + http://resolver:8081/ + custom-auth-token: ValidToken1234 + + + diff --git a/tests/integration/test_s3_access_headers/configs/users.d/users.xml b/tests/integration/test_s3_access_headers/configs/users.d/users.xml new file mode 100644 index 00000000000..4b6ba057ecb --- /dev/null +++ b/tests/integration/test_s3_access_headers/configs/users.d/users.xml @@ -0,0 +1,9 @@ + + + + + default + 1 + + + diff --git a/tests/integration/test_s3_access_headers/s3_mocks/mocker_s3.py b/tests/integration/test_s3_access_headers/s3_mocks/mocker_s3.py new file mode 100644 index 00000000000..0bbcb2e60e8 --- /dev/null +++ b/tests/integration/test_s3_access_headers/s3_mocks/mocker_s3.py @@ -0,0 +1,97 @@ +import http.client +import http.server +import random +import socketserver +import sys +import urllib.parse + +UPSTREAM_HOST = "minio1:9001" +random.seed("No list objects/1.0") + + +def request(command, url, headers={}, data=None): + """Mini-requests.""" + + class Dummy: + pass + + parts = urllib.parse.urlparse(url) + c = http.client.HTTPConnection(parts.hostname, parts.port) + c.request( + command, + urllib.parse.urlunparse(parts._replace(scheme="", netloc="")), + headers=headers, + body=data, + ) + r = c.getresponse() + result = Dummy() + result.status_code = r.status + result.headers = r.headers + result.content = r.read() + return result + + +CUSTOM_AUTH_TOKEN_HEADER = "custom-auth-token" +CUSTOM_AUTH_TOKEN_VALID_VALUE = "ValidToken1234" + + +class RequestHandler(http.server.BaseHTTPRequestHandler): + def do_GET(self): + if self.path == "/": + self.send_response(200) + self.send_header("Content-Type", "text/plain") + self.end_headers() + self.wfile.write(b"OK") + return + self.do_HEAD() + + def do_PUT(self): + self.do_HEAD() + + def do_DELETE(self): + self.do_HEAD() + + def do_POST(self): + self.do_HEAD() + + def do_HEAD(self): + + custom_auth_token = self.headers.get(CUSTOM_AUTH_TOKEN_HEADER) + if custom_auth_token and custom_auth_token != CUSTOM_AUTH_TOKEN_VALID_VALUE: + self.send_response(403) + self.send_header("Content-Type", "application/xml") + self.end_headers() + + body = f""" + + AccessDenied + Access Denied. Custom token was {custom_auth_token}, the correct one: {CUSTOM_AUTH_TOKEN_VALID_VALUE}. + RESOURCE + REQUEST_ID + +""" + self.wfile.write(body.encode()) + return + + content_length = self.headers.get("Content-Length") + data = self.rfile.read(int(content_length)) if content_length else None + r = request( + self.command, + f"http://{UPSTREAM_HOST}{self.path}", + headers=self.headers, + data=data, + ) + self.send_response(r.status_code) + for k, v in r.headers.items(): + self.send_header(k, v) + self.end_headers() + self.wfile.write(r.content) + self.wfile.close() + + +class ThreadedHTTPServer(socketserver.ThreadingMixIn, http.server.HTTPServer): + """Handle requests in a separate thread.""" + + +httpd = ThreadedHTTPServer(("0.0.0.0", int(sys.argv[1])), RequestHandler) +httpd.serve_forever() diff --git a/tests/integration/test_s3_access_headers/test.py b/tests/integration/test_s3_access_headers/test.py new file mode 100644 index 00000000000..4d4a5b81230 --- /dev/null +++ b/tests/integration/test_s3_access_headers/test.py @@ -0,0 +1,124 @@ +import logging +import os + +import pytest + +from helpers.cluster import ClickHouseCluster +from helpers.mock_servers import start_mock_servers +from helpers.s3_tools import prepare_s3_bucket + +SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__)) + + +def run_s3_mocks(started_cluster): + script_dir = os.path.join(os.path.dirname(__file__), "s3_mocks") + start_mock_servers( + started_cluster, + script_dir, + [ + ("mocker_s3.py", "resolver", "8081"), + ], + ) + + +@pytest.fixture(scope="module") +def started_cluster(): + cluster = ClickHouseCluster(__file__, with_spark=True) + try: + cluster.add_instance( + "node1", + main_configs=[ + "configs/config.d/named_collections.xml", + "configs/config.d/s3_headers.xml", + ], + user_configs=["configs/users.d/users.xml"], + with_minio=True, + ) + + logging.info("Starting cluster...") + cluster.start() + + prepare_s3_bucket(cluster) + logging.info("S3 bucket created") + + run_s3_mocks(cluster) + yield cluster + + finally: + cluster.shutdown() + + +CUSTOM_AUTH_TOKEN = "custom-auth-token" +CORRECT_TOKEN = "ValidToken1234" +INCORRECT_TOKEN = "InvalidToken1234" + + +@pytest.mark.parametrize( + "table_name, engine, query_with_invalid_token_must_fail", + [ + pytest.param( + "test_access_header", + "S3('http://resolver:8081/root/test_access_header.csv', 'CSV')", + True, + id="test_access_over_custom_header", + ), + pytest.param( + "test_static_override", + "S3('http://resolver:8081/root/test_static_override.csv', 'minio', 'minio123', 'CSV')", + False, + id="test_access_key_id_overrides_access_header", + ), + pytest.param( + "test_named_colections", + "S3(s3_mock, format='CSV')", + False, + id="test_named_coll_overrides_access_header", + ), + ], +) +def test_custom_access_header( + started_cluster, table_name, engine, query_with_invalid_token_must_fail +): + instance = started_cluster.instances["node1"] + + instance.query( + f""" + SET s3_truncate_on_insert=1; + INSERT INTO FUNCTION s3('http://minio1:9001/root/{table_name}.csv', 'minio', 'minio123','CSV') + SELECT number as a, toString(number) as b FROM numbers(3); + """ + ) + instance.query( + f""" + DROP TABLE IF EXISTS {table_name}; + CREATE TABLE {table_name} (name String, value UInt32) + ENGINE={engine}; + """ + ) + instance.query("SYSTEM DROP QUERY CACHE") + + assert instance.query(f"SELECT count(*) FROM {table_name}") == "3\n" + + config_path = "/etc/clickhouse-server/config.d/s3_headers.xml" + + instance.replace_in_config( + config_path, + f"{CUSTOM_AUTH_TOKEN}: {CORRECT_TOKEN}", + f"{CUSTOM_AUTH_TOKEN}: {INCORRECT_TOKEN}", + ) + instance.query("SYSTEM RELOAD CONFIG") + + if query_with_invalid_token_must_fail: + instance.query_and_get_error(f"SELECT count(*) FROM {table_name}") + + else: + assert instance.query(f"SELECT count(*) FROM {table_name}") == "3\n" + + instance.replace_in_config( + config_path, + f"{CUSTOM_AUTH_TOKEN}: {INCORRECT_TOKEN}", + f"{CUSTOM_AUTH_TOKEN}: {CORRECT_TOKEN}", + ) + + instance.query("SYSTEM RELOAD CONFIG") + assert instance.query(f"SELECT count(*) FROM {table_name}") == "3\n" diff --git a/tests/integration/test_storage_azure_blob_storage/test_check_after_upload.py b/tests/integration/test_storage_azure_blob_storage/test_check_after_upload.py new file mode 100644 index 00000000000..c647c96809d --- /dev/null +++ b/tests/integration/test_storage_azure_blob_storage/test_check_after_upload.py @@ -0,0 +1,129 @@ +import logging +import os + +import pytest + +from helpers.cluster import ClickHouseCluster +from test_storage_azure_blob_storage.test import azure_query + +SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__)) +NODE_NAME = "node" +TABLE_NAME = "blob_storage_table" +AZURE_BLOB_STORAGE_DISK = "blob_storage_disk" +LOCAL_DISK = "hdd" +CONTAINER_NAME = "cont" + + +def generate_cluster_def(port): + path = os.path.join( + os.path.dirname(os.path.realpath(__file__)), + "./_gen/disk_storage_conf.xml", + ) + os.makedirs(os.path.dirname(path), exist_ok=True) + with open(path, "w") as f: + f.write( + f""" + + + + azure_blob_storage + http://azurite1:{port}/devstoreaccount1 + cont + false + devstoreaccount1 + Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== + 100000 + 100000 + 10 + 10 + true + + + local + / + + + + + +
+ blob_storage_disk +
+ + hdd + +
+
+
+
+
+""" + ) + return path + + +@pytest.fixture(scope="module") +def cluster(): + try: + cluster = ClickHouseCluster(__file__) + port = cluster.azurite_port + path = generate_cluster_def(port) + cluster.add_instance( + NODE_NAME, + main_configs=[ + path, + ], + with_azurite=True, + ) + logging.info("Starting cluster...") + cluster.start() + logging.info("Cluster started") + + yield cluster + finally: + cluster.shutdown() + + +# Note: use azure_query for selects and inserts and create table queries. +# For inserts there is no guarantee that retries will not result in duplicates. +# But it is better to retry anyway because connection related errors +# happens in fact only for inserts because reads already have build-in retries in code. + + +def create_table(node, table_name, **additional_settings): + settings = { + "storage_policy": "blob_storage_policy", + "old_parts_lifetime": 1, + "index_granularity": 512, + "temporary_directories_lifetime": 1, + } + settings.update(additional_settings) + + create_table_statement = f""" + CREATE TABLE {table_name} ( + dt Date, + id Int64, + data String, + INDEX min_max (id) TYPE minmax GRANULARITY 3 + ) ENGINE=MergeTree() + PARTITION BY dt + ORDER BY (dt, id) + SETTINGS {",".join((k+"="+repr(v) for k, v in settings.items()))}""" + + azure_query(node, f"DROP TABLE IF EXISTS {table_name}") + azure_query(node, create_table_statement) + assert ( + azure_query(node, f"SELECT COUNT(*) FROM {table_name} FORMAT Values") == "(0)" + ) + + +def test_simple(cluster): + node = cluster.instances[NODE_NAME] + create_table(node, TABLE_NAME) + + values = "('2021-11-13',3,'hello')" + azure_query(node, f"INSERT INTO {TABLE_NAME} VALUES {values}") + assert ( + azure_query(node, f"SELECT dt, id, data FROM {TABLE_NAME} FORMAT Values") + == values + ) diff --git a/tests/integration/test_storage_hdfs/test.py b/tests/integration/test_storage_hdfs/test.py index 362ea7d5bda..366bc28d2c9 100644 --- a/tests/integration/test_storage_hdfs/test.py +++ b/tests/integration/test_storage_hdfs/test.py @@ -396,6 +396,21 @@ def test_read_files_with_spaces(started_cluster): node1.query(f"drop table test") +def test_write_files_with_spaces(started_cluster): + fs = HdfsClient(hosts=started_cluster.hdfs_ip) + dir = "/itime=2024-10-24 10%3A02%3A04" + fs.mkdirs(dir) + + node1.query( + f"insert into function hdfs('hdfs://hdfs1:9000{dir}/test.csv', TSVRaw) select 123 settings hdfs_truncate_on_insert=1" + ) + result = node1.query( + f"select * from hdfs('hdfs://hdfs1:9000{dir}/test.csv', TSVRaw)" + ) + assert int(result) == 123 + fs.delete(dir, recursive=True) + + def test_truncate_table(started_cluster): hdfs_api = started_cluster.hdfs_api node1.query( diff --git a/tests/integration/test_storage_kafka/test.py b/tests/integration/test_storage_kafka/test.py index 0bade55415f..336ca824a2d 100644 --- a/tests/integration/test_storage_kafka/test.py +++ b/tests/integration/test_storage_kafka/test.py @@ -4193,7 +4193,7 @@ def test_kafka_formats_with_broken_message(kafka_cluster, create_query_generator ], "expected": { "raw_message": "050102696405496E743634000000000000000007626C6F636B4E6F06537472696E67034241440476616C3106537472696E6702414D0476616C3207466C6F617433320000003F0476616C330555496E743801", - "error": "Cannot convert: String to UInt16", + "error": "Cannot parse string 'BAD' as UInt16", }, "printable": False, }, diff --git a/tests/integration/test_storage_s3/test.py b/tests/integration/test_storage_s3/test.py index ad1842f4509..d8326711d84 100644 --- a/tests/integration/test_storage_s3/test.py +++ b/tests/integration/test_storage_s3/test.py @@ -1592,7 +1592,7 @@ def test_parallel_reading_with_memory_limit(started_cluster): f"select * from url('http://{started_cluster.minio_host}:{started_cluster.minio_port}/{bucket}/test_memory_limit.native') settings max_memory_usage=1000" ) - assert "Memory limit (for query) exceeded" in result + assert "Query memory limit exceeded" in result time.sleep(5) diff --git a/tests/integration/test_user_valid_until/test.py b/tests/integration/test_user_valid_until/test.py index eea05af9e45..828432f223e 100644 --- a/tests/integration/test_user_valid_until/test.py +++ b/tests/integration/test_user_valid_until/test.py @@ -76,6 +76,18 @@ def test_basic(started_cluster): node.query("DROP USER IF EXISTS user_basic") + # NOT IDENTIFIED test to make sure valid until is also parsed on its short-circuit + node.query("CREATE USER user_basic NOT IDENTIFIED VALID UNTIL '01/01/2010'") + + assert ( + node.query("SHOW CREATE USER user_basic") + == "CREATE USER user_basic IDENTIFIED WITH no_password VALID UNTIL \\'2010-01-01 00:00:00\\'\n" + ) + + assert error in node.query_and_get_error("SELECT 1", user="user_basic") + + node.query("DROP USER IF EXISTS user_basic") + def test_details(started_cluster): node.query("DROP USER IF EXISTS user_details_infinity, user_details_time_only") @@ -124,3 +136,51 @@ def test_restart(started_cluster): assert error in node.query_and_get_error("SELECT 1", user="user_restart") node.query("DROP USER IF EXISTS user_restart") + + +def test_multiple_authentication_methods(started_cluster): + node.query("DROP USER IF EXISTS user_basic") + + node.query( + "CREATE USER user_basic IDENTIFIED WITH plaintext_password BY 'no_expiration'," + "plaintext_password by 'not_expired' VALID UNTIL '06/11/2040', plaintext_password by 'expired' VALID UNTIL '06/11/2010'," + "plaintext_password by 'infinity' VALID UNTIL 'infinity'" + ) + + assert ( + node.query("SHOW CREATE USER user_basic") + == "CREATE USER user_basic IDENTIFIED WITH plaintext_password, plaintext_password VALID UNTIL \\'2040-11-06 00:00:00\\', " + "plaintext_password VALID UNTIL \\'2010-11-06 00:00:00\\', plaintext_password\n" + ) + assert node.query("SELECT 1", user="user_basic", password="no_expiration") == "1\n" + assert node.query("SELECT 1", user="user_basic", password="not_expired") == "1\n" + assert node.query("SELECT 1", user="user_basic", password="infinity") == "1\n" + + error = "Authentication failed" + assert error in node.query_and_get_error( + "SELECT 1", user="user_basic", password="expired" + ) + + # Expire them all + node.query("ALTER USER user_basic VALID UNTIL '06/11/2010 08:03:20'") + + assert ( + node.query("SHOW CREATE USER user_basic") + == "CREATE USER user_basic IDENTIFIED WITH plaintext_password VALID UNTIL \\'2010-11-06 08:03:20\\'," + " plaintext_password VALID UNTIL \\'2010-11-06 08:03:20\\'," + " plaintext_password VALID UNTIL \\'2010-11-06 08:03:20\\'," + " plaintext_password VALID UNTIL \\'2010-11-06 08:03:20\\'\n" + ) + + assert error in node.query_and_get_error( + "SELECT 1", user="user_basic", password="no_expiration" + ) + assert error in node.query_and_get_error( + "SELECT 1", user="user_basic", password="not_expired" + ) + assert error in node.query_and_get_error( + "SELECT 1", user="user_basic", password="infinity" + ) + assert error in node.query_and_get_error( + "SELECT 1", user="user_basic", password="expired" + ) diff --git a/tests/queries/0_stateless/01221_system_settings.reference b/tests/queries/0_stateless/01221_system_settings.reference index 32a0ed11b6c..821d2e386a9 100644 --- a/tests/queries/0_stateless/01221_system_settings.reference +++ b/tests/queries/0_stateless/01221_system_settings.reference @@ -1,4 +1,4 @@ -send_timeout 300 0 Timeout for sending data to the network, in seconds. If a client needs to send some data but is not able to send any bytes in this interval, the exception is thrown. If you set this setting on the client, the \'receive_timeout\' for the socket will also be set on the corresponding connection end on the server. \N \N 0 Seconds 300 0 -storage_policy default 0 Name of storage disk policy \N \N 0 String 0 +send_timeout 300 0 Timeout for sending data to the network, in seconds. If a client needs to send some data but is not able to send any bytes in this interval, the exception is thrown. If you set this setting on the client, the \'receive_timeout\' for the socket will also be set on the corresponding connection end on the server. \N \N 0 Seconds 300 0 Production +storage_policy default 0 Name of storage disk policy \N \N 0 String 0 Production 1 1 diff --git a/tests/queries/0_stateless/01301_aggregate_state_exception_memory_leak.reference b/tests/queries/0_stateless/01301_aggregate_state_exception_memory_leak.reference index 6282bf366d0..76c31901df7 100644 --- a/tests/queries/0_stateless/01301_aggregate_state_exception_memory_leak.reference +++ b/tests/queries/0_stateless/01301_aggregate_state_exception_memory_leak.reference @@ -1,2 +1,2 @@ -Memory limit exceeded +Query memory limit exceeded Ok diff --git a/tests/queries/0_stateless/01301_aggregate_state_exception_memory_leak.sh b/tests/queries/0_stateless/01301_aggregate_state_exception_memory_leak.sh index 5b7cba77432..ceb7b60be0f 100755 --- a/tests/queries/0_stateless/01301_aggregate_state_exception_memory_leak.sh +++ b/tests/queries/0_stateless/01301_aggregate_state_exception_memory_leak.sh @@ -16,5 +16,5 @@ for _ in {1..1000}; do if [[ $elapsed -gt 30 ]]; then break fi -done 2>&1 | grep -o -P 'Memory limit .+ exceeded' | sed -r -e 's/(Memory limit)(.+)( exceeded)/\1\3/' | uniq +done 2>&1 | grep -o -P 'Query memory limit exceeded' | sed -r -e 's/(.*):([a-Z ]*)([mM]emory limit exceeded)(.*)/\2\3/' | uniq echo 'Ok' diff --git a/tests/queries/0_stateless/01383_log_broken_table.sh b/tests/queries/0_stateless/01383_log_broken_table.sh index 997daf1bf2f..d3c5a2e9aad 100755 --- a/tests/queries/0_stateless/01383_log_broken_table.sh +++ b/tests/queries/0_stateless/01383_log_broken_table.sh @@ -24,7 +24,7 @@ function test_func() $CLICKHOUSE_CLIENT --query "INSERT INTO log SELECT number, number, number FROM numbers(1000000)" --max_memory_usage $MAX_MEM > "${CLICKHOUSE_TMP}"/insert_result 2>&1 RES=$? - grep -o -F 'Memory limit' "${CLICKHOUSE_TMP}"/insert_result || cat "${CLICKHOUSE_TMP}"/insert_result + grep -o -F 'emory limit' "${CLICKHOUSE_TMP}"/insert_result || cat "${CLICKHOUSE_TMP}"/insert_result $CLICKHOUSE_CLIENT --query "SELECT count(), sum(x + y + z) FROM log" > "${CLICKHOUSE_TMP}"/select_result 2>&1; @@ -36,9 +36,9 @@ function test_func() $CLICKHOUSE_CLIENT --query "DROP TABLE log"; } -test_func TinyLog | grep -v -P '^(Memory limit|0\t0|[1-9]000000\t)' -test_func StripeLog | grep -v -P '^(Memory limit|0\t0|[1-9]000000\t)' -test_func Log | grep -v -P '^(Memory limit|0\t0|[1-9]000000\t)' +test_func TinyLog | grep -v -P '^(emory limit|0\t0|[1-9]000000\t)' +test_func StripeLog | grep -v -P '^(emory limit|0\t0|[1-9]000000\t)' +test_func Log | grep -v -P '^(emory limit|0\t0|[1-9]000000\t)' rm "${CLICKHOUSE_TMP}/insert_result" rm "${CLICKHOUSE_TMP}/select_result" diff --git a/tests/queries/0_stateless/01514_distributed_cancel_query_on_error.sh b/tests/queries/0_stateless/01514_distributed_cancel_query_on_error.sh index edf3683ccba..245aa3ceb99 100755 --- a/tests/queries/0_stateless/01514_distributed_cancel_query_on_error.sh +++ b/tests/queries/0_stateless/01514_distributed_cancel_query_on_error.sh @@ -19,6 +19,6 @@ opts=( ) ${CLICKHOUSE_CLIENT} "${opts[@]}" -q "SELECT groupArray(repeat('a', if(_shard_num == 2, 100000, 1))), number%100000 k from remote('127.{2,3}', system.numbers) GROUP BY k LIMIT 10e6" |& { # the query should fail earlier on 127.3 and 127.2 should not even go to the memory limit exceeded error. - grep -F -q 'DB::Exception: Received from 127.3:9000. DB::Exception: Memory limit (for query) exceeded:' + grep -F -q "DB::Exception: Received from 127.3:${CLICKHOUSE_PORT_TCP}. DB::Exception: Query memory limit exceeded:" # while if this will not correctly then it will got the exception from the 127.2:9000 and fail } diff --git a/tests/queries/0_stateless/01710_projection_pk_trivial_count.reference b/tests/queries/0_stateless/01710_projection_pk_trivial_count.reference index 43316772467..546c26a232b 100644 --- a/tests/queries/0_stateless/01710_projection_pk_trivial_count.reference +++ b/tests/queries/0_stateless/01710_projection_pk_trivial_count.reference @@ -1,3 +1,3 @@ ReadFromMergeTree (default.x) - ReadFromPreparedSource (Optimized trivial count) + ReadFromPreparedSource (_minmax_count_projection) 5 diff --git a/tests/queries/0_stateless/02117_show_create_table_system.reference b/tests/queries/0_stateless/02117_show_create_table_system.reference index b260e2dce6c..2ea62444cff 100644 --- a/tests/queries/0_stateless/02117_show_create_table_system.reference +++ b/tests/queries/0_stateless/02117_show_create_table_system.reference @@ -342,7 +342,8 @@ CREATE TABLE system.merge_tree_settings `max` Nullable(String), `readonly` UInt8, `type` String, - `is_obsolete` UInt8 + `is_obsolete` UInt8, + `tier` Enum8('Production' = 0, 'Obsolete' = 4, 'Experimental' = 8, 'Beta' = 12) ) ENGINE = SystemMergeTreeSettings COMMENT 'Contains a list of all MergeTree engine specific settings, their current and default values along with descriptions. You may change any of them in SETTINGS section in CREATE query.' @@ -932,7 +933,8 @@ CREATE TABLE system.replicated_merge_tree_settings `max` Nullable(String), `readonly` UInt8, `type` String, - `is_obsolete` UInt8 + `is_obsolete` UInt8, + `tier` Enum8('Production' = 0, 'Obsolete' = 4, 'Experimental' = 8, 'Beta' = 12) ) ENGINE = SystemReplicatedMergeTreeSettings COMMENT 'Contains a list of all ReplicatedMergeTree engine specific settings, their current and default values along with descriptions. You may change any of them in SETTINGS section in CREATE query. ' @@ -1009,7 +1011,8 @@ CREATE TABLE system.settings `type` String, `default` String, `alias_for` String, - `is_obsolete` UInt8 + `is_obsolete` UInt8, + `tier` Enum8('Production' = 0, 'Obsolete' = 4, 'Experimental' = 8, 'Beta' = 12) ) ENGINE = SystemSettings COMMENT 'Contains a list of all user-level settings (which can be modified in a scope of query or session), their current and default values along with descriptions.' diff --git a/tests/queries/0_stateless/02240_get_type_serialization_streams.reference b/tests/queries/0_stateless/02240_get_type_serialization_streams.reference index 15e9bf87562..eb16198e877 100644 --- a/tests/queries/0_stateless/02240_get_type_serialization_streams.reference +++ b/tests/queries/0_stateless/02240_get_type_serialization_streams.reference @@ -1,7 +1,7 @@ ['{ArraySizes}','{ArrayElements, Regular}'] ['{ArraySizes}','{ArrayElements, TupleElement(keys), Regular}','{ArrayElements, TupleElement(values), Regular}'] ['{TupleElement(1), Regular}','{TupleElement(2), Regular}','{TupleElement(3), Regular}'] -['{DictionaryKeys, Regular}','{DictionaryIndexes}'] +['{DictionaryKeys}','{DictionaryIndexes}'] ['{NullMap}','{NullableElements, Regular}'] ['{ArraySizes}','{ArrayElements, Regular}'] ['{ArraySizes}','{ArrayElements, TupleElement(keys), Regular}','{ArrayElements, TupleElement(values), Regular}'] diff --git a/tests/queries/0_stateless/03096_text_log_format_string_args_not_empty.sql b/tests/queries/0_stateless/03096_text_log_format_string_args_not_empty.sql index a08f35cfc1d..a4eef59f442 100644 --- a/tests/queries/0_stateless/03096_text_log_format_string_args_not_empty.sql +++ b/tests/queries/0_stateless/03096_text_log_format_string_args_not_empty.sql @@ -7,7 +7,7 @@ select conut(); -- { serverError UNKNOWN_FUNCTION } system flush logs; SET max_rows_to_read = 0; -- system.text_log can be really big -select count() > 0 from system.text_log where message_format_string = 'Peak memory usage{}: {}.' and value1 is not null and value2 like '% MiB'; +select count() > 0 from system.text_log where message_format_string = '{}{} memory usage: {}.' and not empty(value1) and value3 like '% MiB'; select count() > 0 from system.text_log where level = 'Error' and message_format_string = 'Unknown {}{} identifier {} in scope {}{}' and value1 = 'expression' and value3 = '`count`' and value4 = 'SELECT count'; diff --git a/tests/queries/0_stateless/03203_system_query_metric_log.reference b/tests/queries/0_stateless/03203_system_query_metric_log.reference index d761659fce2..940b0c4e178 100644 --- a/tests/queries/0_stateless/03203_system_query_metric_log.reference +++ b/tests/queries/0_stateless/03203_system_query_metric_log.reference @@ -1,12 +1,30 @@ -number_of_metrics_1000_ok timestamp_diff_in_metrics_1000_ok -initial_data_1000_ok -data_1000_ok -number_of_metrics_1234_ok timestamp_diff_in_metrics_1234_ok -initial_data_1234_ok -data_1234_ok -number_of_metrics_123_ok timestamp_diff_in_metrics_123_ok -initial_data_123_ok -data_123_ok +--Interval 1000: check that amount of events is correct +1 +--Interval 1000: check that the delta/diff between the events is correct +1 +--Interval 1000: check that the Query, SelectQuery and InitialQuery values are correct for the first event +1 +--Interval 1000: check that the SleepFunctionCalls, SleepFunctionMilliseconds and ProfileEvent_SleepFunctionElapsedMicroseconds are correct +1 +--Interval 400: check that amount of events is correct +1 +--Interval 400: check that the delta/diff between the events is correct +1 +--Interval 400: check that the Query, SelectQuery and InitialQuery values are correct for the first event +1 +--Interval 400: check that the SleepFunctionCalls, SleepFunctionMilliseconds and ProfileEvent_SleepFunctionElapsedMicroseconds are correct +1 +--Interval 123: check that amount of events is correct +1 +--Interval 123: check that the delta/diff between the events is correct +1 +--Interval 123: check that the Query, SelectQuery and InitialQuery values are correct for the first event +1 +--Interval 123: check that the SleepFunctionCalls, SleepFunctionMilliseconds and ProfileEvent_SleepFunctionElapsedMicroseconds are correct +1 +--Check that a query_metric_log_interval=0 disables the collection 0 +-Check that a query which execution time is less than query_metric_log_interval is never collected 0 +--Check that there is a final event when queries finish 3 diff --git a/tests/queries/0_stateless/03203_system_query_metric_log.sh b/tests/queries/0_stateless/03203_system_query_metric_log.sh index 1c189c6ce41..bf94be79d7c 100755 --- a/tests/queries/0_stateless/03203_system_query_metric_log.sh +++ b/tests/queries/0_stateless/03203_system_query_metric_log.sh @@ -7,7 +7,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) readonly query_prefix=$CLICKHOUSE_DATABASE $CLICKHOUSE_CLIENT --query-id="${query_prefix}_1000" -q "SELECT sleep(2.5) FORMAT Null" & -$CLICKHOUSE_CLIENT --query-id="${query_prefix}_1234" -q "SELECT sleep(2.5) SETTINGS query_metric_log_interval=1234 FORMAT Null" & +$CLICKHOUSE_CLIENT --query-id="${query_prefix}_400" -q "SELECT sleep(2.5) SETTINGS query_metric_log_interval=400 FORMAT Null" & $CLICKHOUSE_CLIENT --query-id="${query_prefix}_123" -q "SELECT sleep(2.5) SETTINGS query_metric_log_interval=123 FORMAT Null" & $CLICKHOUSE_CLIENT --query-id="${query_prefix}_0" -q "SELECT sleep(2.5) SETTINGS query_metric_log_interval=0 FORMAT Null" & $CLICKHOUSE_CLIENT --query-id="${query_prefix}_fast" -q "SELECT sleep(0.1) FORMAT Null" & @@ -20,32 +20,42 @@ function check_log() { interval=$1 + # Check that the amount of events collected is correct, leaving a 20% of margin. + $CLICKHOUSE_CLIENT -m -q """ + SELECT '--Interval $interval: check that amount of events is correct'; + SELECT + count() BETWEEN ((ceil(2500 / $interval) - 1) * 0.8) AND ((ceil(2500 / $interval) + 1) * 1.2) + FROM system.query_metric_log + WHERE event_date >= yesterday() AND query_id = '${query_prefix}_${interval}' + """ + # We calculate the diff of each row with its previous row to check whether the intervals at # which data is collected is right. The first row is always skipped because the diff with the # preceding one (itself) is 0. The last row is also skipped, because it doesn't contain a full # interval. $CLICKHOUSE_CLIENT --max_threads=1 -m -q """ - WITH diff AS ( - SELECT - row_number() OVER () AS row, - count() OVER () as total_rows, - event_time_microseconds, - first_value(event_time_microseconds) OVER (ORDER BY event_time_microseconds ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING) as prev, - dateDiff('ms', prev, event_time_microseconds) AS diff - FROM system.query_metric_log - WHERE event_date >= yesterday() AND query_id = '${query_prefix}_${interval}' - ORDER BY event_time_microseconds - OFFSET 1 - ) - SELECT if(count() BETWEEN ((ceil(2500 / $interval) - 2) * 0.8) AND ((ceil(2500 / $interval) - 2) * 1.2), 'number_of_metrics_${interval}_ok', 'number_of_metrics_${interval}_error'), - if(avg(diff) BETWEEN $interval * 0.8 AND $interval * 1.2, 'timestamp_diff_in_metrics_${interval}_ok', 'timestamp_diff_in_metrics_${interval}_error') - FROM diff WHERE row < total_rows + SELECT '--Interval $interval: check that the delta/diff between the events is correct'; + WITH diff AS ( + SELECT + row_number() OVER () AS row, + count() OVER () as total_rows, + event_time_microseconds, + first_value(event_time_microseconds) OVER (ORDER BY event_time_microseconds ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING) as prev, + dateDiff('ms', prev, event_time_microseconds) AS diff + FROM system.query_metric_log + WHERE event_date >= yesterday() AND query_id = '${query_prefix}_${interval}' + ORDER BY event_time_microseconds + OFFSET 1 + ) + SELECT avg(diff) BETWEEN $interval * 0.8 AND $interval * 1.2 + FROM diff WHERE row < total_rows """ # Check that the first event contains information from the beginning of the query. # Notice the rest of the events won't contain these because the diff will be 0. $CLICKHOUSE_CLIENT -m -q """ - SELECT if(ProfileEvent_Query = 1 AND ProfileEvent_SelectQuery = 1 AND ProfileEvent_InitialQuery = 1, 'initial_data_${interval}_ok', 'initial_data_${interval}_error') + SELECT '--Interval $interval: check that the Query, SelectQuery and InitialQuery values are correct for the first event'; + SELECT ProfileEvent_Query = 1 AND ProfileEvent_SelectQuery = 1 AND ProfileEvent_InitialQuery = 1 FROM system.query_metric_log WHERE event_date >= yesterday() AND query_id = '${query_prefix}_${interval}' ORDER BY event_time_microseconds @@ -55,27 +65,36 @@ function check_log() # Also check that it contains some data that we know it's going to be there. # Notice the Sleep events can be in any of the rows, not only in the first one. $CLICKHOUSE_CLIENT -m -q """ - SELECT if(sum(ProfileEvent_SleepFunctionCalls) = 1 AND - sum(ProfileEvent_SleepFunctionMicroseconds) = 2500000 AND - sum(ProfileEvent_SleepFunctionElapsedMicroseconds) = 2500000 AND - sum(ProfileEvent_Query) = 1 AND - sum(ProfileEvent_SelectQuery) = 1 AND - sum(ProfileEvent_InitialQuery) = 1, - 'data_${interval}_ok', 'data_${interval}_error') + SELECT '--Interval $interval: check that the SleepFunctionCalls, SleepFunctionMilliseconds and ProfileEvent_SleepFunctionElapsedMicroseconds are correct'; + SELECT sum(ProfileEvent_SleepFunctionCalls) = 1 AND + sum(ProfileEvent_SleepFunctionMicroseconds) = 2500000 AND + sum(ProfileEvent_SleepFunctionElapsedMicroseconds) = 2500000 AND + sum(ProfileEvent_Query) = 1 AND + sum(ProfileEvent_SelectQuery) = 1 AND + sum(ProfileEvent_InitialQuery) = 1 FROM system.query_metric_log WHERE event_date >= yesterday() AND query_id = '${query_prefix}_${interval}' """ } check_log 1000 -check_log 1234 +check_log 400 check_log 123 # query_metric_log_interval=0 disables the collection altogether -$CLICKHOUSE_CLIENT -m -q """SELECT count() FROM system.query_metric_log WHERE event_date >= yesterday() AND query_id = '${query_prefix}_0'""" +$CLICKHOUSE_CLIENT -m -q """ + SELECT '--Check that a query_metric_log_interval=0 disables the collection'; + SELECT count() FROM system.query_metric_log WHERE event_date >= yesterday() AND query_id = '${query_prefix}_0' +""" # a quick query that takes less than query_metric_log_interval is never collected -$CLICKHOUSE_CLIENT -m -q """SELECT count() FROM system.query_metric_log WHERE event_date >= yesterday() AND query_id = '${query_prefix}_fast'""" +$CLICKHOUSE_CLIENT -m -q """ + SELECT '-Check that a query which execution time is less than query_metric_log_interval is never collected'; + SELECT count() FROM system.query_metric_log WHERE event_date >= yesterday() AND query_id = '${query_prefix}_fast' +""" # a query that takes more than query_metric_log_interval is collected including the final row -$CLICKHOUSE_CLIENT -m -q """SELECT count() FROM system.query_metric_log WHERE event_date >= yesterday() AND query_id = '${query_prefix}_1000'""" +$CLICKHOUSE_CLIENT -m -q """ + SELECT '--Check that there is a final event when queries finish'; + SELECT count() FROM system.query_metric_log WHERE event_date >= yesterday() AND query_id = '${query_prefix}_1000' +""" diff --git a/tests/queries/0_stateless/03208_datetime_cast_losing_precision.reference b/tests/queries/0_stateless/03208_datetime_cast_losing_precision.reference new file mode 100644 index 00000000000..664ea35f7f6 --- /dev/null +++ b/tests/queries/0_stateless/03208_datetime_cast_losing_precision.reference @@ -0,0 +1,10 @@ +0 +0 +0 +0 +\N +0 +1 +0 +0 +0 diff --git a/tests/queries/0_stateless/03208_datetime_cast_losing_precision.sql b/tests/queries/0_stateless/03208_datetime_cast_losing_precision.sql new file mode 100644 index 00000000000..2e2c7009c2e --- /dev/null +++ b/tests/queries/0_stateless/03208_datetime_cast_losing_precision.sql @@ -0,0 +1,43 @@ +WITH toDateTime('2024-10-16 18:00:30') as t +SELECT toDateTime64(t, 3) + interval 100 milliseconds IN (SELECT t) settings transform_null_in=0; + +WITH toDateTime('2024-10-16 18:00:30') as t +SELECT toDateTime64(t, 3) + interval 100 milliseconds IN (SELECT t) settings transform_null_in=1; + +WITH toDateTime('1970-01-01 00:00:01') as t +SELECT toDateTime64(t, 3) + interval 100 milliseconds IN (now(), Null) settings transform_null_in=1; + +WITH toDateTime('1970-01-01 00:00:01') as t +SELECT toDateTime64(t, 3) + interval 100 milliseconds IN (now(), Null) settings transform_null_in=0; + +WITH toDateTime('1970-01-01 00:00:01') as t, + arrayJoin([Null, toDateTime64(t, 3) + interval 100 milliseconds]) as x +SELECT x IN (now(), Null) settings transform_null_in=0; + +WITH toDateTime('1970-01-01 00:00:01') as t, + arrayJoin([Null, toDateTime64(t, 3) + interval 100 milliseconds]) as x +SELECT x IN (now(), Null) settings transform_null_in=1; + +WITH toDateTime('2024-10-16 18:00:30') as t +SELECT ( + SELECT + toDateTime64(t, 3) + interval 100 milliseconds, + toDateTime64(t, 3) + interval 101 milliseconds +) +IN ( + SELECT + t, + t +) SETTINGS transform_null_in=0; + +WITH toDateTime('2024-10-16 18:00:30') as t +SELECT ( + SELECT + toDateTime64(t, 3) + interval 100 milliseconds, + toDateTime64(t, 3) + interval 101 milliseconds +) +IN ( + SELECT + t, + t +) SETTINGS transform_null_in=1; diff --git a/tests/queries/0_stateless/03252_optimize_functions_to_subcolumns_map.reference b/tests/queries/0_stateless/03252_optimize_functions_to_subcolumns_map.reference deleted file mode 100644 index 3bc835eaeac..00000000000 --- a/tests/queries/0_stateless/03252_optimize_functions_to_subcolumns_map.reference +++ /dev/null @@ -1 +0,0 @@ -['foo'] ['bar'] diff --git a/tests/queries/0_stateless/03252_optimize_functions_to_subcolumns_map.sql b/tests/queries/0_stateless/03252_optimize_functions_to_subcolumns_map.sql deleted file mode 100644 index e0cc932783d..00000000000 --- a/tests/queries/0_stateless/03252_optimize_functions_to_subcolumns_map.sql +++ /dev/null @@ -1,9 +0,0 @@ -drop table if exists x; -create table x -( - kv Map(LowCardinality(String), LowCardinality(String)), - k Array(LowCardinality(String)) alias mapKeys(kv), - v Array(LowCardinality(String)) alias mapValues(kv) -) engine=Memory(); -insert into x values (map('foo', 'bar')); -select k, v from x settings optimize_functions_to_subcolumns=1; diff --git a/tests/queries/0_stateless/03254_parquet_bool_native_reader.reference b/tests/queries/0_stateless/03254_parquet_bool_native_reader.reference new file mode 100644 index 00000000000..0c7e55ad234 --- /dev/null +++ b/tests/queries/0_stateless/03254_parquet_bool_native_reader.reference @@ -0,0 +1,20 @@ +0 false +1 \N +2 false +3 \N +4 false +5 \N +6 false +7 \N +8 true +9 \N +0 false +1 \N +2 false +3 \N +4 false +5 \N +6 false +7 \N +8 true +9 \N diff --git a/tests/queries/0_stateless/03254_parquet_bool_native_reader.sh b/tests/queries/0_stateless/03254_parquet_bool_native_reader.sh new file mode 100755 index 00000000000..c28523b3c54 --- /dev/null +++ b/tests/queries/0_stateless/03254_parquet_bool_native_reader.sh @@ -0,0 +1,21 @@ +#!/usr/bin/env bash +# Tags: no-ubsan, no-fasttest + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + +USER_FILES_PATH=$($CLICKHOUSE_CLIENT_BINARY --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}') + +WORKING_DIR="${USER_FILES_PATH}/${CLICKHOUSE_TEST_UNIQUE_NAME}" + +mkdir -p "${WORKING_DIR}" + +DATA_FILE="${CUR_DIR}/data_parquet/nullbool.parquet" + +DATA_FILE_USER_PATH="${WORKING_DIR}/nullbool.parquet" + +cp ${DATA_FILE} ${DATA_FILE_USER_PATH} + +${CLICKHOUSE_CLIENT} --query="select id, bool from file('${DATA_FILE_USER_PATH}', Parquet) order by id SETTINGS input_format_parquet_use_native_reader=false;" +${CLICKHOUSE_CLIENT} --query="select id, bool from file('${DATA_FILE_USER_PATH}', Parquet) order by id SETTINGS input_format_parquet_use_native_reader=true;" diff --git a/tests/queries/0_stateless/03257_async_insert_native_empty_block.reference b/tests/queries/0_stateless/03257_async_insert_native_empty_block.reference new file mode 100644 index 00000000000..6df2a541bff --- /dev/null +++ b/tests/queries/0_stateless/03257_async_insert_native_empty_block.reference @@ -0,0 +1,9 @@ +1 name1 +2 name2 +3 +4 +5 +Ok Preprocessed 2 +Ok Preprocessed 3 +Ok Preprocessed 0 +Ok Preprocessed 0 diff --git a/tests/queries/0_stateless/03257_async_insert_native_empty_block.sh b/tests/queries/0_stateless/03257_async_insert_native_empty_block.sh new file mode 100755 index 00000000000..43a5472914d --- /dev/null +++ b/tests/queries/0_stateless/03257_async_insert_native_empty_block.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + +$CLICKHOUSE_CLIENT --query " + DROP TABLE IF EXISTS json_square_brackets; + CREATE TABLE json_square_brackets (id UInt32, name String) ENGINE = MergeTree ORDER BY tuple() +" + +MY_CLICKHOUSE_CLIENT="$CLICKHOUSE_CLIENT --async_insert 1 --wait_for_async_insert 1" + +echo '[{"id": 1, "name": "name1"}, {"id": 2, "name": "name2"}]' | $MY_CLICKHOUSE_CLIENT -q "INSERT INTO json_square_brackets FORMAT JSONEachRow" + +echo '[{"id": 3}, {"id": 4}, {"id": 5}]' | $MY_CLICKHOUSE_CLIENT -q "INSERT INTO json_square_brackets FORMAT JSONEachRow" + +echo '[]' | $MY_CLICKHOUSE_CLIENT -q "INSERT INTO json_square_brackets FORMAT JSONEachRow" + +echo '' | $MY_CLICKHOUSE_CLIENT -q "INSERT INTO json_square_brackets FORMAT JSONEachRow" + +$CLICKHOUSE_CLIENT --query " + SYSTEM FLUSH LOGS; + SELECT * FROM json_square_brackets ORDER BY id; + SELECT status, data_kind, rows FROM system.asynchronous_insert_log WHERE database = currentDatabase() AND table = 'json_square_brackets' ORDER BY event_time_microseconds; + DROP TABLE json_square_brackets; +" diff --git a/tests/queries/0_stateless/03257_json_escape_file_names.reference b/tests/queries/0_stateless/03257_json_escape_file_names.reference new file mode 100644 index 00000000000..f44e7d62cc1 --- /dev/null +++ b/tests/queries/0_stateless/03257_json_escape_file_names.reference @@ -0,0 +1,3 @@ +{"a-b-c":"43","a-b\\/c-d\\/e":"44","a\\/b\\/c":"42"} +42 43 44 +42 43 44 diff --git a/tests/queries/0_stateless/03257_json_escape_file_names.sql b/tests/queries/0_stateless/03257_json_escape_file_names.sql new file mode 100644 index 00000000000..9cc150170fd --- /dev/null +++ b/tests/queries/0_stateless/03257_json_escape_file_names.sql @@ -0,0 +1,10 @@ +set allow_experimental_json_type = 1; +drop table if exists test; +create table test (json JSON) engine=MergeTree order by tuple() settings min_rows_for_wide_part=0, min_bytes_for_wide_part=0; +insert into test format JSONAsObject {"a/b/c" : 42, "a-b-c" : 43, "a-b/c-d/e" : 44}; + +select * from test; +select json.`a/b/c`, json.`a-b-c`, json.`a-b/c-d/e` from test; +select json.`a/b/c`.:Int64, json.`a-b-c`.:Int64, json.`a-b/c-d/e`.:Int64 from test; +drop table test; + diff --git a/tests/queries/0_stateless/03257_scalar_in_format_table_expression.reference b/tests/queries/0_stateless/03257_scalar_in_format_table_expression.reference new file mode 100644 index 00000000000..5d60960bee9 --- /dev/null +++ b/tests/queries/0_stateless/03257_scalar_in_format_table_expression.reference @@ -0,0 +1,5 @@ +Hello 111 +World 123 +Hello 111 +World 123 +6 6 diff --git a/tests/queries/0_stateless/03257_scalar_in_format_table_expression.sql b/tests/queries/0_stateless/03257_scalar_in_format_table_expression.sql new file mode 100644 index 00000000000..ec89c9874e9 --- /dev/null +++ b/tests/queries/0_stateless/03257_scalar_in_format_table_expression.sql @@ -0,0 +1,84 @@ +SELECT * FROM format( + JSONEachRow, +$$ +{"a": "Hello", "b": 111} +{"a": "World", "b": 123} +$$ + ); + +-- Should be equivalent to the previous one +SELECT * FROM format( + JSONEachRow, + ( + SELECT $$ +{"a": "Hello", "b": 111} +{"a": "World", "b": 123} +$$ + ) + ); + +-- The scalar subquery is incorrect so it should throw the proper error +SELECT * FROM format( + JSONEachRow, + ( + SELECT $$ +{"a": "Hello", "b": 111} +{"a": "World", "b": 123} +$$ + WHERE column_does_not_exists = 4 + ) + ); -- { serverError UNKNOWN_IDENTIFIER } + +-- https://github.com/ClickHouse/ClickHouse/issues/70177 + +-- Resolution of the scalar subquery should work ok (already did, adding a test just for safety) +-- Disabled for the old analyzer since it incorrectly passes 's' to format, instead of resolving s and passing that +WITH (SELECT sum(number)::String as s FROM numbers(4)) as s +SELECT *, s +FROM format(TSVRaw, s) +SETTINGS enable_analyzer=1; + +SELECT count() +FROM format(TSVRaw, ( + SELECT where_qualified__fuzz_19 + FROM numbers(10000) +)); -- { serverError UNKNOWN_IDENTIFIER } + +SELECT count() +FROM format(TSVRaw, ( + SELECT where_qualified__fuzz_19 + FROM numbers(10000) + UNION ALL + SELECT where_qualified__fuzz_35 + FROM numbers(10000) +)); -- { serverError UNKNOWN_IDENTIFIER } + +WITH ( + SELECT where_qualified__fuzz_19 + FROM numbers(10000) +) as s SELECT count() +FROM format(TSVRaw, s); -- { serverError UNKNOWN_IDENTIFIER } + +-- https://github.com/ClickHouse/ClickHouse/issues/70675 +SELECT count() +FROM format(TSVRaw, ( + SELECT CAST(arrayStringConcat(groupArray(format(TSVRaw, ( + SELECT CAST(arrayStringConcat(1 GLOBAL IN ( + SELECT 1 + WHERE 1 GLOBAL IN ( + SELECT toUInt128(1) + GROUP BY + GROUPING SETS ((1)) + WITH ROLLUP + ) + GROUP BY 1 + WITH CUBE + ), groupArray('some long string')), 'LowCardinality(String)') + FROM numbers(10000) + )), toLowCardinality('some long string')) RESPECT NULLS, '\n'), 'LowCardinality(String)') + FROM numbers(10000) +)) +FORMAT TSVRaw; -- { serverError UNKNOWN_IDENTIFIER, ILLEGAL_TYPE_OF_ARGUMENT } + +-- Same but for table function numbers +SELECT 1 FROM numbers((SELECT DEFAULT)); -- { serverError UNKNOWN_IDENTIFIER } diff --git a/tests/queries/0_stateless/03257_setting_tiers.reference b/tests/queries/0_stateless/03257_setting_tiers.reference new file mode 100644 index 00000000000..d3d171221e8 --- /dev/null +++ b/tests/queries/0_stateless/03257_setting_tiers.reference @@ -0,0 +1,10 @@ +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 diff --git a/tests/queries/0_stateless/03257_setting_tiers.sql b/tests/queries/0_stateless/03257_setting_tiers.sql new file mode 100644 index 00000000000..c7ffe87a80b --- /dev/null +++ b/tests/queries/0_stateless/03257_setting_tiers.sql @@ -0,0 +1,11 @@ +SELECT count() > 0 FROM system.settings WHERE tier = 'Production'; +SELECT count() > 0 FROM system.settings WHERE tier = 'Beta'; +SELECT count() > 0 FROM system.settings WHERE tier = 'Experimental'; +SELECT count() > 0 FROM system.settings WHERE tier = 'Obsolete'; +SELECT count() == countIf(tier IN ['Production', 'Beta', 'Experimental', 'Obsolete']) FROM system.settings; + +SELECT count() > 0 FROM system.merge_tree_settings WHERE tier = 'Production'; +SELECT count() > 0 FROM system.merge_tree_settings WHERE tier = 'Beta'; +SELECT count() > 0 FROM system.merge_tree_settings WHERE tier = 'Experimental'; +SELECT count() > 0 FROM system.merge_tree_settings WHERE tier = 'Obsolete'; +SELECT count() == countIf(tier IN ['Production', 'Beta', 'Experimental', 'Obsolete']) FROM system.merge_tree_settings; diff --git a/tests/queries/0_stateless/03258_dynamic_in_functions_weak_ptr_exception.reference b/tests/queries/0_stateless/03258_dynamic_in_functions_weak_ptr_exception.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/03258_dynamic_in_functions_weak_ptr_exception.sql b/tests/queries/0_stateless/03258_dynamic_in_functions_weak_ptr_exception.sql new file mode 100644 index 00000000000..f825353c135 --- /dev/null +++ b/tests/queries/0_stateless/03258_dynamic_in_functions_weak_ptr_exception.sql @@ -0,0 +1,6 @@ +SET allow_experimental_dynamic_type = 1; +DROP TABLE IF EXISTS t0; +CREATE TABLE t0 (c0 Tuple(c1 Int,c2 Dynamic)) ENGINE = Memory(); +SELECT 1 FROM t0 tx JOIN t0 ty ON tx.c0 = ty.c0; +DROP TABLE t0; + diff --git a/tests/queries/0_stateless/03259_native_http_async_insert_settings.reference b/tests/queries/0_stateless/03259_native_http_async_insert_settings.reference new file mode 100644 index 00000000000..573541ac970 --- /dev/null +++ b/tests/queries/0_stateless/03259_native_http_async_insert_settings.reference @@ -0,0 +1 @@ +0 diff --git a/tests/queries/0_stateless/03259_native_http_async_insert_settings.sh b/tests/queries/0_stateless/03259_native_http_async_insert_settings.sh new file mode 100755 index 00000000000..c0934b06cc7 --- /dev/null +++ b/tests/queries/0_stateless/03259_native_http_async_insert_settings.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + + +$CLICKHOUSE_CLIENT -q "drop table if exists test" +$CLICKHOUSE_CLIENT -q "create table test (x UInt32) engine=Memory"; + +url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=1" + +$CLICKHOUSE_LOCAL -q "select NULL::Nullable(UInt32) as x format Native" | ${CLICKHOUSE_CURL} -sS "$url&query=INSERT%20INTO%20test%20FORMAT%20Native" --data-binary @- + +$CLICKHOUSE_CLIENT -q "select * from test"; +$CLICKHOUSE_CLIENT -q "drop table test" + diff --git a/tests/queries/0_stateless/03260_dynamic_low_cardinality_dict_bug.reference b/tests/queries/0_stateless/03260_dynamic_low_cardinality_dict_bug.reference new file mode 100644 index 00000000000..8ae0f8e9f14 --- /dev/null +++ b/tests/queries/0_stateless/03260_dynamic_low_cardinality_dict_bug.reference @@ -0,0 +1,20 @@ +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 +12345678 diff --git a/tests/queries/0_stateless/03260_dynamic_low_cardinality_dict_bug.sql b/tests/queries/0_stateless/03260_dynamic_low_cardinality_dict_bug.sql new file mode 100644 index 00000000000..c5b981d5965 --- /dev/null +++ b/tests/queries/0_stateless/03260_dynamic_low_cardinality_dict_bug.sql @@ -0,0 +1,12 @@ +set allow_experimental_dynamic_type = 1; +set min_bytes_to_use_direct_io = 0; + +drop table if exists test; +create table test (id UInt64, d Dynamic) engine=MergeTree order by id settings min_rows_for_wide_part=1, min_bytes_for_wide_part=1, index_granularity=1, use_adaptive_write_buffer_for_dynamic_subcolumns=0, max_compress_block_size=8, min_compress_block_size=8, use_compact_variant_discriminators_serialization=0; + +insert into test select number, '12345678'::LowCardinality(String) from numbers(20); + +select d.`LowCardinality(String)` from test settings max_threads=1; + +drop table test; + diff --git a/tests/queries/0_stateless/data_parquet/nullbool.parquet b/tests/queries/0_stateless/data_parquet/nullbool.parquet new file mode 100644 index 00000000000..d9b365bbe75 Binary files /dev/null and b/tests/queries/0_stateless/data_parquet/nullbool.parquet differ diff --git a/utils/check-style/check-doc-aspell b/utils/check-style/check-doc-aspell index b5a3958e6cf..0406b337575 100755 --- a/utils/check-style/check-doc-aspell +++ b/utils/check-style/check-doc-aspell @@ -53,7 +53,7 @@ done if (( STATUS != 0 )); then echo "====== Errors found ======" echo "To exclude some words add them to the dictionary file \"${ASPELL_IGNORE_PATH}/aspell-dict.txt\"" - echo "You can also run ${0} -i to see the errors interactively and fix them or add to the dictionary file" + echo "You can also run '$(realpath --relative-base=${ROOT_PATH} ${0}) -i' to see the errors interactively and fix them or add to the dictionary file" fi exit ${STATUS}